[ntp:questions] Re: Large offset problem with xntpd 3.4x, DEC UNIX, and TrueTime NTS-100s

Tom Smith smith at cag.lkg.hp.com
Sat Sep 10 14:44:37 UTC 2005


Excellent description!

There is no fundamental problem I can recall with xntpd 3.4x on
Digital UNIX 4.0D, but there is one thing you might check, especially
if xntpd is running with the "-x" option. If the clients are configured
only to slew, large time jumps on the servers such as you have had may
well have caused the clients to acquire and institutionalize an absurdly
large drift rate in /etc/ntp.driftfile that they are now laboring to correct.
Even if the servers are now perfectly well behaved, this residual
problem on the clients could cause them to run persistently
fast or slow, drifting out of synch in one direction, and unable
to be stepped back to the correct time.

Look at a few of those drift files and see if they have unusually large
numbers (in the hundreds). As I recall, in that generation of the
code, you could get drift rates of 900 or so. Anything above 50 on
an Alpha is seriously wrong, and it's unusual to see anything above 20.

If that's what you have, you will have to do the following on
each of the clients to get them back to normal within your lifetime. :-)

1) /sbin/init.d/xntpd stop
2) rm /etc/ntp.driftfile
3) /sbin/init.d/settime start (runs "ntpdate -b [server] [server] ..."}
4) /sbin/init.d/xntpd start

-Tom

Spence Green wrote:
> I was recently tasked with fixing a time synchronization setup on a closed network.  We have two old TrueTime GPS XL receivers each connected over IRIG-B to two TrueTime NTS-100 (560-5151) NTP time servers (four total NTS-100s).  I don't know which version of the ntp daemon the NTS-100s run, but they were released in 1996.  They cannot peer with each other and are locked in mode 4 (server) operation.  Each NTS-100 has an ethernet connection.  From the client side, we have several hundred DEC Alpha workstations, each running Digital UNIX 4.0d and xntpd 3.4x.  I've tested a number of synchronization configurations, but for this example, assume the following:
> 
> #ntp.conf
> 
> driftfile /etc/ntp.driftfile
> 
> server time001 version 3
> server time002 version 3
> server time003 version 3
> server time004 version 3
> 
> #Some logging directives
> 
> 
> Each client thus runs at stratum 2.  We have a problem with large offsets, though.  Several weeks ago, one of the NTS-100s went down due to a power failure.  When it came back up, it had incremented its year (IRIG-B does not contain year information, so on these NTS-100s, the admin must set the year via RS-232).  Within a few days, all of the daemons had detected the 1000s+ offset and committed suicide (we aren't using "-g" or any other command line option).  The admins were not aware of this behavior and thus did not detect the failures.  The machines started drifting, etc.
> 
> I've read the NTP RFCs and most of Dr. Mills' website.  From my understanding, the intersection algorithm, if given a sufficient number of low stratum samples, should eliminate falsetickers.  In this scenario, three of the time servers remained within a millisecond of each other, while the fourth was a year and a day off.  I've run a number of tests: using 8+ stratum 1 hosts per client, using peering between clients, adding a layer of stratum 2 servers and forcing clients to stratum 3, implementing a local clock driver.  Without exception, I observe the following behavior in ntpq after restarting xntpd (delete drift files and logs, run ntpdate to step time, start xntpd with no flags):
> 
> 1) Clients select one of the four sources and synchronize their clocks.  All client daemons operating correctly.
> 2) I manually increment the year on one of the servers.
> 3) All client daemons detect the large offset and switch to another server.  ntpq shows an "x" by the faulty server, indicating elimination by the intersection algorithm.
> 4) Eventually, I see an "x" by every server/peer.  xntpd writes "Synchronisation lost" in the syslog and then aborts.
> 
> I've searched the list archives and the internet about large offsets; most sources say that xntpd detects falsetickers with an appropriate number of sources.  Is there a bug in this version of xntpd?  I cannot update the daemon's due to a configuration freeze on these systems.  I've tried many different synchronization subtrees to no avail.  Our program cannot purchase new time server equipment that consistently stores year information.  Does this intersection algorithm fail for large offsets?  I'm at a total loss on this one.  Does anyone have experience with this issue?
> 
> 
> Thanks in advance,
> Spence
> 
> 
> _______________________________________________
> questions mailing list
> questions at lists.ntp.isc.org
> https://lists.ntp.isc.org/mailman/listinfo/questions
> 




More information about the questions mailing list