[ntp:questions] Frequent time reset messages

Bob Robison bob.robison at swri.org
Thu Dec 1 22:12:13 UTC 2005


I'm running a moderate number (around 50) dual-opterons that are
diskless booting a Linux 2.6.12 smp kernel and trying to synch with a
Symmetricon XLI-GPS stratum-1 NTP server on an isolated network.

The problem I have is that when I run "ntpq -c peers" on a number of
these machines to check the status of the ntp synchronization, I see
offsets ranging over almost 1000 msecs.  If I grep through the /var/log/
messages file, I see that there are often messages around every 20
minutes like this:

Dec  1 20:30:28 (none) ntpd[27203]: time reset 0.613771 s
Dec  1 20:30:28 (none) ntpd[27203]: synchronisation lost
Dec  1 20:50:45 (none) ntpd[27203]: time reset 0.931388 s
Dec  1 20:50:45 (none) ntpd[27203]: synchronisation lost
Dec  1 21:19:23 (none) ntpd[27203]: time reset 0.451491 s
Dec  1 21:19:23 (none) ntpd[27203]: synchronisation lost
Dec  1 21:36:24 (none) ntpd[27203]: time reset 0.391510 s
Dec  1 21:36:24 (none) ntpd[27203]: synchronisation lost

This seems like large (and frequent) steps to be occuring.  I have a
fairly simple ntp.conf file: 
---------------------------------
restrict default ignore
restrict 10.2.40.1 mask 255.255.255.255 nomodify notrap noquery
restrict 127.0.0.1

server  10.2.40.1   iburst
server  127.127.1.0     iburst # local clock
fudge   127.127.1.0 stratum 5 # default was 10

driftfile /var/lib/ntp/drift
----------------------------------

These machines each have a Gigabit network connection to a high-end
network switch.  I believe the NTP Server probably has only a 100MBit
link, and he has all the traffic, but I don't think that is the
problem.  

Probably the main issue is the CPU and I/O loading on these opteron
machines.  They are each handling streaming data from a firewire card
(IEEE-1394a) and the CPUs stay fairly busy handling that data -- though
they are not pegged at 100% or anything.

Here is a typical ntpq output:
ntpq> as
ind assID status  conf reach auth condition  last_event cnt
===========================================================
  1 48644  9634   yes   yes  none  sys.peer   reachable  3
  2 48645  9034   yes   yes  none    reject   reachable  3
ntpq> rv 48644
status=9634 reach, conf, sel_sys.peer, 3 events, event_reach,
srcadr=ntpserv, srcport=123, dstadr=10.1.1.1, dstport=123, leap=00,
stratum=1, precision=-9, rootdelay=0.000, rootdispersion=5.554,
refid=GPSM, reach=377, unreach=0, hmode=3, pmode=4, hpoll=7, ppoll=7,
flash=00 ok, keyid=0, offset=360.879, delay=2.544, dispersion=3.803,
jitter=6.636, reftime=c739efcd.cf993b0f  Thu, Dec  1 2005 21:55:25.810,
org=c739efde.6ea22848  Thu, Dec  1 2005 21:55:42.432,
rec=c739efde.1292f6e8  Thu, Dec  1 2005 21:55:42.072,
xmt=c739efde.0c8ede54  Thu, Dec  1 2005 21:55:42.049,
filtdelay=     2.54    4.42    2.50    2.98    2.55    2.61    2.44
2.68, 
filtoffset=  360.88  354.24  412.02  412.20  464.11  -95.25
-78.39  -56.90, 
filtdisp=      1.96    3.90    5.82    7.77    9.70
11.62   12.61   13.57

If anyone has any suggestions about what might be happening, or how to
keep these guys synched up more tightly, I would certainly appreciate
it.  I've dug around through FAQs, Wiki's, Docs, etc... but not sure
exactly why my time is bouncing around so much.

thanks in advance,
bob
-- 
Bob Robison                        bob.robison at swri.org
Staff Engineer                     210-522-3935
Southwest Research Institute       San Antonio, TX



More information about the questions mailing list