[ntp:questions] Re: Frequent time reset messages

Richard B. Gilbert rgilbert88 at comcast.net
Fri Dec 2 13:40:29 UTC 2005

Bob Robison wrote:

>I'm running a moderate number (around 50) dual-opterons that are
>diskless booting a Linux 2.6.12 smp kernel and trying to synch with a
>Symmetricon XLI-GPS stratum-1 NTP server on an isolated network.
>The problem I have is that when I run "ntpq -c peers" on a number of
>these machines to check the status of the ntp synchronization, I see
>offsets ranging over almost 1000 msecs.  If I grep through the /var/log/
>messages file, I see that there are often messages around every 20
>minutes like this:
>Dec  1 20:30:28 (none) ntpd[27203]: time reset 0.613771 s
>Dec  1 20:30:28 (none) ntpd[27203]: synchronisation lost
>Dec  1 20:50:45 (none) ntpd[27203]: time reset 0.931388 s
>Dec  1 20:50:45 (none) ntpd[27203]: synchronisation lost
>Dec  1 21:19:23 (none) ntpd[27203]: time reset 0.451491 s
>Dec  1 21:19:23 (none) ntpd[27203]: synchronisation lost
>Dec  1 21:36:24 (none) ntpd[27203]: time reset 0.391510 s
>Dec  1 21:36:24 (none) ntpd[27203]: synchronisation lost
>This seems like large (and frequent) steps to be occuring.  I have a
>fairly simple ntp.conf file: 
>restrict default ignore
>restrict mask nomodify notrap noquery
>server   iburst
>server     iburst # local clock
>fudge stratum 5 # default was 10
>driftfile /var/lib/ntp/drift
>These machines each have a Gigabit network connection to a high-end
>network switch.  I believe the NTP Server probably has only a 100MBit
>link, and he has all the traffic, but I don't think that is the
>Probably the main issue is the CPU and I/O loading on these opteron
>machines.  They are each handling streaming data from a firewire card
>(IEEE-1394a) and the CPUs stay fairly busy handling that data -- though
>they are not pegged at 100% or anything.
>Here is a typical ntpq output:
>ntpq> as
>ind assID status  conf reach auth condition  last_event cnt
>  1 48644  9634   yes   yes  none  sys.peer   reachable  3
>  2 48645  9034   yes   yes  none    reject   reachable  3
>ntpq> rv 48644
>status=9634 reach, conf, sel_sys.peer, 3 events, event_reach,
>srcadr=ntpserv, srcport=123, dstadr=, dstport=123, leap=00,
>stratum=1, precision=-9, rootdelay=0.000, rootdispersion=5.554,
>refid=GPSM, reach=377, unreach=0, hmode=3, pmode=4, hpoll=7, ppoll=7,
>flash=00 ok, keyid=0, offset=360.879, delay=2.544, dispersion=3.803,
>jitter=6.636, reftime=c739efcd.cf993b0f  Thu, Dec  1 2005 21:55:25.810,
>org=c739efde.6ea22848  Thu, Dec  1 2005 21:55:42.432,
>rec=c739efde.1292f6e8  Thu, Dec  1 2005 21:55:42.072,
>xmt=c739efde.0c8ede54  Thu, Dec  1 2005 21:55:42.049,
>filtdelay=     2.54    4.42    2.50    2.98    2.55    2.61    2.44
>filtoffset=  360.88  354.24  412.02  412.20  464.11  -95.25
>-78.39  -56.90, 
>filtdisp=      1.96    3.90    5.82    7.77    9.70
>11.62   12.61   13.57
>If anyone has any suggestions about what might be happening, or how to
>keep these guys synched up more tightly, I would certainly appreciate
>it.  I've dug around through FAQs, Wiki's, Docs, etc... but not sure
>exactly why my time is bouncing around so much.
>thanks in advance,
It's possible that your systems are losing clock interrupts!  Some 
device drivers disable or mask interrupts a little too freely.  If 
interrupts are masked or disabled for two consecutive clock "ticks", one 
will be lost.  If your flavor of Linux has a kernel parameter called 
"HZ" and it is set to 1000, try changing it to 100.

More information about the questions mailing list