[ntp:questions] NTPd looses sync regularly / 12 hour intervals.

Michael Nielsen hael at tv2.dk
Wed Sep 8 06:40:43 UTC 2010


Hi,

This probably isn't the place to post this, if so, I will apologise in
advance..

I have  a problem that really baffles me, having run ntp servers for
years.

I have two ntp servers set up in the local network, from which another
40 odd servers synchronise.

Some of the servers do function as expected, and keep time quite
accurately, however, the
majority of   servers rapport an error that I cannot lock down, and it's
currently baffling me...  
(It's probably  something obvious I'm overlooking or so I hope).

The logs on the systems that have problems usually show the following
pattern


Sep  7 17:43:48 sn ntpd[11721]: synchronized to 10.7.100.28, stratum 2
Sep  7 17:57:51 sn ntpd[11721]: time reset +71.784598 s
Sep  7 17:59:10 sn ntpd[11721]: synchronized to 10.7.100.27, stratum 2
Sep  8 05:29:11 sn ntpd[11721]: no servers reachable
Sep  8 05:34:37 sn ntpd[11721]: synchronized to 10.7.100.28, stratum 2
Sep  8 05:35:52 sn ntpd[11721]: time reset +74.977115 s
Sep  8 05:36:41 sn ntpd[11721]: synchronized to 10.7.100.28, stratum 2

It appears to have an issue nearly exactly every 12 hours, the time
difference is getting worse, 
it started at around 1-2 seconds, and has steadily increased, and how
deviate with 74 seconds.

One day I was watching the event, and saw the machine go from being +/-
4 ms out to suddenly 
becoming +/- 36000+ out, so it appears to be something specific that
causes the problem. 

The strange thing is that the servers appear to be fully synchronised
for most of that time, but 
suddenly the jitter increases dramatically, and the clocks offset change
to something very 
large, and then is reset by ntp a few cycles later - I've decreased the
poll times to be maximum 
of 256 seconds, as that seems to correct the problem faster.

The configuration of all servers (ntp.conf) is

restrict default nomodify notrap noquery
restrict 127.0.0.1
server ntp1.i.tv2.dk minpoll 1 maxpoll 8 iburst
server ntp2.i.tv2.dk minpoll 1 maxpoll 8 iburst
server ntp.i.tv2.dk  iburst

tinker huffpuff 7200
driftfile /var/lib/ntp/drift
broadcastdelay  0.008
keys            /etc/ntp/keys

The versions of ntp that's in use, varies a lot with the machine ages,
operating systems, 
and so forth, strangely enough it seems that most machines have the same
problem.    
Similiarly the systems are running suse, rh 4/5, on different kernel
versions.  My initial 
instinct was that it might have been network related.

NB: the tinker huffpuff 7200 was something that was added to see if it
had any effect 
or not,   minpoll, maxpoll values were added as they seem to improve
recovery from the 
problem, however it's treating the symptom.

As far as I've ascertained, the ntp servers are at no time unreachable,
and do not appear 
to ever loose sync with their clock sources.    Their logs indicate that
the occasionally 
- ever hour or so, that they do change between different time servers.

Any ideas are most welcome.



More information about the questions mailing list