[ntp:questions] Spike detection and clock runaway

A C agcarver+ntp at acarver.net
Thu Feb 23 21:14:56 UTC 2012

I had ntpd running ok for the past six days with what appeared to be no 
more problems.  The libc issues were not present (ntpq was polling 
regularly with no lockups of ntpd), PPS seemed to be working with kernel 
discipline enabled (though I have an outstanding question about PPS and 
the PPM adjustments).  It was giving me offsets of only +/-5 ms or less.

However, very suddenly it went off track and I don't understand exactly 
what happened.  The log file is available at http://acarver.net/ntpd/

Near the bottom of that log (at the time stamp 23 Feb 12:35:36) spike 
detections started to show up.  I had no spikes through the rest of the 
log up to that point.  After the spike detection, ntpd started to step 
the clock around but it has never been able to recover from this.  It's 
currently still stuck:

      remote           refid      st t when poll reach   delay   offset 
==============================================================================    .PPS.            0 l  66m   16    0    0.000    0.000 
   0.000    .GPSD.           4 l   39  128   37    0.000  -2584.2 
1237.15        2 u   53   64    7   91.841  -222.24 
1639.19       4 u   49   64    7   98.229  -1431.9 
842.750   .INIT.          16 u   35  256    0    0.000    0.000 
   0.000  2 u   15   64   17   78.384  -1576.1 
*  2 u   28   64   17   85.010  -1209.3 

The loopstats and clockstats files for that time period are also at the 
same link.  Clockstats on the PPS source shows the shift occurred very 
quickly (see file clockstats.20120223 near timestamp 55980 45286.524). 
It was completely fine up until that moment and then it fell apart. 
There were no missing pulses at that point in time and according to the 
time stamp they were arriving very regularly.

More information about the questions mailing list