[ntp:questions] ntpd losing sync
A C
agcarver+ntp at acarver.net
Sat Feb 4 08:33:07 UTC 2012
Ok, I thought this was a one-off problem but I've had ntpd lose sync
again after about four days from a restart. It never regains sync.
It starts with what seems to be the system clock drifting away from the
PPS lock and then the oscillations from corrections are just too great
and the whole thing blows up.
Here's the current configuration for version 4.2.7p236:
server 0.us.pool.ntp.org minpoll 9 iburst
server 1.us.pool.ntp.org minpoll 9 iburst
server 0.north-america.pool.ntp.org minpoll 9 iburst
server ntp1.gatech.edu prefer minpoll 9
server rolex.usg.edu minpoll 9
server 127.127.22.0 minpoll 2 maxpoll 4
fudge 127.127.22.0 time1 +0.000 flag2 1 flag3 1 refid PPS
server 127.127.28.0 minpoll 7 noselect
fudge 127.127.28.0 time1 -0.6 refid GPSD
The peer list after waiting about a day from the initial system upset:
remote refid st t when poll reach delay offset
jitter
==============================================================================
x127.127.22.0 .PPS. 0 l - 16 377 0.000 -465.49
355.933
127.127.28.0 .GPSD. 0 l - 128 377 0.000 -208986
2833.87
207.7.148.214 216.218.254.202 2 u - 512 377 1045.07 -209713
11784.0
72.14.179.211 127.67.113.92 2 u - 512 377 1029.80 -201710
6559.37
173.255.224.22 128.4.1.1 2 u 245 512 377 919.628 -202629
7684.05
130.207.165.28 130.207.244.240 2 u - 512 377 994.543 -204125
7778.28
131.144.4.10 65.212.71.102 2 u 23 512 377 1000.21 -203648
7687.63
Note that the offset for PPS is swinging wildly, not exactly visible in
this static snapshot.
ntpq associations:
ind assid status conf reach auth condition last_event cnt
===========================================================
1 4560 912a yes yes none falsetick sys_peer 2
2 4561 9014 yes yes none reject reachable 1
3 4562 9014 yes yes none reject reachable 1
4 4563 9034 yes yes none reject reachable 3
5 4564 9014 yes yes none reject reachable 1
6 4565 904a yes yes none reject sys_peer 4
7 4566 9014 yes yes none reject reachable 1
rv 4560 (first sys_peer):
associd=4560 status=912a conf, reach, sel_falsetick, 2 events, sys_peer,
srcadr=PPS(0), srcport=123, dstadr=127.0.0.1, dstport=123, leap=00,
stratum=0, precision=-20, rootdelay=0.000, rootdisp=0.000, refid=PPS,
reftime=d2d76400.c9b870fd Sat, Feb 4 2012 8:00:00.787,
rec=d2d76401.ffffffff Sat, Feb 4 2012 8:00:02.000, reach=377,
unreach=0, hmode=3, pmode=4, hpoll=4, ppoll=4, headway=0, flash=00 ok,
keyid=0, offset=259.524, delay=0.000, dispersion=4.956, jitter=444.467,
filtdelay= 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00,
filtoffset= 259.52 344.53 419.52 474.51 -430.48 -335.49 -265.48
-185.49,
filtdisp= 4.74 4.98 5.22 5.47 5.70 5.94 6.18
6.42
rv 4565 (second sys_peer)
associd=4565 status=904a conf, reach, sel_reject, 4 events, sys_peer,
srcadr=ntp1.gatech.edu, srcport=123, dstadr=10.0.0.21, dstport=123,
leap=00, stratum=2, precision=-20, rootdelay=0.565, rootdisp=24.597,
refid=130.207.244.240,
reftime=d2d7609d.0646422f Sat, Feb 4 2012 7:45:33.024,
rec=d2d76271.00c7dd3a Sat, Feb 4 2012 7:53:21.003, reach=377,
unreach=0, hmode=3, pmode=4, hpoll=9, ppoll=9, headway=46,
flash=400 peer_dist, keyid=0, offset=-204125.520, delay=994.543,
dispersion=16.941, jitter=7778.280,
filtdelay= 997.29 999.05 994.54 996.13 994.70 994.38 977.68
995.78,
filtoffset= -209351 -206700 -204125 -201435 -198758 -196080 -193475
-190882,
filtdisp= 0.08 8.07 15.83 23.94 32.01 40.08 47.91
55.76
I can provide graphs of the offset, dispersion and skew for any of the
peers if anyone wants them. The physical GPS itself has been ticking
just fine, no apparent issues with its signal to the machine. As far as
I can tell from the peers files there is simply a sudden shift away from
a nominal few microseconds of offset for the reported PPS. The offset
then swings wildly (like a PID loop in oscillation) until I restart ntpd
and the system clock is stabilized.
The system sits quietly in a corner of the room. It has no duties other
than to run ntpd and gpsd. Whatever monitoring I do is run on other
systems (ntpd is polled remotely with ntpq on another system, gpsd
status is queried remotely by another system and compiled there). The
oscillations happen after a few days but no obvious cron jobs are
running at the times that they start. If there's something I can do to
instrument ntpd further I can do that and see if I catch the problem.
More information about the questions
mailing list