[ntp:questions] NTP not syncing

unruh unruh at invalid.ca
Thu Dec 5 19:14:30 UTC 2013


On 2013-12-05, mike cook <michael.cook at sfr.fr> wrote:
>> 
>> 
>> The problem for ntp is that ntp takes a long time to recover from a bad
>> drift value. 
>> 
>
> This seems to have been an issue since I started using ntp, more than 10 years ago. I am surprised that it is not fixed.

Because David Mills has no interest at all in fixing it. He has a model
for the operation of ntpd, and that model does not include rapid
correction of errors. It is designed for long term stability first and
foremost.

>
> A simple test on linux with a modern version of ntp:   Here is the normal state of this R-PI
>
> Thu Dec  5 09:34:13 CET 2013
> mike at raspberrypi ~ $ sudo cat /var/lib/ntp/ntp.drift
> -36.772
> mike at raspberrypi ~ $ ls -l /var/lib/ntp/ntp.drift
> -rw-r--r-- 1 root root 8 Dec  5 08:51 /var/lib/ntp/ntp.drift
> mike at raspberrypi ~ $ ntpq  -pn |grep \*
> mintc=3, offset=-0.169517, frequency=-37.191, sys_jitter=0.333279,
>
>  offset with this server is  fairly stable at 1-300  microseconds, sometimes better.
>
> So now stop ntpd , stick a silly value in the drift file and restart.
>
> root at raspberrypi:/home/mike# echo "-256.666" > /var/lib/ntp/ntp.drift
> root at raspberrypi:/home/mike# cat /var/lib/ntp/ntp.drift
> -256.666
> root at raspberrypi:/home/mike# /etc/init.d/ntp start
> Starting NTP server: ntpd.
> root at raspberrypi:/home/mike# ntpq -c rv
> associd=0 status=0615 leap_none, sync_ntp, 1 event, clock_sync,
> version="ntpd 4.2.7p319 at 1.2483 Tue May 28 11:26:22 UTC 2013 (2)",
> processor="armv6l", system="Linux/3.2.27-pps", leap=00, stratum=2,
> precision=-19, rootdelay=14.258, rootdisp=202.121, refid=145.238.203.14,
> reftime=d64aba43.dfdbb690  Thu, Dec  5 2013  9:39:31.874,
> clock=d64aba54.494fd9c7  Thu, Dec  5 2013  9:39:48.286, peer=2675, tc=6,
> mintc=3, offset=3.357234, frequency=-256.666, sys_jitter=1.622350,
> clk_jitter=2.342, clk_wander=0.000
>
> So we have picked up the drift and are using it as is, no verification.
>
> root at raspberrypi:/home/mike# ntpq -pn
> ...
> *145.238.203.14  .TS-3.           1 u   44   64    1   14.258    3.357   1.622
> ...
> So iburst got us a reasonable start point. Now lets see how it evolves:
>
> oot at raspberrypi:/home/mike# while true; do date; ntpq -pn |grep \*;ntpq -c rv |grep frequency; ls -l /var/lib/ntp/ntp.drift;cat /var/lib/ntp/ntp.drift; sleep 60; done
> Thu Dec  5 09:46:00 CET 2013
> *145.238.203.14  .TS-3.           1 u   62   64   77   14.258    3.357  11.413
> mintc=3, offset=16.974005, frequency=-256.666, sys_jitter=11.412631,
> -rw-r--r-- 1 root root 9 Dec  5 09:38 /var/lib/ntp/ntp.drift
> -256.666
>
> three samples later , 
>
> Thu Dec  5 09:49:01 CET 2013
> *145.238.203.14  .TS-3.           1 u   39   64  377   14.270   13.392  25.459               the offset multiplies by three
> mintc=3, offset=16.974005, frequency=-256.666, sys_jitter=25.458613,
> -rw-r--r-- 1 root root 9 Dec  5 09:38 /var/lib/ntp/ntp.drift
> -256.666
> Thu Dec  5 09:50:02 CET 2013
> *145.238.203.14  .TS-3.           1 u   32   64  377   14.272   64.913  38.415               then more than 20 times
> mintc=3, offset=64.912586, frequency=-224.970, sys_jitter=38.415064,
> -rw-r--r-- 1 root root 9 Dec  5 09:38 /var/lib/ntp/ntp.drift
> -256.666
> Thu Dec  5 09:51:02 CET 2013
> *145.238.203.14  .TS-3.           1 u   25   64  377   14.272   64.913  34.083
> mintc=3, offset=64.912586, frequency=-224.970, sys_jitter=34.083058,
> -rw-r--r-- 1 root root 9 Dec  5 09:38 /var/lib/ntp/ntp.drift
> -256.666
> Thu Dec  5 09:52:02 CET 2013
> *145.238.203.14  .TS-3.           1 u   19   64  377   14.242   78.513  37.945               and it gets worse  - note that we still think this is a good source
> mintc=3, offset=78.512782, frequency=-214.937, sys_jitter=37.944744,
> -rw-r--r-- 1 root root 9 Dec  5 09:38 /var/lib/ntp/ntp.drift
> -256.666
> Thu Dec  5 09:53:03 CET 2013
> *145.238.203.14  .TS-3.           1 u   10   64  377   14.242   78.513  30.074
> mintc=3, offset=78.512782, frequency=-214.937, sys_jitter=30.073729,
> -rw-r--r-- 1 root root 9 Dec  5 09:38 /var/lib/ntp/ntp.drift
> -256.666
>
> Our worst state is at 10:03, 30 minutes after the start up.  The real time frequency value is decreasing but not reflected to the file. This is an issue as an admin blindly restarting ntp after noticing crappy offsets will hit the same wall again.
>
> The file gets updated after 1Hr, at 
>
> Thu Dec  5 10:39:21 CET 2013
> *145.238.203.14  .TS-3.           1 u   25   64  377   12.834   37.836   8.350
> mintc=3, offset=37.835862, frequency=-88.963, sys_jitter=8.349705,
> -rw-r--r-- 1 root root 8 Dec  5 10:39 /var/lib/ntp/ntp.drift
> -88.963
>
> The rate of convergence is getting quicker but we don't get back to a good state until nearly 3Hrs:
>
> Thu Dec  5 12:20:02 CET 2013
> *145.238.203.14  .TS-3.           1 u   28   64  377   12.979    0.287   0.693
> mintc=3, offset=0.287015, frequency=-38.180, sys_jitter=0.693195,
> -rw-r--r-- 1 root root 8 Dec  5 11:39 /var/lib/ntp/ntp.drift
> -41.923
>
> And the "normal" drift is reached around 4hrs after the restart.
>
> Thu Dec  5 13:30:30 CET 2013
> *145.238.203.14  .TS-3.           1 u   60   64  377   12.882    0.134   0.058
> mintc=3, offset=0.134499, frequency=-37.307, sys_jitter=0.057602,
> -rw-r--r-- 1 root root 8 Dec  5 12:39 /var/lib/ntp/ntp.drift
> -37.928
>
> I am sure that a much faster convergence could be achieved with a little thought, even if it meant a little ringing.

Yes, chrony does it. But it uses a very different philosophy from ntpd.
David has said time and again that he is completely uninterested in
fixing the "issue" rapidity of convergence of ntpd, but also that he
retains the right to decide how ntpd should behave. The problem is not
the ringing (that would make it worse-- most rapid convergence for a
simply feedback is at critical damping which ntpd tries to roughly get
to. But because it has no memory, such a circuit is limited in what it
can do. Chrony has a memory It uses the last 3-64 measurements to
estimate what the correct time is, and then tries to get there quickly
and stably. All ntpd knows is the current offset and has no idea if a
non-zero value is because the local clock is off, or because noise has
made the remote measurement bad. It cannot examine the past history to
try to decide between the two possibilities, and thus must tread
carefully, because it stupid to rapidly chase errors. And remember that
the only tool ntpd has is to change the rate. Now, you or I might well
look at say the last 5 offsets, and see the trend, and the noise in that
trend ans say-- Hey, the rate of my clock is way off. But that is not
what ntpd does. 

   



More information about the questions mailing list