[ntp:questions] Speed of ntp convergence

Unruh unruh-spam at physics.ubc.ca
Sun Nov 2 17:15:06 UTC 2008


"David J Taylor" <david-taylor at blueyonder.neither-this-part.nor-this-bit.co.uk> writes:

>Hal Murray wrote:
>>>> Try switching it off, changing the value int he drift file by say
>>>> 50PPM and
>>>> then switching it on again, and see how long it takes to recover
>>>> from that.
>>
>>> Why would I do that?  The drift values rarely change by more than
>>> five, certainly not by 50.  If you are seeing a change of 50, then
>>> perhaps that it part of your problem?
>>
>> A big step like that makes it easy to see how the system responds.
>> At least if it's a linear system. :)

>Yes, I appreciate that, Hal, but it doesn't emulate the situation here 
>very well, which I understood to be slow convergence after a routine 
>start.  It sounds as if the OP may have an incorrect drift file - it's 
>worth checking that it /is/ being updated.



The drift file read 10. The actual drift was 250 (determined after the
system had settled down). The drift file never changed even after a day of
running. ntp does not seem to be rewriting the drift file. Now that is a
problem (although with the  apparent Linux bug in the timing routines where is
miscalibrates the clock on bootup, the drift is NOT stable over reboots
anyway, so the existence of a drift file is irrelevant. ) However, the question is
about the bahaviour of ntp. ntp should NOT be taking 10 hours to get over a
wrong value in the drift file. With GPS refclock the system should be able
to figure out the drift to better than 1 PPM from two successive readings
of the clock ( 16 sec) or atmost  3 reading (32 sec)  NOT 10 hours. 
It is a design flaw in ntp. A design flaw which Mills absolutely refuses to
fix, saying ntp works as designed.  Clocks jump in frequency-- for example if the machine
suddenly gets used a lot, the temp can jump by 20-30C which can drive the
rate out by a few PPM. With a response time of hours, that means that the
ntp response to that change in temperature takes hours, apparently even if
the poll interval is 4 (16 sec). 
Mills has based his feedback loop on the Markovian theory of simple
feedback loops of engineering control theory. In most systems from the 20th
century, memory was in very short supply. The only memory was in the
parameters of the system itself ( voltage across a capacitor, current in an
inductor, speed of the motor) and the control theory worked well with that. Each
bit of memory (each additional capacitor, inductor, governor) cost a lot. In a
digital computer, memory is virtually infinite in supply. It costs
essentially nothing to remember. Thus you can use your data values from
almost as far back as you wish.  Of course they can become irrelevant
because the physical situation changes.

The ntp control theory uses only the clock offset and rate as its "memory",
(current measurement alters only those two parameters and then is forgotten
except in its effect of those two parameters. In fact the control comes
ONLY though control of the drift rate). Chrony remembers up to 64 previous
measurements-- correcting them for changes in offset and frequency of the
clock-- and throws them away only when it becomes clear that the parameters
of the system have changed (error no longer dominated by random noise, but
by some consistant change) and those old values are useless for the
prediction of the future behaviour of the clock. This means that it can
after three measurements get a very good estimate of both the frequency and
phase offset of the clock, and correct them, refining them as more data
comes in. At present its key limitation is that it does not do refclocks at
all. (It also only runs on Linux).

All of this is largely irrelevant if what you want is millisecond accuracy
of your clock. ntp is great for that. Or if your computer is on all the
time and ntp has converged-- my measurements indicate chrony is only about
2-3 times better than ntp in that situation, caused I think primarily by
the temperature induced rate fluctuations of the cpu. That means that if
you are concerned with usec timing, a difference between say a 10 us
accuracy and a 5 usec accuracy, which for almost all of us is irrelevant. 

ntp also has the advantage of obeying the KISS principle (Keep it stimple,
stupid) in that direct control of the rate of the clock is far simpler than
keeping a memory, updating and correcting the memory, trying to figure out
when to forget and when to remember,... And the more complex, the greater
the possibility of error. (although with clock-filter, huff-and-puff,
server selection,... ntp is getting pretty complex as well.)

  




More information about the questions mailing list