[ntp:questions] Speed of ntp convergence

Unruh unruh-spam at physics.ubc.ca
Sun Nov 2 19:39:10 UTC 2008


"Richard B. Gilbert" <rgilbert88 at comcast.net> writes:

>Unruh wrote:
>> "David J Taylor" <david-taylor at blueyonder.neither-this-part.nor-this-bit.co.uk> writes:
>> 
>>> Hal Murray wrote:
>>>>>> Try switching it off, changing the value int he drift file by say
>>>>>> 50PPM and
>>>>>> then switching it on again, and see how long it takes to recover
>>>>>> from that.
>>>>> Why would I do that?  The drift values rarely change by more than
>>>>> five, certainly not by 50.  If you are seeing a change of 50, then
>>>>> perhaps that it part of your problem?
>>>> A big step like that makes it easy to see how the system responds.
>>>> At least if it's a linear system. :)
>> 
>>> Yes, I appreciate that, Hal, but it doesn't emulate the situation here 
>>> very well, which I understood to be slow convergence after a routine 
>>> start.  It sounds as if the OP may have an incorrect drift file - it's 
>>> worth checking that it /is/ being updated.
>> 
>> 
>> 
>> The drift file read 10. The actual drift was 250 (determined after the
>> system had settled down). The drift file never changed even after a day of
>> running. ntp does not seem to be rewriting the drift file. Now that is a
>> problem (although with the  apparent Linux bug in the timing routines where is
>> miscalibrates the clock on bootup, the drift is NOT stable over reboots
>> anyway, so the existence of a drift file is irrelevant. ) However, the question is
>> about the bahaviour of ntp. ntp should NOT be taking 10 hours to get over a
>> wrong value in the drift file. 

>That's easy to fix!  If the drift file is not correct, remove it before 
>starting ntpd.

Of course. However, I have no idea it is incorrect until after ntp has
started up and shown me it was incorrect. 

>How do you tell if it's incorrect?  Since ntpd is supposed to 
>update/rewrite the drift file every sixty minutes, a drift file more 
>than sixty minutes old is suspect!

I think my problem was that the permissions on /etc/ntp/drift were
incorrect ( owned by root rather than by ntp). But that makes no
difference to how ntp behaves. ntp should do the "right thing" even if the
drift file is wrong. It should take a bit longer, but not 10 hours longer. 
And with the current apparent bug in Linux wehre the system time is
miscalibrated, it would seem that the drift file on Linux is ALWAYS wrong.



>> With GPS refclock the system should be able
>> to figure out the drift to better than 1 PPM from two successive readings
>> of the clock ( 16 sec) or atmost  3 reading (32 sec)  NOT 10 hours. 
>> It is a design flaw in ntp. A design flaw which Mills absolutely refuses to
>> fix, saying ntp works as designed.  Clocks jump in frequency-- for example if the machine
>> suddenly gets used a lot, the temp can jump by 20-30C which can drive the
>> rate out by a few PPM. With a response time of hours, that means that the
>> ntp response to that change in temperature takes hours, apparently even if
>> the poll interval is 4 (16 sec). 
>> Mills has based his feedback loop on the Markovian theory of simple
>> feedback loops of engineering control theory. In most systems from the 20th
>> century, memory was in very short supply. The only memory was in the
>> parameters of the system itself ( voltage across a capacitor, current in an
>> inductor, speed of the motor) and the control theory worked well with that. Each
>> bit of memory (each additional capacitor, inductor, governor) cost a lot. In a
>> digital computer, memory is virtually infinite in supply. It costs
>> essentially nothing to remember. Thus you can use your data values from
>> almost as far back as you wish.  Of course they can become irrelevant
>> because the physical situation changes.
>> 
>> The ntp control theory uses only the clock offset and rate as its "memory",
>> (current measurement alters only those two parameters and then is forgotten
>> except in its effect of those two parameters. In fact the control comes
>> ONLY though control of the drift rate). Chrony remembers up to 64 previous
>> measurements-- correcting them for changes in offset and frequency of the
>> clock-- and throws them away only when it becomes clear that the parameters
>> of the system have changed (error no longer dominated by random noise, but
>> by some consistant change) and those old values are useless for the
>> prediction of the future behaviour of the clock. This means that it can
>> after three measurements get a very good estimate of both the frequency and
>> phase offset of the clock, and correct them, refining them as more data
>> comes in. At present its key limitation is that it does not do refclocks at
>> all. (It also only runs on Linux).
>> 
>> All of this is largely irrelevant if what you want is millisecond accuracy
>> of your clock. ntp is great for that. Or if your computer is on all the
>> time and ntp has converged-- my measurements indicate chrony is only about
>> 2-3 times better than ntp in that situation, caused I think primarily by
>> the temperature induced rate fluctuations of the cpu. That means that if
>> you are concerned with usec timing, a difference between say a 10 us
>> accuracy and a 5 usec accuracy, which for almost all of us is irrelevant. 
>> 
>> ntp also has the advantage of obeying the KISS principle (Keep it stimple,
>> stupid) in that direct control of the rate of the clock is far simpler than
>> keeping a memory, updating and correcting the memory, trying to figure out
>> when to forget and when to remember,... And the more complex, the greater
>> the possibility of error. (although with clock-filter, huff-and-puff,
>> server selection,... ntp is getting pretty complex as well.)
>> 
>>   
>> 

>You are at liberty to write your own version of ntpd using your 
>preferred algorithm!  Dave designed ntpd to cope with the, usually 
>horrible, behavior of the internet.  This is not necessarily the best 
>design for all circumstances.  It is, however, what we have to work with 
>  unless we are willing and able to "roll our own"!  I am not!

I have other things I am supposed to do with my time, and I am not a good
enough programmer (Ie, it would take me 10 times longer to program as it
would a good programmer) to spend the time. But the nice thing about open
source is that jobs can be partitioned. One person can test (I am
reasonably good at that) and another can code. 
I disagree that the design is optimal for "the usual horrible behaviour of
the internet". Many of the design decisions occured before the internet.
His example of the horrible Malaysia link is from long ago. I suspect he
would have trouble finding such a horrible link now even in Malaysia
(although a link to the moon might well be just as bad).
That ntp is good I do not dispute. That ntp is optimal I do dispute. Many
many people have complained about the behaviour of ntp on startup. There is
no excuse for it from the point of view of principle (although perhaps the
KISS principle might apply). It does not HAVE to be that bad in order for
ntp to work well. 

But this is going off track. I have a situation in which may chosen clock
is a refclock with poll interval 4. The time scale of ntp is supposed to be
something like 16 times the poll interval, which would be 256 seconds. It
is an hour, which is 20 times longer. What I am asking is why is the time
interval so long when from what I have seen of ntp  it should be much
shorter. Have I misunderstood the design (eg, the time scale is 16 times
the highest possible poll interval, which would  be 4 hr. which
is not right either). or is there some bug in ntp as designed?




More information about the questions mailing list