[ntp:questions] very slow convergence of ntp to correct time.

Unruh unruh-spam at physics.ubc.ca
Sun Jan 20 17:50:41 UTC 2008


david at ex.djwhome.demon.co.uk.invalid (David Woolley) writes:

>In article <Hiqkj.8953$yQ1.2617 at edtnps89>,
>Unruh <unruh-spam at physics.ubc.ca> wrote:

>> but the offsets are still over 100 times worse than I was getting with
>> chrony. (Yes, I know, one suggestion is-- go back to chrony-- but the

>Note that chrony seems not to have been updated for several years and its
>knocking copy on ntpd refers to an obsolete version and isn't, I think,
>entirely true even for that.  It has never been supported on this newsgroup,
>and I'm not aware that its author has ever contributed here.

Actually not true. The latest version 1.23 has just been released, but it
is true that the support has become somewhat slow of late. 

>It's based on NTP version 3, but looks as though it has a completely
>different local clock discipline algorithm.  The local clock discipline
>algorithm is an appendix, and appears to be non-normative in NTPv3,
>but there isn't enough information, in the documentation, to establish
>whether the rest of it is compliant with the RFC.  It does look to be
>rather more than the typical SNTP client, as it seems to support multiple
>servers and seems to maintain a frequency correction.

I certainly does impliment the ntp but does use it own clock discipline
algorithm. Which is why it converges so fast to having a well disciplined
clock.(minutes rather than hours or days). In general it does a great job
acint either as an ntp client or server. It does not support refclocks
however. The question I am trying to understand is whether the clock
discipline algorithm is responsible for the oscillations in the rate which
you can see ( on the order of .2PPM with a period of about 1.5 hr) which is
why I decided to try running ntp on one of the clients. And that was when I
discovered the really bad initial behaviour of ntp if there was not drift
file initially. ntp really should not take a clock which had an initial
accuracy of .01usec, and drive it away from lock to an accuracy of 60ms and
then take hours to correct that error, never actually getting back the
orginal without a restart of ntp.



>I get the impression that it uses a statistics based, linear regression,
>approach, rather than the engineering based, phase locked loop on in the
>reference implementation, but again the documentation is not explicit
>on that.

?? ntp also used linear regression to estimate the drift. That is then fed
back into the frequency locked loop and the phase locked loop. 
chrony uses two mechanism to correct errors, a fast slew (adjtimex tickval
) to eliminate offset errors and freq adjust to eliminate drift errors.
This is at least in part why it can adjust initially so rapidly. The
question is whether some non-linearity or whatever in the clock algorithm
is causing the oscillations (effectively narrow band ringing in the
algorithm), but the time scale seems wrong. The maxpoll is 7 which is about
2 min, the typical number of data points retained is about 10-20 in the
adjustment algorithm, which would be of order 1/2-1 hr, while the
oscillations are of order 1.5 hr. 



>It's main claim to usefulness is for systems that only get occasional
>and irregular NTP readings, particularly systems that are on dialup,
>or which are primarily updated by wristwatch and eye.

It is also clear that  another claim to usefulness is its very rapid
convergence to lock. One of the main reasons I started using it was that it
also disciplines the rtc estimating its offset and drift continually.



>> I would assume that ntp is giving these samples with long round trip very low weight, or even
>> eliminating them.

>Note: if these spikes are positive, they may be the result of lost ticks.

Don't think so. I think they are 5-10ms transmission delays. The delays disappear if I run at
maxpoll 7 rather than 10, so I suspect the router is forgetting the
addresses and taking its own sweet time about finding them if the time
between transmissions is many minutes.
chrony has a nice feature of being able to send an
echo datagram to the other machine if you want (before the ntp packet), to
 wake up the routers along the way. 
 

>Pop corn spikes of less than 128ms are not ignored in the default
>configuration.  If, as I suspect, you only have one time source, they
>will get full weight (for multiple sources, I think delay may be used
>to weight the contribution between different sources).

I thought an elimination algorithm was used to get rid of the outliers in
the ntp algorithm (median filter).


>There are two possible approaches to such excursions.  It might be possible
>to reduce the 128ms to whatever your 95 to 98 percentile figure is.  However,
>I suspect that this will seriously compromise the ability to get initial lock
>and to recover from major disturbances.

>You could also use the huff and puff filter, if they are the result of 
>asymmetric delays.  I'm not sure how well that works on short timescales and
>whether it assumes a particular sense for the asymmetry.




More information about the questions mailing list