[ntp:questions] very slow convergence of ntp to correct time.

David Woolley david at ex.djwhome.demon.co.uk.invalid
Sun Jan 20 20:08:54 UTC 2008


In article <RRLkj.8804$vp3.6129 at edtnps90>,
Unruh <unruh-spam at physics.ubc.ca> wrote:
> david at ex.djwhome.demon.co.uk.invalid (David Woolley) writes:

> >Note that chrony seems not to have been updated for several years and its

> Actually not true. The latest version 1.23 has just been released, but it
> is true that the support has become somewhat slow of late. 

There appeared to be no change significant enough to require new
documentation, and it didn't acknowledge NTPv4 in the documentation.

> I certainly does impliment the ntp but does use it own clock discipline
> algorithm. Which is why it converges so fast to having a well disciplined

Do you mean that it will work with NTP servers or that it actually complies
with the normative parts of the NTP (v3) specification?  Most versions of
W32Time do the former, but not the latter.

> clock.(minutes rather than hours or days). In general it does a great job
> acint either as an ntp client or server. It does not support refclocks

I certainly have reservations about the initial lock up of ntpd, but
it has got a lot better since the version referenced in the chrony 
documentation.  The basic problem it has, is that, if it starts close to
the correct time and with a saved frequency, it assumes that offsets its
sees are random errors, and applies the algorithm that gives very
good performance when locked up, not one designed to get within the
actual noise levels.

> file initially. ntp really should not take a clock which had an initial
> accuracy of .01usec, and drive it away from lock to an accuracy of 60ms and

That should only happen if it starts up believing that the frequency error
differs greatly from the true value, i.e. you do a cold start with a 
drift file present.  It is optimizing by not re-calibrating the frequency,
because it believes it already has a correct value.

> then take hours to correct that error, never actually getting back the
> orginal without a restart of ntp.

I find that weird, as the control loop should converge to zero, although
it will take a long time to do so, because it is correcting at rates
matched to the rates at which new errors are introduced.  I would have
thought that such a problem was so obvious that any problem would have been
fixed.

One thought.  If you are using Linux with a HZ value other than 100, and 
the kernel discipline, the kernel discipline code will violate the 
design assumptions, because the people who implemented HZ=250 and HZ=1000
didn't update the ntpd support code.

> ?? ntp also used linear regression to estimate the drift. That is then fed

ntpd uses infinite impulse response filters which make use of z and possibly
z^2 terms.  A linear regression approach assumes the use of a finite impulse
response filter using relatively high order z terms (I have a feeling that the
FIR is non-linear).  The overall response is, of course, IIR, because
the complete system is a feedback loop.

> back into the frequency locked loop and the phase locked loop. 
> chrony uses two mechanism to correct errors, a fast slew (adjtimex tickval

ntpd maintains separate phase and frequency correction values, and, I seem
to remember, decays the phase correction if there are no updates.

> ) to eliminate offset errors and freq adjust to eliminate drift errors.

That assumes that you can measure the two values separately, which is not
really true.

As noted above I do have some reservations about the use of IIR filters
in out of lock conditions, but I also pointed out that the chrony author
appears not to have contributed to this newsgroup to argue the case against
the ntpd approach, to the extent that regulars here have never heard of 
chrony, even though it has been around for 9 or so years.  In locked 
conditions, the ntpd algorithm should be better.

This is at least in part why it can adjust initially so rapidly. The
question is whether some non-linearity or whatever in the clock algorithm
is causing the oscillations (effectively narrow band ringing in the
> algorithm), but the time scale seems wrong. The maxpoll is 7 which is about

Note that servers tend to be dimensioned assuming the average maxpoll is
higher, and would consider something locked at maxpoll 6 or less as hostile.

> 2 min, the typical number of data points retained is about 10-20 in the
> adjustment algorithm, which would be of order 1/2-1 hr, while the
> oscillations are of order 1.5 hr. 

That does seem to suggest a shorter natural oscillation period (for the
offset control loop, even shorter than you suggest), although
note that the cutoff frequency of the filter is too high to provide
optimum stability when locked.  With ntpd, if you force maxpoll too
low, it will oversample, but the FIR filter here doesn't allow that.

> chrony has a nice feature of being able to send an
> echo datagram to the other machine if you want (before the ntp packet), to

I think that would be considered abusive by most ntp server operators
(especially those in Australia who pay for bandwidth used).

 wake up the routers along the way. 
 
> I thought an elimination algorithm was used to get rid of the outliers in
> the ntp algorithm (median filter).

I did correct this, but NTPv4 only uses median filters for reference
clocks.  Normal sources use the sample with the lowest overall 
error bounds, which may have a similar effect.




More information about the questions mailing list