[ntp:questions] very slow convergence of ntp to correct time.

Unruh unruh-spam at physics.ubc.ca
Sun Jan 20 22:49:09 UTC 2008

david at ex.djwhome.demon.co.uk.invalid (David Woolley) writes:

>In article <RRLkj.8804$vp3.6129 at edtnps90>,
>Unruh <unruh-spam at physics.ubc.ca> wrote:
>> david at ex.djwhome.demon.co.uk.invalid (David Woolley) writes:

>> >Note that chrony seems not to have been updated for several years and its

>> Actually not true. The latest version 1.23 has just been released, but it
>> is true that the support has become somewhat slow of late. 

>There appeared to be no change significant enough to require new
>documentation, and it didn't acknowledge NTPv4 in the documentation.

No, it was a bug fix release.

>> I certainly does impliment the ntp but does use it own clock discipline
>> algorithm. Which is why it converges so fast to having a well disciplined

>Do you mean that it will work with NTP servers or that it actually complies
>with the normative parts of the NTP (v3) specification?  Most versions of
>W32Time do the former, but not the latter.

It will work with ntp servers. I have no idea what "compiles with the
normative parts of the NTP(v3) specification" means. It is a completely
rewritten program that impliments ntp in terms of the communication over
the net, but impliments its own clock discipline routines. 

>> clock.(minutes rather than hours or days). In general it does a great job
>> acint either as an ntp client or server. It does not support refclocks

>I certainly have reservations about the initial lock up of ntpd, but
>it has got a lot better since the version referenced in the chrony 
>documentation.  The basic problem it has, is that, if it starts close to
>the correct time and with a saved frequency, it assumes that offsets its
>sees are random errors, and applies the algorithm that gives very
>good performance when locked up, not one designed to get within the
>actual noise levels.

The ntp daemon I am running is 4.2.4 and I have the experimental evidence
on that we page. Ie, I am not refering to anything in the chrony
documentation. I am running experiments.

>> file initially. ntp really should not take a clock which had an initial
>> accuracy of .01usec, and drive it away from lock to an accuracy of 60ms and

>That should only happen if it starts up believing that the frequency error
>differs greatly from the true value, i.e. you do a cold start with a 
>drift file present.  It is optimizing by not re-calibrating the frequency,
>because it believes it already has a correct value.

Nope. Cold start with no drift file present. I looked before and during

>> then take hours to correct that error, never actually getting back the
>> orginal without a restart of ntp.

>I find that weird, as the control loop should converge to zero, although
>it will take a long time to do so, because it is correcting at rates
>matched to the rates at which new errors are introduced.  I would have
>thought that such a problem was so obvious that any problem would have been

>One thought.  If you are using Linux with a HZ value other than 100, and 
>the kernel discipline, the kernel discipline code will violate the 
>design assumptions, because the people who implemented HZ=250 and HZ=1000
>didn't update the ntpd support code.

>> ?? ntp also used linear regression to estimate the drift. That is then fed

>ntpd uses infinite impulse response filters which make use of z and possibly
>z^2 terms.  A linear regression approach assumes the use of a finite impulse
>response filter using relatively high order z terms (I have a feeling that the
>FIR is non-linear).  The overall response is, of course, IIR, because
>the complete system is a feedback loop.

>> back into the frequency locked loop and the phase locked loop. 
>> chrony uses two mechanism to correct errors, a fast slew (adjtimex tickval

>ntpd maintains separate phase and frequency correction values, and, I seem
>to remember, decays the phase correction if there are no updates.

You mean that it estimates the drift rate of the clock, and teh offset and
then alters the frequency of the system clock so as to drive both to zero. 
Chrony does as well, but uses a much stronger offset (phase) correction
than drift. (essentially using the ticksize in adjtimex to drive the offset
to zero, and the frequency to bring the drift to zero. That means it has to
keep careful track of the prior measurements to compensate them for the
changes in offset and frequency as well.

>> ) to eliminate offset errors and freq adjust to eliminate drift errors.

>That assumes that you can measure the two values separately, which is not
>really true.

That is where linear regression is used by both. The drift error is the
slope and the offset error is the intercept. 

>As noted above I do have some reservations about the use of IIR filters
>in out of lock conditions, but I also pointed out that the chrony author
>appears not to have contributed to this newsgroup to argue the case against
>the ntpd approach, to the extent that regulars here have never heard of 
>chrony, even though it has been around for 9 or so years.  In locked 
>conditions, the ntpd algorithm should be better.

Well, that is of course a question which I am trying to use my experiments
to answer.

>This is at least in part why it can adjust initially so rapidly. The
>question is whether some non-linearity or whatever in the clock algorithm
>is causing the oscillations (effectively narrow band ringing in the
>> algorithm), but the time scale seems wrong. The maxpoll is 7 which is about

>Note that servers tend to be dimensioned assuming the average maxpoll is
>higher, and would consider something locked at maxpoll 6 or less as hostile.

Sure. But since it is my server, I am willing to live with that. The
problem I ran into was severe transmission delays if I used maxpoll 10. 

But this is to some extent irrelevant. The first question is to decide on
the best algorithm, and then to decide with the additional constraint of
not loading the net with queries. Note that the speed both of the network
and of the computers, means that what was excessive 10 years ago, now is
completely and utterly negligible. Ie, one ntp datagram per minute 
 on a gigabit net is a very different proposition than on a dial up or T1


>> 2 min, the typical number of data points retained is about 10-20 in the
>> adjustment algorithm, which would be of order 1/2-1 hr, while the
>> oscillations are of order 1.5 hr. 

>That does seem to suggest a shorter natural oscillation period (for the
>offset control loop, even shorter than you suggest), although
>note that the cutoff frequency of the filter is too high to provide
>optimum stability when locked.  With ntpd, if you force maxpoll too
>low, it will oversample, but the FIR filter here doesn't allow that.

Sorry, don't understand. Yes, if the cause is some feedback loop
instabiltiy I would expect a shorter time period for it. One the other hand
it is such a non-linear system, that all kinds of things could happen.

>> chrony has a nice feature of being able to send an
>> echo datagram to the other machine if you want (before the ntp packet), to

>I think that would be considered abusive by most ntp server operators
>(especially those in Australia who pay for bandwidth used).

Oh, come on. That is about 1000 bytes per day extra per user. Since they
probably handle 
handle 10^10 bytes or more per day in other traffic, that would require a 
HUGE userbase to make a difference.

> wake up the routers along the way. 
>> I thought an elimination algorithm was used to get rid of the outliers in
>> the ntp algorithm (median filter).

>I did correct this, but NTPv4 only uses median filters for reference
>clocks.  Normal sources use the sample with the lowest overall 
>error bounds, which may have a similar effect.

More information about the questions mailing list