[ntp:questions] NTP vs chrony comparison (Was: oscillations in ntp clock synchronization)
David Woolley
david at ex.djwhome.demon.co.uk.invalid
Tue Jan 22 23:08:56 UTC 2008
David L. Mills wrote:
> As for "offset should be much larger than the error", be careful here.
> By error I assume you mean what ntpq rv shows as jitter. The best case
No. By error I meant a measurement that neither ntpd nor chrony can
actually make, namely the difference between the user's concept of
perfect time and the actual time in the software clock in the client. If
you could actually measure it, you would probably characterize it by the
root mean square of this.
What actually happens is that, say you have a server, that you define as
perfect time, you desperately want a measure of how accurate your client
is compared with the server's internal time. People seize on offset as
a measure of that, but, if the loop is well locked, which I think
amounts to jitter and RMS offset being essentially the same, offset is
almost entirely made up of measurement error. In reality the client's
software clock may well be in almost perfect synchronization with the
server's and certainly should have an RMS difference that is much less
than deduced from offset/jitter. (Systematic errors may result in a
systematic offset, so one is really talking about a jitter-like measure,
relative to the, unavailable, perfect time.)
The measurement cannot be made using ntpd or chrony alone, because if
they could measure the true error, they could correct for it.
> is when offset is indeed less than jitter; if the error is much larger
> than error, this suggests the frequency has surged and the time
I think you meant the first error to be offset and the second one to be
jitter. I would consider this case to be one where the loop was not
properly locked.
> constant/poll interval needs to be reduced. Watch the poll interval
> behavior in the loopstats data.
I think you really need to address two issues to put this thread to
rest:
- the use of linear regression algorithms on finite histories, as an
alternative to the ntpd algorithm (i.e. the statisticians/scientists
approach, versus the engineer's);
- the handling of cases where it is obvious to a human that the time
is wrong, but ntpd will take 3000+s to fully correct.
chrony uses linear regression (modified least squares) and it seems to
be getting a reputation for recovering from transients much better than
ntpd. Unruh believes that this is the consequence of the algorithm that
it uses, which means that least squares type techniques are beginning to
be associated with the way to go with time synchronization. I know you
disagree, but you have to convince people of that when chrony seems to
behave much better in the transients seen in real uses of ntpd.
I wonder if what is really needed is to use linear regression to gain
and regain lock and to use the current ntpd algorithm when you are
reasonably convinced that the loop is locked. At the moment, you do a
two point linear regression on a cold start, or after a step, although
two point least squares fits are rather trivial as they always have zero
variance if the points are distinct!
My understanding of chrony, based on high level documents and a quick
skim of the code is that:
- it is not NTP compliant because it doesn't seem to implement
normative parts of the NTPv3 specification, like the intersection
algorithm (but many people don't distinguish between SNTP and NTP
because they use the same wire formats);
- the way it works is to maintain a finite history of measurements
and to use linear regression (least squares modified to give less
weight to outliers) and to calculate a phase and frequency error.
It applies the phase correction as a fast slew, which is seen as an
an advantage, because only a fixed frequency correction is left if the
server goes away) and the frequency correction continuously.
Once it has applied a correction, it adjusts the historic measurements
to account for its current time and frequency scales.
I think there is more to it than this, e.g. adjusting sample rates
and the number of retained samples.
Because it is significantly different in principle from ntpd, it is not
entirely clear that ntpd concepts like loop time constants are explicit
in the chrony model, although they might be implicit in things like the
period over which samples are currently being retained.
A problem that Unruh is having is that some of the answers he is getting
seem to represent blind faith in ntpd without any knowledge of
alternative approaches.
More information about the questions
mailing list