[ntp:questions] NTP vs chrony comparison (Was: oscillations in ntp clock synchronization)

David Woolley david at ex.djwhome.demon.co.uk.invalid
Tue Jan 22 23:08:56 UTC 2008


David L. Mills wrote:

 > As for "offset should be much larger than the error", be careful here.
 > By error I assume you mean what ntpq rv shows as jitter. The best case

No. By error I meant a measurement that neither ntpd nor chrony can 
actually make, namely the difference between the user's concept of 
perfect time and the actual time in the software clock in the client. If 
you could actually measure it, you would probably characterize it by the 
root mean square of this.

What actually happens is that, say you have a server, that you define as 
perfect time, you desperately want a measure of how accurate your client 
is compared with the server's internal time.  People seize on offset as 
a measure of that, but, if the loop is well locked, which I think 
amounts to jitter and RMS offset being essentially the same, offset is 
almost entirely made up of measurement error.  In reality the client's 
software clock may well be in almost perfect synchronization with the 
server's and certainly should have an RMS difference that is much less 
than deduced from offset/jitter.  (Systematic errors may result in a 
systematic offset, so one is really talking about a jitter-like measure, 
relative to the, unavailable, perfect time.)

The measurement cannot be made using ntpd or chrony alone, because if 
they could measure the true error, they could correct for it.

 > is when offset is indeed less than jitter; if the error is much larger
 > than error, this suggests the frequency has surged and the time

I think you meant the first error to be offset and the second one to be 
jitter.  I would consider this case to be one where the loop was not 
properly locked.

 > constant/poll interval needs to be reduced. Watch the poll interval
 > behavior in the loopstats data.

I think you really need to address two issues to put this thread to
rest:

- the use of linear regression algorithms on finite histories, as an
   alternative to the ntpd algorithm (i.e. the statisticians/scientists
   approach, versus the engineer's);

- the handling of cases where it is obvious to a human that the time
   is wrong, but ntpd will take 3000+s to fully correct.

chrony uses linear regression (modified least squares) and it seems to 
be getting a reputation for recovering from transients much better than 
ntpd.  Unruh believes that this is the consequence of the algorithm that 
it uses, which means that least squares type techniques are beginning to 
be associated with the way to go with time synchronization.  I know you 
disagree, but you have to convince people of that when chrony seems to 
behave much better in the transients seen in real uses of ntpd.

I wonder if what is really needed is to use linear regression to gain 
and regain lock and to use the current ntpd algorithm when you are 
reasonably convinced that the loop is locked.  At the moment, you do a 
two point linear regression on a cold start, or after a step, although 
two point least squares fits are rather trivial as they always have zero 
variance if the points are distinct!


My understanding of chrony, based on high level documents and a quick 
skim of the code is that:

- it is not NTP compliant because it doesn't seem to implement
   normative parts of the NTPv3 specification, like the intersection
   algorithm (but many people don't distinguish between SNTP and NTP
   because they use the same wire formats);

- the way it works is to maintain a finite history of measurements
   and to use linear regression (least squares modified to give less
   weight to outliers) and to calculate a phase and frequency error.

   It applies the phase correction as a fast slew, which is seen as an
   an advantage, because only a fixed frequency correction is left if the
   server goes away) and the frequency correction continuously.

   Once it has applied a correction, it adjusts the historic measurements
   to account for its current time and frequency scales.

   I think there is more to it than this, e.g. adjusting sample rates
   and the number of retained samples.

Because it is significantly different in principle from ntpd, it is not 
entirely clear that ntpd concepts like loop time constants are explicit 
in the chrony model, although they might be implicit in things like the 
period over which samples are currently being retained.

A problem that Unruh is having is that some of the answers he is getting 
seem to represent blind faith in ntpd without any knowledge of 
alternative approaches.




More information about the questions mailing list