[ntp:questions] Red Hat vote for chrony

William Unruh unruh at invalid.ca
Fri Dec 5 16:53:30 UTC 2014


On 2014-12-05, Charles Swiger <cswiger at mac.com> wrote:
> On Dec 5, 2014, at 3:42 AM, William Unruh <unruh at invalid.ca> wrote:
>> On 2014-12-05, Charles Swiger <cswiger at mac.com> wrote:
>>> On Dec 4, 2014, at 7:00 PM, William Unruh <unruh at invalid.ca> wrote:
>>> [ ... ]
>>>> Actually Miroslav Lichvar IS an expert. He is the chrony maintainer, has
>>>> done a lot of testing comparing chrony to ntpd ( which showed that
>>>> chrony controlled the clock a factor of 2 to 20 times better than ntpd
>>>> did), and is with Redhat. 
>>> 
>>> The data I've seen for chrony suggests it handles broken clocks such as
>>> commonly found in VMs better than ntpd does.  The tradeoff is that
>>> chrony prioritizes chasing the reference time over first trying to ensure
>>> that the local clock frequency is stable, whereas ntpd really wants
>>> to make sure that the local clock counts 3600 seconds in each hour of
>>> wall-clock time and then worries about slewing the local time to match
>>> up with the reference time.
>> 
>> Nope. ntp changes the rate of the local clock to correct offsets. That
>> is all it does. It does not make the rate correct, and then the offset.
>
> If you don't know what the rate of the local clock is, how can you figure
> out the proper slewing rate?

ntpd does not care what the rate is. If it sees an offset, it alters the
rate todecrease that offset. It keeps doing this until the offset is
gone. It is designed so that the overshoot is very small (critical
damping). 
>
> One can obviously keep slewing the clock towards the reference time even
> without having a good idea of the intrinsic drift, but such an approach
> will tend to overshoot the ideal frequency adjustment and "ring" or
> oscillate back and forth.

Not if it is properly designed (critical damping). Mills did a good job
of designing a feedback system. As he says it is standard engineering
practice.

>
>> It simply alters the rate at any time so as to decrease the offset, and
>> it does this measurement by measurement. It has no memory. 
>
> This is obviously false.  What do you think /etc/ntp.drift is?

It is the offset from the standard rate of the clock. That memory is
never used except on bootup. ntpd has to know how much to alter the
drift.

>
> Furthermore, I distinctly recall you complaining that ntpd's clock filter
> "throws away 7 out of 8 polls"; even though you are mis-interpreting the
> situation, you at least acknowledged that ntpd is keeping track of prior
> data.

But it does not use them. It looks at the delays, and choses the
shortest delay from the last 8 offset measurements. The scheme is a
simple feedback system, which "remembers" the current drift and
measures the current offset. That by "current" it means "the offset with
the lowest delay of the last 8 measurements" does not alter the design.
It is a Markovian system. It does not remember what the drift or the
offset was 15 measurements ago, for example. 

If you really want to understand ntpd, read the documentation or Mill's
book. Don't try picking apart a clearly very short descritption.
delta r_i= alpha (remotetime_i - localtime_i) is the equation used,
where delta r_i is the change in the rate of the clock, remotetime_i is
the time as measured by the ntp exchange procedure of he remote clock,
and localtime_i is the time on the local clock when that remote time was
measured. alpha is a constant chosed to make the feedback loop be close
to critically damped. 

In order to make the system resistant to varying assymetric travel
times, they introduced another step. The "i" in the above is not every
measurement, but the "best of 8" (ie the one with the shortest delay,
under the assumption that that will be the one with least assymetry in
the travel path). But it also means that 85 % of the measurements are
simply discarded-- never used to determine the clock offsets or delays
or anything. So, ntpd "loads" the network with 8 times as many queries
as it actually uses. If there are large variations in the delays on the
two paths, this is probably as good as you can do. If there are not,
this is a very wasteful procedure and means you throw away data that you
could have used to better control your clock. 

Chrony remembers up to 64 past offsets and rates, and uses those in a
linear regression to get the best estimate of the current offset and
rate. It changes how long a memory it uses by estimating whether or not
the a linear fit is a good approximation to that data series, and throws
away old measurements until it does feel that a linear fit is a good
estimate. It does not try to do assymmetric path corrections which might
be a detraction of chrony in some situations (my measurements indicate
it is not on the systems I have looked at at least, but there may be
situations in which it might help). 

BEcause of the clock selection algorithm, and the conservative choice
for alpha, ntpd responds very slowly to changes in the rates due to
temperature changes for example. chrony is much faster, and I believe
this is one of the reasons for the better control that chrony gives. 



. 


>
> Or did someone else write Message-id: <Hh0tu.8666$tR7.4952 at fx22.iad>?
>
>> Chrony uses the last N measurements to make the best estimate of the
>> rate and the offset that it can. It sets the rate to the best estimate
>> of the true rate, and then adds a rate correction to decrease the
>> offset.
>
> Yes, that's a reasonable approach.
>
>>> It's informative to note that the chrony docs (section 5.3.4) recommend
>>> using minpoll=2 and maxpoll=4!  With those settings chrony will send 225
>> 
>> that is for refclocks. 
>> 
>>> polls per hour, versus 3.5 polls per hour for ntpd with its maxpoll=10.
>>> Assuming arguendo the claim of "a factor of 20 times better" is true, I
>>> still don't care to pay the price of a factor of 64 times more network polls.
>> 
>> You may set the minpoll and maxpoll to whateever you want. But chrony
>> does not advocate a maxpoll of 4 over the network. Read again. 
>
> I suggest following your own advice before trying to correct others.
>
> http://chrony.tuxfamily.org/manual.html#How-can-I-improve-the-accuracy-of-the-system-clock-with-NTP-sources_003f
>
> "5.3.4 How can I improve the accuracy of the system clock with NTP sources?
>
> Select NTP servers that are well synchronised, stable and close to your network. It???s better to use more than one server, three or four is usually recommended as the minimum, so chronyd can detect falsetickers and combine measurements from multiple sources.
> [ ... ]
> The optimal polling interval depends on many factors, this includes the ratio between the wander of the clock and the network jitter (sometimes expressed in NTP documents as the Allan intercept), the temperature sensitivity of the crystal oscillator and the maximum rate of change of the temperature. An example of the directive for a server located in the same LAN could be
>  	
> server ntp.local minpoll 2 maxpoll 4 polltarget 30"
>
> The docs aren't talking about reference clocks here, they are talking about
> polling another machine over the LAN.
>
>>> Furthermore-- unfortunately-- I have yet to see data on the accuracy of
>>> chrony measured against high-quality TCXO or Rb/Cs reference clocks,
>>> such as the PRS-10 that PHK used:
>>> 
>>> http://www.thinksrs.com/products/PRS10.htm
>>> 
>>> ...the current version of which claims to have a +/- 10 ns accuracy for
>>> the PPS signal.  Instead, most of the data I've seen provided for chrony
>>> has involved comparing local clock timestamps to the reference timesource
>>> or to some other network timesource, without detailed information as to the
>>> accuracy of those references.
>> 
>> Nope. I have done measurements on the net where I compared the net to a
>> gps PPS source. The computer PPS has an accuracy of about 1microsec and
>> that can be compared to the network time.
>
> About a microsecond is two orders of magnitude worse than ~10 nanoseconds.
> As I said before, I'd be happy to review data for chrony taken against that
> quality of reference.
>
>> I get an increase of about 2-3
>> times better than ntp. Lichvar got something like 20 times better using
>> a PPS against a local high accuracy time source. The main reason seems
>> to be that chrony is far more algile-- it catches temperature drifts
>> much more quickly than ntpd does, for the same poll.
>
> More agile is almost always less stable.  I'd rather my timekeeping software
> figure out the average intrinsic drift averaged over long time intervals
> such that it keeps an average frequency correction rather than chasing
> short-lived drifts due to thermals.  But then, I also make sure that my
> timeservers are running in temperature-controlled environments so that
> such daily drifts you mention are minimized.
>
>> Remember that ntpd
>> throws out 85% of the measurements it makes, in order to try (poorly) to
>> compensate for network up-down inequalities. Sometimes, if the network
>> is very variably assymetric that can improve results. Usually it simply
>> throws away valuable measurements.
>
> Ah, I thought this claim would reappear.  Note that you're contradicting
> your earlier claim that ntpd doesn't remember anything before the last poll.
>
>> ntpd is also designed to act very
>> slowly to changes in rate. It is a design philosophy Mills defends
>> strongly, with the matra of stability.
>
> Agreed.
>
>> Chrony, because of its long term
>> memory, recognizes and responds to rate variations much more quickly,
>> with no sign of instability. It would be good to have a detailed
>> analysis of the chrony algorithm to see if there are corner cases where
>> chrony does worse by going unstable. ntpd is simple to analyse (if one
>> ignores the extreme non-linearity of the "jump" if the offset is greater
>> than 128ms.)
>
> Also agreed.  It will be interesting to see whether chrony starts including
> more sanity checking after being exposed to a wider range of use-cases
> from Red Hat's adoption.
>
>> If you would like to compare chrony vs ntp with your PRS10, please do
>> so. Otherwise look at some of he numbers I have on
>> www.theory.physics.ubc.ca/chrony
>
> Lots of RRD graphs; no signs of measurements taken against a sub-microsecond
> reference, though.
>
>>> Of course you're not going to see much delta between the local clock and the
>>> reference that you're polling every 16 seconds.  Without measuring the
>>> local clock against some other clock or oscillator which is known to be
>>> accurate to sub-microsecond levels, one doesn't have the data needed to draw
>>> conclusions about the actual timekeeping precision.
>> 
>> Actually not true.
>
> What isn't true?
>
>> How do you think the standards of the various
>> contries determine the accuracy of their clocks. They have no better
>> time standard to compare them with. And yet they confidently will quote
>> accuracy figures for their clocks. Study that.
>
> For almost all of human history, the sun or the "fixed celestial heavens"
> have provided the most accurate time reference available.  Even today,
> we add (or subtract, in theory) leap seconds in order to keep UTC and UT1
> aligned to better than a second courtesy of IERS.
>
> Yes, the USNO, CERN, and so forth now do have sufficiently high quality
> atomic clocks which have better timekeeping precision than celestial
> observations.
>
> Such a point is orthogonal to the notion of how to measure a local clock,
> unless of course one is using those high-quality atomic clocks as the
> reference to measure your local clock against.
>
> Regards,



More information about the questions mailing list