[ntp:questions] Extracting ntpq like information programmatically
unruh at invalid.ca
Tue Apr 2 16:28:50 UTC 2013
On 2013-04-02, David Woolley <david at ex.djwhome.demon.invalid> wrote:
> unruh wrote:
>> needs to remember only the last couple of measurements. It is a Markovian
> Which is s statistics term. The biggest difference between chrony and
> ntpd, is that chrony seems to have been designed using statistics
> algorithms (linear filters), and ntpd using engineering ones (digital
> filters and PI controllers). There is a culture clash at work here.
Agreed. (except both use digital filters and linear filters.)
chrony tries to use history to determine the best policy, and to throw
away the history if it becomes a bad predictor of current behaviour.
ntpd uses only the latest measurement, with some huristics as to whether
or not the latest measurement is "good", to determine future behaviour.
Often, although not always, history is a better predictor of future
Both have highly non-linear behavour at times. ntp when it jumps the
time due to it having deviated too badly, when it truncates the PPM
of the clock correction, and when it throws away measurements because
the delay is longer than some other measurement's delay.
chrony when it changes the time period over
which it calculates the output (changes the taps in the FIR).
Note that chrony also uses some IIR in its algorithm (the correction
applied is some fraction of the correction implied by the FIR so it has
an infinite response component.)
> ntpd is definitely an infinite impulse response system. I think chrony
> is a finite impulse response one. FIR ones will fully accommodate a real
> change in a finite time, but will also do so for a persistent error.
Everything will do so for a persistent error, since there is no way of
differentiating an error from a change in time.
Their treatment of short term errors is very different. ntpd for example
uses delay as a proxy for "error" and throws away perfectly good
measurements (80% of them). Chrony on the other hand uses all
measurements equally. Should one for example have chrony use the delay
(or some function of the delay) as a weighting function for the least
squares fit? Can one use the delay-offset relation to improve the
fitting procedure (see www.theory.physics.ubc.ca/chrony/chrony.html or
David Mills' book to see that large offsets are often strongly
correlated with the delay. Unfortuantely for any simple scheme, that
correlation can be (randomly) positive or negative. Is there a better
algorithm for chrony which could use those correlations to improve the
behaviour? chrony can do it, since it keeps a history, and can thus find
those coefficients. Roughly,
offset= s+x+ theta(offset/2)
where s is the true offset, x is random variable roughly gaussian
distributed about zero, theta is a random variable with a dominant value of 1 or
-1 but can sometimes have intermediate values, and whose distribution
need not be unbiased, and delay is random variable with a very
non-gaussian distribution (long tails).
Is there some way of being able to use that information to get a better
estimate of s than just the mean (or median) of lots of measurements? Looking at the
scatter plots, it would seem that I could do so by eye, so I should be
able to develope and algorithm as well.
ntpd's approach is to only keep the measurement with the smallest delay
of the past 8 measurements. That is very profligate of measurements, and
effectively increases the sampling period by a factor of 8. Surely if
all of the delays are say within a factor of 1.5 of the shortest delay,
they all convey useful information and could be used.
chrony essentially just fits everything with a least squares fit, but
that gives the tails far too much power (you can tell it to throw away
all measurements whose delay is larger than a some number times the shortest
delay over the history which does lessen that problem, but again,
is that the best use of the data? Throwing away data always feels
wrong, especially if that data is so precious-- with poll of 10 one is
only collecting data thrice every hour. It means that ntpd is
insensitive to changes on a timescale of less than about 3 hours and is
one of the key reasons why ntpd is so slow to adapt to changes (eg
If one is going to throw away data, what is the "best" tradeoff? That
will probably depend on exactly what you want out of the data.
More information about the questions