[ntp:questions] ARRGH!!! I woke up to a 50 SECOND clock error.

unruh unruh at invalid.ca
Fri Mar 16 23:55:32 UTC 2012


On 2012-03-16, Charles Elliott <elliott.ch at verizon.net> wrote:
> On the subject of accuracy, has anyone ever really looked at NTPD's offset
> filtering mechanism?  What it does now is sort the last (about 50) offsets
> from smallest to largest and then prunes the smallest or largest, depending
> on which is further away from the average, until there are only N (I forget
> what N is) offset observations left.

That is for refclocks, And it is usually about 16 (poll 4, and once per
second). N is about 60% of the total. 

>
> There may be at least two problems with this filtering mechanism.  First,
> there is no apparent theory behind it; I have never seen such a crude filter

The theory is that there are two noise mechanisms, one approximately
gaussian with small standard deviation and one much broader but rarer.
Ie, occasionally you will get "popconr" spikes. The median is the
optimal estimator if you want to minimize |y-ybar|, just as the mean is
the optimal estimator for (y-ybar)^2. |y-ybar| is less sensitive to
large deviations. 

> that does not take into account any information inherent in the data.  On
> the other hand, what I don't know about filters would fill all 24 volumes of
> an encyclopedia. 

Sure it does. See above. 

>
> Second, we know that each offset observation should have arrived about one
> second after the previous one, yet NTPD does not take advantage of that
> knowledge.  There are filters, such as the Kalman filter that uses a
> Bayesian estimation approach to predict the next observation and adjusts it
> according to the prediction when it arrives, that do take advantage of
> previous observations.  Demonstrations of the Kalman filter on the Internet
> show almost spectacular results.  I used a Kalman filter in my clock
> simulation program and the results seemed pretty good.  However, there are
> numerical analysis considerations to programming a Kalman filter as the sums
> and products of observations can become large in a program that runs
> infinitely long.  Also, choosing the parameters of a Kalman filter is
> apparently a black art.

Recall that ntpd was designed to work on GPS PPS input, and clock
settings over a bush telegraph. Very different noise structures. 

>
> Would it be worth it to recruit an electrical or systems engineer who
> claimed to know something about filtering data to take a serious look at
> NTPD's data filtering approach?  There has to be some reason that there is a

David Mills claims to know about filtering data. Not that I always agree
with him, but he is not stupid. 

> significant negative correlation between delay and offset in NTPD.  There

???? There is no such correlation in general. If there is on your
system, then it means that the return (?)  trip is the one that is being
slowed down by something in the chain. (depending on how you define
offset). 

> also has to be a reason that my GPS clock (BU-353, which, when it is working
> well, only has offset ?6 ms from zero) has a difference between about 0 and
> 47 ms from an NTP server on another computer that gets its time from 8 NTP
> stratum 2 servers over the Internet and has remarkably consistent offsets
> ?3.5 ms from zero.  The difference between the GPS clock and the average of
> the stratum 2 servers appears to be a function of the time of day; it is
> large during the mid-part of the day, when the Internet is busy and the
> delay is large and quite variable between servers, and small late in the day
> (right now it is -0.626; 6:55 PM EST), when the delay is smaller and pretty
> uniform for all stratum 2 servers. 

Yup. You would expect heavily conjested networks to have more error than
lightly conjested ones. 
And it sounds like you have assymetric delays. Note that most ISPs
deliver very different rates for up vs down, and that may well come with
assymetric delays. (eg 600Kb/s, vs 30Mb/s for my cable access)

>
> Charles Elliott
>
>> -----Original Message-----
>> From: questions-bounces+elliott.ch=verizon.net at lists.ntp.org
>> [mailto:questions-bounces+elliott.ch=verizon.net at lists.ntp.org] On
>> Behalf Of Chris Albertson
>> Sent: Thursday, March 15, 2012 5:22 PM
>> To: unruh
>> Cc: questions at lists.ntp.org
>> Subject: Re: [ntp:questions] ARRGH!!! I woke up to a 50 SECOND clock
>> error.
>> 
>> On Thu, Mar 15, 2012 at 2:09 PM, unruh <unruh at invalid.ca> wrote:
>> 
>> > Unfortunately it is not that simple. That rate changes by significan
>> > amounts. Thus the rate you get after a week may be very different
>> than
>> > the rate you get after an hour. That, I submit, is the chief obstacle
>> > to having an accurate clock. And that change in rate does not fit
>> with
>> > the "Allan variance" assumptions (the noise source is not of the type
>> > assumed)
>> 
>> You are right about that.  I was going to add in a bit about how to
>> pick the best time to look at the clock tower.  But left it out because
>> the point I was making was only that these things are not NTP
>> specific.   Details after that did not contribute the the main point.
>> 
>> 
>> Chris Albertson
>> Redondo Beach, California
>> _______________________________________________
>> questions mailing list
>> questions at lists.ntp.org
>> http://lists.ntp.org/listinfo/questions



More information about the questions mailing list