[ntp:questions] ARRGH!!! I woke up to a 50 SECOND clock error.

Charles Elliott elliott.ch at verizon.net
Sat Mar 17 18:14:37 UTC 2012


I did not express myself well.  Permit me to explain.  

1. The variable delays are caused by buffering at the Internet routers.  See
CACM Staff. (2012). BufferBloat: what's wrong with the internet? Commun.
ACM, 55(2), 40-47. doi: 10.1145/2076450.2076464.  "A discussion with Vint
Cerf, Van Jacobson, Nick Weaver, and Jim Gettys."  Instead of rejecting
packets that a router has no bandwidth to transmit, routers now buffer them.
As memory has become cheaper, the buffers have in many cases become large.
This results in packet delays that are highly variable and can exceed
several seconds.

	The delays are asymmetric; that is, they are always positive.
NTPD's filtering algorithm always rejects outliers.  So when a packet is
received that has been buffered for 500 to 1000 ms, NTPD may associate it
with an offset this is much closer to the mean.  If this is true, then NTPD
has to be biased in its estimation of round-trip delay.  If a series of
packets are received that have been buffered, then eventually the mean will
increase until the offset is no longer an outlier.  But if the delay caused
by buffering is sporadic, as Unruh suggests, then NTPD will never compute
the offset correctly as all the outliers will have been rejected.

	Furthermore, there are two bits of information associated with the
computation of offset: The offset = ((t1-t0) + (t3 - t2))/2 itself and the
time it is computed, which must be separated by some power of 2 seconds.  As
far as I can tell NTPD just throws away the fact that the interval between
offset computations must be near some power of 2.  Let the offset be y and
the computation of it be x.  Then is not y = f(x)?  That is, is not the
offset at least partially a function of the time it was computed?  Would
regression analysis be a more appropriate filtering algorithm?

	Moreover, in industry feedback control is often the mechanism used
to correct for random noise, which buffering delay may be.

2. It is true that ADSL is asymmetric.  For example several tests show that
I see [2.57, 2.88] mbps down and [0.560, 0.720] mbps up.  But consider that
I have to be within a few thousand feet of the telco central office, which
is in Center City Philadelphia, and all the NTP servers are in State
College, PA (152 mi), Northern (Ramsey) New Jersey (95 mi), or New York City
(86 mi).  Thus the distance that the packet travels at 0.7 mbps is
relatively tiny compared to its total distance.  I question whether the ADSL
asymmetry accounts for the large negative correlation between delay and
offset that NTPD produces.

3. Ron is correct about the BU-353 GPS device: After a cold restart, the
BU-353 is right in synch with the average of 8 stratum 2 servers, and then
the offset steadily increases until the BU-353 looses synch with the
satellites after about a day.  Another cold restart starts the cycle again.
This may make comparisons between the BU-353 and Internet time invalid.  The
question is, what would happen if one did a warm restart instead of a cold
one?


Charles Elliott

> -----Original Message-----
> From: questions-bounces+elliott.ch=verizon.net at lists.ntp.org
> [mailto:questions-bounces+elliott.ch=verizon.net at lists.ntp.org] On
> Behalf Of unruh
> Sent: Friday, March 16, 2012 7:56 PM
> To: questions at lists.ntp.org
> Subject: Re: [ntp:questions] ARRGH!!! I woke up to a 50 SECOND clock
> error.
> 
> On 2012-03-16, Charles Elliott <elliott.ch at verizon.net> wrote:
> > On the subject of accuracy, has anyone ever really looked at NTPD's
> > offset filtering mechanism?  What it does now is sort the last (about
> > 50) offsets from smallest to largest and then prunes the smallest or
> > largest, depending on which is further away from the average, until
> > there are only N (I forget what N is) offset observations left.
> 
> That is for refclocks, And it is usually about 16 (poll 4, and once per
> second). N is about 60% of the total.
> 
> >
> > There may be at least two problems with this filtering mechanism.
> > First, there is no apparent theory behind it; I have never seen such
> a
> > crude filter
> 
> The theory is that there are two noise mechanisms, one approximately
> gaussian with small standard deviation and one much broader but rarer.
> Ie, occasionally you will get "popconr" spikes. The median is the
> optimal estimator if you want to minimize |y-ybar|, just as the mean is
> the optimal estimator for (y-ybar)^2. |y-ybar| is less sensitive to
> large deviations.
> 
> > that does not take into account any information inherent in the data.
> > On the other hand, what I don't know about filters would fill all 24
> > volumes of an encyclopedia.
> 
> Sure it does. See above.
> 
> >
> > Second, we know that each offset observation should have arrived
> about
> > one second after the previous one, yet NTPD does not take advantage
> of
> > that knowledge.  There are filters, such as the Kalman filter that
> > uses a Bayesian estimation approach to predict the next observation
> > and adjusts it according to the prediction when it arrives, that do
> > take advantage of previous observations.  Demonstrations of the
> Kalman
> > filter on the Internet show almost spectacular results.  I used a
> > Kalman filter in my clock simulation program and the results seemed
> > pretty good.  However, there are numerical analysis considerations to
> > programming a Kalman filter as the sums and products of observations
> > can become large in a program that runs infinitely long.  Also,
> > choosing the parameters of a Kalman filter is apparently a black art.
> 
> Recall that ntpd was designed to work on GPS PPS input, and clock
> settings over a bush telegraph. Very different noise structures.
> 
> >
> > Would it be worth it to recruit an electrical or systems engineer who
> > claimed to know something about filtering data to take a serious look
> > at NTPD's data filtering approach?  There has to be some reason that
> > there is a
> 
> David Mills claims to know about filtering data. Not that I always
> agree with him, but he is not stupid.
> 
> > significant negative correlation between delay and offset in NTPD.
> > There
> 
> ???? There is no such correlation in general. If there is on your
> system, then it means that the return (?)  trip is the one that is
> being slowed down by something in the chain. (depending on how you
> define offset).
> 
> > also has to be a reason that my GPS clock (BU-353, which, when it is
> > working well, only has offset ?6 ms from zero) has a difference
> > between about 0 and
> > 47 ms from an NTP server on another computer that gets its time from
> 8
> > NTP stratum 2 servers over the Internet and has remarkably consistent
> > offsets
> > ?3.5 ms from zero.  The difference between the GPS clock and the
> > average of the stratum 2 servers appears to be a function of the time
> > of day; it is large during the mid-part of the day, when the Internet
> > is busy and the delay is large and quite variable between servers,
> and
> > small late in the day (right now it is -0.626; 6:55 PM EST), when the
> > delay is smaller and pretty uniform for all stratum 2 servers.
> 
> Yup. You would expect heavily conjested networks to have more error
> than lightly conjested ones.
> And it sounds like you have assymetric delays. Note that most ISPs
> deliver very different rates for up vs down, and that may well come
> with assymetric delays. (eg 600Kb/s, vs 30Mb/s for my cable access)
> 
> >
> > Charles Elliott
> >
> >> -----Original Message-----
> >> From: questions-bounces+elliott.ch=verizon.net at lists.ntp.org
> >> [mailto:questions-bounces+elliott.ch=verizon.net at lists.ntp.org] On
> >> Behalf Of Chris Albertson
> >> Sent: Thursday, March 15, 2012 5:22 PM
> >> To: unruh
> >> Cc: questions at lists.ntp.org
> >> Subject: Re: [ntp:questions] ARRGH!!! I woke up to a 50 SECOND clock
> >> error.
> >>
> >> On Thu, Mar 15, 2012 at 2:09 PM, unruh <unruh at invalid.ca> wrote:
> >>
> >> > Unfortunately it is not that simple. That rate changes by
> >> > significan amounts. Thus the rate you get after a week may be very
> >> > different
> >> than
> >> > the rate you get after an hour. That, I submit, is the chief
> >> > obstacle to having an accurate clock. And that change in rate does
> >> > not fit
> >> with
> >> > the "Allan variance" assumptions (the noise source is not of the
> >> > type
> >> > assumed)
> >>
> >> You are right about that.  I was going to add in a bit about how to
> >> pick the best time to look at the clock tower.  But left it out
> >> because the point I was making was only that these things are not
> NTP
> >> specific.   Details after that did not contribute the the main
> point.
> >>
> >>
> >> Chris Albertson
> >> Redondo Beach, California
> >> _______________________________________________
> >> questions mailing list
> >> questions at lists.ntp.org
> >> http://lists.ntp.org/listinfo/questions
> 
> _______________________________________________
> questions mailing list
> questions at lists.ntp.org
> http://lists.ntp.org/listinfo/questions



More information about the questions mailing list