[ntp:hackers] What to do when the offset is WAYTOOBIG

Judah Levine jlevine at boulder.nist.gov
Thu Apr 19 12:33:06 PDT 2007


>I am watching five clocks. Three of the say 1200, two say 1300 and 
>my clock says 1400.
>Since the majority of clocks I watch say 1200, I conclude the real 
>time is 1220, but that is beyond my panic limit of one hour.

     I would have looked at this differently. I would have evaluated 
the error of my clock from the
times of the remote clocks I was monitoring using the sigma of my 
clock as a metric -- the average
prediction/correction error over some previous time interval. Since 
the sigma of a typical system
is on the order of milliseconds, I would have concluded that 
something is really broken here --
the prediction errors are hours, not milliseconds. I would not have 
concluded that the time was 1220,
because I trust my local clock to be within some small multiple of 
its historical prediction error.  That
might not be correct, but it is my first-order working hypothesis. 
Based on the evidence at hand, I
have no way of deciding who is right, except that something is 
clearly broken. So I set my clock
to unhealthy and do not adjust it. If the problem really is in the 
remote clocks, then this strategy is
optimum. If the problem is in my clock then I have limited the damage 
by telling my customers not
to use it. (The act of setting the clock unhealthy triggers a pager 
alarm in the NIST servers, but that
is outside of the scope of NTP).
     Since my strategy uses the historical prediction error of the 
local clock as a way of evaluating
the responses of the remote systems, I only need to query a single 
external server. I accept its
response if its time difference is within some reasonable value of 
what my historical sigma has been. My
system would query a second server if this test fails, but that might 
not help here, since none of the queries
would pass this test. The fact that a number of external servers 
agreed would not by itself override my
sigma test. As I mentioned above, this situation would trigger an alarm.
     The weakness with my algorithm comes when the servers disagree 
by something on the order of
my prediction sigma. That is sticky because I can't say for sure 
whether it is a glitch or a conforming
event. Depending on the details, I can follow the wrong pied piper here.

Judah Levine
Time and Frequency Division
NIST Boulder

More information about the hackers mailing list