[ntp:hackers] What to do when the offset is WAYTOOBIG

David L. Mills mills at udel.edu
Wed Apr 18 07:52:36 PDT 2007


Brian,

There is a very simple explanation for this. Apparently, when a number 
of servers is available the offsets of some of them sane and others 
insane, the mitigation algorithms happen to find an insane one first and 
before considering the others. In NTPv4 the configurable tos minsane 
variable defaults to 1 as per popular preference and in order to speed 
initial acquisition. It really should be set to something larger, say 3, 
when the number of available servers is that value or greater. This way 
the insane servers can be mitigated before any attempt to set the clock. 
The tos minsane variable is of course available only in NTPv4.

Your users should be queried on how to interpret the collective time on 
any particular client under conditions where some servers are sane and 
others not. Assuming the client mitigation algorithms have sifted the 
truechimers from the ratty population and the client considers them 
insane, the natural conclusion is that the client clock must be set by 
means other than NTP. In NTPv4, paranoic users can set the configurable 
tinker panic variable to other than ten minutes or even disable it for 
the first update or all eternity. The tinker panic variable is of course 
available only in NTPv4.

The philosophical basis of this design is very carefully considered in 
the book. However, the simple characterization of the panic threshold is 
that if exceeded, it will not get better no matter how long you wait.

Dave

Brian Utterback wrote:

> I have had users complain because the xntpd that ships with Solaris
> exits when the offset is too big, even though there were servers that
> did not have offsets that were too big. In other words, it was counter
> intuitive to them that the election would go to a candidate that was
> ineligible to serve. It just makes sense that the elimination should be
> before the election when the candidates are filing (or this year, it
> seems the first step is to form an exploratory committee). Wouldn't the
> best solution be to add a test in peer_unfit?
>
> On the other hand, another simple solution is to treat the panic exit
> in local_clock the same way as spikes or popcorn filtees, simply return
> 0 and let the code ignore the update.
>
> Both of these allows ntpd to "wait for better times". The first is more
> intuitive but might have an effect on the detection of falsetickers. the
> second is probably safer, but will leave the system free running in more
> cases, since it will not fall back on the other servers.
>
> Judah Levine wrote:
>
>> Hello,
>>     I agree that having the software exit on this condition is probably
>> the wrong thing to do. The NIST time servers use my own LOCKCLOCK
>> algorithm for synchronizing the clock, and that algorithm partitions
>> this condition into 3 possibilities:
>>
>>      1. A failure of the network or the remote host or the 
>> measurement process.
>>      2. A time step of the local clock
>>      3. A frequency step of the local clock
>>
>> The first action is to immediately initiate a query to another time 
>> source,
>> if one is available. Unlike the standard NTP, my algorithm queries only
>> one time source on each calibration cycle and evaluates the response
>> based on the size of the correction that it implies for the local clock.
>> This checks possibility 1. Possibilities 2 and 3 are distinguished by
>> delaying for a short period of time and initiating a second query. I
>> can talk about the details if anyone is interested or you can find this
>> stuff in my papers. However, the point is that the software never exits.
>>
>> If the program is unable to decide what to do then it sets itself to
>> unhealthy and waits for better times. (This is also a difference with
>> the standard version of NTP. My understanding is that the standard
>> version will never set itself to "clock unsynchronized" once it is up
>> and running.)
>>
>> Best wishes,
>>
>> Judah Levine
>> NIST and University of Colorado
>>
>> Judah Levine
>> Time and Frequency Division
>> NIST Boulder
>>
>>
>> _______________________________________________
>> hackers mailing list
>> hackers at support.ntp.org
>> https://support.ntp.org/mailman/listinfo/hackers
>
>



More information about the hackers mailing list