[ntp:questions] Re: tinker step 0 (always slew) and kernel time discipline

Richard B. Gilbert rgilbert88 at comcast.net
Fri Sep 22 18:53:51 UTC 2006

Joe Harvell wrote:

> David L. Mills wrote:
> <snip>
>> 5. If for some reason the server(s) are not reachable at startup and 
>> the applications must start, then I would assume the applications 
>> would fail, since the time is not synchronized. If the applications 
>> use the NTP system primatives, the synchronization condition is 
>> readily apparent in the return code. Since they can't run anyway, 
>> there is no harm in stepping the clock, no matter what the initial 
>> offset. Forcing a slew in this case would seem highly undesirable, 
>> unless the application can tolerate large differences between clocks 
>> and, in that case, using ntpd is probably a poor choice in the first 
>> place.
> I agree that the condition of no time servers reachable on startup is 
> the most common case where a large offset will eventually be observed.  
> I agree that the application should detect this and fail before starting 
> up.  I am concerned about clock and network failure scenarios that cause 
> an NTP client to see two different NTP servers with very different times.
> This actually happened in a testbed for our application. NTP stats show 
> that over the course of 22 days, the offsets of two configured NTP 
> servers (both ours) serving one of our NTP clients started diverging up 
> to a maximum distance of 800 seconds.  During this time, our NTP client 
> stepped its clock forward 940 times and backwards 803 times, with 
> increasing magnitudes up to ~400 seconds.  The problem went away when 
> someone "added an IP address to the configuration of one of the NTP 
> servers."  (I am still trying to determine exactly what happened).  The 
> ntp.conf files of the NTP client, the stats, and a nice graph of the 
> offsets is found at http://dingo.dogpad.net/ntpProblem/.
> I concede that only having 2 NTP servers for our host made this problem 
> more likely to occur.  But considering the mayhem caused by jerking the 
> clock back and forth every 15 minues for 22 days, I think it is worth 
> investigating whether to eliminate stepping altogether.

Why didn't anyone notice the problem for 22 days?  If, indeed, it caused 
mayhem, why was it allowed to continue for so long?

More information about the questions mailing list