[ntp:questions] Re: tinker step 0 (always slew) and kernel time discipline

Richard B. Gilbert rgilbert88 at comcast.net
Fri Sep 22 18:53:51 UTC 2006


Joe Harvell wrote:

> David L. Mills wrote:
> <snip>
> 
>>
>> 5. If for some reason the server(s) are not reachable at startup and 
>> the applications must start, then I would assume the applications 
>> would fail, since the time is not synchronized. If the applications 
>> use the NTP system primatives, the synchronization condition is 
>> readily apparent in the return code. Since they can't run anyway, 
>> there is no harm in stepping the clock, no matter what the initial 
>> offset. Forcing a slew in this case would seem highly undesirable, 
>> unless the application can tolerate large differences between clocks 
>> and, in that case, using ntpd is probably a poor choice in the first 
>> place.
>>
> 
> I agree that the condition of no time servers reachable on startup is 
> the most common case where a large offset will eventually be observed.  
> I agree that the application should detect this and fail before starting 
> up.  I am concerned about clock and network failure scenarios that cause 
> an NTP client to see two different NTP servers with very different times.
> 
> This actually happened in a testbed for our application. NTP stats show 
> that over the course of 22 days, the offsets of two configured NTP 
> servers (both ours) serving one of our NTP clients started diverging up 
> to a maximum distance of 800 seconds.  During this time, our NTP client 
> stepped its clock forward 940 times and backwards 803 times, with 
> increasing magnitudes up to ~400 seconds.  The problem went away when 
> someone "added an IP address to the configuration of one of the NTP 
> servers."  (I am still trying to determine exactly what happened).  The 
> ntp.conf files of the NTP client, the stats, and a nice graph of the 
> offsets is found at http://dingo.dogpad.net/ntpProblem/.
> 
> I concede that only having 2 NTP servers for our host made this problem 
> more likely to occur.  But considering the mayhem caused by jerking the 
> clock back and forth every 15 minues for 22 days, I think it is worth 
> investigating whether to eliminate stepping altogether.
> 

Why didn't anyone notice the problem for 22 days?  If, indeed, it caused 
mayhem, why was it allowed to continue for so long?




More information about the questions mailing list