[ntp:questions] Re: tinker step 0 (always slew) and kernel time discipline
Richard B. Gilbert
rgilbert88 at comcast.net
Fri Sep 22 18:53:51 UTC 2006
Joe Harvell wrote:
> David L. Mills wrote:
> <snip>
>
>>
>> 5. If for some reason the server(s) are not reachable at startup and
>> the applications must start, then I would assume the applications
>> would fail, since the time is not synchronized. If the applications
>> use the NTP system primatives, the synchronization condition is
>> readily apparent in the return code. Since they can't run anyway,
>> there is no harm in stepping the clock, no matter what the initial
>> offset. Forcing a slew in this case would seem highly undesirable,
>> unless the application can tolerate large differences between clocks
>> and, in that case, using ntpd is probably a poor choice in the first
>> place.
>>
>
> I agree that the condition of no time servers reachable on startup is
> the most common case where a large offset will eventually be observed.
> I agree that the application should detect this and fail before starting
> up. I am concerned about clock and network failure scenarios that cause
> an NTP client to see two different NTP servers with very different times.
>
> This actually happened in a testbed for our application. NTP stats show
> that over the course of 22 days, the offsets of two configured NTP
> servers (both ours) serving one of our NTP clients started diverging up
> to a maximum distance of 800 seconds. During this time, our NTP client
> stepped its clock forward 940 times and backwards 803 times, with
> increasing magnitudes up to ~400 seconds. The problem went away when
> someone "added an IP address to the configuration of one of the NTP
> servers." (I am still trying to determine exactly what happened). The
> ntp.conf files of the NTP client, the stats, and a nice graph of the
> offsets is found at http://dingo.dogpad.net/ntpProblem/.
>
> I concede that only having 2 NTP servers for our host made this problem
> more likely to occur. But considering the mayhem caused by jerking the
> clock back and forth every 15 minues for 22 days, I think it is worth
> investigating whether to eliminate stepping altogether.
>
Why didn't anyone notice the problem for 22 days? If, indeed, it caused
mayhem, why was it allowed to continue for so long?
More information about the questions
mailing list