[ntp:questions] Re: 2 NTP Servers with diverging clocks and how to avoid stepping backwards in time (repost)
Richard B. Gilbert
rgilbert88 at comcast.net
Wed Sep 20 00:22:25 UTC 2006
Joseph Harvell wrote:
> Richard B. Gilbert wrote:
>>Joseph Harvell wrote:
>>>I am doing post-mortem analysis on an NTP related problem in which one
>>>host running ntp-4.1.2 gets in a state where it seems to be making large
>>>step corrections to its local clock.
>>>How can I avoid the large clock stepping in this scenario? Is it
>>>related to the "prefer" keyword used for 192.168.0.1?
>>>Can I safely use "tinker step 0" along with "kernel disable" to prevent
>>>step corrections altogether?
>>Safely?? Probably not!!!! Far better to fix the problem, whatever it
> Yes, I agree I need to fix the reachability problem. I think
> configuring more servers is definitely a good idea.
> The reason I ask about the "prefer" keyword is I think it has the effect
> that if the prefer server survives through clustering algorithm its
> clock alone will be used to correct the local clock; whereas if no
> server is a prefer server, the clocks all survivors of the clustering
> algorithm will be used for clock corrections. Note the bands in the
> graph that suggest the local clock was repeatedly stepped back and forth
> between the two servers' clocks. What I am looking for is to see how
> much the "prefer" keyword is contributing to the frequency and magnitude
> of step corrections in this scenario.
The prefer keyword, as I understand it, tells ntpd to chose this server
if it is possible to do so; e.g. the server is responding, it is
synchronized, the numbers look okay, etc. If the "prefer" server's
numbers look really bad (high jitter, synchronization distance, etc) I
believe the prefer keyword is ignored.
> Also, I recognize that there are failures in which the local host can
> end up with only one server reachable, and that this can flip flop
> between two servers with clocks that are between 128ms and 1024s apart.
> So in this scenario, the local ntpd will step the clock back and forth
> unless I use tinker step 0.
> My application is more sensitive to stepping than it is to the time
> being correct. So I would really like someone to explain to me why NOT
> to use tinker step 0. The February post I was referring to suggested it
> could maybe be done safely along with 'disable kernel'.
I have no personal experience with such a procedure. It seems to me
that this is bending ntpd all out of shape to prevent something that
shouldn't be happening in the first place. A properly configured ntpd
with a properly functioning local clock; e.g. frequency within the 500
PPM tolerance, should never NEED to step except, possibly, during startup.
Try four, five or seven (protects against one, two, or three
falsetickers) servers. Four are probably sufficient for most purposes.
If your servers are "in house" and serving their unsynchronized local
clocks, it's a very poor idea. And this is the only way I can imagine
two servers drifting more than 128 milliseconds apart. If, for some
reason, you can't connect to the internet, invest $85 each in one or
more Garmin GPS18-LVC timing receivers and use them to synchronize
either your NTP servers or your application server.
More information about the questions