[ntp:questions] NTP.log interpretation

William Unruh unruh at invalid.ca
Fri Apr 18 18:31:49 UTC 2014


On 2014-04-18, GregL <greg.leibfried at gmail.com> wrote:
>> On Fri, Apr 18, 2014 at 09:01:09AM -0500, GregL wrote:
>> > >   What you should do is to add more servers to the config.
>> >
>> > What about the idea of going to only one entry, but that entry is served
>> by
>> > a DNS load balancer to choose one of two internal time servers to check.
>> >  Each of those, is configured to point at a pool of time servers (4
>> each).
>>
>> Well, that will prevent the client from detecting it's getting wrong
>> time. Is that what you want?
>>
>>
> I'm wrestling with that very question.  With 100+ systems, we have a far
> greater problem if some systems are *off* and others are not.
>
> From the log it seems that at least one server is completely wrong,
>> the offset between the two servers is around 3 seconds! I'd suggest to
>> fix that first.
>>
>>
> Yes, clearly the root of the most recent problem was a faulty configuration
> that allowed our internal time servers to grow to nearly 50 seconds apart
> at some point....and that wreaked havoc in many many areas.

What was causing that. Clearly one, or both, are not getting their time
from proper servers themselves. In you post there seemed to be a hint
that one of your servers was getting its time from the other. That is
bad idea. It is no better than having just one server. 

>
> That is fixed, and our two internal time servers *should* be correct.

>
> Now, I'm just planning on making changes to the ntp.conf, like adding the
> "-x" parameter.  I'm hoping that that will prevent huge time resets
> backwards in time...should that ever be even possible again.

ntpd will reset the time if it is off by more than 128 ms. Those higly
non-linear jumps are one of the "features" of ntpd. If you do not want
them, run for example chrony. It will smoothly change the time. It will
however also at times slew the time much faster than 500PPM to get the
time back on track. 
>
> But, was the "sychronization lost" message *because* ntp saw the time
> difference so great on peer servers...and chose one to synch to...resulting
> in the time reset message?

And since there are only two, it had no idea which one to choose so it
chose randomly. 



More information about the questions mailing list