[ntp:questions] NTP.log interpretation

William Unruh unruh at invalid.ca
Fri Apr 18 20:12:50 UTC 2014

On 2014-04-18, GregL <greg.leibfried at gmail.com> wrote:
>> > Yes, clearly the root of the most recent problem was a faulty
>> configuration
>> > that allowed our internal time servers to grow to nearly 50 seconds apart
>> > at some point....and that wreaked havoc in many many areas.
>> What was causing that. Clearly one, or both, are not getting their time
>> from proper servers themselves. In you post there seemed to be a hint
>> that one of your servers was getting its time from the other. That is
>> bad idea. It is no better than having just one server.
> Yes.  From what I understand, one of the servers that serves as a time
> server as was rebuilt in January and the ntpd configuration was not put
> back on.  It was an oversight.  Because of other services that run there,
> that server *should* have kept in sync with the other server, but that sync
> didn't appear to happen either.

Having two servers, one of which gets its time from the other is pretty
useless. It is equivalent at the best of times to having only one
server, and at the worst to hvaing none (as you discovered).

You should always try to make sure that your sources of time really are
independent. That is a problem with the pool, you can get two or three
servers all of whom get their time from the same stratum 1 server. 

If you can do it, a better solution would be to have say one server with
a gps PPS clock source, and the other(s) from the outside ntp pool. 

> Clearly a bad situation.  That is corrected now, with both internal time
> servers independently configured to go to a external pool of NTP servers.
> That is more of the "correct the problem" fix;  as a matter of looking at
> the big picture, we are just trying to determine any other changes we
> should make.   Building more dedicated time servers that aren't rebooted
> weekly is one thing I will lobby for ;-)
> I'm certainly learning more ;-)
> --Greg

