[ntp:questions] Re: High NTP drift values, time resets and hwclock command

Sergio Ferruchi holger.koch at mails.de
Thu Jul 27 07:26:53 UTC 2006


At first I really thank you all for answering my questions.

Danny Mayer Part

> In your case you are using private addresses
> so you really don't need to bother with restrict statements in the first
> place unless you cannot trust other people in your local network in
> which case you have a social/HR problem rather than a technical one.

Ok but I dont think I have a access permission problem. The clients can
access the servers. I just want to restrict access from servers or
clients which should not access them.

> Never do this. You need to leave it to NTP to control the clock. It
> knows better what it's doing. It may very well be the main cause of your
> problems. Who advised you to do this?

The problem was some clients where never shutdown for some reason.
Thats why hardware clock was never updated and with every startup a
totally wrong time stamp was set.


Richard B. Gilbert

> OTOH setting it every ten minutes seems like overkill; if it gains or
> loses a significant amount of time in ten minutes, it's going to be
> REALLY wrong after the power has been off for an hour or two.

Ok I will reconsider the intervall.

Steve Kostecke  Part

> I'd be a bit concerned about a 25ms offset to a time server in the same
> rack.

Ok it was after I added some more external servers and now it is much
smaller (1 ms). It seems it is stable now.
drift value is now around 70 PPM.

>
> > Jul 26 12:16:41 sb1-1 ntpd[10225]: time reset +0.481624 s
> > Jul 26 12:18:00 sb1-1 ntpd[10225]: synchronized to 192.168.130.172,
> > stratum 3
> > Jul 26 12:36:16 sb1-1 ntpd[10225]: time reset -0.197015 s
> > Jul 26 12:37:35 sb1-1 ntpd[10225]: synchronized to 192.168.130.172,
> > stratum 3
> > Jul 26 13:07:42 sb1-1 ntpd[10225]: time reset +0.263151 s
> > Jul 26 13:09:02 sb1-1 ntpd[10225]: synchronized to 192.168.130.172,
> > stratum 3
>
> This means that 'sb1-1' has drifted more than the default step threshold
> (128ms). Does this occur only on the "clients" or the "servers"?

It appears on server and on the clients but I assume the reason was the
artificial high ntp.drift value of the ntp server measurement.

> What OS / (kernel) version are you running? Does the hardware have any
> sort of power-management, variable processor speed, etc. ?

It is a 2.6.10 Kernel.

I am currently request information on the power management
configuration of the blades.
Can I check it by using a command?

> Some people feel that daemons have no business writing to /etc and
> should use /var. But this is not a problem.

In principle you are right. But its a configuration file which is read
when NTPD is started maybe therefore it is often written to /etc.

>
> > broadcastdelay  0.008
>
> Unneeded but not a problem.

Ok I will delete it.

>
> > tinker panic 0
>
> This command modifies the ntpd panic threshold (which is normally 1024
> seconds). Setting this to 0 disables the panic sanity check and a clock
> offset of any value will be accepted.
>
> Why do you feel you need this?

The problem was when ntpdate fails the server would not be able to
adapt to big offsets. But you are right I will consider the usage of -g
option.


> > server  127.127.1.0
> > fudge   127.127.1.0 stratum 11
>
> You've correctly fudged the LocalCLK to a reasonable stratum. You may
> wish to fudge the LocalCLK on the two (front) servers to different
> strata (i.e. one to 11 and the other to 12) so that the clients will
> follow one of the (front) servers.

Actually thats the case. I did use for the one server 11 and for the
other one 12. What I observed was that thats sometimes not helped to
let the clients synchronizes from the lower stratum. Maybe the filter
algorithm did choose the other one for other reasons. Unfortunately I
did not really understand how the filter mechanism is working exactly.
Is there an simple description that I can understand it. ;-)
I will try to use 11 and 13 instead. Maybe 11 and 12 is to less
difference.

I think the main problem was caused by the local clock configuration
but I need it to be fault tolerant in case of temporarely ntp server
outages. This assures the client and servers follows the same time.
What makes it sometimes instable is when only one external NTP server
is responding. Then I think the local clock on both servers do decide
to dont trus the one external NTP server. Is this possible or can this
only happen if the one external Server goes insane.
I also wonder why in such cases the drift value is 500 PPM. Is it
possible that the local clock + peer + only one external server
available is the main reason for the problem.
Why can a drift value can be determined that wrong and is it possible
that the drift value is not corrected automatically if more than one
server is available again.

Again many thanks to your contributions.




More information about the questions mailing list