[ntp:questions] Too high steps in time reset

David Woolley david at ex.djwhome.demon.co.uk.invalid
Tue Apr 22 07:32:29 UTC 2008


massimo.musso at gmail.com wrote:
 
--------------------------------------------------------------------------------------------------------------
> the first server (172.31.1.90) is a dcf77 stratum 11 in my LAN,
> syncronizing itself 2/3 times per day

I was going to say that that never has valid time, but actually it is 
never going to be used as the server of record, even though it has valid 
time, because the local clock will always win.  More later.

> Apr 22 07:22:26 gecssrv1 ntpd[20177]: time reset +9.470501 s

Positive steps on Red Hat are usually the result of lost clock 
interrupts.  I think that is in the known issues documents mentioned in 
another thread.

If there is any other problem, it is more or less essential that you 
provide ntpq peers output.

> Apr 22 07:26:44 gecssrv1 ntpd[20177]: synchronized to 193.204.114.232,
> stratum 1
> Apr 22 07:29:59 gecssrv1 ntpd[20177]: synchronized to LOCAL(0),
> stratum 10

Synchronizing to LOCAL should be considered a fault condition, 
equivalent to a total loss of synchronisation.  LOCAL should be an 
active choice, not done by default, but if you use it, you should ensure 
that you have enough real servers to outvote it.  The DCF server is 
useless because of its stratum, and would be of questionable value 
because of the large root dispersions it will accumulate between updates 
(these are not fundamental limitations of DCF as a clock source).

Many people would say that you need at least four independent sources of 
true time, and I would suggest that that needs to be in excess over the 
number of LOCAL clock sources (direct and indirect) that you have.

> You can see that at 7:22 I've got a time reset +9.4... that's HUGE. It
> happens often.

Does that correlate with some heavy disk based job? (Backup?)

> The dcf77 sycronized at 5:40.

The switch to the stratum 1, at about that time, may be because the 
error band on DCF has collapsed and the intersection of it and the 
stratum one now exclude the local clock value, thus outvoting the local 
clock.  When it has run for a long time with no update, the error bounds 
will increase and any local clock value within them will be acceptable, 
even if that conflicts with the stratum one.

Also note that any time reset events indicate a problem that should be 
investigated.  Again see the other thread.




More information about the questions mailing list