[ntp:questions] Is this normal behavior?

David Woolley david at djwhome.demon.co.uk
Mon Dec 11 07:39:05 UTC 2006


In article <2E8AE992B157C0409B18D0225D0B476304C57680 at XCH-VN01.sph.ad.jhsph.edu>,
kzembowe at jhuccp.org (Zembower, Kevin) wrote:

> Dec  8 09:03:06 cn2 ntpd[16955]: time reset +3.120367 s
> Dec  8 09:23:33 cn2 ntpd[16955]: time reset +3.503628 s

You have a serious problem with your machine running slow.  On Linux this
is often due to lost clock interrupts as a result of using a higher HZ
figure in the kernel than the disk driver can support.   It could also
mean a broken motherboard clock, the effects of power management, a wrong
value having been calculated for the CPU frequency, etc.  The fact that
you report high but intermediate offsets tends to rule out the possibility
that you have coflicting clock synchronisation software.

> *ntp1.usno.navy. .USNO.           1 u   60   64  177    8.567  827.174
> 551.616

Do you meet the rules of engagement conditions for using a stratum
one server (although this one tends to be overloaded and not particularly
good as a result)?   In any case, note that the offseet has already reached
827ms.

> +trane.wu-wien.a 195.13.1.153     3 u   57   64  177  125.292  841.188
> 548.251
> +221-15-178-69.g 140.142.16.34    2 u   50   64  177  107.300  1212.00
> 395.490

These two servers are too far away to be useful, given that you can
achieve single figure delays to other servers.

> I notice the problem here, and if I run 'watch ntpq -p.' Seldom is my
> reachability 377, and it frequently and inexplicitly drops to 1 as I'm

This is because the offset becomes unacceptably high, and a step
is initiated, before it gets to that point.  Whenever the clock is
stepped (which is never desirable, after the initial synchronisation)
the states of the servers are discarded and ntpd starts over (but with
updated frequency and offset estimates).

> Is this normal behavior for NTP, to frequently lose the ability to reach
> a timeserver? If not, how can I troubleshoot it further?

What's probably happening here is that each server is rejected
in turn.  Server hopping does happen, but not like this.

> These time resets seem rather large to me. Is this normal, too?

This is the fundamental symptom.

> Are there any other diagnostics that I could run to help identify any
> problem?

Check if the rate of loss correlates with any form of system activity
(particularly IDE disks).

Disable any power management features.

Make sure that HZ=100 or rebuild the kernel to make it so.

Check the clock behaviour running MS-DOS or the oldest available Windows
(basically to avoid all device activity and use quite large ticks. If it
loses at more than 450ppm, get it working in that environement before
running the normal system (actually, you can correct pure frequency
errors of more than this, but a good machine should be within about
20ppm and the worst I've seen is about 300ppm, so this large an error
probably indicates a system that is too unreliable for the job.

Check the frequency correction.  If it is not on the, 500ppm, end stop,
it may indicate that your time loss is intermittent.

If you meet the conditions for using stratum one public servers, it would
probably be a good idea to dedicate a machine to being the site
stratume two server.  This can be relatively low specfication (well,
actually very low) which means that it is much less likely to suffer from
the more technical causes of this sort of problem.

Read the recent thread that concluded that a power management related
parameter can sometimes avoid a problem.




More information about the questions mailing list