[ntp:questions] Is this normal behavior?
kzembowe at jhuccp.org
Mon Dec 11 21:52:54 UTC 2006
David, Richard and Harlan, thank you so much for your helpful comments.
I am running Linux, Debian stable (sarge) on this host. It's a Dell
PowerEdge 2450 running dual 600MHz Pentium III processors, with 512M
RAM. Based on this, I don't think the Known Hardware Issues apply to my
I tried to disable ACPI and APIC by including 'append="noapic,acpi=off"'
in the correct lilo stanza, but this didn't help; time resets still
occurred every 10-15 minutes.
I'm trying now to disable ACPI and APIC, and set HZ to 100 in a custom
kernel. I use the kernels from the Debian stable distribution, and have
never built a custom kernel, but the directions I've found seem do-able.
I was able to completely remove ACPI and APIC (don't include in kernel),
but I can't find where the HZ setting is applied. I'm making the changes
through the 'make menuconfig' program in Debian. Can anyone give me a
hint where to find this setting? Sorry, I know this isn't a kernel list,
but I'm hoping someone here remembers this off the top of their head. Is
there anyway to check the value of HZ before I go to the trouble of
rebuilding the kernel?
Thank you also for your suggestions to clean up my ntp.conf and
timeservers. I certainly do not need to be synced to tock.usno.navy.mil,
and I think that when I put the pool servers in, there was just one pool
and no regional ones. I hope I've cleaned it up sufficiently:
cn2:~# ntpq -p
remote refid st t when poll reach delay offset
+trust.httpdnet. 22.214.171.124 3 u 39 64 77 33.281 47914.3
+tock.jrc.us 126.96.36.199 2 u 28 64 77 17.224 47384.9
*lumberjack.ald. 188.8.131.52 2 u 36 64 77 32.721 47921.3
LOCAL(0) LOCAL(0) 13 l 34 64 77 0.000 0.000
Thanks, again, for all your help and suggestions.
From: questions-bounces+kzembowe=jhuccp.org at lists.ntp.isc.org
[mailto:questions-bounces+kzembowe=jhuccp.org at lists.ntp.isc.org] On
Behalf Of David Woolley
Sent: Monday, December 11, 2006 2:39 AM
To: questions at lists.ntp.isc.org
Subject: Re: [ntp:questions] Is this normal behavior?
<2E8AE992B157C0409B18D0225D0B476304C57680 at XCH-VN01.sph.ad.jhsph.edu>,
kzembowe at jhuccp.org (Zembower, Kevin) wrote:
> Dec 8 09:03:06 cn2 ntpd: time reset +3.120367 s
> Dec 8 09:23:33 cn2 ntpd: time reset +3.503628 s
You have a serious problem with your machine running slow. On Linux
is often due to lost clock interrupts as a result of using a higher HZ
figure in the kernel than the disk driver can support. It could also
mean a broken motherboard clock, the effects of power management, a
value having been calculated for the CPU frequency, etc. The fact that
you report high but intermediate offsets tends to rule out the
that you have coflicting clock synchronisation software.
> *ntp1.usno.navy. .USNO. 1 u 60 64 177 8.567 827.174
Do you meet the rules of engagement conditions for using a stratum
one server (although this one tends to be overloaded and not
good as a result)? In any case, note that the offseet has already
> +trane.wu-wien.a 184.108.40.206 3 u 57 64 177 125.292 841.188
> +221-15-178-69.g 220.127.116.11 2 u 50 64 177 107.300 1212.00
These two servers are too far away to be useful, given that you can
achieve single figure delays to other servers.
> I notice the problem here, and if I run 'watch ntpq -p.' Seldom is my
> reachability 377, and it frequently and inexplicitly drops to 1 as I'm
This is because the offset becomes unacceptably high, and a step
is initiated, before it gets to that point. Whenever the clock is
stepped (which is never desirable, after the initial synchronisation)
the states of the servers are discarded and ntpd starts over (but with
updated frequency and offset estimates).
> Is this normal behavior for NTP, to frequently lose the ability to
> a timeserver? If not, how can I troubleshoot it further?
What's probably happening here is that each server is rejected
in turn. Server hopping does happen, but not like this.
> These time resets seem rather large to me. Is this normal, too?
This is the fundamental symptom.
> Are there any other diagnostics that I could run to help identify any
Check if the rate of loss correlates with any form of system activity
(particularly IDE disks).
Disable any power management features.
Make sure that HZ=100 or rebuild the kernel to make it so.
Check the clock behaviour running MS-DOS or the oldest available Windows
(basically to avoid all device activity and use quite large ticks. If it
loses at more than 450ppm, get it working in that environement before
running the normal system (actually, you can correct pure frequency
errors of more than this, but a good machine should be within about
20ppm and the worst I've seen is about 300ppm, so this large an error
probably indicates a system that is too unreliable for the job.
Check the frequency correction. If it is not on the, 500ppm, end stop,
it may indicate that your time loss is intermittent.
If you meet the conditions for using stratum one public servers, it
probably be a good idea to dedicate a machine to being the site
stratume two server. This can be relatively low specfication (well,
actually very low) which means that it is much less likely to suffer
the more technical causes of this sort of problem.
Read the recent thread that concluded that a power management related
parameter can sometimes avoid a problem.
questions mailing list
questions at lists.ntp.isc.org
More information about the questions