[ntp:questions] Is this normal behavior?

Zembower, Kevin kzembowe at jhuccp.org
Mon Dec 11 21:52:54 UTC 2006


David, Richard and Harlan, thank you so much for your helpful comments.
I am running Linux, Debian stable (sarge) on this host. It's a Dell
PowerEdge 2450 running dual 600MHz Pentium III processors, with 512M
RAM. Based on this, I don't think the Known Hardware Issues apply to my
problem.

I tried to disable ACPI and APIC by including 'append="noapic,acpi=off"'
in the correct lilo stanza, but this didn't help; time resets still
occurred every 10-15 minutes.

I'm trying now to disable ACPI and APIC, and set HZ to 100 in a custom
kernel. I use the kernels from the Debian stable distribution, and have
never built a custom kernel, but the directions I've found seem do-able.
I was able to completely remove ACPI and APIC (don't include in kernel),
but I can't find where the HZ setting is applied. I'm making the changes
through the 'make menuconfig' program in Debian. Can anyone give me a
hint where to find this setting? Sorry, I know this isn't a kernel list,
but I'm hoping someone here remembers this off the top of their head. Is
there anyway to check the value of HZ before I go to the trouble of
rebuilding the kernel?

Thank you also for your suggestions to clean up my ntp.conf and
timeservers. I certainly do not need to be synced to tock.usno.navy.mil,
and I think that when I put the pool servers in, there was just one pool
and no regional ones. I hope I've cleaned it up sufficiently:
cn2:~# ntpq -p
     remote           refid      st t when poll reach   delay   offset
jitter
========================================================================
======
+trust.httpdnet. 130.126.24.44    3 u   39   64   77   33.281  47914.3
447.913
+tock.jrc.us     207.168.62.76    2 u   28   64   77   17.224  47384.9
459.249
*lumberjack.ald. 192.77.171.2     2 u   36   64   77   32.721  47921.3
447.575
 LOCAL(0)        LOCAL(0)        13 l   34   64   77    0.000    0.000
0.002
cn2:~#

Thanks, again, for all your help and suggestions.

-Kevin

-----Original Message-----
From: questions-bounces+kzembowe=jhuccp.org at lists.ntp.isc.org
[mailto:questions-bounces+kzembowe=jhuccp.org at lists.ntp.isc.org] On
Behalf Of David Woolley
Sent: Monday, December 11, 2006 2:39 AM
To: questions at lists.ntp.isc.org
Subject: Re: [ntp:questions] Is this normal behavior?

In article
<2E8AE992B157C0409B18D0225D0B476304C57680 at XCH-VN01.sph.ad.jhsph.edu>,
kzembowe at jhuccp.org (Zembower, Kevin) wrote:

> Dec  8 09:03:06 cn2 ntpd[16955]: time reset +3.120367 s
> Dec  8 09:23:33 cn2 ntpd[16955]: time reset +3.503628 s

You have a serious problem with your machine running slow.  On Linux
this
is often due to lost clock interrupts as a result of using a higher HZ
figure in the kernel than the disk driver can support.   It could also
mean a broken motherboard clock, the effects of power management, a
wrong
value having been calculated for the CPU frequency, etc.  The fact that
you report high but intermediate offsets tends to rule out the
possibility
that you have coflicting clock synchronisation software.

> *ntp1.usno.navy. .USNO.           1 u   60   64  177    8.567  827.174
> 551.616

Do you meet the rules of engagement conditions for using a stratum
one server (although this one tends to be overloaded and not
particularly
good as a result)?   In any case, note that the offseet has already
reached
827ms.

> +trane.wu-wien.a 195.13.1.153     3 u   57   64  177  125.292  841.188
> 548.251
> +221-15-178-69.g 140.142.16.34    2 u   50   64  177  107.300  1212.00
> 395.490

These two servers are too far away to be useful, given that you can
achieve single figure delays to other servers.

> I notice the problem here, and if I run 'watch ntpq -p.' Seldom is my
> reachability 377, and it frequently and inexplicitly drops to 1 as I'm

This is because the offset becomes unacceptably high, and a step
is initiated, before it gets to that point.  Whenever the clock is
stepped (which is never desirable, after the initial synchronisation)
the states of the servers are discarded and ntpd starts over (but with
updated frequency and offset estimates).

> Is this normal behavior for NTP, to frequently lose the ability to
reach
> a timeserver? If not, how can I troubleshoot it further?

What's probably happening here is that each server is rejected
in turn.  Server hopping does happen, but not like this.

> These time resets seem rather large to me. Is this normal, too?

This is the fundamental symptom.

> Are there any other diagnostics that I could run to help identify any
> problem?

Check if the rate of loss correlates with any form of system activity
(particularly IDE disks).

Disable any power management features.

Make sure that HZ=100 or rebuild the kernel to make it so.

Check the clock behaviour running MS-DOS or the oldest available Windows
(basically to avoid all device activity and use quite large ticks. If it
loses at more than 450ppm, get it working in that environement before
running the normal system (actually, you can correct pure frequency
errors of more than this, but a good machine should be within about
20ppm and the worst I've seen is about 300ppm, so this large an error
probably indicates a system that is too unreliable for the job.

Check the frequency correction.  If it is not on the, 500ppm, end stop,
it may indicate that your time loss is intermittent.

If you meet the conditions for using stratum one public servers, it
would
probably be a good idea to dedicate a machine to being the site
stratume two server.  This can be relatively low specfication (well,
actually very low) which means that it is much less likely to suffer
from
the more technical causes of this sort of problem.

Read the recent thread that concluded that a power management related
parameter can sometimes avoid a problem.

_______________________________________________
questions mailing list
questions at lists.ntp.isc.org
https://lists.ntp.isc.org/mailman/listinfo/questions



More information about the questions mailing list