[ntp:questions] Is this normal behavior?

Per Hedeland per at hedeland.org
Sat Mar 3 10:26:55 UTC 2007


In article <45E8AAC2.2050905 at comcast.net> "Richard B. gilbert"
<rgilbert88 at comcast.net> writes:
>Zembower, Kevin wrote:
>> I notice the problem here, and if I run 'watch ntpq -p.' Seldom is my
>> reachability 377, and it frequently and inexplicitly drops to 1 as I'm
>> watching it. Ten minutes ago reachability was 177 as shown above.
>> Watching it in the last minute, it dropped from 377 to 1 on all sources.
[snip]
>> 
>> Is this normal behavior for NTP, to frequently lose the ability to reach
>> a timeserver? If not, how can I troubleshoot it further?
>
>
>Hell no!  You seem to have a serious network problem of some sort.  Can 
>you ping these servers?  At a time when the problem is manifesting itself?

Uh, you're suggesting that his network is so bad that it takes back
packets that have already been received? Obviously this has nothing to
do with network connectivity, the explanation is below.

>> Dec  8 09:49:21 cn2 ntpd[16955]: time reset +4.512453 s
>> Dec  8 09:53:41 cn2 ntpd[16955]: synchronized to LOCAL(0), stratum 13
>> Dec  8 09:54:39 cn2 ntpd[16955]: synchronized to 128.220.2.7, stratum 2
>> Dec  8 09:54:44 cn2 ntpd[16955]: synchronized to 192.5.41.41, stratum 1
>> Dec  8 10:01:11 cn2 ntpd[16955]: synchronized to 67.128.71.75, stratum 2
>> Dec  8 10:02:14 cn2 ntpd[16955]: synchronized to 192.5.41.41, stratum 1
>> Dec  8 10:09:40 cn2 ntpd[16955]: synchronized to 67.128.71.75, stratum 2
>> Dec  8 10:10:51 cn2 ntpd[16955]: synchronized to 192.5.41.41, stratum 1
>> Dec  8 10:10:51 cn2 ntpd[16955]: time reset +3.950081 s
>> Dec  8 10:15:08 cn2 ntpd[16955]: synchronized to LOCAL(0), stratum 13
>> Dec  8 10:16:14 cn2 ntpd[16955]: synchronized to 192.5.41.41, stratum 1
>> 
>> These time resets seem rather large to me. Is this normal, too?
>
>That's not normal either.

Moreover, it's the reason for the 'reach' register dropping to 1 - every
time ntpd does a reset, it effectively starts over with its
calculations, intentionally resetting all previous state including
'reach'.

And of course the resets reveal the real problem here - your (Kevin's)
clock is drifting like crazy. The fact that the resets are large is one
problem indicator, but in normal operation there shouldn't be any resets
at all, and the thing to take special note of is the relation between
the size of the resets and the interval between them.

In this case, the last reset was + ~ 4 seconds, 21.5 minutes after the
previous one, which means that your clock is slow by 4/1290 or 3100 ppm
(parts per million), which is way beyond the 500 ppm limit where ntpd
can operate without resets. You may have a hardware problem, but it's
probably more likely that you're losing clock interrupts.

A reportedly common case of this is running Linux with a high "clock
rate" a.k.a.  "hz", like 1000 which seems to be popular in recent
distributions. Dropping it down to a more traditional 100 or at least
250 is said to help. Another cause may be high disk I/O activitiy on an
OS where disk drivers lock out clock interrupts for long periods -
making sure to use DMA for this (if supported by driver and HW of
course) can help in this case.

--Per Hedeland
per at hedeland.org




More information about the questions mailing list