[ntp:questions] ntpd wedged again

Dave Hart hart at ntp.org
Mon Feb 13 08:49:18 UTC 2012


On Mon, Feb 13, 2012 at 01:37, A C <agcarver+ntp at acarver.net> wrote:
> On 2/12/2012 16:38, David Lord wrote:
>>
>> A C wrote:
>>>
>>> I'm not exactly sure what happened but I may have caught the system in
>>> the act of trying to run away. Below is the last two lines from the
>>> peers log for one of the peers and below that is the output of ntpq
>>> showing what happened to that peer immediately after. Note the offset.
>>> This somewhat matches what I had been seeing previously where my
>>> system would be reporting sane numbers and then very suddenly go haywire.
>>>
>>> 55969 77832.471 67.23.181.241 9324 0.006359155 0.081262776 0.001634487
>>> 0.008837966
>>> 55969 77897.557 67.23.181.241 9324 399326.864663492 0.000122070
>>> 0.001346708 399326.861631493
>>>
>>>
>>> remote refid st t when poll reach delay offset jitter
>>>
>>>
>>> ==============================================================================
>>>
>>> 127.127.22.0 .PPS. 0 l 5 16 377 0.000 0.095 0.122
>>> 127.127.28.0 .GPSD. 4 l 36 128 377 0.000 6.399 10.748
>>> -67.223.228.71 18.26.4.105 2 u 4 64 377 95.580 -3.863 14.410
>>> +74.118.152.85 69.36.224.15 2 u 20 64 377 41.318 0.492 3.692
>>> 67.23.181.241 128.4.1.1 2 u 53 64 377 0.122 3993268 3993268
>>> +130.207.165.28 130.207.244.240 2 u 7 64 377 77.872 2.837 0.328
>>> *131.144.4.10 130.207.244.240 2 u 1 64 377 84.182 1.434 10.710
>>>
>>>
>>> After a couple minutes things went back to normal as if nothing had
>>> happened:
>>>
>>> 55969 78417.519 67.23.181.241 9024 0.005904355 0.081140840 0.005271221
>>> 0.000546629
>>> 55969 78545.470 67.23.181.241 9324 0.006413290 0.081305361 0.002089938
>>> 0.011478833
>>>
>>>
>>> remote refid st t when poll reach delay offset jitter
>>>
>>> ==============================================================================
>>>
>>> 127.127.22.0 .PPS. 0 l 15 16 377 0.000 0.092 0.122
>>> 127.127.28.0 .GPSD. 4 l 94 128 377 0.000 11.432 8.820
>>> -67.223.228.71 18.26.4.105 2 u 58 64 377 95.580 -3.863 8.613
>>> +74.118.152.85 69.36.224.15 2 u 9 64 377 41.034 0.488 3.691
>>> -67.23.181.241 128.4.1.1 2 u 44 64 377 81.141 5.904 0.547
>>> +130.207.165.28 130.207.244.240 2 u 62 64 377 78.036 3.062 0.628
>>> *131.144.4.10 130.207.244.240 2 u 59 64 377 84.182 1.434 10.717
>>>
>>>
>>> It's the fist time that I've actually been able to see this happen
>>> while staring at the monitor. I'm not certain but I suspect that the
>>> last few times this happened where the system did run away the peer
>>> that experienced the strange behavior happened to also be the selected
>>> system peer. I can't explain the behavior though because everything
>>> looks fine on my system otherwise and the remote server also seems to
>>> be fine given that its offset was normal just before the update.
>>
>>
>> What remote server settings do you have?
>>
>> The distances from your sources also don't fit with
>> having a short poll interval unless you know for
>> certain the routes are symmetric and with constant
>> delay.
>>
>> Apart from that other sites have their own problems
>> so when any of my selected sources goes bad it is
>> usually replaced with a better source.
>>
>> Ntpd usually handles the bad sources fairly well.
>>
>
> I posted the entire configuration in my previous email.  Other than iburst I
> have no settings for the servers.  No min or max poll.  I'm not selecting
> the 64 second poll interval, ntpd is.

You can force the remote sources to poll less frequently using minpoll
on their server lines.  I make no promises that is a wise thing to do,
though.  I presume there's a good reason ntpd does not raise the
polling interval on peers when the system polling interval is held low
by the refclock.

> In this case ntpd successfully deselected the errant source and then
> accepted it again after the strange behavior was noted.  But I contend that
> this may have happened with the system peer and ntpd may nto have been so
> graceful about the sudden jump.

That makes sense.  I am looking forward to the day you're able to test
with the libc math fix, optimistically hoping it will resolve this
issue as well.

> PS:  I get a fuzz report once every two hours, is this what is supposed to
> be in the code or is it supposed to report fuzz when it is detected?

If everything were optimal, you'd only see mention of fuzz at startup,
if at all.  It's not mentioned if the fuzz threshold is the precision
reported at startup, only if it's lower, which would happen if the
minimum time to read the system clock is less than the observed
minimum nonzero delta between successive readings, in which case the
fuzz reported is that minimum time to read the clock.  I'm similarly
hoping the libc patch will eliminate the cryptic fuzz "reports" which
are basically barking that the clock appeared to run backwards.  Of
course, once you have the libc patch you will be at liberty to hop off
the ntp-dev bleeding edge with the new fuzz code and back to the
well-worn 4.2.6 ntp-stable path, but I hope you'll wait a bit first to
help me understand if those barks are a sign of buggy new fuzz code or
a buggy libc.

Cheers,
Dave Hart


More information about the questions mailing list