[ntp:questions] ntpd wedged again
A C
agcarver+ntp at acarver.net
Mon Feb 13 21:49:45 UTC 2012
On 2/13/2012 00:49, Dave Hart wrote:
> You can force the remote sources to poll less frequently using minpoll
> on their server lines. I make no promises that is a wise thing to do,
> though. I presume there's a good reason ntpd does not raise the
> polling interval on peers when the system polling interval is held low
> by the refclock.
I'm not sure it's a good idea either but I would really like to
understand why a refclock clamps the polling interval at such a low
value when nearly every bit of documentation says we should be kind to
NTP servers and make sure the polling period is allowed to reach 1024.
>> In this case ntpd successfully deselected the errant source and then
>> accepted it again after the strange behavior was noted. But I contend that
>> this may have happened with the system peer and ntpd may nto have been so
>> graceful about the sudden jump.
>
> That makes sense. I am looking forward to the day you're able to test
> with the libc math fix, optimistically hoping it will resolve this
> issue as well.
The flaw in libc has actually been found and I'm in the (very slow)
process of downloading all the source code to apply a patch and fix it.
In the mean time I am running now for over 24 hours without strange
behavior. I did get PPS working again by adding a 'prefer' to one of
the internet servers. My misunderstanding was that PPS would work even
without a 'prefer' as long as the system could vote on a system peer and
it was reasonably close (a few ms). Apparently that's not how it works.
This is without kernel discipline, though (flag 3 is zero) so I'm
waiting to turn that on. I'm going to give it a week in this configuration.
The stability is good enough that I can watch the PPM correction drift
by a few ppb due to temperature swings in the room. I get about 2 ppb
(the PPM number changes by 0.002) shift for each degree Fahrenheit of
room swing.
>
>> PS: I get a fuzz report once every two hours, is this what is supposed to
>> be in the code or is it supposed to report fuzz when it is detected?
>
> If everything were optimal, you'd only see mention of fuzz at startup,
> if at all. It's not mentioned if the fuzz threshold is the precision
> reported at startup, only if it's lower, which would happen if the
> minimum time to read the system clock is less than the observed
> minimum nonzero delta between successive readings, in which case the
> fuzz reported is that minimum time to read the clock. I'm similarly
> hoping the libc patch will eliminate the cryptic fuzz "reports" which
> are basically barking that the clock appeared to run backwards. Of
> course, once you have the libc patch you will be at liberty to hop off
> the ntp-dev bleeding edge with the new fuzz code and back to the
> well-worn 4.2.6 ntp-stable path, but I hope you'll wait a bit first to
> help me understand if those barks are a sign of buggy new fuzz code or
> a buggy libc.
I'll let you know when I finally get libc fixed but it's pretty much in
a constant state of fuzz. I see a fuzz report about once per hour on
average.
More information about the questions
mailing list