[ntp:questions] ntpd wedged again

A C agcarver+ntp at acarver.net
Mon Feb 13 21:49:45 UTC 2012


On 2/13/2012 00:49, Dave Hart wrote:
> You can force the remote sources to poll less frequently using minpoll
> on their server lines.  I make no promises that is a wise thing to do,
> though.  I presume there's a good reason ntpd does not raise the
> polling interval on peers when the system polling interval is held low
> by the refclock.

I'm not sure it's a good idea either but I would really like to 
understand why a refclock clamps the polling interval at such a low 
value when nearly every bit of documentation says we should be kind to 
NTP servers and make sure the polling period is allowed to reach 1024.


>> In this case ntpd successfully deselected the errant source and then
>> accepted it again after the strange behavior was noted.  But I contend that
>> this may have happened with the system peer and ntpd may nto have been so
>> graceful about the sudden jump.
>
> That makes sense.  I am looking forward to the day you're able to test
> with the libc math fix, optimistically hoping it will resolve this
> issue as well.

The flaw in libc has actually been found and I'm in the (very slow) 
process of downloading all the source code to apply a patch and fix it.

In the mean time I am running now for over 24 hours without strange 
behavior.  I did get PPS working again by adding a 'prefer' to one of 
the internet servers.  My misunderstanding was that PPS would work even 
without a 'prefer' as long as the system could vote on a system peer and 
it was reasonably close (a few ms).  Apparently that's not how it works. 
  This is without kernel discipline, though (flag 3 is zero) so I'm 
waiting to turn that on.  I'm going to give it a week in this configuration.

The stability is good enough that I can watch the PPM correction drift 
by a few ppb due to temperature swings in the room.  I get about 2 ppb 
(the PPM number changes by 0.002) shift for each degree Fahrenheit of 
room swing.

>
>> PS:  I get a fuzz report once every two hours, is this what is supposed to
>> be in the code or is it supposed to report fuzz when it is detected?
>
> If everything were optimal, you'd only see mention of fuzz at startup,
> if at all.  It's not mentioned if the fuzz threshold is the precision
> reported at startup, only if it's lower, which would happen if the
> minimum time to read the system clock is less than the observed
> minimum nonzero delta between successive readings, in which case the
> fuzz reported is that minimum time to read the clock.  I'm similarly
> hoping the libc patch will eliminate the cryptic fuzz "reports" which
> are basically barking that the clock appeared to run backwards.  Of
> course, once you have the libc patch you will be at liberty to hop off
> the ntp-dev bleeding edge with the new fuzz code and back to the
> well-worn 4.2.6 ntp-stable path, but I hope you'll wait a bit first to
> help me understand if those barks are a sign of buggy new fuzz code or
> a buggy libc.

I'll let you know when I finally get libc fixed but it's pretty much in 
a constant state of fuzz.  I see a fuzz report about once per hour on 
average.



More information about the questions mailing list