[ntp:questions] ntpd wedged again

David Lord snews at lordynet.org
Mon Feb 13 23:44:23 UTC 2012


A C wrote:
> On 2/13/2012 01:53, David Lord wrote:
>> A C wrote:
>>> On 2/12/2012 16:38, David Lord wrote:
>>>> A C wrote:
>>>>> I'm not exactly sure what happened but I may have caught the system in
>>>>> the act of trying to run away. Below is the last two lines from the
>>>>> peers log for one of the peers and below that is the output of ntpq
>>>>> showing what happened to that peer immediately after. Note the offset.
>>>>> This somewhat matches what I had been seeing previously where my
>>>>> system would be reporting sane numbers and then very suddenly go
>>>>> haywire.
>>>>>
>>>>> 55969 77832.471 67.23.181.241 9324 0.006359155 0.081262776 0.001634487
>>>>> 0.008837966
>>>>> 55969 77897.557 67.23.181.241 9324 399326.864663492 0.000122070
>>>>> 0.001346708 399326.861631493
>>>>>
>>>>>
>>>>> remote refid st t when poll reach delay offset jitter
>>>>>
>>>>> ============================================================================== 
>>>>>
>>>>>
>>>>>
>>>>> 127.127.22.0 .PPS. 0 l 5 16 377 0.000 0.095 0.122
>>>>> 127.127.28.0 .GPSD. 4 l 36 128 377 0.000 6.399 10.748
>>>>> -67.223.228.71 18.26.4.105 2 u 4 64 377 95.580 -3.863 14.410
>>>>> +74.118.152.85 69.36.224.15 2 u 20 64 377 41.318 0.492 3.692
>>>>> 67.23.181.241 128.4.1.1 2 u 53 64 377 0.122 3993268 3993268
>>>>> +130.207.165.28 130.207.244.240 2 u 7 64 377 77.872 2.837 0.328
>>>>> *131.144.4.10 130.207.244.240 2 u 1 64 377 84.182 1.434 10.710
>>>>>
>>>>>
>>>>> After a couple minutes things went back to normal as if nothing had
>>>>> happened:
>>>>>
>>>>> 55969 78417.519 67.23.181.241 9024 0.005904355 0.081140840 0.005271221
>>>>> 0.000546629
>>>>> 55969 78545.470 67.23.181.241 9324 0.006413290 0.081305361 0.002089938
>>>>> 0.011478833
>>>>>
>>>>>
>>>>> remote refid st t when poll reach delay offset jitter
>>>>> ============================================================================== 
>>>>>
>>>>>
>>>>>
>>>>> 127.127.22.0 .PPS. 0 l 15 16 377 0.000 0.092 0.122
>>>>> 127.127.28.0 .GPSD. 4 l 94 128 377 0.000 11.432 8.820
>>>>> -67.223.228.71 18.26.4.105 2 u 58 64 377 95.580 -3.863 8.613
>>>>> +74.118.152.85 69.36.224.15 2 u 9 64 377 41.034 0.488 3.691
>>>>> -67.23.181.241 128.4.1.1 2 u 44 64 377 81.141 5.904 0.547
>>>>> +130.207.165.28 130.207.244.240 2 u 62 64 377 78.036 3.062 0.628
>>>>> *131.144.4.10 130.207.244.240 2 u 59 64 377 84.182 1.434 10.717
>>>>>
>>>>>
>>>>> It's the fist time that I've actually been able to see this happen
>>>>> while staring at the monitor. I'm not certain but I suspect that the
>>>>> last few times this happened where the system did run away the peer
>>>>> that experienced the strange behavior happened to also be the selected
>>>>> system peer. I can't explain the behavior though because everything
>>>>> looks fine on my system otherwise and the remote server also seems to
>>>>> be fine given that its offset was normal just before the update.
>>>>
>>>> What remote server settings do you have?
>>>>
>>>> The distances from your sources also don't fit with
>>>> having a short poll interval unless you know for
>>>> certain the routes are symmetric and with constant
>>>> delay.
>>>>
>>>> Apart from that other sites have their own problems
>>>> so when any of my selected sources goes bad it is
>>>> usually replaced with a better source.
>>>>
>>>> Ntpd usually handles the bad sources fairly well.
>>>>
>>>
>>> I posted the entire configuration in my previous email. Other than
>>> iburst I have no settings for the servers. No min or max poll. I'm not
>>> selecting the 64 second poll interval, ntpd is.
>>
>> That's normal for ntpd starting up and if there is no good
>> drift file it can take many hours to establish. As the drift
>> value is adjusted the poll intervals normally increase and
>> finer adjustments are made. With a stable temperature I see
>> poll intervals of 1024.
> 
> 
> It has been over 24 hours now of ntpd running (hooray, new record) but 
> the poll times are still at 64.  The refclock seems to be clamping it 
> but I don't know why.  The PPM adjustment has been fairly stable for the 
> last 12 hours.  It hovers around -78.40 and drifts up and down by a few 
> pp-billion as the room temperature changes.
> 

Recent ntpd is supposed to handle that level of frequency
offset but most of my pcs have had the frequency offset
adjusted to be < 10 ppm which is done when I build a kernel
with "options PPS_SYNC" and "options TIMER_FREQ=119????".


David


>>>
>>> In this case ntpd successfully deselected the errant source and then
>>> accepted it again after the strange behavior was noted. But I contend
>>> that this may have happened with the system peer and ntpd may nto have
>>> been so graceful about the sudden jump.
>>
>> I use 'tos minsane 3' in my ntp.conf and have occasionally
>> seen my stratum jump to 16 when there has been a problem
>> with my internet connection.
> 
> A stratum change I can see if something has gone wrong but this is not 
> just a stratum change but a complete skewing of the clock by many 
> seconds.  Initially it appears the skew is just an error internal to 
> ntpd where it only *thinks* the clock is off by many seconds.  However, 
> my suspicion is that if this error happens to fall upon the clock that 
> has been christened the system peer then it causes havoc and really 
> *does* skew the system clock to try and compensate for a perceived 
> drift.  Unfortunately that sends the algorithm off the rails and the 
> entire clock ends up slammed in one direction ticking way off as is 
> evidenced by restarting ntpd shortly after the fatal flaw and observing 
> it forcing a clock step of many seconds.
> 
>>
>>>
>>> PS: I get a fuzz report once every two hours, is this what is supposed
>>> to be in the code or is it supposed to report fuzz when it is detected?
>>
>> Sorry I've never had a fuzz report. If you are running
>> ntp-dev it may have code that gives extra information.
>>
> 
> Yeah, Dave hart already described the fuzz code to me.  I was more 
> curious if it was a cyclic thing or if it was reported on an as detected 
> basis.



More information about the questions mailing list