[ntp:questions] Popcorn on prefer takes out system clock

A C agcarver+ntp at acarver.net
Wed Mar 7 00:15:21 UTC 2012


On 3/6/2012 15:46, Dave Hart wrote:
> On Tue, Mar 6, 2012 at 19:01, A C<agcarver+ntp at acarver.net>  wrote:
>> Last night I had a large popcorn event coming from a server marked "prefer"
>> (for the benefit of the ATOM refclock) upset my system clock (ntpd ver
>> 4.2.7p259).  It knocked the clock out by about two seconds by slewing the
>> clock correction very quickly (almost a step response, actually) from which
>> it never recovered.  I had to restart ntpd to allow a clock step to bring
>> the clock back in line.
>>
>> I switched away from that server to another but happened to spot this in the
>> log today from the remote server that used to be the prefer:
>>
>> 130.207.165.28 944d 8d popcorn 2147483647.997598 s
>
> I assume you mean in the local ntpd log on your sparc box referring to
> the IP address that used to be marked prefer.  I realize not everyone
> is a programmer, but the number above jumped out at me as suspiciously
> round in hex and binary.  Sure enough, round it to whole seconds and
> you have 2^31 or 0x80000000.  Incredible coincidence?
>
> Also note when ntpd detects and logs a popcorn spike, it also ignores
> it.  Further, if ntpd subsequently did not ignore a similar apparent
> offset, it should have exceeded the panic threshold and terminated
> ntpd (default 1000s) or if the panic threshold were disabled it should
> have exceeded the step threshold (default 0.128s) and resulted in a
> 68-year clock step.  In no case should it have triggered slewing.

It did look a lot like a very large binary number but I'm not dismissing 
anything at the moment.  In any case, whether it should or should not 
have triggered slewing, the reality is that it did actually cause a 
slew.  The PPM offset was stable at about -77.8 when the event occurred 
and within two loop cycles (16 seconds) it was -36.  It couldn't recover 
from that mess, the clock was just out of control.

>
>> The system was quite stable up until the moment this popcorn appeared and
>> then it fell apart quickly.  It would appear that the control loop does not
>> gracefully handle a very gross error from the system peer.  I would have
>> expected something like this to be ignored and the server rejected in favor
>> of the other configured servers but it doesn't seem to be the case.
>
> http://www.eecis.udel.edu/~mills/ntp/html/prefer.html
>
> Prefer is a big hammer.  It is not solely a means to label the seconds
> for the ATOM/PPS driver.  Expecting a prefer peer to be ignored in
> favor of non-prefer peers is demonstrating you don't understand what
> prefer does.  If the cluster algorithm decides to remove a peer and
> that peer is marked prefer, the cluster algorithm is terminated and
> the peer remains.  If there is a prefer peer standing at the start of
> the combine algorithm, the combine algorithm is not used and the
> prefer peer becomes the tentative system peer, typically then replaced
> by the PPS if the clock offset is less than 0.4s.
>
> I know you feel forced to use the separate PPS driver and hence
> prefer, but I wonder if that's really so.  The NMEA driver does
> support using different underlying devices for serial vs. PPS.  For
> 127.127.20.0, the serial device is opened via /dev/gps0, while the PPS
> is first attempted on /dev/gpspps0, falling back on /dev/gps0 if
> gpspps0 is unavailable.  You have the PPS on a separate serial port
> because your serial drivers don't support multiple opens by a single
> app.  That suggests to me you could configure the symlinks gps0 and
> gpspps0 to point to those two serial ports and happily use the
> integrated PPS support of the NMEA driver, removing the need to mark a
> prefer peer.

I will try with NMEA and see if I can go without a preferred peer and 
still have PPS.  However, that eventually negates my use of SiRF binary 
so it's not exactly a viable long term solution.  I may just do prefer 
on the SHM clock which doesn't seem to experience popcorn.  It does 
bounce its offset around but no more than about 50ms so it might be 
sufficient.  The last time I tried NMEA it never detected the PPS but 
then I never tried pointing the two devices to the two different ports.


More information about the questions mailing list