[ntp:questions] Popcorn on prefer takes out system clock
hart at ntp.org
Tue Mar 6 23:46:36 UTC 2012
On Tue, Mar 6, 2012 at 19:01, A C <agcarver+ntp at acarver.net> wrote:
> Last night I had a large popcorn event coming from a server marked "prefer"
> (for the benefit of the ATOM refclock) upset my system clock (ntpd ver
> 4.2.7p259). It knocked the clock out by about two seconds by slewing the
> clock correction very quickly (almost a step response, actually) from which
> it never recovered. I had to restart ntpd to allow a clock step to bring
> the clock back in line.
> I switched away from that server to another but happened to spot this in the
> log today from the remote server that used to be the prefer:
> 220.127.116.11 944d 8d popcorn 2147483647.997598 s
I assume you mean in the local ntpd log on your sparc box referring to
the IP address that used to be marked prefer. I realize not everyone
is a programmer, but the number above jumped out at me as suspiciously
round in hex and binary. Sure enough, round it to whole seconds and
you have 2^31 or 0x80000000. Incredible coincidence?
Also note when ntpd detects and logs a popcorn spike, it also ignores
it. Further, if ntpd subsequently did not ignore a similar apparent
offset, it should have exceeded the panic threshold and terminated
ntpd (default 1000s) or if the panic threshold were disabled it should
have exceeded the step threshold (default 0.128s) and resulted in a
68-year clock step. In no case should it have triggered slewing.
> The system was quite stable up until the moment this popcorn appeared and
> then it fell apart quickly. It would appear that the control loop does not
> gracefully handle a very gross error from the system peer. I would have
> expected something like this to be ignored and the server rejected in favor
> of the other configured servers but it doesn't seem to be the case.
Prefer is a big hammer. It is not solely a means to label the seconds
for the ATOM/PPS driver. Expecting a prefer peer to be ignored in
favor of non-prefer peers is demonstrating you don't understand what
prefer does. If the cluster algorithm decides to remove a peer and
that peer is marked prefer, the cluster algorithm is terminated and
the peer remains. If there is a prefer peer standing at the start of
the combine algorithm, the combine algorithm is not used and the
prefer peer becomes the tentative system peer, typically then replaced
by the PPS if the clock offset is less than 0.4s.
I know you feel forced to use the separate PPS driver and hence
prefer, but I wonder if that's really so. The NMEA driver does
support using different underlying devices for serial vs. PPS. For
127.127.20.0, the serial device is opened via /dev/gps0, while the PPS
is first attempted on /dev/gpspps0, falling back on /dev/gps0 if
gpspps0 is unavailable. You have the PPS on a separate serial port
because your serial drivers don't support multiple opens by a single
app. That suggests to me you could configure the symlinks gps0 and
gpspps0 to point to those two serial ports and happily use the
integrated PPS support of the NMEA driver, removing the need to mark a
More information about the questions