[ntp:questions] Clock jumps when refclock used

Dave Hart hart at ntp.org
Sun Apr 15 06:33:02 UTC 2012

On Sat, Apr 14, 2012 at 07:43, A C <agcarver+ntp at acarver.net> wrote:
> I did notice this in the current Internet only configuration, one Internet
> clock went crazy just now:
> peerstats:
> 56031 27012.604  9374  99542.105309349  0.000122070
> 0.015549461  99542.097097054
> ntpq line:
> -  2 u  190 1024  377    0.122  9954210
> 9954209

I think it's safe to say we know a response did not arrive 27+ hours
after a poll, particularly as NTP ignores responses which don't
contain a the most recent poll's timestamp.

I regret beating this drum again, but I suspect your floating-point
hardware and/or software is at fault.  The SPARC architecture has
partial hardware floating-point support, the rest is done via software
provided by the system (OS and/or compiler).  It would be interesting
to enable rawstats and find the corresponding rawstats entry when an
insane peerstats offset is observed for a network source, in hopes
that sheds more light on where the calculation runs off the rails.

> There was a fuzz shortly afterwards (the above was at 07:30:12) :
> 14 Apr 07:30:48 : ts_min 1334388647.362400444 ts_prev 1334388647.362295452
> ts 1334388648.362303446
> 14 Apr 07:30:48 : sys_fuzz 104992 nsec, this fuzz 0.000053869, prior
> 0.000074835

The system clock was observed at startup to take 105 usec to read,
which is not coincidentally the difference ts_min - ts_prev.  The time
to read is the minimum nonzero difference between 12 successive clock
readings.  As ts > ts_min by 999 msec, the underlying OS clock is
behaving correctly.  ntpd's fuzz for this clock reading is randomly
chosen from 0-105 usec and is 54 usec.  The fuzz was 75 usec for the
prior clock reading.

> 14 Apr 07:30:48 : get_systime prev result 0xd333a628.5d03feac is 0.000984891
> later than 0xd333a628.5cc372f1

And this is why the fuzz message was logged -- somehow the sum of
ts_prev and previous fuzz converted to NTP's timescale appears to be
nearly a millisecond later than the sum of ts and current fuzz.  In
short, the output of get_systime() failed to increase over time as it
should (except once at NTP era rollover every 136 years).

Notice the seconds part of ts is one greater than ts_prev.  Yet the
seconds part of the previous result is the same as the seconds part of
the latest result.  Please provide more examples so I can see if this
is a pattern.

I'm suspicious first of the conversion by ntpd between Unix and NTP
timescales, and secondarily of the floating point hardware+software.

> Note the delay, it seems rather curious that the delay is exactly the
> computed jitter of my system.  That's the same value used in the jitter
> field of my PPS source:
> peerstats:
> 56031 27118.381    9024  0.004945328  0.000000000 0.000353156
>  0.000122070
> ntpq:
>    .PPS.            0 l    9   16  377    0.000    5.019 0.122
> Coincidence or did something accidentally jumble some memory addresses?

Your system took 105 usec to read the clock.using get_systime() before
fuzzing was enabled.  ntpd converts this to the nearest larger power
of two to determine the precision, which yields 2**-13 seconds or
0.000122.  ntpd knows it cannot believe differences less than 122 usec
between readings of your clock and so forces the delay and jitter
values to be no less than that value.  This is entirely by design and
can be seen in a more exaggerated form running ntpd on Windows Vista/7
without interpolation with a 1 msec system clock -- the delays and
jitter are 0.977 msec or higher.  If ntpd used clock differences less
than 105 usec, it would be giving unwarranted credence to the
normally-distributed random noise referred to as fuzz whjch is
intentionally added to local clock readings.

Dave Hart

More information about the questions mailing list