[ntp:questions] Clock jumps when refclock used

A C agcarver+ntp at acarver.net
Sun Apr 15 21:16:33 UTC 2012


On 4/14/2012 23:33, Dave Hart wrote:
> On Sat, Apr 14, 2012 at 07:43, A C<agcarver+ntp at acarver.net>  wrote:
>> I did notice this in the current Internet only configuration, one Internet
>> clock went crazy just now:
>>
>> peerstats:
>> 56031 27012.604 130.207.165.28  9374  99542.105309349  0.000122070
>> 0.015549461  99542.097097054
>>
>> ntpq line:
>> -130.207.165.28  130.207.244.240  2 u  190 1024  377    0.122  9954210
>> 9954209
>
> I think it's safe to say we know a response did not arrive 27+ hours
> after a poll, particularly as NTP ignores responses which don't
> contain a the most recent poll's timestamp.
>
> I regret beating this drum again, but I suspect your floating-point
> hardware and/or software is at fault.  The SPARC architecture has
> partial hardware floating-point support, the rest is done via software
> provided by the system (OS and/or compiler).  It would be interesting
> to enable rawstats and find the corresponding rawstats entry when an
> insane peerstats offset is observed for a network source, in hopes
> that sheds more light on where the calculation runs off the rails.

I've gone ahead and enabled rawstats.  I may need to adjust some 
settings depending on how much data gets written.  As for the FP 
support, that's still over in the hands of the NetBSD crew as they try 
to find some actual hardware instead of emulators.
>
>> There was a fuzz shortly afterwards (the above was at 07:30:12) :
>> 14 Apr 07:30:48 : ts_min 1334388647.362400444 ts_prev 1334388647.362295452
>> ts 1334388648.362303446
>> 14 Apr 07:30:48 : sys_fuzz 104992 nsec, this fuzz 0.000053869, prior
>> 0.000074835
>
> The system clock was observed at startup to take 105 usec to read,
> which is not coincidentally the difference ts_min - ts_prev.  The time
> to read is the minimum nonzero difference between 12 successive clock
> readings.  As ts>  ts_min by 999 msec, the underlying OS clock is
> behaving correctly.  ntpd's fuzz for this clock reading is randomly
> chosen from 0-105 usec and is 54 usec.  The fuzz was 75 usec for the
> prior clock reading.
>
>> 14 Apr 07:30:48 : get_systime prev result 0xd333a628.5d03feac is 0.000984891
>> later than 0xd333a628.5cc372f1
>
> And this is why the fuzz message was logged -- somehow the sum of
> ts_prev and previous fuzz converted to NTP's timescale appears to be
> nearly a millisecond later than the sum of ts and current fuzz.  In
> short, the output of get_systime() failed to increase over time as it
> should (except once at NTP era rollover every 136 years).
>
> Notice the seconds part of ts is one greater than ts_prev.  Yet the
> seconds part of the previous result is the same as the seconds part of
> the latest result.  Please provide more examples so I can see if this
> is a pattern.
>
> I'm suspicious first of the conversion by ntpd between Unix and NTP
> timescales, and secondarily of the floating point hardware+software.

I have a couple more recent examples of fuzz messages (still using the 
Internet only configuration):

14 Apr 08:59:56 ntpd[11057]: ts_min 1334393996.509276921 ts_prev 
1334393996.509171929 ts 1334393996.518274219
14 Apr 08:59:56 ntpd[11057]: sys_fuzz 104992 nsec, this fuzz 
0.000076449, prior 0.000007028
14 Apr 08:59:56 ntpd[11057]: get_systime prev result 0xd333bb0c.84e97365 
is 0.000836508 later than 0xd333bb0c.84b2a11e

14 Apr 22:01:03 ntpd[11057]: ts_min 1334440863.347747498 ts_prev 
1334440863.347642506 ts 1334440863.355634890
14 Apr 22:01:03 ntpd[11057]: sys_fuzz 104992 nsec, this fuzz 
0.000102068, prior 0.000103523
14 Apr 22:01:03 ntpd[11057]: get_systime prev result 0xd334721f.5b4e8d0b 
is 0.000930384 later than 0xd334721f.5b1193c9

15 Apr 02:37:23 ntpd[11057]: ts_min 1334457443.842685740 ts_prev 
1334457443.842580748 ts 1334457443.849148243
15 Apr 02:37:23 ntpd[11057]: sys_fuzz 104992 nsec, this fuzz 
0.000000000, prior 0.000089013
15 Apr 02:37:23 ntpd[11057]: get_systime prev result 0xd334b2e3.d9a0e30a 
is 0.000962946 later than 0xd334b2e3.d961c77d

15 Apr 07:14:29 ntpd[11057]: ts_min 1334474069.426351662 ts_prev 
1334474069.426246670 ts 1334474069.429352428 
                                                  x12
15 Apr 07:14:29 ntpd[11057]: sys_fuzz 104992 nsec, this fuzz 
0.000017473, prior 0.000099510
15 Apr 07:14:29 ntpd[11057]: get_systime prev result 0xd334f3d5.6e2babc7 
is 0.000983966 later than 0xd334f3d5.6deb2f91



>
>> Note the delay, it seems rather curious that the delay is exactly the
>> computed jitter of my system.  That's the same value used in the jitter
>> field of my PPS source:
>>
>> peerstats:
>> 56031 27118.381 127.127.22.0    9024  0.004945328  0.000000000 0.000353156
>>   0.000122070
>> ntpq:
>> 127.127.22.0    .PPS.            0 l    9   16  377    0.000    5.019 0.122
>>
>> Coincidence or did something accidentally jumble some memory addresses?
>
> Your system took 105 usec to read the clock.using get_systime() before
> fuzzing was enabled.  ntpd converts this to the nearest larger power
> of two to determine the precision, which yields 2**-13 seconds or
> 0.000122.  ntpd knows it cannot believe differences less than 122 usec
> between readings of your clock and so forces the delay and jitter
> values to be no less than that value.  This is entirely by design and
> can be seen in a more exaggerated form running ntpd on Windows Vista/7
> without interpolation with a 1 msec system clock -- the delays and
> jitter are 0.977 msec or higher.  If ntpd used clock differences less
> than 105 usec, it would be giving unwarranted credence to the
> normally-distributed random noise referred to as fuzz whjch is
> intentionally added to local clock readings.

That seems reasonable.  I had to ask just in case there was something 
else like an errant memory map that may have overwritten something.  At 
least it's one less thing to worry about.


More information about the questions mailing list