[ntp:questions] Clock jumps when refclock used

A C agcarver+ntp at acarver.net
Mon Apr 16 03:08:00 UTC 2012


On 4/15/2012 14:16, A C wrote:
> On 4/14/2012 23:33, Dave Hart wrote:
>> On Sat, Apr 14, 2012 at 07:43, A C<agcarver+ntp at acarver.net> wrote:
>>> I did notice this in the current Internet only configuration, one
>>> Internet
>>> clock went crazy just now:
>>>
>>> peerstats:
>>> 56031 27012.604 130.207.165.28 9374 99542.105309349 0.000122070
>>> 0.015549461 99542.097097054
>>>
>>> ntpq line:
>>> -130.207.165.28 130.207.244.240 2 u 190 1024 377 0.122 9954210
>>> 9954209
>>
>> I think it's safe to say we know a response did not arrive 27+ hours
>> after a poll, particularly as NTP ignores responses which don't
>> contain a the most recent poll's timestamp.
>>
>> I regret beating this drum again, but I suspect your floating-point
>> hardware and/or software is at fault. The SPARC architecture has
>> partial hardware floating-point support, the rest is done via software
>> provided by the system (OS and/or compiler). It would be interesting
>> to enable rawstats and find the corresponding rawstats entry when an
>> insane peerstats offset is observed for a network source, in hopes
>> that sheds more light on where the calculation runs off the rails.
>
> I've gone ahead and enabled rawstats. I may need to adjust some settings
> depending on how much data gets written. As for the FP support, that's
> still over in the hands of the NetBSD crew as they try to find some
> actual hardware instead of emulators.
>>
>>> There was a fuzz shortly afterwards (the above was at 07:30:12) :
>>> 14 Apr 07:30:48 : ts_min 1334388647.362400444 ts_prev
>>> 1334388647.362295452
>>> ts 1334388648.362303446
>>> 14 Apr 07:30:48 : sys_fuzz 104992 nsec, this fuzz 0.000053869, prior
>>> 0.000074835
>>
>> The system clock was observed at startup to take 105 usec to read,
>> which is not coincidentally the difference ts_min - ts_prev. The time
>> to read is the minimum nonzero difference between 12 successive clock
>> readings. As ts> ts_min by 999 msec, the underlying OS clock is
>> behaving correctly. ntpd's fuzz for this clock reading is randomly
>> chosen from 0-105 usec and is 54 usec. The fuzz was 75 usec for the
>> prior clock reading.
>>
>>> 14 Apr 07:30:48 : get_systime prev result 0xd333a628.5d03feac is
>>> 0.000984891
>>> later than 0xd333a628.5cc372f1
>>
>> And this is why the fuzz message was logged -- somehow the sum of
>> ts_prev and previous fuzz converted to NTP's timescale appears to be
>> nearly a millisecond later than the sum of ts and current fuzz. In
>> short, the output of get_systime() failed to increase over time as it
>> should (except once at NTP era rollover every 136 years).
>>
>> Notice the seconds part of ts is one greater than ts_prev. Yet the
>> seconds part of the previous result is the same as the seconds part of
>> the latest result. Please provide more examples so I can see if this
>> is a pattern.
>>
>> I'm suspicious first of the conversion by ntpd between Unix and NTP
>> timescales, and secondarily of the floating point hardware+software.
>
> I have a couple more recent examples of fuzz messages (still using the
> Internet only configuration):
>
> 14 Apr 08:59:56 ntpd[11057]: ts_min 1334393996.509276921 ts_prev
> 1334393996.509171929 ts 1334393996.518274219
> 14 Apr 08:59:56 ntpd[11057]: sys_fuzz 104992 nsec, this fuzz
> 0.000076449, prior 0.000007028
> 14 Apr 08:59:56 ntpd[11057]: get_systime prev result 0xd333bb0c.84e97365
> is 0.000836508 later than 0xd333bb0c.84b2a11e
>
> 14 Apr 22:01:03 ntpd[11057]: ts_min 1334440863.347747498 ts_prev
> 1334440863.347642506 ts 1334440863.355634890
> 14 Apr 22:01:03 ntpd[11057]: sys_fuzz 104992 nsec, this fuzz
> 0.000102068, prior 0.000103523
> 14 Apr 22:01:03 ntpd[11057]: get_systime prev result 0xd334721f.5b4e8d0b
> is 0.000930384 later than 0xd334721f.5b1193c9
>
> 15 Apr 02:37:23 ntpd[11057]: ts_min 1334457443.842685740 ts_prev
> 1334457443.842580748 ts 1334457443.849148243
> 15 Apr 02:37:23 ntpd[11057]: sys_fuzz 104992 nsec, this fuzz
> 0.000000000, prior 0.000089013
> 15 Apr 02:37:23 ntpd[11057]: get_systime prev result 0xd334b2e3.d9a0e30a
> is 0.000962946 later than 0xd334b2e3.d961c77d
>
> 15 Apr 07:14:29 ntpd[11057]: ts_min 1334474069.426351662 ts_prev
> 1334474069.426246670 ts 1334474069.429352428 x12
> 15 Apr 07:14:29 ntpd[11057]: sys_fuzz 104992 nsec, this fuzz
> 0.000017473, prior 0.000099510
> 15 Apr 07:14:29 ntpd[11057]: get_systime prev result 0xd334f3d5.6e2babc7
> is 0.000983966 later than 0xd334f3d5.6deb2f91
>

More observations.  Using a mindist of 0.001 reduces the number of fuzz 
events.  Using the default mindist increases them.  With a mindist of 
0.001 I had maybe six fuzz events in three to four days of running. 
With a default mindist, there were four in the span of about four hours.

Also, using a default mindist makes the system more prone to just 
jumping randomly.  I did not have a clock jump with the mindist set to 
0.001 for the many days that it was running.  However, today after I 
restarted to use a default mindist, my clock went crazy.  The servers 
were continually reachable so it wasn't a loss of the source time.

Here is a cut from the log file after a restart (nothing unusual between 
the fuzz messages and then the sudden loss of peer):

15 Apr 21:45:24 ntpd[24857]: 69.64.72.238 903a 8a sys_peer
15 Apr 21:54:23 ntpd[24857]: 64.73.32.134 941a 8a sys_peer
15 Apr 22:20:36 ntpd[24857]: 69.64.72.238 941a 8a sys_peer
15 Apr 22:22:44 ntpd[24857]: 64.73.32.134 941a 8a sys_peer
15 Apr 23:13:48 ntpd[24857]: 69.64.72.238 941a 8a sys_peer
15 Apr 23:18:54 ntpd[24857]: 64.73.32.134 941a 8a sys_peer
15 Apr 23:22:29 ntpd[24857]: 69.64.72.238 941a 8a sys_peer
15 Apr 23:27:39 ntpd[24857]: 64.73.32.134 941a 8a sys_peer
15 Apr 23:44:20 ntpd[24857]: 69.64.72.238 941a 8a sys_peer
15 Apr 23:57:14 ntpd[24857]: ts_min 1334534233.596689790 ts_prev 
1334534233.596587790 ts 1334534234.596689369
15 Apr 23:57:14 ntpd[24857]: sys_fuzz 102000 nsec, this fuzz 
0.000016991, prior 0.000021793
15 Apr 23:57:14 ntpd[24857]: get_systime prev result 0xd335deda.99020561 
is 0.000980728 later than 0xd335deda.98c1bf7d
16 Apr 00:07:28 ntpd[24857]: 64.73.32.134 941a 8a sys_peer
16 Apr 00:12:13 ntpd[24857]: ts_min 1334535133.590337837 ts_prev 
1334535133.590235837 ts 1334535133.598099224
16 Apr 00:12:13 ntpd[24857]: sys_fuzz 102000 nsec, this fuzz 
0.000009281, prior 0.000093495
16 Apr 00:12:13 ntpd[24857]: get_systime prev result 0xd335e25d.995ff87f 
is 0.001012142 later than 0xd335e25d.991da394
16 Apr 00:15:17 ntpd[24857]: 69.64.72.238 941a 8a sys_peer
16 Apr 00:41:41 ntpd[24857]: ts_min 1334536901.580667199 ts_prev 
1334536901.580565199 ts 1334536901.589564506
16 Apr 00:41:41 ntpd[24857]: sys_fuzz 102000 nsec, this fuzz 
0.000035873, prior 0.000035873
16 Apr 00:41:41 ntpd[24857]: get_systime prev result 0xd335e945.972cdcf8 
is 0.000927929 later than 0xd335e945.96f00ce7
16 Apr 01:29:40 ntpd[24857]: 64.73.32.134 941a 8a sys_peer
16 Apr 01:38:29 ntpd[24857]: 69.64.72.238 931a 8a sys_peer
16 Apr 02:31:32 ntpd[24857]: 64.73.32.134 941a 8a sys_peer
16 Apr 02:33:56 ntpd[24857]: ts_min 1334543636.677775148 ts_prev 
1334543636.677673148 ts 1334543636.716269137
16 Apr 02:33:56 ntpd[24857]: sys_fuzz 102000 nsec, this fuzz 
0.000032985, prior 0.000042556
16 Apr 02:33:56 ntpd[24857]: get_systime prev result 0xd3360394.b79be6da 
is 0.000920500 later than 0xd3360394.b75f936d
16 Apr 02:37:08 ntpd[24857]: 69.64.72.238 931a 8a sys_peer
16 Apr 02:44:28 ntpd[24857]: 169.229.70.201 943a 8a sys_peer
16 Apr 02:44:52 ntpd[24857]: 131.144.4.10 943a 8a sys_peer
16 Apr 02:49:24 ntpd[24857]: 0.0.0.0 0618 08 no_sys_peer

And the billboard showing a sudden jump:

      remote           refid      st t when poll reach   delay   offset 
  jitter
==============================================================================
  127.127.22.0    .PPS.            0 l   17   16  377    0.000  117.993 
349.039
  127.127.28.0    .GPSD.           4 l   64   64  377    0.000  5743.60 
1358.75
  64.73.32.134    198.30.92.2      2 u   50  256  377   98.170  2049.41 
2348.06
  69.64.72.238    192.12.19.20     2 u  136  256  377   34.772  2954.24 
2479.46
  169.229.70.201  169.229.128.214  3 u  109  256  377   39.510    4.712 
3031.14
  130.207.165.28  130.207.244.240  2 u  153  256  377   78.719  2898.68 
2446.16
  131.144.4.10    130.207.244.240  2 u  150  256  377   96.624  4230.27 
3366.28


And the rawstats for that period (the failure occured near 5300 seconds 
according to loopstats):
56033  3776.720 131.144.4.10 10.0.0.21 3543526976.616098773 
3543526976.666466991 3543526976.666475683 3543526976.715181284
56033  3804.705 130.207.165.28 10.0.0.21 3543527004.621563453 
3543527004.671234784 3543527004.671340001 3543527004.701126296
56033  3815.655 69.64.72.238 10.0.0.21 3543527015.617126897 
3543527015.641268945 3543527015.641303950 3543527015.652367134
56033  3822.661 169.229.70.201 10.0.0.21 3543527022.617239533 
3543527022.641105287 3543527022.641130916 3543527022.658522693
56033  3875.737 64.73.32.134 10.0.0.21 3543527075.632610282 
3543527075.685791015 3543527075.685797214 3543527075.732734028
56033  4043.728 131.144.4.10 10.0.0.21 3543527243.622408912 
3543527243.672989899 3543527243.672998768 3543527243.722792465
56033  4065.710 130.207.165.28 10.0.0.21 3543527265.624932415 
3543527265.674964448 3543527265.674985000 3543527265.704652744
56033  4082.664 69.64.72.238 10.0.0.21 3543527282.625458937 
3543527282.650498993 3543527282.650556750 3543527282.661561033
56033  4086.670 169.229.70.201 10.0.0.21 3543527286.627386697 
3543527286.651229629 3543527286.651254578 3543527286.667456566
56033  4143.831 64.73.32.134 10.0.0.21 3543527343.729351073 
3543527343.782157898 3543527343.782166958 3543527343.828430167
56033  4313.721 131.144.4.10 10.0.0.21 3543527513.618713353 
3543527513.669086876 3543527513.669093683 3543527513.716819415
56033  4322.714 130.207.165.28 10.0.0.21 3543527522.618985199 
3543527522.669045604 3543527522.669123000 3543527522.708528623
56033  4349.784 69.64.72.238 10.0.0.21 3543527549.619607663 
3543527549.644078983 3543527549.644125563 3543527549.654836655
56033  4356.790 169.229.70.201 10.0.0.21 3543527556.621701566 
3543527556.645499899 3543527556.645524443 3543527556.662417271
56033  4413.784 64.73.32.134 10.0.0.21 3543527613.622919917 
3543527613.675914764 3543527613.675920009 3543527613.722257737
56033  4573.728 131.144.4.10 10.0.0.21 3543527773.624744968 
3543527773.674838952 3543527773.674847590 3543527773.723649592
56033  4579.721 130.207.165.28 10.0.0.21 3543527779.624917504 
3543527779.674851508 3543527779.674910001 3543527779.716369887
56033  4609.666 69.64.72.238 10.0.0.21 3543527809.617774420 
3543527809.642369952 3543527809.642416724 3543527809.653286044
56033  4617.736 169.229.70.201 10.0.0.21 3543527817.692559861 
3543527817.717021268 3543527817.717054452 3543527817.733617831
56033  4675.723 64.73.32.134 10.0.0.21 3543527875.619116392 
3543527875.672195911 3543527875.672204971 3543527875.718554171
56033  4842.717 130.207.165.28 10.0.0.21 3543528042.613866202 
3543528042.664567478 3543528042.664636000 3543528042.712606772
56033  4843.847 131.144.4.10 10.0.0.21 3543528043.746258122 
3543528043.797066916 3543528043.797074691 3543528043.844428678
56033  4865.662 69.64.72.238 10.0.0.21 3543528065.621857109 
3543528065.646423973 3543528065.646473510 3543528065.656902470
56033  4887.732 169.229.70.201 10.0.0.21 3543528087.688084734 
3543528087.711833352 3543528087.711857558 3543528087.729050815
56033  4940.850 64.73.32.134 10.0.0.21 3543528140.749133491 
3543528140.802308082 3543528140.802317142 3543528140.848041462
56033  5103.818 130.207.165.28 10.0.0.21 3543528303.726550910 
3543528303.776969509 3543528303.777069000 3543528303.815351965
56033  5113.839 131.144.4.10 10.0.0.21 3543528313.739879159 
3543528313.789827799 3543528313.789835945 3543528313.837027228
56033  5132.778 69.64.72.238 10.0.0.21 3543528332.738628258 
3543528332.763489954 3543528332.763526240 3543528332.775530854
56033  5152.783 169.229.70.201 10.0.0.21 3543528352.739929283 
3543528352.764031522 3543528352.764062804 3543528352.780930586
56033  5209.835 64.73.32.134 10.0.0.21 3543528409.734505651 
3543528409.787201881 3543528409.787210941 3543528409.832910031
56033  5368.802 130.207.165.28 10.0.0.21 3543528568.720955152 
3543528568.771177725 3543528568.771267000 3543528568.799973991
56033  5378.830 131.144.4.10 10.0.0.21 3543528578.730018656 
3543528578.780499937 3543528578.780506681 3543528578.827933375
56033  5389.767 69.64.72.238 10.0.0.21 3543528589.729599959 
3543528589.754235978 3543528589.754279960 3543528589.764874756
56033  5413.804 169.229.70.201 10.0.0.21 3543528613.729711420 
3543528613.754012296 3543528613.754037440 3543528613.770926058
56033  5466.827 64.73.32.134 10.0.0.21 3543528666.726202194 
3543528666.779060840 3543528666.779068947 3543528666.824509809
56033  5630.793 130.207.165.28 10.0.0.21 3543528830.712245896 
3543528830.762411466 3543528830.762476000 3543528830.791145106
56033  5641.812 131.144.4.10 10.0.0.21 3543528841.712401803 
3543528841.762792605 3543528841.762801532 3543528841.810044259
56033  5651.748 69.64.72.238 10.0.0.21 3543528851.710921753 
3543528851.735873997 3543528851.735917805 3543528851.746139890
56033  5682.764 169.229.70.201 10.0.0.21 3543528882.717152889 
3543528882.741148280 3543528882.741192559 3543528882.758752645
56033  5730.820 64.73.32.134 10.0.0.21 3543528930.717479181 
3543528930.770674228 3543528930.770682812 3543528930.817574254
56033  5898.806 130.207.165.28 10.0.0.21 3543529098.724438124 
3543529098.775677922 3543529098.775736000 3543529098.804353613
56033  5906.892 131.144.4.10 10.0.0.21 3543529106.703017155 
3543529106.753669543 3543529106.753677689 3543529106.888191595
56033  5909.715 69.64.72.238 10.0.0.21 3543529109.677735136 
3543529109.702483980 3543529109.702531365 3543529109.712689258
56033  5950.709 169.229.70.201 10.0.0.21 3543529150.663124281 
3543529150.687463138 3543529150.687487770 3543529150.706711280
56033  5999.781 64.73.32.134 10.0.0.21 3543529199.680090148 
3543529199.733179092 3543529199.733187198 3543529199.778814577
56033  6165.790 130.207.165.28 10.0.0.21 3543529365.708299256 
3543529365.759323549 3543529365.759406001 3543529365.787546903
56033  6175.729 69.64.72.238 10.0.0.21 3543529375.691632500 
3543529375.716311955 3543529375.716358342 3543529375.726881862
56033  6177.851 131.144.4.10 10.0.0.21 3543529377.659393473 
3543529377.709800580 3543529377.709808993 3543529377.847564287


The matching loopstats:
56033   448.820  0.004279939 -77.514  0.000499632  0.009453 8
56033   917.792  0.004486748 -77.506  0.000473048  0.009266 8
56033  2229.733  0.004547174 -77.484  0.000444596  0.011699 8
56033  2496.726  0.004727311 -77.479  0.000420730  0.011069 8
56033  4609.799  0.004830118 -77.442  0.000395916  0.016642 8
56033  4865.842  0.005117423 -77.438  0.000384023  0.015663 8
56033  5380.827  0.002828855 -77.432  0.000885287  0.014776 8
56033  5909.823  0.005407926 -77.421  0.001231754  0.014326 8
56033  7232.773  0.005187351 -77.396  0.001154836  0.016164 8
56033  9092.658  0.003010063 -77.375  0.001326467  0.016823 8
56033  9428.842  0.005667080 -77.368  0.001556291  0.015935 8
56033  9868.748  0.003383352 -77.362  0.001664696  0.015034 8
56033  9892.704  0.002820953 -77.362  0.001569824  0.014063 8


More information about the questions mailing list