[ntp:questions] Number of Stratum 1 & Stratum 2 Peers

Mike Cook michael.cook at sfr.fr
Fri Dec 12 09:28:52 UTC 2014


To close this parenthesis I did the test for leap second only being propagated by 1 of three servers and Bill’s hypothesis is confirmed with a couple of precisions that I would like to share as it might just be a real life case.

a) To start off , in my test all three servers to my one client are sync’d to the same time. One of them has a leap file modified for my test. As UTC is defined WITH leap seconds, although all servers are sync’d, this is the ONLY one serving UTC. It correctly advertises the upcoming leap.
b) When the leap occurs, the server with the leap file correctly inserts the leap, as does the client. The client’s NTP correctly detects the step and after a few polls correctly flags the UTC server as falsticker as the majority are consistently in disagreement with the now updated clock.
Thu Jan 1 01:06:05 CET 2015
remote refid st t when poll reach delay offset jitter
==============================================================================
*192.168.1.15 .GPS1. 1 u 42 64 377 0.495 999.894 534.506
+192.168.1.17 .GPS1. 1 u 39 64 377 0.564 999.899 654.645
x192.168.1.18 .GPS1. 1 u 66 64 377 0.575 -0.066 0.029
Now we have the full story and the « good » clock has been declared falsticker as not part of the majority but the story doesn't end there. A bit later the clients clock, which is at the time on UTC with leap second, gets stepped forward 1 sec to be in agreement with the majority. This is expected, but we have a client which now has not got good time. 
Thu Jan 1 01:11:27 CET 2015
remote refid st t when poll reach delay offset jitter
==============================================================================
192.168.1.15 .GPS1. 1 u 14 16 3 0.488 -0.039 0.038
*192.168.1.17 .GPS1. 1 u 22 64 1 0.516 -0.044 0.031
192.168.1.18 .GPS1. 1 u 14 16 3 0.566 -999.99 0.052
Thu Jan 1 01:12:31 CET 2015
remote refid st t when poll reach delay offset jitter
==============================================================================
Final status with the UTC server redeclared as a falsticker.
Thu Jan 1 01:15:38 CET 2015
remote refid st t when poll reach delay offset jitter
==============================================================================
+192.168.1.15 .GPS1. 1 u 46 64 77 0.488 -0.039 0.047
*192.168.1.17 .GPS1. 1 u 17 64 37 0.520 -0.054 0.032
x192.168.1.18 .GPS1. 1 u 47 64 77 0.575 -999.99 0.053
This test was to verify a worst case scenario but shows that when administrators are preparing for a leap, they need to make sure that a majority of servers will be making the leap and propagate that info. This is not always easy as query commands are routinely blocked by some internet servers.
Note :
There is a possible bug or RFE required somewhere as the clock variable tai is not correctly set on the client.
On the server that has the leap file we have the correct update rom 35 to 36 :
mike at raspB4 ~ $ ntpq -c "rv 0 tai"
tai=36
But on the client which has no leap file (and probably because of this) tai has been set to 1. So I think that what is happening is that the server notion of tai is not propagated to clients.
mike at cubieez2:~$ ntpq -c "rv 0 tai"
tai=1
There will most likely be a leap declared for the end of Jul 1 2015 or latest Jan 1 2016 so we have a bit of time yet to clean up the park.



> Le 9 déc. 2014 à 14:20, Mike Cook <michael.cook at sfr.fr> a écrit :
> 
> <snip>
>> 
>> 
>>> 
>>>> Three are fine, as long as only one dies or goes nuts.
>>> 
>>> Again, define "goes nuts". You don't seem to like the term 
>>> "falseticker", so how do you define "goes nuts"? If one "goes nuts" or 
>>> even goes offline, if the remaining two do not agree then it is like 
>>> having no server at all.
>> 
>> No, it is like having two, with one being out. 
>> falseticker is a term with a very specific internal definition. Thus a
>> server whose time is right on UTC could be a falseticker, because the
>> other two servers were both exactly 3 days out, with tiny jitter estimates. 
>> I would say then that you had two servers going nuts, and one good, even
>> though ntpd would say there were two good and one false ticker.
> 
> In fact this does not happen. I just tested the hypothesis.
> What happens depends on how the two wayward get there exaggerated offset:
> a) someone,something resets the date:
>   result: ntp on both those servers crashes due to the panic_stop limit.
> 
> So in this case  the client has only one reference and continues using that. It is not flagged as a falsticker.
> That is normal.
> 
> b) someone restarts ntp on the servers with the wrong date. Here the servers ntpd has no way of knowing that it has bad time and so continues serving normally. 
>   On the client. The running ntp sees immediately a huge offset and huge jitter.
> 
> Tue Dec  9 13:15:04 CET 2014
>    remote           refid      st t when poll reach   delay   offset  jitter
> ==============================================================================
> *192.168.1.15    .GPS1.           1 u  320   64  360    0.549    0.040   0.037
> +192.168.1.16    .GPS2.           1 u   37   64  377    0.606    0.006   0.028
> +192.168.1.17    .GPS1.           1 u  309   64  360    0.576    0.027   0.025
> Tue Dec  9 13:16:08 CET 2014
>    remote           refid      st t when poll reach   delay   offset  jitter
> ==============================================================================
> 192.168.1.15    .GPS1.           1 u   55   64  341    0.565    0.042 9660780
> *192.168.1.16    .GPS2.           1 u   37   64  377    0.606    0.006   0.024
> 192.168.1.17    .GPS1.           1 u   42   64  341    0.579    0.041 9660773
> 
> After 5 mins the client is unable to resolve this and declares all clock falsetickers and then panics. I did not have ntpd in debug mode here, but it is reasonable to assume that it panics due to the selected clock being too far out and hitting the panic limit.
> 
> Tue Dec  9 13:23:37 CET 2014
>    remote           refid      st t when poll reach   delay   offset  jitter
> ==============================================================================
> 192.168.1.15    .GPS1.           1 u   45   64  377    0.596  -255600 155.539
> *192.168.1.16    .GPS2.           1 u   25   64  377    0.614    0.024   0.008
> 192.168.1.17    .GPS1.           1 u   30   64  377    0.583  -255600  52.806
> Tue Dec  9 13:24:41 CET 2014
>    remote           refid      st t when poll reach   delay   offset  jitter
> ==============================================================================
> x192.168.1.15    .GPS1.           1 u   43   64  377    0.596  -255600 179.609
> x192.168.1.16    .GPS2.           1 u   23   64  377    0.614    0.024   0.008
> x192.168.1.17    .GPS1.           1 u   27   64  377    0.618  -255599   6.009
> /usr/local/bin/ntpq: read: Connection refused
> Tue Dec  9 13:25:45 CET 2014
> /usr/local/bin/ntpq: read: Connection refused
> 
> This is exactly what happens if the client is restarted.
> 
> clock_filter: n 1 off -255599.997967 del 0.000662 dsp 7.937502 jit 0.000002
> select: endpoint -1 -255600.000806
> select: endpoint  1 -255599.995128
> select: survivor 192.168.1.17 0.002839
> select: combine offset -255599.997967134 jitter 0.000000000
> event at 1 192.168.1.17 903a 8a sys_peer
> clock_update: at 1 sample 1 associd 18641
> event at 1 0.0.0.0 c617 07 panic_stop -255600 s; set clock manually within 1000 s.
> event at 1 0.0.0.0 c61d 0d kern kernel time sync disabled
> 
> So ntp does NOT continue in your test case. Your case may be better if the time difference is less than the panic limit. Say if the two servers do not insert a leap second, but the  « correct » one does. I’ll try that for my own satisfaction if I can figure how to do it.
>> 
>> 
> 
>>> 
>>> 
>>> Brian Utterback
>> 
>> _______________________________________________
>> questions mailing list
>> questions at lists.ntp.org
>> http://lists.ntp.org/listinfo/questions
> _______________________________________________
> questions mailing list
> questions at lists.ntp.org
> http://lists.ntp.org/listinfo/questions


More information about the questions mailing list