[ntp:questions] Number of Stratum 1 & Stratum 2 Peers

Mike Cook mike.cook at orange.fr
Tue Dec 9 13:11:58 UTC 2014


>> <snip>
>>> Three are fine, as long as only one dies or goes nuts.
>> 
>> Again, define "goes nuts". You don't seem to like the term 
>> "falseticker", so how do you define "goes nuts"? If one "goes nuts" or 
>> even goes offline, if the remaining two do not agree then it is like 
>> having no server at all.
> 
> No, it is like having two, with one being out. 
> falseticker is a term with a very specific internal definition. Thus a
> server whose time is right on UTC could be a falseticker, because the
> other two servers were both exactly 3 days out, with tiny jitter estimates. 
> I would say then that you had two servers going nuts, and one good, even
> though ntpd would say there were two good and one false ticker. 

In fact this does not happen. I just tested the hypothesis.
What happens depends on how the two wayward get there exaggerated offset:
a) someone,something resets the date:
    result: ntp on both those servers crashes due to the panic_stop limit.

  So in this case  the client has only one reference and continues using that. It is not flagged as a falsticker.
  That is normal.
   
b) someone restarts ntp on the servers with the wrong date. Here the servers ntpd has no way of knowing that it has bad time and so continues serving normally. 
    On the client. The running ntp sees immediately a huge offset and huge jitter.

Tue Dec  9 13:15:04 CET 2014
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*192.168.1.15    .GPS1.           1 u  320   64  360    0.549    0.040   0.037
+192.168.1.16    .GPS2.           1 u   37   64  377    0.606    0.006   0.028
+192.168.1.17    .GPS1.           1 u  309   64  360    0.576    0.027   0.025
Tue Dec  9 13:16:08 CET 2014
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 192.168.1.15    .GPS1.           1 u   55   64  341    0.565    0.042 9660780
*192.168.1.16    .GPS2.           1 u   37   64  377    0.606    0.006   0.024
 192.168.1.17    .GPS1.           1 u   42   64  341    0.579    0.041 9660773

After 5 mins the client is unable to resolve this and declares all clock falsetickers and then panics. I did not have ntpd in debug mode here, but it is reasonable to assume that it panics due to the selected clock being too far out and hitting the panic limit.

Tue Dec  9 13:23:37 CET 2014
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 192.168.1.15    .GPS1.           1 u   45   64  377    0.596  -255600 155.539
*192.168.1.16    .GPS2.           1 u   25   64  377    0.614    0.024   0.008
 192.168.1.17    .GPS1.           1 u   30   64  377    0.583  -255600  52.806
Tue Dec  9 13:24:41 CET 2014
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
x192.168.1.15    .GPS1.           1 u   43   64  377    0.596  -255600 179.609
x192.168.1.16    .GPS2.           1 u   23   64  377    0.614    0.024   0.008
x192.168.1.17    .GPS1.           1 u   27   64  377    0.618  -255599   6.009
/usr/local/bin/ntpq: read: Connection refused
Tue Dec  9 13:25:45 CET 2014
/usr/local/bin/ntpq: read: Connection refused

This is exactly what happens if the client is restarted.

clock_filter: n 1 off -255599.997967 del 0.000662 dsp 7.937502 jit 0.000002
select: endpoint -1 -255600.000806
select: endpoint  1 -255599.995128
select: survivor 192.168.1.17 0.002839
select: combine offset -255599.997967134 jitter 0.000000000
event at 1 192.168.1.17 903a 8a sys_peer
clock_update: at 1 sample 1 associd 18641
event at 1 0.0.0.0 c617 07 panic_stop -255600 s; set clock manually within 1000 s.
event at 1 0.0.0.0 c61d 0d kern kernel time sync disabled

So ntp does NOT continue in your test case. Your case may be better if the time difference is less than the panic limit. Say if the two servers do not insert a leap second, but the  « correct » one does. I’ll try that for my own satisfaction if I can figure how to do it.

Like

> 
>> 
>> 
>> Brian Utterback
> 
> _______________________________________________
> questions mailing list
> questions at lists.ntp.org
> http://lists.ntp.org/listinfo/questions



More information about the questions mailing list