[ntp:questions] NTP sync problems
David Woolley
david at ex.djwhome.demon.co.uk.invalid
Tue Jul 1 20:57:22 UTC 2008
martin.tengklint at spray.se wrote:
>
> The topology looks like this:
>
> Ext.NTP Server A
> |
> |
> Ext.NTP Server B Ext.NTP Server C
> | |
> |---------------------------------------|
> |
> |
> NTP Server D
> |
> |
> NTP Client E
>
> The problem is that my NTP client E rejected its selected NTP server
> D, which lead to not syncing, leading to offset drifting on NTP Client
> E. I think I have located the lack of sync to a too large "root
> dispersion" value sent from the NTP server D. Its value is 1991 as
> seen below:
>
> # ntpq -c"rv 51316"
> status=9014 reach, conf, 1 event, event_reach,
> srcadr=cliente, srcport=123, dstadr=169.254.5.34, dstport=123,
> leap=00, stratum=2, precision=-16, rootdelay=1.785,
> rootdispersion=1991.028, refid=10.112.1.14, reach=377, unreach=0,
Yup. rootdispersion is high enough for rejection.
> hmode=3, pmode=4, hpoll=6, ppoll=6, flash=00 ok, keyid=0,
> offset=3466396.411, delay=0.567, dispersion=0.956, jitter=37.305,
> reftime=cc0328d1.feabf9bf Wed, Jun 18 2008 9:25:21.994,
> org=cc0329cb.5b962c81 Wed, Jun 18 2008 9:29:31.357,
> rec=cc031c40.f62d86e1 Wed, Jun 18 2008 8:31:44.961,
> xmt=cc031c40.f5f9b77c Wed, Jun 18 2008 8:31:44.960,
> filtdelay= 0.57 0.53 0.57 0.52 0.56 0.68 0.52
> 1.11,
> filtoffset= 3466396 3466359 3466320 3466282 3466235 3466198 3466160
> 3466123,
This exceeds the panic threshold, so, unless this is first time and you
have -g, NTP will abort if accepts this offset.
> filtdisp= 0.03 0.98 1.95 2.93 3.92 4.86 5.81
> 6.77
>
> Upon looking at ntpq -c "as" command on the Client E, the server is in
> condition reject, most likely due to the high root dispersion.
> Correct?
>
> # ntpq -c"as"
>
> ind assID status conf reach auth condition last_event cnt
> ===========================================================
> 1 51316 9014 yes yes none reject reachable 1
>
> The problem exists when having the NTP server D to sync with an
> external NTP server C (stratum 1) having its own system clock as
> reference.
>
> On NTP Server D:
>
> # ntpq -c "as"
> ind assID status conf reach auth condition last_event cnt
> ===========================================================
> 1 62852 9414 yes yes none candidat reachable 1
> 2 62853 9614 yes yes none sys.peer reachable 1
>
> Upon looking in more detail at the two associations above:
>
> # ntpq -c "rv 62853"
> status=9614 reach, conf, sel_sys.peer, 1 event, event_reach,
> srcadr=10.112.1.14, srcport=123, dstadr=10.112.2.90, dstport=123,
> leap=00, stratum=1, precision=-17, rootdelay=0.000,
> rootdispersion=10.284, refid=LCL, reach=377, unreach=0, hmode=3,
> pmode=4, hpoll=10, ppoll=10, flash=00 ok, keyid=0, offset=-1128.193,
> delay=1.226, dispersion=14.849, jitter=224.514,
> reftime=cc12fb96.0a522000 Mon, Jun 30 2008 9:28:38.040,
> org=cc12fbad.30179000 Mon, Jun 30 2008 9:29:01.187,
> rec=cc12fbae.5110fdd4 Mon, Jun 30 2008 9:29:02.316,
> xmt=cc12fbae.50bd8b10 Mon, Jun 30 2008 9:29:02.315,
> filtdelay= 1.23 1.40 1.68 1.50 1.19 1.28 1.10 1.27,
> filtoffset= -1128.1 -903.68 -1144.7 -1133.5 -814.17 -1125.2 -1125.2
> -921.92,
> filtdisp= 0.04 15.38 30.73 46.10 61.46 76.82 92.21 107.59
>
> # ntpq -c "rv 62852"
> status=9414 reach, conf, sel_candidat, 1 event, event_reach,
> srcadr=10.112.1.13, srcport=123, dstadr=10.112.2.90, dstport=123,
> leap=00, stratum=2, precision=-17, rootdelay=6.454,
> rootdispersion=15.533, refid=10.109.1.164, reach=377, unreach=0,
> hmode=3, pmode=4, hpoll=10, ppoll=10, flash=00 ok, keyid=0,
> offset=1147.347, delay=1.298, dispersion=14.874, jitter=0.641,
> reftime=cc12f9fa.ed579000 Mon, Jun 30 2008 9:21:46.927,
> org=cc12fbd3.785bc000 Mon, Jun 30 2008 9:29:39.470,
> rec=cc12fbd2.52cdc1fb Mon, Jun 30 2008 9:29:38.323,
> xmt=cc12fbd2.52726f6f Mon, Jun 30 2008 9:29:38.322,
> filtdelay= 1.30 1.15 1.47 1.24 1.29 2.20 1.54 1.45,
> filtoffset= 1147.35 1147.99 1371.63 1132.04 1143.24 1460.54 1150.79
> 1150.61,
Note that the two servers differ by more than two seconds. I'm not sure
why they aren't both rejected as false tickers (in systems with LCL
clocks, it is important to be able to outvote the local clock with
enough real clocks, and one is far too few to do that!
I think rv 0 on D would be instructive, but it looks to me as though D
is either rejecting both C and B, or it is trying to jump between them
and the resulting huge jitter is causing the root dispersion to go
through the roof. (Rather than jumping, it may be using one and
rejecting the other in its popcorn filter.)
> filtdisp= 0.04 15.41 30.79 46.18 61.57 76.91 92.26 107.63
>
> ...I can see that the one selected (NTP server C, i.e. AssId: 62853)
> has a ref.id of LCL (meaning it is syncing to its local system clock?)
LCL is local clock, which means that any reference clock it actually has
is broken.
Both are selected. The one with the lowest stratum gets to donate its
stratum and quality data, but they are both survivors, and both will be
used to calculate the time.
I would consider a server claiming to sync to LCL and having stratum 1
to be badly misconfigured. Undisciplined local clocks should always
have the highest stratum that just works, so that they are last choice
and don't propagate too far. The default for LCL is maybe OK if the
machine is accurately synchronised by some non-NTP means and steps are
taken to disable NTP if that source fails. Going lower than the default
really is a bad idea, and the fact that it is lower than you non-LCL
server is why you have the anomaly here.
> while the other one, the candidate (NTP server B, stratum 2) is having
> NTP server A as ref.id, meaning syncing it syncs to NTP server A.
>
> Again, when having NTP server D to primarily sync with NTP server C,
> the "root dispersion" apparently gets too high, while having the NTP
> server D to sync with NTP server B is fixing the problem.
>
> My question is why the root dispersion becomes too high upon syncing
> to an external server having its own local system clock as reference
> (i.e. NTP server C)?
Because C and B are not getting times traceable to the same source and
there isn't an X and Y synchronised to the same source as B, to outvote C.
More information about the questions
mailing list