[ntp:questions] NTP sync problems

martin.tengklint at spray.se martin.tengklint at spray.se
Wed Jul 2 08:07:17 UTC 2008


On Jul 1, 10:57 pm, David Woolley
<da... at ex.djwhome.demon.co.uk.invalid> wrote:
> martin.tengkl... at spray.se wrote:
>
> > The topology looks like this:
>
> > Ext.NTP Server A
> >          |
> >          |
> > Ext.NTP Server B           Ext.NTP Server C
> >          |                                       |
> >          |---------------------------------------|
> >                               |
> >                               |
> >                      NTP Server D
> >                               |
> >                               |
> >                       NTP Client E
>
> > The problem is that my NTP client E rejected its selected NTP server
> > D, which lead to not syncing, leading to offset drifting on NTP Client
> > E. I think I have located the lack of sync to a too large "root
> > dispersion" value sent from the NTP server D. Its value is 1991 as
> > seen below:
>
> > # ntpq -c"rv 51316"
> > status=9014 reach, conf, 1 event, event_reach,
> > srcadr=cliente, srcport=123, dstadr=169.254.5.34, dstport=123,
> > leap=00, stratum=2, precision=-16, rootdelay=1.785,
> > rootdispersion=1991.028, refid=10.112.1.14, reach=377, unreach=0,
>
> Yup.  rootdispersion is high enough for rejection.
>
> > hmode=3, pmode=4, hpoll=6, ppoll=6, flash=00 ok, keyid=0,
> > offset=3466396.411, delay=0.567, dispersion=0.956, jitter=37.305,
> > reftime=cc0328d1.feabf9bf  Wed, Jun 18 2008  9:25:21.994,
> > org=cc0329cb.5b962c81  Wed, Jun 18 2008  9:29:31.357,
> > rec=cc031c40.f62d86e1  Wed, Jun 18 2008  8:31:44.961,
> > xmt=cc031c40.f5f9b77c  Wed, Jun 18 2008  8:31:44.960,
> > filtdelay=     0.57    0.53    0.57    0.52    0.56    0.68    0.52
> > 1.11,
> > filtoffset= 3466396 3466359 3466320 3466282 3466235 3466198 3466160
> > 3466123,
>
> This exceeds the panic threshold, so, unless this is first time and you
> have -g, NTP will abort if accepts this offset.
>
>
>
>
>
> > filtdisp=      0.03    0.98    1.95    2.93    3.92    4.86    5.81
> > 6.77
>
> > Upon looking at ntpq -c "as" command on the Client E, the server is in
> > condition reject, most likely due to the high root dispersion.
> > Correct?
>
> > # ntpq -c"as"
>
> > ind assID status  conf reach auth condition  last_event cnt
> > ===========================================================
> >   1 51316  9014   yes   yes  none    reject   reachable  1
>
> > The problem exists when having the NTP server D to sync with an
> > external NTP server C (stratum 1) having its own system clock as
> > reference.
>
> > On NTP Server D:
>
> > # ntpq -c "as"
> > ind assID status conf reach auth condition last_event cnt
> > ===========================================================
> > 1 62852 9414 yes yes none candidat reachable 1
> > 2 62853 9614 yes yes none sys.peer reachable 1
>
> > Upon looking in more detail at the two associations above:
>
> >  # ntpq -c "rv 62853"
> > status=9614 reach, conf, sel_sys.peer, 1 event, event_reach,
> > srcadr=10.112.1.14, srcport=123, dstadr=10.112.2.90, dstport=123,
> > leap=00, stratum=1, precision=-17, rootdelay=0.000,
> > rootdispersion=10.284, refid=LCL, reach=377, unreach=0, hmode=3,
> > pmode=4, hpoll=10, ppoll=10, flash=00 ok, keyid=0, offset=-1128.193,
> > delay=1.226, dispersion=14.849, jitter=224.514,
> > reftime=cc12fb96.0a522000 Mon, Jun 30 2008 9:28:38.040,
> > org=cc12fbad.30179000 Mon, Jun 30 2008 9:29:01.187,
> > rec=cc12fbae.5110fdd4 Mon, Jun 30 2008 9:29:02.316,
> > xmt=cc12fbae.50bd8b10 Mon, Jun 30 2008 9:29:02.315,
> > filtdelay= 1.23 1.40 1.68 1.50 1.19 1.28 1.10 1.27,
> > filtoffset= -1128.1 -903.68 -1144.7 -1133.5 -814.17 -1125.2 -1125.2
> > -921.92,
> > filtdisp= 0.04 15.38 30.73 46.10 61.46 76.82 92.21 107.59
>
> > # ntpq -c "rv 62852"
> > status=9414 reach, conf, sel_candidat, 1 event, event_reach,
> > srcadr=10.112.1.13, srcport=123, dstadr=10.112.2.90, dstport=123,
> > leap=00, stratum=2, precision=-17, rootdelay=6.454,
> > rootdispersion=15.533, refid=10.109.1.164, reach=377, unreach=0,
> > hmode=3, pmode=4, hpoll=10, ppoll=10, flash=00 ok, keyid=0,
> > offset=1147.347, delay=1.298, dispersion=14.874, jitter=0.641,
> > reftime=cc12f9fa.ed579000 Mon, Jun 30 2008 9:21:46.927,
> > org=cc12fbd3.785bc000 Mon, Jun 30 2008 9:29:39.470,
> > rec=cc12fbd2.52cdc1fb Mon, Jun 30 2008 9:29:38.323,
> > xmt=cc12fbd2.52726f6f Mon, Jun 30 2008 9:29:38.322,
> > filtdelay= 1.30 1.15 1.47 1.24 1.29 2.20 1.54 1.45,
> > filtoffset= 1147.35 1147.99 1371.63 1132.04 1143.24 1460.54 1150.79
> > 1150.61,
>
> Note that the two servers differ by more than two seconds.  I'm not sure
> why they aren't both rejected as false tickers (in systems with LCL
> clocks, it is important to be able to outvote the local clock with
> enough real clocks, and one is far too few to do that!
>
> I think rv 0 on D would be instructive, but it looks to me as though D
> is either rejecting both C and B, or it is trying to jump between them
> and the resulting huge jitter is causing the root dispersion to go
> through the roof.  (Rather than jumping, it may be using one and
> rejecting the other in its popcorn filter.)
>
> > filtdisp= 0.04 15.41 30.79 46.18 61.57 76.91 92.26 107.63
>
> > ...I can see that the one selected (NTP server C, i.e. AssId: 62853)
> > has a ref.id of LCL (meaning it is syncing to its local system clock?)
>
> LCL is local clock, which means that any reference clock it actually has
> is broken.
>
> Both are selected.  The one with the lowest stratum gets to donate its
> stratum and quality data, but they are both survivors, and both will be
> used to calculate the time.
>
> I would consider a server claiming to sync to LCL and having stratum 1
> to be badly misconfigured.  Undisciplined local clocks should always
> have the highest stratum that just works, so that they are last choice
> and don't propagate too far.  The default for LCL is maybe OK if the
> machine is accurately synchronised by some non-NTP means and steps are
> taken to disable NTP if that source fails.  Going lower than the default
> really is a bad idea, and the fact that it is lower than you non-LCL
> server is why you have the anomaly here.
>
> > while the other one, the candidate (NTP server B, stratum 2) is having
> > NTP server A as ref.id, meaning syncing it syncs to NTP server A.
>
> > Again, when having NTP server D to primarily sync with NTP server C,
> > the "root dispersion" apparently gets too high, while having the NTP
> > server D to sync with NTP server B is fixing the problem.
>
> > My question is why the root dispersion becomes too high upon syncing
> > to an external server having its own local system clock as reference
> > (i.e. NTP server C)?
>
> Because C and B are not getting times traceable to the same source and
> there isn't an X and Y synchronised to the same source as B, to outvote C.- Hide quoted text -
>
> - Show quoted text -- Hide quoted text -
>
> - Show quoted text -

Ok, thanks for the quick reply!

Just to clarify even more. Please correct me if I'm wrong:

Because B and C are not getting their times traceable to the same
source, NTP on D have difficulties to choose between these two time
sources (as seen, B and C differs more than 2 secs). They are both
survivors and both are used in time calculation, due to lack of reason
to outvote C.

The one with the lowest stratum (i.e C) gets to donate its quality
data, including a hugh jitter, resulting in root dispersion to go
through the roof. And a high root dispersion value gets NTP on E to
reject NTP on D.

Correct?

BR,
Martin




More information about the questions mailing list