[ntp:questions] NTP sync problems

martin.tengklint at spray.se martin.tengklint at spray.se
Wed Jul 2 09:06:26 UTC 2008


On Jul 2, 10:07 am, martin.tengkl... at spray.se wrote:
> On Jul 1, 10:57 pm, David Woolley
>
>
>
>
>
> <da... at ex.djwhome.demon.co.uk.invalid> wrote:
> > martin.tengkl... at spray.se wrote:
>
> > > The topology looks like this:
>
> > > Ext.NTP Server A
> > >          |
> > >          |
> > > Ext.NTP Server B           Ext.NTP Server C
> > >          |                                       |
> > >          |---------------------------------------|
> > >                               |
> > >                               |
> > >                      NTP Server D
> > >                               |
> > >                               |
> > >                       NTP Client E
>
> > > The problem is that my NTP client E rejected its selected NTP server
> > > D, which lead to not syncing, leading to offset drifting on NTP Client
> > > E. I think I have located the lack of sync to a too large "root
> > > dispersion" value sent from the NTP server D. Its value is 1991 as
> > > seen below:
>
> > > # ntpq -c"rv 51316"
> > > status=9014 reach, conf, 1 event, event_reach,
> > > srcadr=cliente, srcport=123, dstadr=169.254.5.34, dstport=123,
> > > leap=00, stratum=2, precision=-16, rootdelay=1.785,
> > > rootdispersion=1991.028, refid=10.112.1.14, reach=377, unreach=0,
>
> > Yup.  rootdispersion is high enough for rejection.
>
> > > hmode=3, pmode=4, hpoll=6, ppoll=6, flash=00 ok, keyid=0,
> > > offset=3466396.411, delay=0.567, dispersion=0.956, jitter=37.305,
> > > reftime=cc0328d1.feabf9bf  Wed, Jun 18 2008  9:25:21.994,
> > > org=cc0329cb.5b962c81  Wed, Jun 18 2008  9:29:31.357,
> > > rec=cc031c40.f62d86e1  Wed, Jun 18 2008  8:31:44.961,
> > > xmt=cc031c40.f5f9b77c  Wed, Jun 18 2008  8:31:44.960,
> > > filtdelay=     0.57    0.53    0.57    0.52    0.56    0.68    0.52
> > > 1.11,
> > > filtoffset= 3466396 3466359 3466320 3466282 3466235 3466198 3466160
> > > 3466123,
>
> > This exceeds the panic threshold, so, unless this is first time and you
> > have -g, NTP will abort if accepts this offset.
>
> > > filtdisp=      0.03    0.98    1.95    2.93    3.92    4.86    5.81
> > > 6.77
>
> > > Upon looking at ntpq -c "as" command on the Client E, the server is in
> > > condition reject, most likely due to the high root dispersion.
> > > Correct?
>
> > > # ntpq -c"as"
>
> > > ind assID status  conf reach auth condition  last_event cnt
> > > ===========================================================
> > >   1 51316  9014   yes   yes  none    reject   reachable  1
>
> > > The problem exists when having the NTP server D to sync with an
> > > external NTP server C (stratum 1) having its own system clock as
> > > reference.
>
> > > On NTP Server D:
>
> > > # ntpq -c "as"
> > > ind assID status conf reach auth condition last_event cnt
> > > ===========================================================
> > > 1 62852 9414 yes yes none candidat reachable 1
> > > 2 62853 9614 yes yes none sys.peer reachable 1
>
> > > Upon looking in more detail at the two associations above:
>
> > >  # ntpq -c "rv 62853"
> > > status=9614 reach, conf, sel_sys.peer, 1 event, event_reach,
> > > srcadr=10.112.1.14, srcport=123, dstadr=10.112.2.90, dstport=123,
> > > leap=00, stratum=1, precision=-17, rootdelay=0.000,
> > > rootdispersion=10.284, refid=LCL, reach=377, unreach=0, hmode=3,
> > > pmode=4, hpoll=10, ppoll=10, flash=00 ok, keyid=0, offset=-1128.193,
> > > delay=1.226, dispersion=14.849, jitter=224.514,
> > > reftime=cc12fb96.0a522000 Mon, Jun 30 2008 9:28:38.040,
> > > org=cc12fbad.30179000 Mon, Jun 30 2008 9:29:01.187,
> > > rec=cc12fbae.5110fdd4 Mon, Jun 30 2008 9:29:02.316,
> > > xmt=cc12fbae.50bd8b10 Mon, Jun 30 2008 9:29:02.315,
> > > filtdelay= 1.23 1.40 1.68 1.50 1.19 1.28 1.10 1.27,
> > > filtoffset= -1128.1 -903.68 -1144.7 -1133.5 -814.17 -1125.2 -1125.2
> > > -921.92,
> > > filtdisp= 0.04 15.38 30.73 46.10 61.46 76.82 92.21 107.59
>
> > > # ntpq -c "rv 62852"
> > > status=9414 reach, conf, sel_candidat, 1 event, event_reach,
> > > srcadr=10.112.1.13, srcport=123, dstadr=10.112.2.90, dstport=123,
> > > leap=00, stratum=2, precision=-17, rootdelay=6.454,
> > > rootdispersion=15.533, refid=10.109.1.164, reach=377, unreach=0,
> > > hmode=3, pmode=4, hpoll=10, ppoll=10, flash=00 ok, keyid=0,
> > > offset=1147.347, delay=1.298, dispersion=14.874, jitter=0.641,
> > > reftime=cc12f9fa.ed579000 Mon, Jun 30 2008 9:21:46.927,
> > > org=cc12fbd3.785bc000 Mon, Jun 30 2008 9:29:39.470,
> > > rec=cc12fbd2.52cdc1fb Mon, Jun 30 2008 9:29:38.323,
> > > xmt=cc12fbd2.52726f6f Mon, Jun 30 2008 9:29:38.322,
> > > filtdelay= 1.30 1.15 1.47 1.24 1.29 2.20 1.54 1.45,
> > > filtoffset= 1147.35 1147.99 1371.63 1132.04 1143.24 1460.54 1150.79
> > > 1150.61,
>
> > Note that the two servers differ by more than two seconds.  I'm not sure
> > why they aren't both rejected as false tickers (in systems with LCL
> > clocks, it is important to be able to outvote the local clock with
> > enough real clocks, and one is far too few to do that!
>
> > I think rv 0 on D would be instructive, but it looks to me as though D
> > is either rejecting both C and B, or it is trying to jump between them
> > and the resulting huge jitter is causing the root dispersion to go
> > through the roof.  (Rather than jumping, it may be using one and
> > rejecting the other in its popcorn filter.)
>
> > > filtdisp= 0.04 15.41 30.79 46.18 61.57 76.91 92.26 107.63
>
> > > ...I can see that the one selected (NTP server C, i.e. AssId: 62853)
> > > has a ref.id of LCL (meaning it is syncing to its local system clock?)
>
> > LCL is local clock, which means that any reference clock it actually has
> > is broken.
>
> > Both are selected.  The one with the lowest stratum gets to donate its
> > stratum and quality data, but they are both survivors, and both will be
> > used to calculate the time.
>
> > I would consider a server claiming to sync to LCL and having stratum 1
> > to be badly misconfigured.  Undisciplined local clocks should always
> > have the highest stratum that just works, so that they are last choice
> > and don't propagate too far.  The default for LCL is maybe OK if the
> > machine is accurately synchronised by some non-NTP means and steps are
> > taken to disable NTP if that source fails.  Going lower than the default
> > really is a bad idea, and the fact that it is lower than you non-LCL
> > server is why you have the anomaly here.
>
> > > while the other one, the candidate (NTP server B, stratum 2) is having
> > > NTP server A as ref.id, meaning syncing it syncs to NTP server A.
>
> > > Again, when having NTP server D to primarily sync with NTP server C,
> > > the "root dispersion" apparently gets too high, while having the NTP
> > > server D to sync with NTP server B is fixing the problem.
>
> > > My question is why the root dispersion becomes too high upon syncing
> > > to an external server having its own local system clock as reference
> > > (i.e. NTP server C)?
>
> > Because C and B are not getting times traceable to the same source and
> > there isn't an X and Y synchronised to the same source as B, to outvote C.- Hide quoted text -
>
> > - Show quoted text -- Hide quoted text -
>
> > - Show quoted text -
>
> Ok, thanks for the quick reply!
>
> Just to clarify even more. Please correct me if I'm wrong:
>
> Because B and C are not getting their times traceable to the same
> source, NTP on D have difficulties to choose between these two time
> sources (as seen, B and C differs more than 2 secs). They are both
> survivors and both are used in time calculation, due to lack of reason
> to outvote C.
>
> The one with the lowest stratum (i.e C) gets to donate its quality
> data, including a hugh jitter, resulting in root dispersion to go
> through the roof. And a high root dispersion value gets NTP on E to
> reject NTP on D.
>
> Correct?
>
> BR,
> Martin- Hide quoted text -
>
> - Show quoted text -

Additional question: As seen in the logs, server B has a quite low
jitter while server C has huge jitter.
Why is that? Is it because of a shaky local clock on server C or is it
because of server C lacks a reliable source?

Thanks in advance!

BR,
Martin




More information about the questions mailing list