[ntp:questions] Voting of the remote peer

Dave Hart davehart at gmail.com
Sat Nov 12 23:52:54 UTC 2011


On Sat, Nov 12, 2011 at 21:11, A C <agcarver+ntp at acarver.net> wrote:
> Still working out the issues with the crashing but this is a new question so
> a new thread.

It's actually hanging in C runtime code.  Crashing, to me, means
abnormal termination, this is hanging due to infinite looping
(consuming much CPU).

> Here is the peer list from ntpq.  My question is, what has determined that
> the remote peer 130.207.165.28 is the best one to sync against even though
> it has the worst offset and jitter.  It consistently stays this way no
> matter how long ntpd is running.

[DH: condensed peers billboard to avoid breaking rows into two lines]

    remote       st t when poll reach  delay   offset  jitter
==============================================================
-69.36.227.90    2 u  404  512  377   35.574   -0.367   0.449
-169.229.70.95   2 u  463  512  377   39.506   -1.586   0.376
+208.87.221.228  2 u  401  512  377   37.755    2.173   0.249
*130.207.165.28  2 u  414  512  317   82.198    4.112   0.637
-131.144.4.10    2 u  363  512  377   84.720    1.632   0.589
o127.127.22.1    0 l   11   16  377    0.000    0.006   0.061
+127.127.28.0    0 l   42  128  377    0.000    4.229  10.549

ntpd's method of determining the system offset (which drives the clock
discipline) is not easy to summarize in a few words.  The best
explanation can be found in Dr. Mills' book, Network Time
Synchronization: the Network Time Protocol on Earth and in Space,
Second Edition. [1]  The price [2] is nothing to sneeze, but then
technical books don't sell in volume like Harry Potter.  Many of the
details presented beautifully rendered in the book are also available
in the NTPv4 protocol specification, a much-uglier plain text document
published as RFC 5905. [3]  Since finishing the book's 2nd edition,
Dr. Mills has spent quite a bit of time enhancing the distribution
HTML documentation with more details previously found only in the
books and RFCs.  The "How NTP Works" page [4], not present in the
4.2.6 documentation, is a good starting point.

The software has continued to evolve since 4.2.6, which I know you are
using.  One of my favorite changes since 4.2.6 is that higher stratum
no longer causes a clustering bias against the source, which makes
sense as the rest of the algorithms already do a fine job of weighing
the relative quality of sources based on error budgets and mutual
agreement.  Specifically, the lambda_p clustering metric referenced in
RFC 5905 consists of lambda (root distance of the peer) plus peer
stratum times MAXDIST (~1 second).  In my experience, this was a
practical annoyance as lower-stratum servers reached over a higher
delay and jitter, and thus higher root distance, tended to be used in
preference to higher-stratum sources with lower root distance.  This
caused me to tend to configure remote sources all of the same stratum
to avoid the bias.

This clustering algorithm refinement also helps with another of my
favorite improvements since 4.2.7, the re-implemented "pool" directive
for automatic server discovery.  In 4.2.6 this simply spun up a few
normal server associations based on a DNS round-robin such as provided
by the NTP pool project. [5]  In 4.2.7, pool acts very much like
manycastclient, automatically spinning up preemptible associations
(using the DNS round robin vs. manycastclient's multicast to discover
servers) until a desired number of sources (tos maxclock) is reached,
and later discarding (preempting) sources which have not been
contributing to the time solution over a number of poll cycles,
causing more (and hopefully different) sources to be spun up to
replace them.  As a result, one can get the same clever automatic
selection and refinement of sources from the NTP pool (or any other
DNS name that resolves to multiple distinct NTP servers' IP addresses)
previously effectively available only on a single network in most
cases, as multicast IP is typically not transported on the Internet,
and often not transported across subnets within a single site.

You should now have enough reading material to keep you busy while you
track down the dtoa() hangs on that SPARCstation IPX.

[1] http://www.eecis.udel.edu/~mills/book.html
[2] http://www.amazon.com/dp/1439814635
[3] http://tools.ietf.org/html/rfc5905
[4] http://www.eecis.udel.edu/~mills/ntp/html/warp.html
[5] http://www.pool.ntp.org/

Cheers,
Dave Hart


More information about the questions mailing list