[ntp:questions] How to measure the quality of NTP server

Brian Inglis Brian.Inglis at SystematicSw.ab.ca
Wed Feb 17 18:14:10 UTC 2016


On 2016-02-16 06:54, Natalie Abravanel wrote:
> Before synching to a server, I would like to define some criteria for a good NTP source.

Before doing that you should come up with some timing accuracy requirements:
for us (microseconds) you need a local (GPS) reference clock; for ms you need
a reference clock on your LAN; for tens of ms, you need sources nearby on your
WAN; for less stringent needs, you can use your country pool CC.pool.ntp.org.

If you want reliability, you probably need 5 sources in different locations.
Most problems are caused by your or their ISPs networks.

Run ntpd against possible sources and log peerstats.
You appear (below) to have 10 sources configured (tc=10).

For each source, average the peerstats status second character value,
representing the select code, and shown as the tally code in the
first character of each line in ntpq -p.
(Strictly only the bottom three bits are select code, but the
top bit will never be set if you do not use broadcast modes.)
For each 5 (backup - tally #) seen, count it as 3.5, as it should
be between 4 (candidate - tally +) and 3 (outlyer - tally -).

Those averages will give you a measure of usefulness of those peers
to your server.
Near 6 (sys.peer - tally *) means it was nearly always selected
as the most accurate timing source.
Near 4 (candidate - tally +) means it was nearly always
a candidate source whose time was included in the solution.
Near 3 or less than means it was nearly always never used.

You could also plot those (adjusted) status values by peer against time,
to see which are good candidates for your sources.
You could also convert those statuses to tally codes for plotting by peer
against time, to get a visual indication of what ntpq -p would have shown.

You can then select the peerstats values for each source with the highest
averages, and plot those against time to see which are most consistent
and have the best stats.

Lowest delay and lowest offset with highest status and lowest variation
are your best bets: delay < 100 ms normally provides better time, as
the network path is shorter; offset and jitter should be in single
digit ms or lower.

> If I query ntp server:
>
> ntpq> associations
>
> ind assid status  conf reach auth condition  last_event cnt
> ===========================================================
>    1 36430  961a   yes   yes  none  sys.peer    sys_peer  1
> ntpq> rv
> associd=0 status=0618 leap_none, sync_ntp, 1 event, no_sys_peer,
> version="ntpd 4.2.6p5 at 1.2349-o Mon Jan 25 14:08:27 UTC 2016 (1)",
> processor="x86_64", system="Linux/2.6.32-431.el6.x86_64", leap=00,
> stratum=3, precision=-24, rootdelay=183.660, rootdisp=63.268,
> refid=109.226.40.40,
> reftime=da6da105.8ad1ef23  Tue, Feb 16 2016 15:22:13.542,
> clock=da6da4db.d53c285b  Tue, Feb 16 2016 15:38:35.832, peer=36430,
> tc=10, mintc=3, offset=0.584, frequency=-17.995, sys_jitter=0.000,
> clk_jitter=0.472, clk_wander=0.106
> ntpq>

Read the online docs at docs.ntp.org for definitions of these system variables.

> *         What should be decent values for : precision, rootdelay, rootdisp, frequency, sys_jitter, clk_jitter, clk_wander?
> I don't want to start syncing with in accurate source (and I have limitations to sync only to one server, which usually as I understand is less recomanded)
> I am trying to find some benchmarks for a good quality ntp server.

The ntpq values are mainly time in ms (.001 == us) or log 2 s:

precision is the latter representing the overhead of reading your system
time and should be -20 (2^-20 s ~ 1 us) or lower (-24 is 1/16 us);

rootdelay is the sum of delays from the stratum 0 timing root to your
system and should be < 100 ms;

rootdisp is the sum of error bounds from the stratum 0 timing root to
your system and should be single digit ms or lower (your values are high,
but you are synced to a stratum 2 server which may be worse than your system:
perhaps a network router?);

sys_jitter is the error bound on offset and should be single digit ms or lower;

frequency is the natural drift of your system clock crystal in PPM (us/s) and
closer to zero is better, but you can not do anything about that,
unless you want to run ntpd tests on systems and pick the best;

clk_jitter and clk_wander are error bounds on your crystal offset and drift
and should be less than 1 ms and 1 PPM respectively.

You did not mention offset and that should always be less than 128 ms
on all systems in sync; less than 1 ms is good.
  
> *         The association rv=0, is it reflect the system clock? (I am a bit confused...) I know that other associations if exist reflect the state to a specific refid

I think of rv 0 system variables as instantaneous loopstats, with some other
system stats, and rv associd peer variables as instantaneous peerstats for
remote sources.

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada


More information about the questions mailing list