[ntp:questions] making sense of stats offset values [or trying to...]
david at ex.djwhome.demon.co.uk.invalid
Tue Apr 28 06:57:26 UTC 2009
Bruce Lilly wrote:
> Running ntp
> version="ntpd 4.2.4p5 at 1.1541-o Mon Jan 19 15:18:44 UTC 2009 (1)"
> as reported by ntpq, on opensuse 11.1 Linux, if that matters.
> I'm trying to make sense of the time offset numbers reported in
> loopstats and peerstats files and by ntptrace.
> The documentation is unclear on a few points, and ntptrace appears to
> be broken:
> 1. peerstats:
> the sign is unspecified in the documentation, but has been described
> here as such that adding
For statistical purposes, the sign shouldn't matter. The only time it
might matter is if you were trying to retrospectively correct the time
> the offset to the local clock should give time equivalent to the
> remote peer; i.e. a positive offset
> means that the local clock is early compared to the remote clock,
> and a negative offset means
In normal operation, offset should tell you more about random
measurement errors, than about the peer. There are strong arguments
that this is often not the case in real life, but that represents a
failure of ntpd to work well in the real world. If a clock discipline
can be sure that a certain proportion of the offset represents a real
local clock error, it should be attempting to remove that offset
promptly. ntpd's position is that it is more likely to represent
> that the local clock is late.
> Is that correct? If so, a clarification to the description in the
> "monopt" documentation might be
> helpful to others.
> 2. loopstats:
> the distributed documentation is totally unclear. I have found a
> Sun Microsystems document that
> describes the offset as "how much time (in seconds) the clock will
> be adjusted by in the loop cycle".
> a. Awkward wording notwithstanding, is that correct?
Definitely not correct. It is the input to a combining and weight
process, which in turn is input to a low pass filter, which has a time
constant much longer than the poll interval, so only a small part of any
particular offset measurement gets applied in any one interval.
> b. is the adjustment intended to remove the entire offset between
> the local clock and the best-guess
No. As above.
> estimate of UTC, i.e. can the loopstats offset field be
> interpreted as the offset between the local
> clock and the best-guess estimate of UTC? Or something else?
Yes, but with the qualification that the best guess has an error band
which is comparable with the offset.
> c. what about the sign in this case?
> 3. ntptrace output:
> The man page (oddly enough, with version in the lower left as
> 4.1.1b-r5) gives an example:
I believe ntptrace is unsupported.
> % ntptrace localhost: stratum 4, offset 0.0019529, synch distance
> server2ozo.com: stratum 2, offset 0.0124263, synch distance
> usndh.edu: stratum 1, offset 0.0019298, synch distance 0.011993,
> [let's ignore the missing stratum 3 and the disappearing refid
> and text
> On each line, the fields are (left to right): the host name,
> the host stratum, the time offset between
> that host and the local host (as measured by ntptrace ; this is
> why it is not always zero for "localhost ")...
It is (simplifying slightly) the time offset between the local clock
when the response is received and the local clock on the server, the
actual return propagation time ago, plus half the round trip time.
Either time may have large reading errors, due to clock resolution, e.q.
W32time has a reading error that can exceed 10ms.
> This is completely baffling:
> a. what does it mean for the local host to have a time offset from
It means that it takes a finite time for IP messages to propagate from
one process to another through the networking layer, and for process
scheduling to switch between processes. On a machine with poor clock
resolution, there could be a large measurement difference for a small
> b. are the offset values cached or determined from cached data [if
> I run ntptrace twice a couple of
> seconds apart, I get offset values identical from one run to
> the next down to the last reported digit,
> while the synchronization distances vary significantly]?
> c. is it intended that the offset reported by ntptrace bear no
> resemblance to that reported by ntpq -p
> and in peerstats?:
They would generally be larger, because ntpq offfsets represent the
lowest delay values from the last eight polls, spread over from several
minutes to over an hour, whereas ntptrace represents a one off,
ntprace is also probably running at normal priority and without any
memory locked into physical memory.
ntptrace offsets are in seconds, whereas ntpq offsets are in milli-seconds.
> # ntpq -p
> remote refid st t when poll reach delay
> offset jitter
> *megatron.blilly 22.214.171.124 2 u 27 64 377
> 2.927 0.296 0.122
> # ntptrace
> megatron.blilly.net: stratum 2, offset 0.002120, synch
> distance 0.024161
> Note that ntpq reports an offset of 0.296 milliseconds from the
> local host to its system peer, while
> ntptrace reports an order of magnitude larger offset!
2.120 milli-seconds is not an order of magnitude different. I think you
are expecting milliseconds to have three zeros after the decimal
point; they don't!
> Should I really believe what ntptrace says, viz. that the
> local host is offset from a remote
> stratum 1 server by a mere 3 microseconds in spite of orders
> of magnitude larger values of
> jitter (and that from a program that says the local host is
> offset from itself by hundreds of
An instantaneous offset reading can be anywhere within the error band
that jitter is trying to estimate, so you can believe it as much as any
other reading that is within the error band.
> Ultimately I'm trying to do a couple of things:
> 1. determine if the loopstats offset value can be correlated to
> something informative about the
> system time of the local host, such as an estimate of the local
> clock offset from UTC.
> 2. determine the best-guess estimate of the offset of a given peer
> from UTC.
If you operate ntpd in the environment for which Dave Mills designed it,
the best guess, in real time, should always be zero. Some people,
including myself, believe that there are real life cases, e.g. for
start-up and temperature induced frequency transients, where this
assumption breaks down.
More information about the questions