[ntp:questions] Accuracy of NTP - Advice Needed

Dennis Ferguson dennis.c.ferguson at gmail.com
Sat Dec 24 05:43:06 UTC 2011


On 23 Dec, 2011, at 22:47 , Paul Sobey wrote:
>>> I appreciate these may appear to be silly questions with obvious answers
>>> - I am grateful in advance for your patience, and any research sources
>>> you may direct me to.
>> 
>> The best (and probably only possible) solution that does give you single-digit us is to route a PPS signal to each and every server, then use the network for approximate (~100 us) timing, with the PPS doing the last two orders of magnitude.
> 
> Our problem will be that running coax around many sites to lots of machines, many of which don't have serial ports (think blades), is both highly time consuming and maintenance intensive. If we have to do it then we will but I'd like a clear idea as to the whys before I start down that particular path.
> 
> In particular at this stage I'm trying to understand more about the theoretical accuracies obtainable under ideal conditions, and most important, how to independently verify the results of any tweaks we might apply. Say I have coalesence turned on a nic and I disable it - I'd like to be able to determine the effect, if any of that change. Is it possible for ntpd (or ptpd) to accurately determine its own accuracy, if that makes sense? If not what techniques might I use to independently measure?

If you really want to do this, with either NTP (the protocol, maybe not ntpd the
implementation) or PTP, then I think the place you need to start is, unfortunately,
with your operating system kernels.

I have a board which implements a clock which can be synchronized to the 10 MHz
and 1 PPS outputs from a GPS receiver.  The board's clock resolution is about
3 ns (i.e. a 320 MHz internal clock) and the PIO interface to the board is designed
so that it should be possible to transfer time from the board clock to the computer's
clock with no more than +/- 10 ns or so of ambiguity (call it +/- 20 ns to be safe).
When I first used this with a stock NetBSD kernel (whose clock code I think was copied
from FreeBSD at some point) I was a little bit surprised to find that, despite the
low tens of nanoseconds of accuracy the hardware was capable of, sampling the card
against system timestamps gave me a result which jittered by on the order of several
microseconds.  After looking at why this was, I found that the jitter was in fact
coming from the system clock itself and was caused by the way clock adjustments are
applied at clock interrupt time (I believe some of the complaints about "interrupt
latency" of the serial PPS driver are in fact seeing this system clock jitter and
blaming it on something else; my very brief measurement of that driver found that
while the fixed interrupt latency is 100's of nanoseconds it is also relatively
constant, with outliers which are fairly easy to filter).  Needless to say, if you
can't get your system clock stable to better than microseconds you are unlikely to
be able to synchronize it to a network source at that level.  I fixed this by
replacing the clock code, instead computing the time as a linear function of the
value of the underlying counter, and getting rid of the clock interrupt discrete
adjustments altogether (except when the NTP adjustment interface is in use, though
that's a whole other story), so now my system clock doesn't jitter.

The second operating system issue that's useful to address, whether the data is
coming from NTP or PTP, is the clock adjustment system call interface.  In particular,
there are huge advantages to be gained by having a system call interface which
allows you to make both clock frequency (i.e. rate of clock advance) and time offset
adjustments, and which makes the adjustments you tell it to with great precision
(or at least, tells you precisely what it did).  The reason this is advantageous
would require a long explanation, but the summary is that it allows you to treat
the clock control process as solely a measurement process, rather than a feedback
control process, and this makes it possible to begin to look at a broader variety
of filtering procedures for incoming data to try to maximize the signal while
minimizing the noise, without the additional burden of having to consider the
stability (in the control system sense) of the adjustment process.  The adjustments
can be done open-loop.

I believe that the operating system work described above, plus maybe some work
on your ethernet card drivers, is necessary to achieve what you want with either
with NTP or PTP.  With my own implementation of the NTP daemon I can generally
keep a client machine within 10 us of a server (measured with one of cards
mentioned above in each machine) separated by (I think) 4 gigabit ethernet switches,
I think with one 10 Gbps circuit in there, carrying company network traffic,
with a 16 second polling interval.

Note that I haven't tested this with ntpd yet, mostly because I don't like the
way I had to jam support for the NTP system call interface into an otherwise
very clean kernel time implementation but haven't yet had the time to try converting
ntpd to use the native adjustment interface.  I would note, however, that ntpd
probably has some additional burden that it bears which makes this harder.  In
particular, while ntpd operates by essentially making a series of frequency
adjustments to the system clock to bring it into synchronization, it also
makes the assumption that the frequency adjustments it is asking the kernel
to make may not be accurately implemented by the kernel.  This is the
fundamental reason it implements the control process as a PLL/FLL; it assumes
it needs to correct not only the errors the underlying hardware clock is
making but also the additional errors caused by unpredictably inaccurate
implementation of the adjustments it is telling the kernel to make.  This
is why, even though it is possible to determine the system clock's
frequency error as accurately as it is possible to know it in 10 or 15
minutes (according to the Allan variance typical of the process), ntpd
can take hours and hours to work through a large frequency error.  When ntpd
does get to a point where it has integrated out this correction it should
track better (assuming the changes in system clock frequency are small),
but I haven't been able to test whether it does as well as a more straight
forward procedure that takes advantage of the fact that the operating system
makes accurate adjustments.

In any case, if you want fine timing I think you need to work on your
operating systems first.  That is the low hanging fruit.

Dennis Ferguson



More information about the questions mailing list