[ntp:questions] Idea to improve ntpd accuracy

Charles Elliott elliott.ch at comcast.net
Sat Feb 27 13:23:56 UTC 2016

Here are two other ideas you might want to consider to improve accuracy:
Feed forward (PID) control on system temperature and significant changes in
system load.  Let me describe the situation.  Right now I have only 3
computers on a home LAN.  Two, one of which is my main system for email and
dictating research notes, process BOINC/Einstein at Home work units and one
serves only as an external-facing NTP client that distributes time to the
other two.  Due to environmental pressures and the need to conserve energy,
the two computers processing Einstein at Home WUs stop that at midnight and
resume at 5 AM automatically.  Ambient temperature affects all 3 computers
equally, thought its effect is most noticeable on the external-facing NTP

When the two computers shed the load at midnight, the offsets immediately
decrease to between -1 and -2 ms; then, both machines spend all night
recovering to zero offset at about 5 AM, whereupon the offsets jump up to
between +0.5 and +1 ms when the load is resumed.  At about 8 AM both
machines are back to zero offset.  

All during the night, the ambient temperature is falling because
BOINC/Einstein at Home consumes between 450 and 500 watts of GPU and CPU power
on each machine on which it runs.  Consequently, the frequency offset (in
ppm) declines all night, reaching a minimum at 5 AM, and then increases
steadily on all three machines until it reaches a steady state at about 10
AM.  It hovers at a fairly steady state until midnight, whereupon the fall
and rise pattern is repeated.

PID control is proportional + integral + derivative.  Normally, a control
signal is proportional to an error signal, such as offset error, in
proportion to the sum of (offset) errors (integral), and in proportion to
the change in (offset) error (jitter). However, for temperature, there is no
concept of error; it is what it is.  For control as a function of
temperature (T), there clearly is a direct functional relationship between
ppm and T, such as ppm = Kp T.  Jitter lags both temperature and system load
changes by about 2 hours, so I am not sure derivative control is wise or
even possible.  Integral control is what yanks the elevator up to floor
level when the brakes are slipping.  It can be hard to get right because of
the yo-yo effect.  Obviously, more thought has to be put into all this.  But
I have been watching these relationships for several weeks with NTP Plotter,
and they are obvious and fairly consistent.   

What started me off on this, is that CPUID (www.cpuid.com), makers of CPU-z,
HardwareMonitor, and HardwareMonitorPro, the latter two of which do an
excellent job of monitoring all manner of system and peripheral
temperatures, is offering a "System Monitoring SDK" for evaluation.  The
idea would be to record the relationships between various temperature
measurements and ppm and then try to adjust the PPM adjustment with
temperature to see if any improvement is offsets results.  However, CPUID
never replied to my email.

Charles Elliott

-----Original Message-----
From: questions
[mailto:questions-bounces+elliott.ch=comcast.net at lists.ntp.org] On Behalf Of
Sent: Thursday, February 25, 2016 4:52 PM
To: questions at lists.ntp.org
Subject: [ntp:questions] Idea to improve ntpd accuracy

This may or may not be worthwhile, but I thought I'd throw it out there and
see what happens:

Recent work testing some microsecond-accurate NTP servers lead me to an idea
that could improve accuracy of measurements made by ntpd. These NTP servers
have hardware timestamps on receive but that's not possible on transmit w/o
a custom NIC. I've seen this issue discussed before.

The next best thing is to generate the transmit timestamp based on a guess
as to how long it takes the NIC to get on the wire and send the packet. That
works pretty well as long as there's no other network traffic. In this
situation, it is possible to make use of microsecond accuracy in an NTP

Now, add some typical network traffic and the time it takes the NIC to get
on the wire becomes unpredictable to the tune of 200us or more (for
100 base-T Ethernet). The server's microsecond accuracy is largely lost in
the noise.

The NIC generates an interrupt after the packet is sent which can result in
a fairly accurate trailing hardstamp. The problem is...the packet is already
gone and has the wrong transmit timestamp.

Here's my idea:

What if the poll response packet contained a flag or indication of some sort
which means "this is an approximate transmit timestamp". That packet would
then be immediately followed by a second response packet with a more
accurate transmit time. The second packet could be otherwise identical to
the first, or it could be a new flavor of packet that contained only the
transmit time (that would save on network bandwidth).

The ntpd process would need to use the receive time of the first packet (the
one with an approximate tx timestamp) and merge in the following accurate tx
timestamp before performing the normal processing associated with a poll

Here are the pros and cons I can think of:


* Possible accuracy improvement of 1-2 orders of magnitude. I know ntpd
already does some work to try and filter out network delay variation so the
improvement might not be a full 2 orders of magnitude.
* Could potentially be made compatible backwards compatible with ntp 3/4


* Increased network traffic
* Improvement to that level of accuracy might not be of interest to anyone
* Could be a fair bit of work for at least a couple of folks
* I may have (or probably) missed some stuff regarding network behavior that
would reduce the level of improvement that could be realized.
* Perhaps this is less of an issue on G-bit Ethernet?

Wondering if anyone thinks this idea is worth pursuing further...?

questions mailing list
questions at lists.ntp.org

More information about the questions mailing list