[ntp:hackers] 1 msec jitter on serial PPS tamed

Dave Hart davehart at davehart.com
Mon Feb 23 08:08:25 UTC 2009


After beating my head on it for almost two weeks, I finally
eliminated the +/- 500 usec jitter on my Windows ntpd with
serial PPS.  You can see the difference in this loopstats
file using your favorite graphing tool:

http://davehart.net/ntp/refclock/loopstats.20090223

or just take a look at this zoomed-in view of it from
Meinberg's NTP monitor:

http://davehart.net/ntp/refclock/loopstats-20090223-zoom.gif

Before about 0430 I was running with the old stuff.  The
difference afterwards is remarkable.

C:\Users\davehart>ntpq -c "rv 7261" ntp.davehart.net
status=9614 reach, conf, sel_sys.peer, 1 event, event_reach,
srcadr=GPS_NMEA(1), srcport=123, dstadr=127.0.0.1,
dstport=123, leap=00,
stratum=0, precision=-20, rootdelay=0.000,
rootdispersion=0.000,
refid=GPPS, reach=377, unreach=0, hmode=3, pmode=4, hpoll=4,
ppoll=10,
flash=00 ok, keyid=0, ttl=64, offset=0.021, delay=0.000,
dispersion=0.258, jitter=0.005,
reftime=cd4cd732.fe45eed4  Mon, Feb 23 2009  8:05:06.993,
org=cd4cd732.fe45eed4  Mon, Feb 23 2009  8:05:06.993,
rec=cd4cd733.b9d6b62a  Mon, Feb 23 2009  8:05:07.725,
xmt=cd4cd732.ce92d5d1  Mon, Feb 23 2009  8:05:06.806,
filtdelay=     0.00    0.00    0.00    0.00    0.00    0.00
0.00    0.00,
filtoffset=    0.02    0.02    0.01    0.02    0.02    0.01
0.02    0.02,
filtdisp=      0.00    0.27    0.53    0.80    1.02    1.29
1.55    1.77

I'm sure it'll nudge a little closer to 0 offset before it
finishes settling down.

I saw evidence a couple of days ago that the "serial PPS"
jitter that mysteriously appeared on Patch Tuesday was in
fact interpolation error.  Specifically I noticed it went
away during a relatively steep slew only to reappear as the
slew faded away.  Since then I've been furiously stewing and
hacking on ntpd's Windows interpolation code trying
different approaches.  It can be satisfying to debug a new
hunk of code implementing a new design, but less so when
after polishing it becomes clear it's mostly headed for the
bit recycler without even a glimpse from another
programmer's eyes.

I won't bore you right now with any of the failed
approaches.  What I hit on that is paying off is very much
along the lines of NTP's minimum delay clock filter.
Details are likely to change, but right now it's using a
48-deep history of clock/counter pairs sampled 25 times per
second, or about 2 seconds of correlations.  When
interpolating current time from a current performance
counter, calculate the interpolated time for each of the 48
samples, and use the highest (latest) one.  This is based on
the observation that error introduced by a stale system time
relative to its paired counter sample is negative.
Therefore the sample pair that results in the highest
interpolated time is closest to correct.

Each of these 48 calculations is basically

time = baseline_time + ((counter - baseline_counter) * 1e7 /
freq

where all those varibles are 64 bit except freq, the amount
the performance counter increments each second.  Undoubtedly
there is an equivalent and more efficient way to determining
which of the 48 baselines to use.  1e7 or 10 million is the
number of units in the Windows timescale per second.

I need to spend some more time playing with it to understand
what frequency of sampling and what depth of history is
going to work best across a wide variety of hardware.  I'm
thrilled to finally nail this jitter bug.

Cheers,
Dave Hart



More information about the hackers mailing list