[ntp:questions] Taming the pinball machine
David L. Mills
mills at udel.edu
Thu Nov 6 04:21:22 UTC 2003
If your technical occulometer glazes over after a paragraph or two of
wicked technical jargon, perhaps you should skip this message. I'm
putting it in a public place to respond to previous reports and address
the widest nerd audience.
There have been several recent reports that certain configuration
scenarios experience somewhat naughty behavior when a valid
synchronization source is found after a holdover during which only the
local clock driver winds the clock. The result is a nasty frequency
surge that takes some time to dissipate. After a couple of days of
simulation and analysis I tracked down the cause and fixed it, with
happy byproduct the behavior with very long poll intervals and high
network jitter is much improved.
There were three minor design changes:
1. A bug was found and fixed in the code that adjusts the poll interval
during the interval just before setting the clock for the first time.
The conditions under which this can occur are arcane and unlikely for
2. The clock discipline state machine was modified to improve the
frequency estimate when started for the first time (no ntp.drift file
yet created) and when a very large time step is suspected of being
actually due to a large frequency step.
3. Frequency management at very long poll intervals and very large
network and using burst modes was improved by calling the clock
discipline only once after each burst.
Some idea of the conditions under which these improvements are mildly
dramatic is when the poll interval is much above 1024 s and jitter
standard deviation 50 ms. That jitter regime is some 100 times worse
than you might see on a quiet LAN and with many peaks above 100 ms and
some over one second. Simulation runs with jitter 300 ms confirm the
thing is stable, even if clock steps occur every few hours.
There were a couple of useful lessons learned during the exercise:
1. Don't use the kernel time discipline if you intend to let the poll
interval creep much over the default maximum 10 (1024 s). The current
kernel implementation doesn't have the fancy algorithms so useful at
longer poll intervals.
2. Really DO use burst mode (both iburst and burst keywords) if you
intend to operate much above the default maximum.
3. Verify the kernel is or is not hallucinating using a disable ntp
configuration command and letting the daemon run for a few days while
watching the loopstats. The frequency should be a good clean straight
With the revised code running in simulation, the results are really
impressive. With a wild frequency step of 400 PPM and 50 ms jitter and
maxpoll set to the maximum 17 (about 36 hours), the poll interval climbs
to 17 in about a week. There are no clock steps and the residual time
offsets are in the 30-50-ms range. In other tests the frequency was
measureed accurately and with no annoying surges. Happy chime.
I recommend that only folks that fit the profile above try the test
version at ftp.udel.edu:pub/ntp/software/ntp-4.2.0.gz. This is only a
temporary spot under an assumed name until the crew rolls a suitably
versioned model to the web.
More information about the questions