[ntp:hackers] NTP code and documentation wiggles
David L. Mills
mills at udel.edu
Thu Jan 10 19:13:09 UTC 2008
Guys,
1. The documenation cleanup project continues. The howtos and reference
clock pages have been cleared of broken links, HTML syntax errors and
clearly evident errors. The Windows howto was a challenge, but I gave up
on the Solaris howto. Some pages have been reformatted to style used
elsewhere in the collection. Some of the options pages have been cleaned
up. The monitoring options page was a mess; it took me a day to clear
the weeds, not to mention the broken and misleading prose.
2. Brian pointed out a problem with the initial frequency training
scheme and the Solaris kernel. Close inspection revealed other ways the
training could be improved. The result is a reliable estimate of initial
frequency well within 1 PPM with no more residual offset error than if
the frequency file was present and accurate.
3. The rate grooming and KoD code works too well. While it is not
possible for a compatible client to overrun a compatible server, it is
possible to cheat by restarting a client shortly after completing an
iburst, which did result in a KoD. This turns out to be a little
irritating during testing. I modified the received KoD response to set
the headway to the max rather than to disable the association. There may
still be wiggle room in this area.
4. Stale certificates and leapsecond values can be a problem with
Autokey. The common case is when a trusted host key or certificate is
refreshed and eventually all dependent servers and clients need to
refresh their certificate trails. As there is no revocation capability
in Autokey, a certificate is valid until its lifetime expires.
Therefore, both the old and new certificates, as well as all
certificates signed by the old and new host keys, remain valid until
they expire.
The only way to make sure media values are current is to restart the
association. The current code does this once per week for each
association; this updates the certificate cache and leapseconds values
if the downstratum servers have refreshed media.
Still thinking about the leapsecond issue. This turns out to be much
harder than you might imagine. It is the only issue that requires an
absolute epoch. Everywhere else the result is only a nudge one way or
another relative to the current epoch. The seconds to the absolure epoch
shifts one way or another as the clock is nudged and stepped, especially
near the leap epoch. The current code does a good job of consensus,
voting, etc., but arming and executing the leap requires that the local
clock be stable within at least a few seconds before the leap.
There is also a question when kernel assist is not available at time
time of the leap. The current code simply sets the clock back one second
on the assumption the actual kernel clock reading routine will not be
stepped back for any cause unless more than one second, which is what my
kernel code does. However, it is doubtful that other kernels have
adopted that approach. Note that it is impractical to used adjtime(), as
its slew rate would continue for over a half hour.
Dave
More information about the hackers
mailing list