[ntp:questions] Re: ntpd PLL and clock overshoot

David Woolley david at djwhome.demon.co.uk
Tue Oct 10 07:00:11 UTC 2006

In article <V4WdnThd8_TCzbfYnZ2dnUVZ_omdnZ2d at comcast.com>,
Richard B. Gilbert <rgilbert88 at comcast.net> wrote:

> I can't confirm the 100 percent but the current version doesn't work too 
> well with my GPS reference clock at startup!  I had something like a 90 
> millisecond offset when I started ntpd.  Over the next few minutes it 
> corrected that offset but didn't stop, or even slow down, when it hit 
> the zero line.  It kept right on going until it had a -9 millisecond 

That's only a 10% overshoot, which is only twice the design target, so
is a different problem.

The problem you are seeing is that (ignoring its ability to modify the
loop time constants) ntpd uses a simple analogue process controller 
type mechanism to control the phase, based on measured phase errors.

Such processes don't have any prior knowledge of the amount of noise in
the phase error signal, whereas a human does.  The human realises that,
for example, 89.9 out of the initial 90ms are the initial transient, whereas
the ntpd control loop assumes it could all be a random excursion and the
actual clock may be correct.  (Note that some instances of ntpd may be
operating in contexts where all the 90ms is phase noise.)

Such linear control can either overshoot and converge quickly, or can
be over, or critcally damped, but take longer to converge in the first

My feeling is that there is scope for ntpd to learn the likely phase
noise and to use a fast and dead beat way of getting into the noise band
before applying the linear control loop.  Once the systematic errors
have been removed, the gaussian noise assumptions that underly the
analysis of the behaviour of the current algorithm may well apply and
it may then be the best algorithm for maintaining lock.

I think there may well be a good case for using Nick McClaren's, statistics
based, leased squares fit, at least during initial acquisition, rather than
the linear controller that is currently used.

Note, it may be necessary to ensure that time is not served before the
error is within the noise region, as that may cause downstream servers
to do their initial acquisition based on the initial catch up of their
server, rather than the true time, and might cause instabilities in the
network, taken as a whole.

A related problem is that ntpd has no built in knowledge that crystals
only vary by 1 or 2 ppm with temperature, so when presented with transients
can end up believing it needs a long term frequency correction of 500ppm.
My feeling is that there should be a coarse control loop that can cope
with long term changes (including sudden ones like changing motherboards)
and a fine control loop with only a limited control range (although maybe
the whole range can be used for the phase correction).

A real life example of this is a CD drive, where fast, fine, tracking
is applied to the read head itself, using voice coils, and longer term
corrections are applied using the head positioner.

More information about the questions mailing list