[ntp:questions] Re: NTP stepping issue

David Woolley david at djwhome.demon.co.uk
Tue Oct 12 20:58:20 UTC 2004

In article <mailman.23.1097522252.72027.questions at lists.ntp.isc.org>,
Robert Rati <Robert.Rati at motorola.com> wrote:

> tinker panic 0

I assume that this is a typo and you meant "tinker step 0".
What I suspect this does is to disable the 128ms check completely,
with the result that the normal control loop will continue to be
used, but:

> The system repeatedly threw out
> frequency error 512 PPM exceeds tolerance 500 PPM

the phase error into the control loop will be very large, and the
proportional part of the feedback function will demand a large frequency
offset, which will hit the end stop.  The integral part will try and
increase the slew rate as well.  ntpd uses a proportional-integral linear
(to a first approximation) control loop, using phase error as the process
variable and frequency as the controlled parameter - or at least this
is what RFC 1305 effectively says, although I'm having trouble finding
the integral part in the (oldish) version of the code I've got to hand.

> but the time was slowing being slewed correctly.  Unfortunately, it 
> never stopped being slewed.  The client slowly slewed to the time 
> provided by the server, and then right past it.  It went from being 30 

If it is behaving as described above, it will, being a basically linear
system, try to do a damped oscillation.  Presumably to improve convergence
speed, the loop is underdamped, so will overshoot.  In your case, it won't
be completely linear, because of hitting the end stop.

> seconds behind to being 2 minutes fast and counting.  Is this an 
> expected behavior?  It would seem to me that once the NTP daemon on the 
> client reached the time being served by the server, that the daemon 
> would stop slewing and be in sync.

What you seem to want is a step mode that starts a slew, and
continues the slew until enough time has passed to clear the error that
existed at the start of the slew (or possibly until the sign of the
error changes), then revert to the frequency just before it started the
big correction.

> Also, is the ntp daemon supposed to be able to handle a time difference 
> of 30 seconds without stepping (ie only slowly slewing to correct the 
> time difference)?

Any large correction is an extreme error condition under which normal
operation has broken down and there is a presumption that the world
is falling to pieces.  I think you can safely assume that it is not designed
to to cope with phase errors that cause the frequency to hit the end stop,
which, assuming an accurate crystal, means rather less than 1/2000th of 
the time to the first zero crossing (in the linear region, the slew will
accelerate and then slow again).  RFC 1305 suggests 39 minutes to the first
zero crossing for a 100ms error.  Assuming that behaviour is linear,
1/2000 of 39 minutes is just over a second, so I would suggest that it is 
probably undesirable to set the clock out by more than about half a second,
and certainly no more than one second.

Bigger errors will cause the maximum root distance to be exceeded if
all the upstream servers are unbroken.  You should be able to ignore
the variations between different reference clocks.  That means that
the dynamic range is adequate for valid use.  Obviously, if you don't have
real reference clocks, you will be advertising false accuracy information
about clocks that could move by more than this, but that isn't a scenario
it is designed for.

> In addition to that, can the ntp daemon handle the time being changed on 
> the local system (ie via the date command) while the daemon is running?

There is no requirement to be able to support that, as it will never happen
on a system that is capable of operating NTP properly.  In practice, it will
be perceived as a serious problem with the upstream time sources.

More information about the questions mailing list