[ntp:questions] Server offset included in served time?

David Woolley david at ex.djwhome.demon.co.uk.invalid
Mon Sep 15 20:41:46 UTC 2008


Martin Burnicki wrote:

> 
> But what about the behaviour shortly after startup? The NTP daemon tries to
> determine the initial time offset from its upstream sources. Unless that
> initial offset exceeds the 128 ms limit it starts to slew its system time
> *very* slowly until the frequency drift has been compensated and the
> estimated time offset has been minimized.

I've had some thoughts about this.  As I see it the problems are:

- ntpd doesn't have any persistent history of jitter, so has to start by 
assuming that the jitter is of the same order of magnitude as the offset 
(what people looking at offset often forget is that they have the 
benefit of hindsight).

- ntpd is already at the shortest permitted time constant, and going 
lower would require faster polling, or compromising the level of 
oversampling, or length of the initial best measurement filter.  It is 
this lower bound on the time constant that means that ntpd can get into 
a position where it should know that the time is wrong, but cannot fix 
it quickly.

- the step limit is fixed at configuration time.

One could deal with the first by making the smoothed jitter be 
persistent.  That way ntpd can detect whether its offsets exceed 
reasonable jitter for the system, before it has enough history for the 
session to know the jitter from measurements just in the current session.

Once one knows that offsets are high compared with jitter, one can 
address the time constant issue.  Normally jitter << offset would tend 
to force the time constant down, but is has nowhere to go down.  Maybe 
what is needed is to allow the degree of oversampling to compromised 
until one first begins to get offsets of the same order as the jitter. 
Maybe also use less then 8 filter slots.

This may compromise the stability of downstream systems, so it may be 
necessary to stay in an alarm state until this stage of the pocess is 
complete.  This may be a problem for people who want a whole network to 
power up at the same time, and quickly.

If there were also a persistent record of a high percentile figure for 
the offset, one could also use that to set the step threshold during the 
startup phase, maybe reverting to the standard value, later, to give 
better tolerance of major network problems.
> 
> While the system time is being slewed it may be e.g. 120 ms off, and when
> the daemon sends the system time to its clients then it will serve a time
> which is 120 ms off.

To some extent the fact that systems are already experiencing this 
suggests, to me, that one might not need to alarm the time during a 
temporary short loop time constant phase at startup.




More information about the questions mailing list