[ntp:questions] Re: ntpd PLL and clock overshoot

Richard B. Gilbert rgilbert88 at comcast.net
Sat Oct 14 18:34:50 UTC 2006


David Woolley wrote:

> In article <egj4uk$a18$1 at scrotar.nss.udel.edu>,
> user at domain.invalid (probably David Mills with an IT department that is 
> overzealous about preventing spam) wrote:
> 
> 
>>The modern NTP feedback loop is much more intricate than you report. It 
>>is represented as a hybrid phase/frequency feedback loop with a 
> 
> 
> There may be various finesses, but it is still the essentially analogue
> nature of the process that causes people to complain about overshoots
> and runaway frequency excursions.
> 
> 
>>state-machine driven initial frequency measurement. Details are in the 
> 
> 
> As I understand it, the initial frequency measurement is only applied
> when cold started (no ntp.drift). Moreover, the perceived problem being
> reported here is about the initial phase correction.  It is normal
> to have to make phase corrections many times the mean phase error
> on a restart, even though it isn't normal to have to do a signficant
> frequency correction.
> 
> 
>>There are lots of nasty little approximations in the PLL/FLL code due to 
>>imprecise measurement of some time intervals. While the design targe for 
>>overshoot is 5-6 percent, I would not be surprised if in some cases it 
>>is 10 percent.
> 
> 
> I think the problem here is that a human trying to manually control the
> effective frequency might have overshot by only 0.1%.  They would have
> slewed the phase in at the maximum acceptable rate and then made a
> step change in frequency at the moment they crossed a measured phase
> error of zero, stepping by minus the average rate of phase change 
> during the slew in.  Only then would they start operating anything 
> like the current algorithm.
> 
> What they are seeing is 10% of the original error after about an hour,
> when they know that they could have achieved 0.1% in under 10 minutes,
> assuming a 500ppm slew rate limit.  (They'd probably need some automation
> to time the transition accurately enough to get to 100 microseconds, as
> assumed here.)
> 
> The best way of implementing this is probably to provide the system with
> memory about the likely phase measurement noise, but a simpler approach
> of detecting the first zero crossing would probably work quite well.
> 

I believe that Dave Mills has already explained that the problem is due 
  to changes in the adjtime() routine in both Sun Solaris and Unix. 
This being the case, the choices would seem to be:
a. Live with it.
b. Get Sun and the Linux developers to back out the change to adjtime() 
that broke ntpd.
c. Provide a custom adjtime() for each platform affected.  I suspect 
that the routine in question runs in kernel mode and may be part of the 
kernel so that this may be easier said than done!




More information about the questions mailing list