[ntp:questions] Re: ntpd PLL and clock overshoot

David L. Mills mills at udel.edu
Sat Oct 14 16:10:19 UTC 2006


You are victim of faulty engineering intuition. See Chapter 4 in The 
Book. See the graphs therein showing the response to initial 
phase/frequency excursions compiled both in simulation and practice with 
the current algorithm. If your experience is markedly different from 
these data, then suspect something in the operating system, in 
particular unexpected behavior in the adjtime() system call.

Your scenario where the operator slings the frequency as the response 
crosses zero is equivalent to a frequency-lock model which disregards 
the initial phase error. This is in fact the model for the initial 
frequency estimate when the frequency file has not yet been created. 
This is most important when the initial poll interval is very long, as 
it must be with the telephone modem driver.

With all of the machines here, including FreeBSD, Solaris, HP-UX, SunOS, 
Tru64 and HP-UX, the loop response in steady state is as I reported 
earlier. The results with Linux are highly suspect, as at least in some 
cases the timer interrupt frequency has been changed significantly 
without compensation in the kernel parameters. I have recommended to 
avoid Linux in any case involving precision timekeeping.


David Woolley wrote:
> In article <egj4uk$a18$1 at scrotar.nss.udel.edu>,
> user at domain.invalid (probably David Mills with an IT department that is 
> overzealous about preventing spam) wrote:
>>The modern NTP feedback loop is much more intricate than you report. It 
>>is represented as a hybrid phase/frequency feedback loop with a 
> There may be various finesses, but it is still the essentially analogue
> nature of the process that causes people to complain about overshoots
> and runaway frequency excursions.
>>state-machine driven initial frequency measurement. Details are in the 
> As I understand it, the initial frequency measurement is only applied
> when cold started (no ntp.drift). Moreover, the perceived problem being
> reported here is about the initial phase correction.  It is normal
> to have to make phase corrections many times the mean phase error
> on a restart, even though it isn't normal to have to do a signficant
> frequency correction.
>>There are lots of nasty little approximations in the PLL/FLL code due to 
>>imprecise measurement of some time intervals. While the design targe for 
>>overshoot is 5-6 percent, I would not be surprised if in some cases it 
>>is 10 percent.
> I think the problem here is that a human trying to manually control the
> effective frequency might have overshot by only 0.1%.  They would have
> slewed the phase in at the maximum acceptable rate and then made a
> step change in frequency at the moment they crossed a measured phase
> error of zero, stepping by minus the average rate of phase change 
> during the slew in.  Only then would they start operating anything 
> like the current algorithm.
> What they are seeing is 10% of the original error after about an hour,
> when they know that they could have achieved 0.1% in under 10 minutes,
> assuming a 500ppm slew rate limit.  (They'd probably need some automation
> to time the transition accurately enough to get to 100 microseconds, as
> assumed here.)
> The best way of implementing this is probably to provide the system with
> memory about the likely phase measurement noise, but a simpler approach
> of detecting the first zero crossing would probably work quite well.

More information about the questions mailing list