[ntp:hackers] Does ntpd need to whine more ?

David L. Mills mills at udel.edu
Mon Oct 3 15:41:53 UTC 2005


Poul-Henning Kamp wrote:

> In message <43413DB6.308 at udel.edu>, "David L. Mills" writes:
>
>> There is a fundamental misunderstanding here.
>
>
> Agreed, but we may not agree what the misunderstanding is.
>
>> There is a fundamental misunderstanding here. The clock discipline is in
>> fact a flywheel which is nudged at each poll update to correct the time
>> and update the frequency estimate. If you stop nudging it for awhile it
>> may accumulate error, but not much. How long should you wait before
>> declaring unsynchronized?
>
>
> I don't think it is unreasonable to expect people to have a plain
> XO (unless they tell NTPD otherwise) and therefore few systems
> actually have a recoverable offset after one day on the island.
>
> And expecting to recapture with a poll of 1024 after free-wheeling
> for a day is waaaay more optimistic than 25 cent XO's deserve.
>
> I would say that once the shift register runs dry, we should
> reduce the poll rate (if minpoll allows) for every empty shift
> register we see:
>
> That way you should have a scenario like:
>
> 0 poll = 1024 shift=11111111
> 1024 poll = 1024 shift=11111110
> 2048 poll = 1024 shift=11111100
> 3072 poll = 1024 shift=11111000
> 4096 poll = 1024 shift=11110000
> 5120 poll = 1024 shift=11100000
> 6144 poll = 1024 shift=11000000
> 7168 poll = 1024 shift=10000000
> 8192 poll = 1024 shift=00000000, reduce poll, start timer 512 * 8
> 12288 poll = 512 shift=00000000, reduce poll, start timer 256 * 8
> 14336 poll = 256 shift=00000000, reduce poll, start timer 128 * 8
> 15360 poll = 128 shift=00000000, reduce poll, start timer 64 * 8
> 15872 poll = 64 shift=00000000 at minpoll, do nothing
>
> That way we are back to 64s poll rate after 4h24m and that sounds
> very compatible with typical XO performance.
>
> In general the majority of NTPD synchronized machines suffer from
> diurnal wobble, so even 12 hours wouldn't be unreasonable.
>
>> The clock discipline algorithm is very good at estimating the optimum
>> time constant.
>
>
> Actually it isn't.
>
> It is far too eager to wander up to 1024 and due to the time delay
> of the shift register it takes ages for it to find out it got too
> far and it usually ends up stepping to get back in sync.
>
> The worst case situation is actually incredibly common: You wander
> up to 1024, and temperature changes, so your offset grows. The
> shift register filles up with monotonically increasing offsets and
> we get a systematic delay of [3...4] x 1024 seconds before the PLL
> ever hears about the existence of the offset.
>
> Iburst mode is certainly a big improvement but not very widely used
> yet.
>
>> You are invited to concoct
>> counterexamples, but I will believe them only if confirmed by actual
>> scenarios in vivo or better yet in simulation.
>
>
> My first and primary beef is that we do not whine loudly when we
> have lost reachability, no matter how long this has been going on.
>
> Can't we at least agree that after being unreachable for N hours
> we should syslog something rather severe ?
>
> I'd propose 24 for N, but even 168 will improve on the current
> situation where people have no inkling that their system has
> wandered off into the sunset.
>
>> The local clock is a terrible idea, unless for the only purpose to
>> wrangle a herd to a common timescale in response to a loss of outside
>> synchronization.
>
>
> Agreed. I belive some OS bogusly ships with a stratum 11 localclock
> and whoever decided that should be forced to polish the hands of
> Big Ben until he or it wears out.
>
> But in this case, localclock only obscures the problem, it is not
> the basic problem.
>



More information about the hackers mailing list