[ntp:hackers] Does ntpd need to whine more ?

Danny Mayer mayer at ntp.isc.org
Mon Oct 3 19:13:13 UTC 2005


Poul-Henning Kamp wrote:
> In message <43413DB6.308 at udel.edu>, "David L. Mills" writes:
> 
> 
>>There is a fundamental misunderstanding here.
> 
> 
> Agreed, but we may not agree what the misunderstanding is.
> 
> 
>>There is a fundamental misunderstanding here. The clock discipline is in 
>>fact a flywheel which is nudged at each poll update to correct the time 
>>and update the frequency estimate. If you stop nudging it for awhile it 
>>may accumulate error, but not much. How long should you wait before 
>>declaring unsynchronized?
> 
> 
> I don't think it is unreasonable to expect people to have a plain
> XO (unless they tell NTPD otherwise) and therefore few systems
> actually have a recoverable offset after one day on the island.
> 
> And expecting to recapture with a poll of 1024 after free-wheeling
> for a day is waaaay more optimistic than 25 cent XO's deserve.
> 
> I would say that once the shift register runs dry, we should
> reduce the poll rate (if minpoll allows) for every empty shift
> register we see:
> 
> That way you should have a scenario like:
> 
> 0	poll = 1024 shift=11111111
> 1024	poll = 1024 shift=11111110
> 2048	poll = 1024 shift=11111100
> 3072	poll = 1024 shift=11111000
> 4096	poll = 1024 shift=11110000
> 5120	poll = 1024 shift=11100000
> 6144	poll = 1024 shift=11000000
> 7168	poll = 1024 shift=10000000
> 8192	poll = 1024 shift=00000000, reduce poll, start timer 512 * 8
> 12288	poll = 512  shift=00000000, reduce poll, start timer 256 * 8
> 14336	poll = 256  shift=00000000, reduce poll, start timer 128 * 8
> 15360	poll = 128  shift=00000000, reduce poll, start timer 64 * 8
> 15872	poll = 64   shift=00000000  at minpoll, do nothing
> 
> That way we are back to 64s poll rate after 4h24m and that sounds
> very compatible with typical XO performance.
> 

I disagree with this unless I misunderstood what you are suggesting.
The poll interval should not change as long as the server does not 
respond to an NTP packet. When the first packet is returned, then you 
should look to see how far back it didn't respond and then decide by how 
much to change, if any, the poll interval. If it's never responded we 
are presumably at minpoll otherwise the system is probably stable anywat 
and really doesn't need to change the poll frequency much in order to 
assure itself it has good statistics for that server. We have far too 
many misimplementations of NTP which actually increase their poll 
frequency if they don't get a response.

The other related issue is KOD packets. A KOD packet should cause it to 
immediately back off to maxpoll and if there are 3 KOD's in succession 
if should unceremoniously drop that server as a source of Chimes. Kind 
of like a "3 death chimes and you're out".

> My first and primary beef is that we do not whine loudly when we
> have lost reachability, no matter how long this has been going on.
> 
> Can't we at least agree that after being unreachable for N hours
> we should syslog something rather severe ?
> 
> I'd propose 24 for N, but even 168 will improve on the current
> situation where people have no inkling that their system has
> wandered off into the sunset.
> 
Yes, we should log this, but the main question is how often? We don't 
want to do this too much otherwise we are clogging up the syslog with a 
lot of unnecessary messages.

Danny


More information about the hackers mailing list