[ntp:questions] Thoughts on huff and puff

David Woolley david at ex.djwhome.demon.co.uk.invalid
Sat Oct 11 13:56:08 UTC 2008


David L. Mills wrote:
> 
> The huff-'n-puff scheme was never intended to be universally applicable. 
> It is intended for the poor bloke with an overloaded DSL line to an ISP 
> and very little else. It could be further engineered as you propose and 

Where I work falls into that category (2 Mb/s SDL (1:1 contention), with 
delays varying between 5ms and 100ms).  As it happens, we don't really 
need good time; the NTP system is only really used for CVS and, more 
recently for IP phones.  The IT department's time infrastructure 
currently uses w32time and currently has a measured error  of over a 
second and is reporting a root dispersion of over 10 seconds (not sure 
if this is a w32time artifact, or is because w32time has lost synch, but 
doesn't alarm for high root distance).  They are quite happy with this 
crude time keeping!  (Real ntpd is only used for IP phones because they 
do apply the root dispersion test.)

> others are welcome to do that. You should understand that would be a 
> difficult and complex project.
> 
> The the local clock driver (and modem driver) is not used unless all 
> outside connectivity is lost and even in that case the orphan mode is 

That part was generalising the issue, I wouldn't configure a local clock 
in those circumstances; in fact I would very rarely consider configuring 
one, so most of my knowledge of what can go wrong if you do configure 
one comes from here.  However, people who package ntpd almost always do 
configure the local clock in their sample configurations, and most 
people will not remove it, so mitigation of Murphy's law requires that 
you assume that a local clock probably is configured.

I accept that samples from the local clock will only be used under 
exceptional circumstances, although note several cases, reported here, 
where systems seem to have locked onto the local clock in spite of 
having valid external sources (one last week, for example).  However, 
once a system locks onto the local clock, the minimum delay will be 
poisoned for the whole huffpuff history period.

> preferable. Using a radio reference clock with an overloaded DST backup 
> is not a good idea. If the reference clock fails, the server continues 
> to be a good source for many hours until the distance threshold is 
> exceeded. Even after that orphan mode would be preferable over a highly 
> congested DSL link.

If there are reasons why orphan is better in the degenerate case, they 
need to be in the end user documentation, as that documentation 
currently only indicates benefits where there are multiple orphan 
candidates.

The realistic case is where there is an internal cross feed.

> 
> You claim that a method to designate which inbound/outbound link 
> congestion is preset. The h-f scheme is expressly design to determine 
> that and adapt accordingly, especially when the congestion surge 

I wasn't saying it was preset.  I was actually suggesting that in many 
cases, presetting it would work more reliably.  The sign detection 
assumes that the local clock is more or less right, and, therefore, 
that, when the minimum delay is exceeded, the absolute value of offsets 
need to be reduced.  However, if the system has just started, and is 
really 120ms out, that may be the wrong choice.

> switches from one direction to the other. If you examine the mathematics 
> carefully, you will discover the sign determination is necessary in 
> order to determine which limb of the scattergram is congested. See my 
> book for further discussion and especially the experiments with Malaysia.
> 
> Your comment that NTP handles startup and temperature changes badly may 
> very well be the case. But, you present only anecdotal evidence, no 

Start up transients are so obvious that almost everyone sees them.  I 
haven't done the fine measurements needed to look into temperature 
transients, but the arguments for them convince me.  You even told 
someone, last week, that NTP was unsuitable for their application, 
because it was unable to handle startup transients adequately.

> simulation, no statistical analysis and no quantitative comparison with 
> alternative methods. I have no problem with alternative methods as long 
> as they are justified by analysis, statistical justification and proof 
> by experiment or simulation.
> 
> Dave
> 
> David Woolley wrote:
>> I had cause to look at tinker huffpuff recently and a number of things 
>> concern me.
>>
>> 1) It is applied globally, and that seems to include reference clocks, 
>> including the local clock (which you can expect to find on most real 
>> world configurations, even though it is often inappropriate for them). 
>> That means that the presence of a reference clock as a reference, or 
>> the use of another source on the same LAN may artificially depress the 
>> estimate of the minimum delay.
>>
>> Ideally it should be done per association, and if that is too 
>> expensive, one should be able to opt servers into the the mechanism, 
>> which one would, probably, only then do for ones LAN servers.  It 
>> should not be applied to reference clocks in general and certainly 
>> should not be applied to the local clock.
>>
>> 2) Its method for determining the sign of the correction is 
>> oversimplistic.  It would probably work if the actual clock error was 
>> small, but, as we've seen discussed recently, ntpd handles real world 
>> startup and temperature change transients poorly, which could result 
>> in huff and puff trying to increase the error.
>>
>> In many cases where huffpuff would be useful, one knows that the 
>> asymmetry is overwhelmingly in one direction and there needs to be a 
>> way of conveying that information.




More information about the questions mailing list