[ntp:questions] Thoughts on huff and puff

David L. Mills mills at udel.edu
Sat Oct 11 23:34:37 UTC 2008


David,

See the Association Management page, Orphan Mode section. It would be 
odd if the casual reader did not come away from that section with the 
impression that orphan mode was always preferred over the local clock 
driver.

What you report seems to be common practice in packaged systems - 
include the local clock driver - is unfortunate. I consider that a bad 
practice, but then my engineering principles might not coincide with 
their on-ground reality.

I understand some folks, including you, disapprove of certain aspects of 
ntpd behavior. The engineered behavior is consistent with the behavior 
of linear feedback control systems, both in theory and implementation. 
These systems have stored internal state that is continuously updated in 
operation. In order to avoid initial transients the state at startup 
must be completely consistent with this state; otherwise, there will be 
an initial transient. Same is true of aircraft autopilots, your car and 
any other feedback control system. If you start ntpd with exactly the 
right frequencey and exactly the right offset, there will be no initial 
transient. Otherwise, there will be a transient.

Dave

David Woolley wrote:
> David L. Mills wrote:
> 
>>
>> The huff-'n-puff scheme was never intended to be universally 
>> applicable. It is intended for the poor bloke with an overloaded DSL 
>> line to an ISP and very little else. It could be further engineered as 
>> you propose and 
> 
> 
> Where I work falls into that category (2 Mb/s SDL (1:1 contention), with 
> delays varying between 5ms and 100ms).  As it happens, we don't really 
> need good time; the NTP system is only really used for CVS and, more 
> recently for IP phones.  The IT department's time infrastructure 
> currently uses w32time and currently has a measured error  of over a 
> second and is reporting a root dispersion of over 10 seconds (not sure 
> if this is a w32time artifact, or is because w32time has lost synch, but 
> doesn't alarm for high root distance).  They are quite happy with this 
> crude time keeping!  (Real ntpd is only used for IP phones because they 
> do apply the root dispersion test.)
> 
>> others are welcome to do that. You should understand that would be a 
>> difficult and complex project.
>>
>> The the local clock driver (and modem driver) is not used unless all 
>> outside connectivity is lost and even in that case the orphan mode is 
> 
> 
> That part was generalising the issue, I wouldn't configure a local clock 
> in those circumstances; in fact I would very rarely consider configuring 
> one, so most of my knowledge of what can go wrong if you do configure 
> one comes from here.  However, people who package ntpd almost always do 
> configure the local clock in their sample configurations, and most 
> people will not remove it, so mitigation of Murphy's law requires that 
> you assume that a local clock probably is configured.
> 
> I accept that samples from the local clock will only be used under 
> exceptional circumstances, although note several cases, reported here, 
> where systems seem to have locked onto the local clock in spite of 
> having valid external sources (one last week, for example).  However, 
> once a system locks onto the local clock, the minimum delay will be 
> poisoned for the whole huffpuff history period.
> 
>> preferable. Using a radio reference clock with an overloaded DST 
>> backup is not a good idea. If the reference clock fails, the server 
>> continues to be a good source for many hours until the distance 
>> threshold is exceeded. Even after that orphan mode would be preferable 
>> over a highly congested DSL link.
> 
> 
> If there are reasons why orphan is better in the degenerate case, they 
> need to be in the end user documentation, as that documentation 
> currently only indicates benefits where there are multiple orphan 
> candidates.
> 
> The realistic case is where there is an internal cross feed.
> 
>>
>> You claim that a method to designate which inbound/outbound link 
>> congestion is preset. The h-f scheme is expressly design to determine 
>> that and adapt accordingly, especially when the congestion surge 
> 
> 
> I wasn't saying it was preset.  I was actually suggesting that in many 
> cases, presetting it would work more reliably.  The sign detection 
> assumes that the local clock is more or less right, and, therefore, 
> that, when the minimum delay is exceeded, the absolute value of offsets 
> need to be reduced.  However, if the system has just started, and is 
> really 120ms out, that may be the wrong choice.
> 
>> switches from one direction to the other. If you examine the 
>> mathematics carefully, you will discover the sign determination is 
>> necessary in order to determine which limb of the scattergram is 
>> congested. See my book for further discussion and especially the 
>> experiments with Malaysia.
>>
>> Your comment that NTP handles startup and temperature changes badly 
>> may very well be the case. But, you present only anecdotal evidence, no 
> 
> 
> Start up transients are so obvious that almost everyone sees them.  I 
> haven't done the fine measurements needed to look into temperature 
> transients, but the arguments for them convince me.  You even told 
> someone, last week, that NTP was unsuitable for their application, 
> because it was unable to handle startup transients adequately.
> 
>> simulation, no statistical analysis and no quantitative comparison 
>> with alternative methods. I have no problem with alternative methods 
>> as long as they are justified by analysis, statistical justification 
>> and proof by experiment or simulation.
>>
>> Dave
>>
>> David Woolley wrote:
>>
>>> I had cause to look at tinker huffpuff recently and a number of 
>>> things concern me.
>>>
>>> 1) It is applied globally, and that seems to include reference 
>>> clocks, including the local clock (which you can expect to find on 
>>> most real world configurations, even though it is often inappropriate 
>>> for them). That means that the presence of a reference clock as a 
>>> reference, or the use of another source on the same LAN may 
>>> artificially depress the estimate of the minimum delay.
>>>
>>> Ideally it should be done per association, and if that is too 
>>> expensive, one should be able to opt servers into the the mechanism, 
>>> which one would, probably, only then do for ones LAN servers.  It 
>>> should not be applied to reference clocks in general and certainly 
>>> should not be applied to the local clock.
>>>
>>> 2) Its method for determining the sign of the correction is 
>>> oversimplistic.  It would probably work if the actual clock error was 
>>> small, but, as we've seen discussed recently, ntpd handles real world 
>>> startup and temperature change transients poorly, which could result 
>>> in huff and puff trying to increase the error.
>>>
>>> In many cases where huffpuff would be useful, one knows that the 
>>> asymmetry is overwhelmingly in one direction and there needs to be a 
>>> way of conveying that information.




More information about the questions mailing list