[ntp:questions] Thoughts on huff and puff

Unruh unruh-spam at physics.ubc.ca
Sun Oct 12 05:59:11 UTC 2008


"David L. Mills" <mills at udel.edu> writes:

>David,

>See the Association Management page, Orphan Mode section. It would be 
>odd if the casual reader did not come away from that section with the 
>impression that orphan mode was always preferred over the local clock 
>driver.

>What you report seems to be common practice in packaged systems - 
>include the local clock driver - is unfortunate. I consider that a bad 
>practice, but then my engineering principles might not coincide with 
>their on-ground reality.

I agree that it is both unfortunate and incomprehensible. 


>I understand some folks, including you, disapprove of certain aspects of 
>ntpd behavior. The engineered behavior is consistent with the behavior 
>of linear feedback control systems, both in theory and implementation. 
>These systems have stored internal state that is continuously updated in 
>operation. In order to avoid initial transients the state at startup 
>must be completely consistent with this state; otherwise, there will be 
>an initial transient. Same is true of aircraft autopilots, your car and 
>any other feedback control system. If you start ntpd with exactly the 
>right frequencey and exactly the right offset, there will be no initial 
>transient. Otherwise, there will be a transient.

fortunately there are other feedback control systems than your simple
second order system. Systems with much faster convergence. (and yes, I will
point you to chrony again). Your system is essentially a markovian system,
and systems with memory can be just as stable, and converge much faster. 


>Dave

>David Woolley wrote:
>> David L. Mills wrote:
>> 
>>>
>>> The huff-'n-puff scheme was never intended to be universally 
>>> applicable. It is intended for the poor bloke with an overloaded DSL 
>>> line to an ISP and very little else. It could be further engineered as 
>>> you propose and 
>> 
>> 
>> Where I work falls into that category (2 Mb/s SDL (1:1 contention), with 
>> delays varying between 5ms and 100ms).  As it happens, we don't really 
>> need good time; the NTP system is only really used for CVS and, more 
>> recently for IP phones.  The IT department's time infrastructure 
>> currently uses w32time and currently has a measured error  of over a 
>> second and is reporting a root dispersion of over 10 seconds (not sure 
>> if this is a w32time artifact, or is because w32time has lost synch, but 
>> doesn't alarm for high root distance).  They are quite happy with this 
>> crude time keeping!  (Real ntpd is only used for IP phones because they 
>> do apply the root dispersion test.)
>> 
>>> others are welcome to do that. You should understand that would be a 
>>> difficult and complex project.
>>>
>>> The the local clock driver (and modem driver) is not used unless all 
>>> outside connectivity is lost and even in that case the orphan mode is 
>> 
>> 
>> That part was generalising the issue, I wouldn't configure a local clock 
>> in those circumstances; in fact I would very rarely consider configuring 
>> one, so most of my knowledge of what can go wrong if you do configure 
>> one comes from here.  However, people who package ntpd almost always do 
>> configure the local clock in their sample configurations, and most 
>> people will not remove it, so mitigation of Murphy's law requires that 
>> you assume that a local clock probably is configured.
>> 
>> I accept that samples from the local clock will only be used under 
>> exceptional circumstances, although note several cases, reported here, 
>> where systems seem to have locked onto the local clock in spite of 
>> having valid external sources (one last week, for example).  However, 
>> once a system locks onto the local clock, the minimum delay will be 
>> poisoned for the whole huffpuff history period.
>> 
>>> preferable. Using a radio reference clock with an overloaded DST 
>>> backup is not a good idea. If the reference clock fails, the server 
>>> continues to be a good source for many hours until the distance 
>>> threshold is exceeded. Even after that orphan mode would be preferable 
>>> over a highly congested DSL link.
>> 
>> 
>> If there are reasons why orphan is better in the degenerate case, they 
>> need to be in the end user documentation, as that documentation 
>> currently only indicates benefits where there are multiple orphan 
>> candidates.
>> 
>> The realistic case is where there is an internal cross feed.
>> 
>>>
>>> You claim that a method to designate which inbound/outbound link 
>>> congestion is preset. The h-f scheme is expressly design to determine 
>>> that and adapt accordingly, especially when the congestion surge 
>> 
>> 
>> I wasn't saying it was preset.  I was actually suggesting that in many 
>> cases, presetting it would work more reliably.  The sign detection 
>> assumes that the local clock is more or less right, and, therefore, 
>> that, when the minimum delay is exceeded, the absolute value of offsets 
>> need to be reduced.  However, if the system has just started, and is 
>> really 120ms out, that may be the wrong choice.
>> 
>>> switches from one direction to the other. If you examine the 
>>> mathematics carefully, you will discover the sign determination is 
>>> necessary in order to determine which limb of the scattergram is 
>>> congested. See my book for further discussion and especially the 
>>> experiments with Malaysia.
>>>
>>> Your comment that NTP handles startup and temperature changes badly 
>>> may very well be the case. But, you present only anecdotal evidence, no 
>> 
>> 
>> Start up transients are so obvious that almost everyone sees them.  I 
>> haven't done the fine measurements needed to look into temperature 
>> transients, but the arguments for them convince me.  You even told 
>> someone, last week, that NTP was unsuitable for their application, 
>> because it was unable to handle startup transients adequately.
>> 
>>> simulation, no statistical analysis and no quantitative comparison 
>>> with alternative methods. I have no problem with alternative methods 
>>> as long as they are justified by analysis, statistical justification 
>>> and proof by experiment or simulation.
>>>
>>> Dave
>>>
>>> David Woolley wrote:
>>>
>>>> I had cause to look at tinker huffpuff recently and a number of 
>>>> things concern me.
>>>>
>>>> 1) It is applied globally, and that seems to include reference 
>>>> clocks, including the local clock (which you can expect to find on 
>>>> most real world configurations, even though it is often inappropriate 
>>>> for them). That means that the presence of a reference clock as a 
>>>> reference, or the use of another source on the same LAN may 
>>>> artificially depress the estimate of the minimum delay.
>>>>
>>>> Ideally it should be done per association, and if that is too 
>>>> expensive, one should be able to opt servers into the the mechanism, 
>>>> which one would, probably, only then do for ones LAN servers.  It 
>>>> should not be applied to reference clocks in general and certainly 
>>>> should not be applied to the local clock.
>>>>
>>>> 2) Its method for determining the sign of the correction is 
>>>> oversimplistic.  It would probably work if the actual clock error was 
>>>> small, but, as we've seen discussed recently, ntpd handles real world 
>>>> startup and temperature change transients poorly, which could result 
>>>> in huff and puff trying to increase the error.
>>>>
>>>> In many cases where huffpuff would be useful, one knows that the 
>>>> asymmetry is overwhelmingly in one direction and there needs to be a 
>>>> way of conveying that information.




More information about the questions mailing list