[ntp:questions] Thoughts on huff and puff
Unruh
unruh-spam at physics.ubc.ca
Sun Oct 12 05:59:11 UTC 2008
"David L. Mills" <mills at udel.edu> writes:
>David,
>See the Association Management page, Orphan Mode section. It would be
>odd if the casual reader did not come away from that section with the
>impression that orphan mode was always preferred over the local clock
>driver.
>What you report seems to be common practice in packaged systems -
>include the local clock driver - is unfortunate. I consider that a bad
>practice, but then my engineering principles might not coincide with
>their on-ground reality.
I agree that it is both unfortunate and incomprehensible.
>I understand some folks, including you, disapprove of certain aspects of
>ntpd behavior. The engineered behavior is consistent with the behavior
>of linear feedback control systems, both in theory and implementation.
>These systems have stored internal state that is continuously updated in
>operation. In order to avoid initial transients the state at startup
>must be completely consistent with this state; otherwise, there will be
>an initial transient. Same is true of aircraft autopilots, your car and
>any other feedback control system. If you start ntpd with exactly the
>right frequencey and exactly the right offset, there will be no initial
>transient. Otherwise, there will be a transient.
fortunately there are other feedback control systems than your simple
second order system. Systems with much faster convergence. (and yes, I will
point you to chrony again). Your system is essentially a markovian system,
and systems with memory can be just as stable, and converge much faster.
>Dave
>David Woolley wrote:
>> David L. Mills wrote:
>>
>>>
>>> The huff-'n-puff scheme was never intended to be universally
>>> applicable. It is intended for the poor bloke with an overloaded DSL
>>> line to an ISP and very little else. It could be further engineered as
>>> you propose and
>>
>>
>> Where I work falls into that category (2 Mb/s SDL (1:1 contention), with
>> delays varying between 5ms and 100ms). As it happens, we don't really
>> need good time; the NTP system is only really used for CVS and, more
>> recently for IP phones. The IT department's time infrastructure
>> currently uses w32time and currently has a measured error of over a
>> second and is reporting a root dispersion of over 10 seconds (not sure
>> if this is a w32time artifact, or is because w32time has lost synch, but
>> doesn't alarm for high root distance). They are quite happy with this
>> crude time keeping! (Real ntpd is only used for IP phones because they
>> do apply the root dispersion test.)
>>
>>> others are welcome to do that. You should understand that would be a
>>> difficult and complex project.
>>>
>>> The the local clock driver (and modem driver) is not used unless all
>>> outside connectivity is lost and even in that case the orphan mode is
>>
>>
>> That part was generalising the issue, I wouldn't configure a local clock
>> in those circumstances; in fact I would very rarely consider configuring
>> one, so most of my knowledge of what can go wrong if you do configure
>> one comes from here. However, people who package ntpd almost always do
>> configure the local clock in their sample configurations, and most
>> people will not remove it, so mitigation of Murphy's law requires that
>> you assume that a local clock probably is configured.
>>
>> I accept that samples from the local clock will only be used under
>> exceptional circumstances, although note several cases, reported here,
>> where systems seem to have locked onto the local clock in spite of
>> having valid external sources (one last week, for example). However,
>> once a system locks onto the local clock, the minimum delay will be
>> poisoned for the whole huffpuff history period.
>>
>>> preferable. Using a radio reference clock with an overloaded DST
>>> backup is not a good idea. If the reference clock fails, the server
>>> continues to be a good source for many hours until the distance
>>> threshold is exceeded. Even after that orphan mode would be preferable
>>> over a highly congested DSL link.
>>
>>
>> If there are reasons why orphan is better in the degenerate case, they
>> need to be in the end user documentation, as that documentation
>> currently only indicates benefits where there are multiple orphan
>> candidates.
>>
>> The realistic case is where there is an internal cross feed.
>>
>>>
>>> You claim that a method to designate which inbound/outbound link
>>> congestion is preset. The h-f scheme is expressly design to determine
>>> that and adapt accordingly, especially when the congestion surge
>>
>>
>> I wasn't saying it was preset. I was actually suggesting that in many
>> cases, presetting it would work more reliably. The sign detection
>> assumes that the local clock is more or less right, and, therefore,
>> that, when the minimum delay is exceeded, the absolute value of offsets
>> need to be reduced. However, if the system has just started, and is
>> really 120ms out, that may be the wrong choice.
>>
>>> switches from one direction to the other. If you examine the
>>> mathematics carefully, you will discover the sign determination is
>>> necessary in order to determine which limb of the scattergram is
>>> congested. See my book for further discussion and especially the
>>> experiments with Malaysia.
>>>
>>> Your comment that NTP handles startup and temperature changes badly
>>> may very well be the case. But, you present only anecdotal evidence, no
>>
>>
>> Start up transients are so obvious that almost everyone sees them. I
>> haven't done the fine measurements needed to look into temperature
>> transients, but the arguments for them convince me. You even told
>> someone, last week, that NTP was unsuitable for their application,
>> because it was unable to handle startup transients adequately.
>>
>>> simulation, no statistical analysis and no quantitative comparison
>>> with alternative methods. I have no problem with alternative methods
>>> as long as they are justified by analysis, statistical justification
>>> and proof by experiment or simulation.
>>>
>>> Dave
>>>
>>> David Woolley wrote:
>>>
>>>> I had cause to look at tinker huffpuff recently and a number of
>>>> things concern me.
>>>>
>>>> 1) It is applied globally, and that seems to include reference
>>>> clocks, including the local clock (which you can expect to find on
>>>> most real world configurations, even though it is often inappropriate
>>>> for them). That means that the presence of a reference clock as a
>>>> reference, or the use of another source on the same LAN may
>>>> artificially depress the estimate of the minimum delay.
>>>>
>>>> Ideally it should be done per association, and if that is too
>>>> expensive, one should be able to opt servers into the the mechanism,
>>>> which one would, probably, only then do for ones LAN servers. It
>>>> should not be applied to reference clocks in general and certainly
>>>> should not be applied to the local clock.
>>>>
>>>> 2) Its method for determining the sign of the correction is
>>>> oversimplistic. It would probably work if the actual clock error was
>>>> small, but, as we've seen discussed recently, ntpd handles real world
>>>> startup and temperature change transients poorly, which could result
>>>> in huff and puff trying to increase the error.
>>>>
>>>> In many cases where huffpuff would be useful, one knows that the
>>>> asymmetry is overwhelmingly in one direction and there needs to be a
>>>> way of conveying that information.
More information about the questions
mailing list