[ntp:questions] strange behaviour of ntp peerstats entries.

Unruh unruh-spam at physics.ubc.ca
Tue Jan 29 04:33:47 UTC 2008


"David L. Mills" <mills at udel.edu> writes:

>Unruh,

>It would seem self evident from the equations that minimizing the delay 
>variance truly does minimize the offset variance. Further evidence of 
>that is in the raw versus filtered offset graphs in the architecture 
>briefings. If nothing else, the filter reduces the variance by some 10 
>dB. More to the point, emphasis added, the wedge scattergrams show just 

I guess then I am confused because my data does not support that. While the
delay variance IS reduced, the offset variance is not. The correleation
between dely and offset IS reduced by a factor of 10, but the clock
variance is reduced not at all. 

Here are the results from one day gathered brom one clock (I had ntp not
only print out the peer->offset peer->delay as it does in the
record_peer_stats , but also the p_offset and p_del, the offset and delays
calculated for each packet. I alsy throw out the outliers ( for some reason
the system would all of a sudden have packets with were 4ms round trip,
rather than 160usec. These "popcorn" spikes are clearly bad. The difference
between the variance as calculated from the peer->offset values, and the
p_offset values was

.00005995 (p_offset with del spikes greater than .0003 eliminated)
.00006017 (peer->offset std dev )
.000007337 (p_delay standard deviation, with the greater than .0003 spikes
removed)
.000005489 (peer->delay std dev)

(Note that if those popcorn spikes had not been removed, the std dev of the
p_offset and p_delay would have been much larger).

Ie, it makes no difference at all to the offset std dev, but a significant
one to the delay, (Yes, the precision I quote the numbers at is far greater
than the accurasy)
This is throwing away 83% of the data in the peer-> case. 

Note that this is one machine on one day, etc. and well after the startup
transients had disappeared.

 
>how good the filter can be. It selects points near the apex of the 
>wedge, the others don't matter. You might argue the particular clock 
>filter algorithm could be improved, but the mission in any case is to 
>select the points at or near the apex.


>While the authors might not have realized it, the filter method you 
>describe is identical to Cristian's Probabilistic Clock Synchronization 
>(PCS) methiod described in the literature some years back. The idea is 

I have no idea if Curnoe knew that. The majority of his code was written 10
years ago, not recently. He uses only the inverse of the delay as the
weights I believe, with a user adjustable parameter to throw away delays
which are too large. 

>to discard the outlyer delays beyond a decreasing threshold. In other 
>words, the tighter the threshold, the more outlyers are tossed out, so 
>you strike a balance. I argued then and now that it is better to select 
>the best from among the samples rather than to selectively discard the 
>outlyers.



>There may be merit in an arugment that says the points along the limbs 
>of the wedge are being ignored. In principle, these points can be found 
>using a slective filter that searches for an offset/delay ration of 0.5, 
>which in fact is what the huff-n'-puff filter does. To do this 
>effectively you need to know the baseline propagation delay, which is 
>also what the huff-n'-puff filter does. Experiments doing this with 
>symmetric delays, as agains the asymmetric delays the huff-n'-puff 
>filter was designed for were inconclusive.

But from what I see of the code, the huff-n-puff occurs after 80% have
already been discarded by the clock_filter

If data were cheap, ( and I think that in most cases today it is) then
throwing away 80% is fine. There is lots more out there. But this
profligacy in the treatment of the data sits uncomfortably with the
competing claim that collecting data is precious -- you should never use
maxpoll less than 7, you should bother the ntp servers as little as
possible. That makes the data precious. You cannot simply go out and
collect all you want. Then throwing it away seems a bad idea to me.


>Dave

>Unruh wrote:

>snip

>> Oh yes. popcorn suppression is important. I agree. But the filter goes well
>> beyond that. My eaction is that on the one hand people keep saying how
>> important net load is, and that one does not want to use poll intervals
>> that are much smaller than 8 or 10, and on the other hand, throwing away
>> 80-90% of the data collected. Remin ds me of the story of Saul, king of the
>> Israelites, whose army was besieged, and he mentioned that he was thirsty.
>> A few of his soldiers risked everything to get through the enemy lines and
>> bring him water. He was so impressed that he poured it all out on the
>> ground, in tribute to their courage. I have always found that story an
>> incredible insult to the bravery instead.
>> 
>> The procedure does drastically reduce the variance of the delay, but does
>> not much for the variance of the offset, which is of coure what is
>> important. Just to bring up chrony again, it uses both a suppression where
>> round trips greater than say 1.5 of min are discarded, and data is weighted
>> by some power of the invere of the delay.

>snip




More information about the questions mailing list