[ntp:questions] How should an NTP server fail?

David L. Mills mills at udel.edu
Wed Jun 9 23:41:38 UTC 2010


The number of samples in the clock filter has nothing to do with the 
selection process, nor whether the peer is the system peer or not. The 
selection alogorithm doesn't even know how many samples are in the 
filter, only that the filter candidate that is used has least delay. The 
selection metric includes that and the dispersion at the measurement 
time, plus the dispersion increment since then. When two or more servers 
are configured at substantially the same delay, the client may 
occasionally hop from one to the other depending on these factors, 
although there is a anti-hop scheme that discourages this unless there 
is a substantial difference.

What bugs me when these issues appear on the bugs list is when it is not 
a bug but a design issue which which should not be so narrowly confined. 
I often get two or more messages about the same issue from different 
folks and I wind up replying to each one separately. It's the strongest 
advice I can give is to view the architecture briefing on the NTP 
project page before submitting reports like that.


blu wrote:

>On Jun 5, 4:11 pm, David Mills <mi... at udel.edu> wrote:
>>This issue is widely misunderstood; yours is the second such message to
>>me today. So, please spread the word.
>>When a server loses all sources it does not necessarily become
>>unsuitable for downstream clients. Ordinarily, it inherits error
>>statistics from upstream servers and provides them to downstream
>>clients. Servers and clients use these statistics to calculate the
>>maximum error statistic which represents the maximum clock error
>>relative to the primary reference clock. See the error budget called out
>>in the specification. Once determined, the maximum error increases at a
>>rate (15 PPM) determined as the maximum disciplined clock frequency
>>error of the server clock. This increase continues indefinitely or until
>>the sources are again found.
>Since you have requested that this be discussed on the newgroup rather
>than in bug 1554, I am replying here.
>In bug 1554, the reporter claims that what you describe above is what
>he sees happen if the clock filter contains 4 to 7 samples. However,
>he says that if the clock filter is full with 8 samples, then the
>system peer is unselected and a no_sys_peer event is posted. This is
>in contradiction of what you keep describing as the correct behavior,
>but then you keep saying that the reported behavior is correct. Since
>the behavior he is reporting is not the same correct behavior as you
>keep describing it, we have continued to treat this as a bug.
>So, you need to either confirm that this change in behavior at 8
>samples is correct and amend your description,  or confirm that your
>description is correct and admit that the reported behavior is a bug.
>Or deny the reported behavior happens (I tend to favor this at this
>point. I suspect user error right now.) In either of these last two
>cases, we should probably still discuss this in the bug report.
>Brian Utterback
>questions mailing list
>questions at lists.ntp.org

More information about the questions mailing list