[ntp:questions] How should an NTP server fail?

David L. Mills mills at udel.edu
Thu Jun 10 03:48:36 UTC 2010


On closer examination of the code, the scenario I suggested in my 
previous message is not possible. In other words there is no possibility 
the reach register is zero and the tally code is other than space. The 
reason for this is that the select algorithm that determines the system 
peer and lights the tally codes is called not only when a new update is 
received from any server, but also after four poll intervals when no 
sample have been received from a server. This means not only does the 
indicated dispersion increases rapidly, which would greatly reduce its 
chances of becoming the system peer if other sources were present, but 
prevents the race condition between the time a poll is sent and the next 
update is received.

The sysadmins of the world have had almost thirty years to develop uses 
for the monitoring facilities first designed by Dennis Fergusson circa 
1983 and only minor changes since then. When I implemented the tally 
codes circa 1992 the intent was that the sysadmin needs only the pe 
command and the tally codes do asses the general health and the rv 
command only as diagnostic aid.


David Woolley wrote:

> David L. Mills wrote:
>> Miroslav,
>> You might be confusing the server role with the client role. The 
>> server has one or more upstream sources and downstream clients. The 
>> tally code for each source is displayed by the pe command separately 
>> at the server and the client. Each time an update is received from a 
>> source at either the server or the client the tally codes for all 
>> sources are redetermined. If a source is considered invalid, 
>> unreachable or the maximum error statistic exceed the select 
>> threshold, the tally indicator surely will be blank. If a source is 
>> marked as the system peer, it surely is valid and reachable.
> This is not the behaviour that the person who started the thread is 
> complaining about.  He is complaining that the system peer and 
> selected markers are not cleared on the server when it loses 
> reachability to the respective upstream servers.  My previous article 
> was on the basis that you were not challenging that aspect of his report.
> In the real world, most administrators judge whether a server is 
> synchronized by doing ntpq peers and looking for these flags, not by 
> doing a client request and looking at the error statistics.  In fact, 
> relatively few people realise that you need to use rv on the 
> associations to properly diagnose a failure to select.
>> In the case you present the server has lost all sources, but remains 
>> a viable choice even beyond that, as long as the maximum error does 
>> not exceed the select threshold. The user can set this to whatever 
>> value is appropriate, with default 1.5 s. The point I emphasize is 
>> that the server, even if it has lost all sources, remains conformant 
>> to the formal specification. Thus, the time provider does not judge 
>> the quality which the receiver requires; this is specified by the 
>> receiver.
> _______________________________________________
> questions mailing list
> questions at lists.ntp.org
> http://lists.ntp.org/listinfo/questions

More information about the questions mailing list