[ntp:questions] How should an NTP server fail?

David L. Mills mills at udel.edu
Wed Jun 9 20:34:06 UTC 2010


You might be confusing the server role with the client role. The server 
has one or more upstream sources and downstream clients. The tally code 
for each source is displayed by the pe command separately at the server 
and the client. Each time an update is received from a source at either 
the server or the client the tally codes for all sources are 
redetermined. If a source is considered invalid, unreachable or the 
maximum error statistic exceed the select threshold, the tally indicator 
surely will be blank. If a source is marked as the system peer, it 
surely is valid and reachable.

In the case you present the server has lost all sources, but remains a 
viable choice even beyond that, as long as the maximum error does not 
exceed the select threshold. The user can set this to whatever value is 
appropriate, with default 1.5 s. The point I emphasize is that the 
server, even if it has lost all sources, remains conformant to the 
formal specification. Thus, the time provider does not judge the quality 
which the receiver requires; this is specified by the receiver.


Miroslav Lichvar wrote:

>On Wed, Jun 09, 2010 at 07:41:28PM +0100, David L. Mills wrote:
>>When a server loses all sources, its own indicators reveal that.
>>However, the only way downstream clients see this is increasing
>>dispersion. Depending on other available sources a client has no way to
>>know (or care) about that other than increasing maximum error. If no
>>other sources are available, a client may well cling to that server, as
>>by design it <continues> to provide service within the maximum error
>Continuing discussion from https://bugs.ntp.org/show_bug.cgi?id=1554
>When a server loses connectivity to a source, why is it allowed 
>for the source to stay marked as system peer?
>Normally in such situations the server is unmarked which generates
>no_sys_peer event if it was the only source. But sometimes it stays
>selected which means the event is unreliable and the operator has to
>use something else for monitoring, probably track the reachable status
>for each peer, or is there something better?

More information about the questions mailing list