[ntp:questions] NTP client with 4 servers lost sync

Brian Inglis Brian.Inglis at SystematicSw.ab.ca
Wed Jul 1 20:52:58 UTC 2015


On 2015-07-01 10:18, Nuno Pereira wrote:
> Following last night's leap second, we had some issues with our NTP servers,
> especially in a clients with 4 servers configured, but not in clients with 1
> source configured.

> We have 2 types of configuration (beside the one in the NTP server):

> Config 1 (clients with access to the external network):
> *	2 NTP servers in the LAN, configured with "iburst prefer";
> *	2 external NTP servers, configured with "iburst".

> Config 2 (clients without access to the external network):
> *	1 NTP server in the LAN, configured with "iburst prefer" or "iburst"
> (in this case to "prefer" or not is the same").

> The 2 external servers configured had problems with the leap second, having
> one second offset after it happen, while the LAN servers got no issues (they
> had a leap file, and reported leap_armed within the 24 hours before the
> event).
>
> This lead to something like this being reported by "ntpq -p" (don't have
> prints):

>      remote           refid      st t when poll reach   delay   offset  jitter
> ==============================================================================
> xlan_server_1    160.45.10.8      2 u 1013 1024  377    1.019   -0.483   0.687
> xlan_server_2    160.45.10.8      2 u  922 1024  377    1.042   -0.499   0.665
> xext_server_1    194.117.9.137    2 u  384 1024  377    3.360 1002.688   0.790
> xext_server_2    194.117.9.139    2 u  388 1024  377    3.360 1001.582   0.833

> I mean, all 4 were considered false tickers.

> In the meanwhile, in the clients where I had no access to the external
> network, having only 1 server to sync to (lan_server_1), things worked with no
> problem.

>>From what I've read in this list and in the docs, the best configuration is to
> have 4 servers, and that's what's brought by default in the CentOS and Debian
> servers, but this issue brought again the even number of servers issue that
> can arise with just 2.

> How can 4 be worst than 1?
>
> Do I have to go to a 5 servers configuration, in order to avoid this? Or go
> for 4 servers in the LAN?
>
> I'm having difficulties to convince my colleagues that we must configure 4
> servers (they think that exaggerated), with them thinking that the best is to
> have just one, and now I got this issue.

See the select and prefer doc pages.
To get sync, you need a majority clique, with more truechimers than falsetickers,
so with two of each, you don't get a majority, and none are considered reliable.
That is why pool servers are recommended as backup with external access, in case
some local sources go down or false.
At least three sources internal or external are preferable to allow a majority
clique even if one source goes down or false; more if you need to allow for
possible network issues.
  
Also note that prefer means only that source, if it is a survivor, will be used
for system offset and jitter stats, rather than the combine algorithm output.
With more than one surviving preferred source, implementation details decide
which wins.
It is intended for use mainly with local device drivers, as well as to mark a
source to provide seconds numbering for PPS sources.

You may want to consider adding all LAN sources to all clients, add enough LAN
sources to provide an odd number, add pool servers as backup to external servers,
and drop prefer from LAN sources to allow the combine algorithm to compute stats.

-- 
Take care. Thanks, Brian Inglis


More information about the questions mailing list