[ntp:questions] NTP client with 4 servers lost sync

Nuno Pereira nuno.pereira at g9telecom.pt
Thu Jul 2 19:40:58 UTC 2015


> -----Mensagem original-----
> De: questions [mailto:questions-
> bounces+nuno.pereira=g9telecom.pt at lists.ntp.org] Em nome de Brian Inglis
> Enviada: quarta-feira, 1 de Julho de 2015 21:53
> Para: questions at lists.ntp.org
> Assunto: Re: [ntp:questions] NTP client with 4 servers lost sync
> 
> On 2015-07-01 10:18, Nuno Pereira wrote:
> > Following last night's leap second, we had some issues with our NTP
servers,
> > especially in a clients with 4 servers configured, but not in clients with
1
> > source configured.
> 
> > We have 2 types of configuration (beside the one in the NTP server):
> 
> > Config 1 (clients with access to the external network):
> > *	2 NTP servers in the LAN, configured with "iburst prefer";
> > *	2 external NTP servers, configured with "iburst".
> 
> > Config 2 (clients without access to the external network):
> > *	1 NTP server in the LAN, configured with "iburst prefer" or "iburst"
> > (in this case to "prefer" or not is the same").
> 
> > The 2 external servers configured had problems with the leap second,
having
> > one second offset after it happen, while the LAN servers got no issues
(they
> > had a leap file, and reported leap_armed within the 24 hours before the
> > event).
> >
> > This lead to something like this being reported by "ntpq -p" (don't have
> > prints):
> 
> >      remote           refid      st t when poll reach   delay   offset
jitter
> >
> ============================================================
> ==================
> > xlan_server_1    160.45.10.8      2 u 1013 1024  377    1.019   -0.483
0.687
> > xlan_server_2    160.45.10.8      2 u  922 1024  377    1.042   -0.499
0.665
> > xext_server_1    194.117.9.137    2 u  384 1024  377    3.360 1002.688
0.790
> > xext_server_2    194.117.9.139    2 u  388 1024  377    3.360 1001.582
0.833
> 
> > I mean, all 4 were considered false tickers.
> 
> > In the meanwhile, in the clients where I had no access to the external
> > network, having only 1 server to sync to (lan_server_1), things worked
with
> no
> > problem.
> 
> >>From what I've read in this list and in the docs, the best configuration
is to
> > have 4 servers, and that's what's brought by default in the CentOS and
> Debian
> > servers, but this issue brought again the even number of servers issue
that
> > can arise with just 2.
> 
> > How can 4 be worst than 1?
> >
> > Do I have to go to a 5 servers configuration, in order to avoid this? Or
go
> > for 4 servers in the LAN?
> >
> > I'm having difficulties to convince my colleagues that we must configure 4
> > servers (they think that exaggerated), with them thinking that the best is
to
> > have just one, and now I got this issue.
> 
> See the select and prefer doc pages.
> To get sync, you need a majority clique, with more truechimers than
> falsetickers,
> so with two of each, you don't get a majority, and none are considered
reliable.
> That is why pool servers are recommended as backup with external access, in
> case
> some local sources go down or false.
> At least three sources internal or external are preferable to allow a
majority
> clique even if one source goes down or false; more if you need to allow for
> possible network issues.
> 
> Also note that prefer means only that source, if it is a survivor, will be
used
> for system offset and jitter stats, rather than the combine algorithm
output.
> With more than one surviving preferred source, implementation details decide
> which wins.
> It is intended for use mainly with local device drivers, as well as to mark
a
> source to provide seconds numbering for PPS sources.

The use of prefer was based on this idea: "I want to use only the local
sources, unless all fail, and so I have to use an external".
>From what I read now, I saw that prefer was a bad choice.

I could only see the "noselect" option in order to accomplish that idea (only
use external if all the local fail), but that also fails, as the external
sources aren't available if the local ones fail.
Am I right?

> You may want to consider adding all LAN sources to all clients, add enough
LAN
> sources to provide an odd number, add pool servers as backup to external
> servers,
> and drop prefer from LAN sources to allow the combine algorithm to compute
> stats.
All LAN sources are just 2 (in reality just one for the moment, as they're the
same host).
>From my experience, the pool servers, if taken directly from pool.ntp.org, are
very unstable and not trustable, and so I chose 2 fixed not so bad external
servers as backup. But they failed in the leap second.


But how can you explain that a client with just one source was better than a
client with 4 sources?
It's not just from this leap second situation: it's from some months where we
have some clients with just one source that are having less problems that the
4 sources configuration.

4 is not odd, I know, but in that case I have to go for a 5 sources
configuration, as a 3 source configuration can fall into a 2 available sources
configuration if one of them fails?
And as we prefer to use local sources, in order to have all of our clients
with better accuracy between them, in that case we would need to have 5
servers of NTP! It's a little insane, in our opinion.

> --
> Take care. Thanks, Brian Inglis
> _______________________________________________
> questions mailing list
> questions at lists.ntp.org
> http://lists.ntp.org/listinfo/questions

Nuno Pereira
G9Telecom


More information about the questions mailing list