[ntp:questions] NTP over redundant peer links, undetected loops

Stefan Schimanski usenet at 1stein.org
Tue Feb 10 09:28:20 UTC 2009


Hi!

We are trying to implement a NTP installation over a redundant
network, connecting the stratum 2 servers to the stratum 3 clients.
The precise situation is the following (compare with
http://1stein.org/download/network.png):

3 networks, 192.168.3.0, 192.168.4.0, 192.168.5.0
ATOM1, ATOM2 - stratum 1 servers in network 3
GW1, GW2 - stratum 2 servers in all networks, i.e. 3, 4, 5
CLIENT1...CLIENTn - stratum 3 clients in network 4 and 5

Our goal is that GW1 and GW2 are always synchronized, even
- if network 3 goes down,
- or if one of networks 4 or 5 goes down,
- or if the worst case happens that network 3 and 4 (or 5) go down and
only one link is left between GW1, GW2 and the clients.

We have configured the hosts in the following way:

GW1 - with two IPs GW1_4, GW1_5
server 127.127.1.0
fudge stratum 5
server ATOM1
server ATOM2
peer GW2_4
peer GW2_5

GW2 - with two IPs GW2_4, GW2_5
server 127.127.1.0
fudge stratum 10
server ATOM1
server ATOM2
peer GW1_4
peer GW1_5

CLIENT1 ... CLIENTn
server GW1
server GW2

The problem:

If network 3 goes down, GW1 and GW2 select each other as their
reference clock, one over network 4, one over network 5. The loop
detection does not work here. The stratum of both references goes up
poll by poll, until it reaches 16. Then one of GW1/GW2, say GW1,
switches to the LOCAL(0) source. After the new stratum of GW1 propages
to GW2 and back to GW1 (as stratum 7), GW1 switches back to GW2, even
though the local clock's stratum is smaller. Then the game starts
again that the stratum goes up by propagation.

Solution 1: By removing one peer connection, we are able to remove the
possible loop and it starts working, obviously by loosing some of the
redundancy in network 4 and 5 which we do not want.

Solution 2: We can also remove all peer statements and put "server
GW1_4" and "server GW1_5" in GW2's config. But then we are lost if
ATOM1 and ATOM2 are out of sync, because then it can happen that GW1
syncs to ATOM1 and GW2 to ATOM2, such that GW1 and GW2 are out of
sync. But the latter _must not_ happen.

Is there a way to tell xntpd to identify the IPs GW1_4, GW1_5 and
GW2_4, GW2_5 such that the loop detection works? Can one force to use
a common refid instead of the IP?

Regards
  Stefan Schimanski



More information about the questions mailing list