[ntp:questions] NTP over redundant peer links, undetected loops
usenet at 1stein.org
Tue Feb 10 09:28:20 UTC 2009
We are trying to implement a NTP installation over a redundant
network, connecting the stratum 2 servers to the stratum 3 clients.
The precise situation is the following (compare with
3 networks, 192.168.3.0, 192.168.4.0, 192.168.5.0
ATOM1, ATOM2 - stratum 1 servers in network 3
GW1, GW2 - stratum 2 servers in all networks, i.e. 3, 4, 5
CLIENT1...CLIENTn - stratum 3 clients in network 4 and 5
Our goal is that GW1 and GW2 are always synchronized, even
- if network 3 goes down,
- or if one of networks 4 or 5 goes down,
- or if the worst case happens that network 3 and 4 (or 5) go down and
only one link is left between GW1, GW2 and the clients.
We have configured the hosts in the following way:
GW1 - with two IPs GW1_4, GW1_5
fudge stratum 5
GW2 - with two IPs GW2_4, GW2_5
fudge stratum 10
CLIENT1 ... CLIENTn
If network 3 goes down, GW1 and GW2 select each other as their
reference clock, one over network 4, one over network 5. The loop
detection does not work here. The stratum of both references goes up
poll by poll, until it reaches 16. Then one of GW1/GW2, say GW1,
switches to the LOCAL(0) source. After the new stratum of GW1 propages
to GW2 and back to GW1 (as stratum 7), GW1 switches back to GW2, even
though the local clock's stratum is smaller. Then the game starts
again that the stratum goes up by propagation.
Solution 1: By removing one peer connection, we are able to remove the
possible loop and it starts working, obviously by loosing some of the
redundancy in network 4 and 5 which we do not want.
Solution 2: We can also remove all peer statements and put "server
GW1_4" and "server GW1_5" in GW2's config. But then we are lost if
ATOM1 and ATOM2 are out of sync, because then it can happen that GW1
syncs to ATOM1 and GW2 to ATOM2, such that GW1 and GW2 are out of
sync. But the latter _must not_ happen.
Is there a way to tell xntpd to identify the IPs GW1_4, GW1_5 and
GW2_4, GW2_5 such that the loop detection works? Can one force to use
a common refid instead of the IP?
More information about the questions