[ntp:questions] Re: Problem with good synchronization.
David L. Mills
mills at udel.edu
Sun Oct 17 22:12:11 UTC 2004
My last message needs a more careful analysis. Following is to convince
Consider two bulls angus1 and angus2, each configured for their
low-stratum sources and with the local clock driver at default stratum
5. Each is configured with the other in symmetric-active mode. Normally,
timing flows from the sources for angus1 and angus2 to their respective
cow jerseys. If all sources for angus1 fail, it restores via
symmetric-active mode to angus2. Same thing happens if angus2 loses all
sources and restores via angus1. Nothing out of the ordinary here and
the local clock drivers are not involved
The interesting case is when all sources for both angus1 and angus2
fail. Without loss of generality, assume that angus1 sources fail first,
then angus2 sources fail. Before angus2 sources fail, angus1 has
restored via angus2. When angus2 sources fail, angus2 will not restore
via angus1 because that would result in a timing loop. So, it falls back
to the local clock driver at straum 5 and continues until at least one
source resumes life.
This is all very well, but can the order of events be crafted that
results in angus1 and angus2 disbelieving each other because of a race
and then continuing with a fractured subnet? Assume all sources die at
the same time and that each one discovers at the same poll interval that
all sources have failed and each peer restores via the other, forming a
temporary timing loop.
If at any time a timing loop exists for whatever reason between angus1
and angus2, the following applies. Without loss of generality, assume
the angus1 poll arrives at angus2 before the angus2 poll arrives at
angus1. Angus2 sees that a loop would form and so falls back to its
local clock driver. When the next poll from angus2 arrives at angus1,
the loop is broken and angus1 abandons the local clock driver and
continues with restoral via angus2.
Note this doesn't work with three or more bulls, since the timing loops
can't reliably be recognized. However, eventually the bulls would count
to infinity and start over, probably leading to an indefinate cycle.
David L. Mills wrote:
> By default, the local clock driver will not become active unless all
> other sources fail. Catch two bulls at the same stratum and run
> symmetric mode between them. The cows follow either bull or both. Now,
> if either bull loses its cowboy, it follows the other. That was my
> intent and I just verified it works. If both cowboys fail, the bulls
> will agree to disagree.
> Harlan Stenn wrote:
>> I've never seen that behavior (but uyou havemore experience).
>> I have tried the mechanism you describe, and in that case I have had the
>> primary external timesource disappear and then the bulls head off in
>> own directions, with different cows following different bulls.
>> In article <ckq6bo$ad1$1 at dewey.udel.edu>,
>> David L. Mills <mills at udel.edu> wrote:
>>> The problem I observed was, when three strata bulls and a number of
>>> cows were involved, developed a deadly embrace where the cows saw the
>>> stratum oscillate first to one bull and then another. This caused
>>> frequent timeouts as the cows waited for a stratum lower than theirs
>>> to come back. The bottom line is not to be so adventurous, keep the
>>> bulls at one stratum and let the cows eat equal stratum grass.
>>> Harlan Stenn wrote:
>>>> Please say more - I have been doing it this way for years.
>>>> When "all is well" *everybody* syncs to the main server, and then
>>>> everybody except the main server runs at S+1.
>>>> If the main server fails folks slowly sniff around and the #2 box soon
>>>> becomes the "lead dog".
>>>> On rare occasions I see a clique form, but it is usually short-lived
>>>> I have never seen a case where the different cliques have
>>>> different time.
More information about the questions