[ntp:questions] Re: Problem with good synchronization.

David L. Mills mills at udel.edu
Sun Oct 17 22:12:11 UTC 2004


Folks,

My last message needs a more careful analysis. Following is to convince 
the skeptics.

Consider two bulls angus1 and angus2, each configured for their 
low-stratum sources and with the local clock driver at default stratum 
5. Each is configured with the other in symmetric-active mode. Normally, 
timing flows from the sources for angus1 and angus2 to their respective 
cow jerseys. If all sources for angus1 fail, it restores via 
symmetric-active mode to angus2. Same thing happens if angus2 loses all 
sources and restores via angus1. Nothing out of the ordinary here and 
the local clock drivers are not involved

The interesting case is when all sources for both angus1 and angus2 
fail. Without loss of generality, assume that angus1 sources fail first, 
then angus2 sources fail. Before angus2 sources fail, angus1 has 
restored via angus2. When angus2 sources fail, angus2 will not restore 
via angus1 because that would result in a timing loop. So, it falls back 
to the local clock driver at straum 5 and continues until at least one 
source resumes life.

This is all very well, but can the order of events be crafted that 
results in angus1 and angus2 disbelieving each other because of a race 
and then continuing with a fractured subnet? Assume all sources die at 
the same time and that each one discovers at the same poll interval that 
all sources have failed and each peer restores via the other, forming a 
temporary timing loop.

If at any time a timing loop exists for whatever reason between angus1 
and angus2, the following applies. Without loss of generality, assume 
the angus1 poll arrives at angus2 before the angus2 poll arrives at 
angus1. Angus2 sees that a loop would form and so falls back to its 
local clock driver. When the next poll from angus2 arrives at angus1, 
the loop is broken and angus1 abandons the local clock driver and 
continues with restoral via angus2.

Note this doesn't work with three or more bulls, since the timing loops 
can't reliably be recognized. However, eventually the bulls would count 
to infinity and start over, probably leading to an indefinate cycle.

Dave

David L. Mills wrote:
> Harlan,
> 
> By default, the local clock driver will not become active unless all 
> other sources fail. Catch two bulls at the same stratum and run 
> symmetric mode between them. The cows follow either bull or both. Now, 
> if either bull loses its cowboy, it follows the other. That was my 
> intent and I just verified it works. If both cowboys fail, the bulls 
> will agree to disagree.
> 
> Dave
> 
> Harlan Stenn wrote:
> 
>> Dave,
>>
>> I've never seen that behavior (but uyou havemore experience).
>>
>> I have tried the mechanism you describe, and in that case I have had the
>> primary external timesource disappear and then the bulls head off in 
>> their
>> own directions, with different cows following different bulls.
>>
>> H
>> -- 
>> In article <ckq6bo$ad1$1 at dewey.udel.edu>,
>> David L. Mills <mills at udel.edu> wrote:
>>
>>> Harlan,
>>>
>>> The problem I observed was, when three strata bulls and a number of 
>>> cows were involved, developed a deadly embrace where the cows saw the 
>>> stratum oscillate first to one bull and then another. This caused 
>>> frequent timeouts as the cows waited for a stratum lower than theirs 
>>> to come back. The bottom line is not to be so adventurous, keep the 
>>> bulls at one stratum and let the cows eat equal stratum grass.
>>>
>>> Dave
>>>
>>> Harlan Stenn wrote:
>>>
>>>> Dave,
>>>>
>>>> Please say more - I have been doing it this way for years.
>>>>
>>>> When "all is well" *everybody* syncs to the main server, and then
>>>> everybody except the main server runs at S+1.
>>>>
>>>> If the main server fails folks slowly sniff around and the #2 box soon
>>>> becomes the "lead dog".
>>>>
>>>> On rare occasions I see a clique form, but it is usually short-lived 
>>>> and
>>>> I have never seen a case where the different cliques have 
>>>> "significantly"
>>>> different time.
>>>>
>>>> H
>>>
>>>
>>
>>
> 




More information about the questions mailing list