[ntp:questions] problem with a failsafe setup, hardware clocks and the correctness test

Eike Middell eike at middell.net
Mon Aug 13 06:59:45 UTC 2007

Hi again,
maybe I should clarify my problem with this situation. 

In the documentation the local clock driver is presented as a useful
reference for the case that all other references fail. (see:
http://www.ee.udel.edu/~mills/ntp/html/drivers/driver1.html second

But what I'm observing is that although my two hardware clocks in
the network are given high stratum numbers and the gps-clock is running
flawlessly the hardwareclocks can outrival the gps-clock through 
the backdoor of the correctness test! This contradicts the basic idea
that the stratum level is the measure for the quality of the clock.
(e.g. stated in the cited paragraph above) and that the so defined
hierarchy dictates the clock selection.

So what I would like to know is whether anybody experienced a similar  
situation? If this is regarded normal behaviour are there other pitfalls 
where the hierarchy given by stratum levels is ignored?

As mentioned in the former post I noticed that hardware clocks always
report an offset of exact zero? Since this value goes into the
calculation of the correctness interval I would also like to know the 
reason for this.

Many thanks in advance,

On Wed, Aug 01, 2007 at 04:46:50PM +0200, Eike Middell wrote:
> Hi list,
> I have set up a time synchronisation system in a small network which
> should be somehow failsafe. On hand I have a Meinberg GPS 169 PCI card 
> and a slow and often saturated satellite link to the internet.
> The idea is to have one host (A) at the site which does not use the 
> GPS time but only synchronizes via internet. In the case that the host
> with the GPS card (B) fails, host A will become the one with the
> lowest stratum number and will become the time source for all other
> hosts. If both methods fail host A and B have their hardware clocks
> as reference and can provide this time. During normal operation host B
> is the reference for all other hosts and I can monitor the performance 
> of both synchronisation methods via the stats that nptd on B give for 
> it's refclock A. 
>   refclock1, refclock2,...
>         \       /
>          \     /
>      iternet via sat                 GPS
>             |                         |
>    hwclock  |                         |   hwclock
>          \  |                         |  /
>          Host A  <-A-is-ref-for-B-  Host B
>        (stratum 2)                (stratum 1)
>               \                   /
>                \                 /
>                 `- other hosts -´
> This setup worked satisfactory. Recently I noticed some behaviour of
> the system that I cannot explain. The small set of reference clocks 
> given to host A reduced to only one host and the hardware clock as 
> all other peers became unreachable. In this situation and given the
> unreliable internet connection host A switches casually between this 
> one last host and it's hardwareclock. 
> Each time that this occured on host A on host B the gps card 
> failed to pass the  correctness test which led to host B also switching 
> to it's only other reference, the hardware clock!
> So what is the reason for ntpd running on host B to reject the gps
> clock? Actually I don't understand the correctness test in detail. It
> tries to find on the time axis a region of overlapping correctness intervals 
> which are calculated out of the offset and jitter of the reference
> clocks. Reference clocks which don't cover this region are rejected.
> Therefore I could imagine that a possible cause could be related to the fact 
> that the peerstats on A & B show for the hwclocks always an offset of
> zero with a jitter that is smaller than that of the gps-card on B.
> Has someone a better idea?
> Thanks in advance,
> Eike
> _______________________________________________
> questions mailing list
> questions at lists.ntp.org
> https://lists.ntp.org/mailman/listinfo/questions

More information about the questions mailing list