[ntp:questions] problem with a failsafe setup, hardware clocks and the correctness test

Eike Middell eike at middell.net
Wed Aug 1 14:46:50 UTC 2007

Hi list,
I have set up a time synchronisation system in a small network which
should be somehow failsafe. On hand I have a Meinberg GPS 169 PCI card 
and a slow and often saturated satellite link to the internet.
The idea is to have one host (A) at the site which does not use the 
GPS time but only synchronizes via internet. In the case that the host
with the GPS card (B) fails, host A will become the one with the
lowest stratum number and will become the time source for all other
hosts. If both methods fail host A and B have their hardware clocks
as reference and can provide this time. During normal operation host B
is the reference for all other hosts and I can monitor the performance 
of both synchronisation methods via the stats that nptd on B give for 
it's refclock A. 

  refclock1, refclock2,...
        \       /
         \     /
     iternet via sat                 GPS
            |                         |
   hwclock  |                         |   hwclock
         \  |                         |  /
         Host A  <-A-is-ref-for-B-  Host B
       (stratum 2)                (stratum 1)
              \                   /
               \                 /
                `- other hosts -´

This setup worked satisfactory. Recently I noticed some behaviour of
the system that I cannot explain. The small set of reference clocks 
given to host A reduced to only one host and the hardware clock as 
all other peers became unreachable. In this situation and given the
unreliable internet connection host A switches casually between this 
one last host and it's hardwareclock. 

Each time that this occured on host A on host B the gps card 
failed to pass the  correctness test which led to host B also switching 
to it's only other reference, the hardware clock!

So what is the reason for ntpd running on host B to reject the gps
clock? Actually I don't understand the correctness test in detail. It
tries to find on the time axis a region of overlapping correctness intervals 
which are calculated out of the offset and jitter of the reference
clocks. Reference clocks which don't cover this region are rejected.

Therefore I could imagine that a possible cause could be related to the fact 
that the peerstats on A & B show for the hwclocks always an offset of
zero with a jitter that is smaller than that of the gps-card on B.

Has someone a better idea?

Thanks in advance,

More information about the questions mailing list