[ntp:questions] Re: clock not synching with low stratum time server

David Woolley david at djwhome.demon.co.uk
Mon Jun 20 20:53:09 UTC 2005


In article <1119259899.825087.5260 at g49g2000cwa.googlegroups.com>,
sbalaji79 at gmail.com wrote:

> Following are the information that you have requested:-

You haven't provided corresponding information for the other cla 
machines, but I would speculate that they are configured the same
except that they don't have any true source of time at all (or
are failing in the same way).

What you need to do is:

1) replace peering by server client relationships, leading away from
   the machine with the real source of time.  If you retain peering,
   you must eliminate the local clocks.

2) Preferably remove all local clocks.  The machines will free run using
   the last correction data if you don't have a local clock driver.

3) If you keep the local clocks, make sure that the machine with real 
   time sources has several (e.g. four) good ones, so that it will outvote
   bogus local clocks in the falseticker algorithm, and remove them from
   the other machines.

4) If you cannot remove them, from the other machines, stagger their
   strata by at least two and use server client relationships with the
   client having the two higher local clock stratum.


> 1) Hardware is alpha and Os is Tru64

If Tru64 has chosen to call ntpd xntpd they have made a mistake that
is going to cause a lot of support problems.  It is generally reckoned
that xntpd was the wrong name to use.

> XNTPDC -P output ( every 900 seconds)

The preferred format is ntpq output.

> *LOCAL(0)        127.0.0.1       12   64  377 0.00000  0.000000 0.00189

This has been selected because there are no overlapping error 
intervals, so all but one source gets eliminated before considering
strata.

> =LOCAL(0)        127.0.0.1       12   64  377 0.00000  0.000000 0.00191
> *cla2astr        10.0.0.1        13 1024  376 0.00615  0.002557 0.02705
> +cla3astr        10.0.0.1        14 1024  377 0.00049 -0.000086 0.01656

cla2 gets selected because its offset is close enough to the local
clock that the error bounds overlap.  cla3's error bounds also overlap.
The local clock is now ruled out because of the rule that says that
it is the source of last resort during clustering.  timeclr is 
ruled out in the falseticker check because it doesn't overlap with the
other three.

I guess that cla2 is using its local clock as source, and that is
why it shows stratum 13.

> Drift file has got the following value : 1.470(after 3 days).

That's remarkably low.  I suspect it is seeing its own local clock 
reflected back from the other machines.

> I used local clock as a fall back reference just in case server
> timeclripfa
> goes down.

In most cases, just letting the machines free run is enough.

> Other local peer servers ( cla2astr,cla3astr) also has approximately
> same kind of stats as this one(cla2astr) i.e 130s drift after three
> days.

The hardware needs fixing.  If the local clock is running with only 1.47ppm
correction, the true correction required (just) exceeds the capture range
of ntpd.  You need to find out why you have a 500ppm frequency error
and fix that before you do anything else, otherwise your clock will
get continually stepped (assuming you allow that - otherwise it will
run away by the excess of the required correction over 500ppm).
 



More information about the questions mailing list