[ntp:questions] Isolated Network Drift Problem

David Woolley david at ex.djwhome.demon.co.uk.invalid
Fri Nov 21 08:10:47 UTC 2008


Cal Webster wrote:
> Our NTP servers are slowly loosing time. All are in nearly perfect sync
> but collectively drift backwards over time. Is there a way to apply a
> bias to the drift calculations?

ntp.drift on the one machine with the local clock configured.
> 
> We had to disconnect from the Internet several months ago. Since then we
> have had serious drift problems. Shortly after the disconnect I
> discovered that we were predictably loosing 10 minutes every 15 days. I
> tried several things but not until I zeroed out the
> "driftfile" (/var/lib/ntp/drift) 10 days ago [Mon Nov 10 18:10:00 2008]
> did this large drift abate.

Drift > 463ppm (500ppm is ntpd's limit of correctable drift, when no 
phase noise is present).  Something is seriously broken.  I suspect that 
you have a lost timer interrupts problem and ntpd was papering over the 
cracks.  That has to be fixed at source.  If the 10/15 minutes a day was 
consistent from when you started free-running, that is the only thing I 
can think of.  If it ramped up, another problem might be your misuse of 
local clock drivers.
> 
> Although it is much improved, we are still steadily loosing time. Three
> days after I zeroed the drift file [Thu Nov 13 15:04:00 EST 2008] we
> were 32 seconds behind. Today, 10 days later [Thu Nov 20 09:05:00 2008]
> we are 1 min 54 secs behind. This works out to roughly 12 secs per day -
> not bad I guess but still requires regular monitoring.
> 

138 ppm is still way too high; temperature only tends to produce 
variations in the single figures.  Whilst you will get some benefit by 
setting the drift file to 138, with the opposite sign from before, the 
instability you report indicates that you a more serious problem to fix.

Before all the recent clock hacks in Linux, when using just the CTC 
interrupts, 30 seconds a year was a reasonable target for an air 
conditioned computer room and a reasonably stable processing load.

> server 127.127.1.0
> fudge   127.127.1.0 stratum 5

If you have a time island, there should be exactly one master server 
with a relatively low stratum local clock, although stratum 5 is 
dangerously low.  Your target should be that you end up with some 
clients at stratum 14 or 15.

Any pure clients should not have a local clock.  That is universally 
true, not just for time islands.  For the remaining machines, you should 
  either specify a clear hieararchy, with steps of two in the local 
clock stratum between each one, or, I think orphan mode will work, 
providing the master server, with the local clock, never goes down for 
more than a few hours at a time.  (There is circumstancial evidence, in 
a recent thread, that root dispersion will diverge on orphan mode 
servers until they get rejected for excessive root distance.)


> 
> 
> [root at axl /]# cat /etc/adjtime
> ------------------------------------
> 44.508790 1226358437 0.000000
> 1226358437
> LOCAL

You should not use this and ntpd at the same time (actually, if you are 
careful, you may be able to use it for correcting the time across a 
period in which the machine is powered down, but doing so requires 
special considerations




More information about the questions mailing list