[ntp:questions] Isolated Network Drift Problem
david at ex.djwhome.demon.co.uk.invalid
Fri Nov 21 08:10:47 UTC 2008
Cal Webster wrote:
> Our NTP servers are slowly loosing time. All are in nearly perfect sync
> but collectively drift backwards over time. Is there a way to apply a
> bias to the drift calculations?
ntp.drift on the one machine with the local clock configured.
> We had to disconnect from the Internet several months ago. Since then we
> have had serious drift problems. Shortly after the disconnect I
> discovered that we were predictably loosing 10 minutes every 15 days. I
> tried several things but not until I zeroed out the
> "driftfile" (/var/lib/ntp/drift) 10 days ago [Mon Nov 10 18:10:00 2008]
> did this large drift abate.
Drift > 463ppm (500ppm is ntpd's limit of correctable drift, when no
phase noise is present). Something is seriously broken. I suspect that
you have a lost timer interrupts problem and ntpd was papering over the
cracks. That has to be fixed at source. If the 10/15 minutes a day was
consistent from when you started free-running, that is the only thing I
can think of. If it ramped up, another problem might be your misuse of
local clock drivers.
> Although it is much improved, we are still steadily loosing time. Three
> days after I zeroed the drift file [Thu Nov 13 15:04:00 EST 2008] we
> were 32 seconds behind. Today, 10 days later [Thu Nov 20 09:05:00 2008]
> we are 1 min 54 secs behind. This works out to roughly 12 secs per day -
> not bad I guess but still requires regular monitoring.
138 ppm is still way too high; temperature only tends to produce
variations in the single figures. Whilst you will get some benefit by
setting the drift file to 138, with the opposite sign from before, the
instability you report indicates that you a more serious problem to fix.
Before all the recent clock hacks in Linux, when using just the CTC
interrupts, 30 seconds a year was a reasonable target for an air
conditioned computer room and a reasonably stable processing load.
> server 127.127.1.0
> fudge 127.127.1.0 stratum 5
If you have a time island, there should be exactly one master server
with a relatively low stratum local clock, although stratum 5 is
dangerously low. Your target should be that you end up with some
clients at stratum 14 or 15.
Any pure clients should not have a local clock. That is universally
true, not just for time islands. For the remaining machines, you should
either specify a clear hieararchy, with steps of two in the local
clock stratum between each one, or, I think orphan mode will work,
providing the master server, with the local clock, never goes down for
more than a few hours at a time. (There is circumstancial evidence, in
a recent thread, that root dispersion will diverge on orphan mode
servers until they get rejected for excessive root distance.)
> [root at axl /]# cat /etc/adjtime
> 44.508790 1226358437 0.000000
You should not use this and ntpd at the same time (actually, if you are
careful, you may be able to use it for correcting the time across a
period in which the machine is powered down, but doing so requires
More information about the questions