[ntp:questions] Re: Clock drift problems

Christopher Browne cbbrowne at acm.org
Mon Jan 19 05:20:09 UTC 2004


Oops! allancady at yahoo.com (Allan Cady) was seen spray-painting on a wall:
> Disclaimer:  I'm asking this question about a Linux box, which I use
> only as a client of a couple of web-based applications and as a file
> server.  I know almost nothing about Linux, or Unix.  I'm a Windows
> guy.  So please go easy on me!

[Grumble, grumble, let's get out the pots of boiling oil...  :-).]

> The problem is, on this Linux machine (Red Hat, I don't know what
> version), the real-time clock has gotten 7 hours slow.  The guys
> here who administer this machine seem to be stumped about what to do
> about it.  (This boggles my mind.)  There are two problems: how to
> get it back where it belongs, and how to prevent it from getting out
> of sync in the future.

> The explanation I'm given about why they can't just "set the clock"
> is that there are applications that would wig out if the clock all
> of the sudden changes by 7 hours.  I can understand that, but things
> like NTP are supposed to be able to deal with this kind of thing by
> adjusting the clock slowly over a period of time.  I don't know the
> details, and I certainly don't know how to set this up on Linux.
> And I don't know if it's capable of handling such a gross
> correction.

NTP is NOT capable of coping with such a gross correction over time;
it "gives up" if it finds things more than 1000 seconds off.  The
problem is that if it goes with small incremental adjustments, it
could readily take WEEKS to adjust by 7 hours.

There is going to have to be some form of "outage" on the machine, as
a result.  The simplest answer may be to get appropriate NTP hosts
into /etc/ntp.conf and /etc/ntp/step-tickers (the latter is needed in
order to get the initial sync that overcomes the "off by 7h" problem),
and see about rebooting the system.

Ideally, that shouldn't be necessary; if they shut down things like
database applications, that may suffice to prevent apps from "wigging
out."  Shut down the "at risk" applications and services, restart ntp
(the command is "/etc/init.d/ntp restart"), and restart the other
apps.

> The explanation I'm given about why the clock is losing time so
> badly in the first place (about 15 minutes a week), is that it
> happens when we do our weekly backups to DVD-ROM; something is
> locking out the hardware interrupt that makes the clock work.  Is
> this "normal"?  They claim it's nothing to do with Linux, that it
> would happen with Windows too.  I've never seen anything like this
> happen on Windows... DOS maybe, but that was 15 years ago.  This is
> a Dell PowerEdge 1600 machine, less than a year old.

Yeah, this is something of a "known issue."  When the system bus gets
taken over by DMA, that certainly can block clocks' access.  Various
Unixes have suffered from similar things over the years, although it
is usually just that the clock gets jittery, not that it outright
dies.

> Given my ignorance of Linux, it's hard for me to ask specific "how do
> you do this" questions... for starters, I'm mostly looking for a
> general opinion of whether this problem is really as confounding as my
> buddies think it is.  It may get to questions of "how can we configure
> NTP to do the gross correction without breaking applications", and "is
> there any way to fix the system so that the clock doesn't drift when
> we're doing backups", but to start with, I'd just like to know if I
> can believe the guys who are telling me there's nothing we can do
> about it.  Or if maybe I can point them somewhere... tell them, "read
> the man on ...".
>
> Or maybe we should be asking Dell for help with this?  

You'd have to find someone at Dell that knows about hardware clocks as
well as NTP, which probably isn't anyone you'll be able to speak with
:-(.

Configuring and running NTP is the right answer.  The thorny bit will
be finding an opportunity to get the system outage.

Of course, if any of the applications care what time it is, they're
broken, and so they _need_ the outage, like it or not...

You should probably visit comp.protocols.time.ntp; there may be
further insights there...
-- 
output = ("aa454" "@" "freenet.carleton.ca")
http://www.ntlug.org/~cbbrowne/ntp.html
:FATAL ERROR -- VECTOR OUT OF HILBERT SPACE



More information about the questions mailing list