[ntp:questions] Quasi-On_topic: System (kernel) time jumps 3600 seconds at random times. Stumped.

elickd at one.net elickd at one.net
Wed May 18 14:41:47 UTC 2005


I know this really isn't on topic for the NTP newsgroup, but I believe
the people who frequent this group have a better understanding of
system time on Unix boxes than anyone else I've encountered thus far.

I've been battling a problem for months now where the system time on
random SCO 5.02/5.0.5/5.0.7 machines jumps + or - 3600 seconds from
true GMT.

Here are the clues I've been working with

1) Cron runs ">cat -s /dev/clock >/dev/null 2>&1 || exit
0;/etc/setclock `date +\%m\%d\H\M\y`" every day at 1 AM and 3 AM. As
best as I can tell, the above command is a very ugly way to check for
the existence of /dev/clock and then set the RTC to the system time,
corrected for DST when applicable.


2)  Cron runs a script every morning at 3:15 AM that based on the
machine name, once a week checks to see that the system time isn't off
by more than 5 minutes using "ntpdate" (and includes logic to error out
the script if it is) and then sets the system time with (another)
"ntpdate -s ${SERVER} >/dev/null 2>&1", where ${SERVER} is the name of
a single NTP server that's been verified alive, out of a pool of
serveral.  ***I didn't write this and disagree with the whole
philosophy of setting system time only once a week using "ntpdate", but
this is the way it is right now.***

2.5) Our timeserver "chain" isn't particularly stable.

3) At 7 a.m., cron runs a small script that checks the system time
against our time servers, logs the difference and sends the results to
a master server for proactive monitoring purposes.

4) There's anecdotal evidence that said 1 hour errors increase in
frequency following either DST change, but not on the exact day.

5) The 3600 second jumps seem to occur more frequently on the day(but
not the exact same moment as) system time is updated by the script
described in #2. Sometimes the error is caught by our support dept.
after the 7 a.m log (#3) and never shows up in ANY logs.  An example is
a location that noticed their time was off by an hour around 10 p.m.
(their time) and had things corrected by support shortly thereafter.

6) Sometimes the RTC agrees with the incorrect system clock and other
times it displays the correct time for that Zone and a 3600 second
error between itself and system time.

7) The TZ variable is verified correct (for their location) on every
machine I've had a problem with.

For a while, I had the idea that the "system" that calculates the DST
time change has a bug in it, but the GMT time the kernel keeps is being
"whacked", not just what "date" reports.

The ntp daemon does a wonderful job of keeping keeping trouble systems
in check, but currently I am not in a position to implement it on the
2500 odd machines I'm responsible for.

Is there any simple way I could log when the system time is adjusted by
an hour (+/- 120 seconds or so) to determine what is causing my
problems?  Or is there a simple way I can detect what processes are
attempting to adjust system time?

Right now my back is against the wall; any ideas are welcome,

Doug




More information about the questions mailing list