[ntp:questions] Re: NTP with linux kernel 2.6.9 vs. kernel 2.4.24
dedourek at unb.ca
Sun Jan 23 03:04:01 UTC 2005
You didn't mention the hardware that you were using. I was just
looking at the 2.6 kernel source in the time keeping area the
other day. There actually appears to be quite a few changes
since I looked at it a while back (not sure if that was 2.2 or 2.4).
I am NOT a kernel hacker expert.
However, as I read the 2.6 code, it seems to me:
-- HZ has been taken out of the .config, so it is not posssible
to just change HZ in the .config and recompile
-- it looks like the "actual HZ" used by the timer interrupt
has been fixed at 1000 in the ".h" file (for the Intel i386
-- the kernel "lies" to "user space" (i.e. applications) that
the HZ is 100 (called "user HZ") so that applications think
that the standard "jiffy" is 10 ms.
-- the timer interrupt does some "fancy footwork" using either
the TSC (a simple clock pulse counter built into later
pentium chips) or a thing called something like "HPET" timer
(doing this from memory, so I may have that acronym wrong)
to deduce when timer interrupts are lost and correct for this
I will add to this some speculation:
-- Originally, almost all Unix's and Linux's on PC style hardware
ran the clock interrupt at 100Hz, or once every 10 ms. Note
that various I/O driver code must lock off interrupts occasionally
to avoid "race conditions" in messing with data structures that
are modified both in interrupt code and in non-interrupt code.
An interrupt occurring while interrupts are locked out is
stored in a one-bit "register". If two interrupts should occur
while interrupts are locked out then when interrupts are again
enabled, the interrupt service routine is only executed once.
This means that one increment of the system time will not be made.
-- A lost clock interrupt (without the fancy footwork of the 2.6
kernel) would cause time to fall behind by 10 ms. (100 Hz clock)
or by 1 ms. (1000 Hz clock).
-- If the interrupts are locked out for more that 10 ms. (100 Hz
clock) or for more than 1 ms. (1000 Hz clock) the time would
-- The time the interrupts are locked out depends critically on
the type of other hardware on this system (e.g. whether DMA
is available for certain devices, and which drivers are
-- The time interrupts are locked out depends on the speed of
the CPU; the same code takes longer to run on slower CPUs
-- Older CPUs may not have the timers to correct for lost interrupts;
it DOESN'T look like the 2.6 kernel sets the clock to 100 Hz
in the absence of a TSC or something to use to avoid lost
-- The 2.6 kernel seems to have more support for the newer CPUs
that allow the speed to be stepped up and down depending on
computation needs, temperature, or who knows what else. I
haven't been able to follow what happens in this case, e.g.
whether the TSC becomes an unreliable indicator for lost
Perhaps someone who works on the time keeping code in the kernel
monitors this list and can comment on the above. (Then again,
based on my experience, maybe not.) Back in the RedHat 7.x series,
someone at RedHat stepped up the HZ from 100 to 1000 and timekeeping
on my old machine was rotten until I recompiled the kernel for
100 Hz. Of course, I don't believe the kernel in those days
aattempted to detect lost interrupts. Now it appears this is
happening again, but Linux is supporting only the newer machines
that have the TSC timer and/or are fast enough that interrupts
aren't locked out for too long.
Goetz Lichtwald wrote:
> I got through this mailing list and asked google for several
> times and still, I found no answer or hint where to look at.
> So, the basic problem is that I have a couple of linux boxes
> that need to be synchronized via NTP. Those boxes have all the
> very same hardware and configuration.
> Running the kernel 2.4.24 NTP works fine. For various reasons
> there is the need to run -- at least a few of them -- with
> kernel 2.6.9 and here it happens. NTP complains. The Log states:
> ntpd: frequency error -512 PPM exceeds tolerance 500 PPM
> Using "adjtimex" does not alleviate the problem at all. The
> kernel 2.6.9 was configured via "make oldconfig" from kernel
> Does anybody has a hint where to look at or even a solution to
> the problem. For better debugging, some further facts about the
> ntp 4.2.0a-11
> os Debian Sarge with kernel 2.6.9 resp. 2.4.24
> Any help is welcome!
> - goetz
More information about the questions