[ntp:questions] Re: NTP with linux kernel 2.6.9 vs. kernel 2.4.24

John DeDourek dedourek at unb.ca
Sun Jan 23 03:04:01 UTC 2005


You didn't mention the hardware that you were using.  I was just
looking at the 2.6 kernel source in the time keeping area the
other day.  There actually appears to be quite a few changes
since I looked at it a while back (not sure if that was 2.2 or 2.4).

I am NOT a kernel hacker expert.

However, as I read the 2.6 code, it seems to me:
-- HZ has been taken out of the .config, so it is not posssible
    to just change HZ in the .config and recompile
-- it looks like the "actual HZ" used by the timer interrupt
    has been fixed at 1000 in the ".h" file (for the Intel i386
    architecture anyway)
-- the kernel "lies" to "user space" (i.e. applications) that
    the HZ is 100 (called "user HZ") so that applications think
    that the standard "jiffy" is 10 ms.
-- the timer interrupt does some "fancy footwork" using either
    the TSC (a simple clock pulse counter built into later
    pentium chips) or a thing called something like "HPET" timer
    (doing this from memory, so I may have that acronym wrong)
    to deduce when timer interrupts are lost and correct for this

I will add to this some speculation:
-- Originally, almost all Unix's and Linux's on PC style hardware
    ran the clock interrupt at 100Hz, or once every 10 ms.  Note
    that various I/O driver code must lock off interrupts occasionally
    to avoid "race conditions" in messing with data structures that
    are modified both in interrupt code and in non-interrupt code.
    An interrupt occurring while interrupts are locked out is
    stored in a one-bit "register".  If two interrupts should occur
    while interrupts are locked out then when interrupts are again
    enabled, the interrupt service routine is only executed once.
    This means that one increment of the system time will not be made.
-- A lost clock interrupt (without the fancy footwork of the 2.6
    kernel) would cause time to fall behind by 10 ms. (100 Hz clock)
    or by 1 ms. (1000 Hz clock).
-- If the interrupts are locked out for more that 10 ms. (100 Hz
    clock) or for more than 1 ms. (1000 Hz clock) the time would
    fall behind
-- The time the interrupts are locked out depends critically on
    the type of other hardware on this system (e.g. whether DMA
    is available for certain devices, and which drivers are
    being used)
-- The time interrupts are locked out depends on the speed of
    the CPU; the same code takes longer to run on slower CPUs
-- Older CPUs may not have the timers to correct for lost interrupts;
    it DOESN'T look like the 2.6 kernel sets the clock to 100 Hz
    in the absence of a TSC or something to use to avoid lost
    interrupts.
-- The 2.6 kernel seems to have more support for the newer CPUs
    that allow the speed to be stepped up and down depending on
    computation needs, temperature, or who knows what else.  I
    haven't been able to follow what happens in this case, e.g.
    whether the TSC becomes an unreliable indicator for lost
    interrupts.

Perhaps someone who works on the time keeping code in the kernel
monitors this list and can comment on the above.  (Then again,
based on my experience, maybe not.)  Back in the RedHat 7.x series,
someone at RedHat stepped up the HZ from 100 to 1000 and timekeeping
on my old machine was rotten until I recompiled the kernel for
100 Hz.  Of course, I don't believe the kernel in those days
aattempted to detect lost interrupts.  Now it appears this is
happening again, but Linux is supporting only the newer machines
that have the TSC timer and/or are fast enough that interrupts
aren't locked out for too long.

Goetz Lichtwald wrote:
> Hi,
> 
> 
> I got through this mailing list and asked google for several
> times and still, I found no answer or hint where to look at. 
> 
> So, the basic problem is that I have a couple of linux boxes
> that need to be synchronized via NTP. Those boxes have all the
> very same hardware and configuration. 
> 
> Running the kernel 2.4.24 NTP works fine. For various reasons 
> there is the need to run -- at least a few of them -- with 
> kernel 2.6.9 and here it happens. NTP complains. The Log states:
> 
> ntpd[22804]: frequency error -512 PPM exceeds tolerance 500 PPM
> 
> Using "adjtimex" does not alleviate the problem at all.  The 
> kernel 2.6.9 was configured via "make oldconfig" from kernel 
> 2.4.24.
> 
> Does anybody has a hint where to look at or even a solution to
> the problem. For better debugging, some further facts about the
> environment:
> ntp 4.2.0a-11 
> os  Debian Sarge with kernel 2.6.9 resp. 2.4.24
> 
> 
> Any help is welcome!
> 
>  - goetz




More information about the questions mailing list