[ntp:questions] Windows - Seven Days Later [Warning: long post]

John DeDourek dedourek at unb.ca
Fri Oct 15 01:47:00 UTC 2004



David J Taylor wrote:
> Brian Inglis wrote:
> []
> 
>>>As I said, it's not /just/ Windows that does this, though.
>>
>>Which other OSes experience these same problems?
> 
> 
> For example, I have seen reports in this group of Linux with non-DMA disks 
> and too high a clock interrupt frequency.
> 
> 
>>If it is common, it may be a hardware related issue.
>>Have you checked the clock to see what its accuracy is without any
>>time syncing?
>>Have you checked your hardware config to see if say, the network card
>>IRQ is being shared with other cards, and could have an impact on
>>interrupts?
>>Have you shut off all other network services to see if one of them
>>could be having an impact on NTP?
>>A Windows hardware group may offer better advice on these issues.
> 
> 
> I'm not having this problem but that looks like a useful checklist for the 
> OP.
> 
> David 
> 
> 

I was one of the people having trouble with a Red Had Linux box.  To
try to state the problem precisely (most of you know this already):
Most PC operating systems keep time by using an interrupt service
routine to increment the OS time variable.  If you miss an interrupt,
the system time falls behind by one "clock tick" (OS dependent,
typical value 10ms).  ntp will then speed up the clock for a while
until it catches up then slows it down to normal speed again.

Other things use interrupts and their associated service routines,
e.g to do network I/O, disk I/O, etc.  The associated service
routines are in the driver for those devices.  For technical
reasons beyond what I want to include here, the driver
must lock out all interrupts occassionally for "short" periods.

When the clock interrupt occurs while interrupts are locked out,
the clock interrupt routine can't run and update the clock.  The
interrupt is stored in a one bit counter and the service runs
once when interrupts are enabled.  IF TWO INTERRUPTS OCCUR
WHILE INTERRUPTS ARE LOCKED OUT, the counter stays at one
(cause the hardware designers don't put more than one bit in
those things).  Thus the interrupt service routine only increments
the time by one tick, but the clock "ticked twice" and you
are behind one tick.  Not a lot of hope for fixing that
without hardware redesign with more bits in the couter.
(I lie a little here; search
for rumblings about consulting the CPU cycle counter.)

Back to the OS.  The problem is drivers, e.g. the network
driver.  If it locks interrupts out "too long" you lose.
Driver has a certain number of instructions; it will take
longer to run those on a slow machine.

Also, for a while Red Hat in its "wisdom" ratcheted the
tick time down from 10 ms to around 2.5 ms if I recall.

Short tick, slow machine, I lost big time.  Rebuilt
kernel and put tick back to 10 ms and things were better.
Got a faster machine and the other drivers ran quicker
(while the interrupts were locked out) and things ran
better.  Still occasional glitches.  What I need to
do is find a student who wants to instrument Linux to
find the bad drivers and suggest patches to fix
the problem.  It is
possible witn Linux because we can fiddle with the
code.

This is where Windows loses.  All you can
do is tell Microsoft, or the people supplying the
various binary driver disks that someone is locking
interrupts out for more than a tick and hope they have
incentive to fix it.  Can't hope to fix that here.

ntp does its best when other things in the OS (drivers)
cause lost clock interrupts, but the time does wobble
as the OS problem pulls the time back and ntp pushes it
forward.




More information about the questions mailing list