[ntp:questions] Clock skew changes drastically between reboots
devnull at localhost.com
Wed Apr 11 09:35:41 UTC 2007
Hal Murray wrote:
> Spoon wrote:
>> I've noticed something I find very strange on the systems I have to work
>> with. Every time I reboot the computer, the clock skew of the local
>> clock changes, sometimes by what seems to be a huge amount.
>> For example, I boot the computer, let ntpd run for 12 hours, and the
>> value recorded in the drift file is 35 ppm. I reboot the computer, let
>> ntpd run for 12 hours, and I get 5 ppm...
> I'm chasing the same glitch.
> I've seen it on two systems, both i386 running Linux 2.6 kernel.
> I think I've tracked it to tsc_init which calls calculate_cpu_khz
> both are in ./arch/i386/kernel/tsc.c
> tsc_init prints a line like this:
> kernel: Detected 2793.226 MHz processor.
> The problem is that calculate_cpu_khz doesn't return the
> same answer. I hacked the code to call/print it 10 times
> and I get things like this:
> kernel: Detected 2793.287 MHz processor.
> kernel: Detected 2793.225 MHz processor.
> kernel: Detected 2793.228 MHz processor.
> kernel: Detected 2793.304 MHz processor.
> kernel: Detected 2793.242 MHz processor.
> kernel: Detected 2793.192 MHz processor.
> kernel: Detected 2793.334 MHz processor.
> kernel: Detected 2793.203 MHz processor.
> kernel: Detected 2793.292 MHz processor.
> kernel: Detected 2793.237 MHz processor.
> That's a spread of about 50 ppm which matches what I've seen
> before I started looking for this glitch.
I believe you've nailed the problem.
I patched my kernel with:
--- tsc.c 2007-04-11 10:04:50.000000000 +0200
+++ tsc.c 2007-04-11 10:13:13.000000000 +0200
@@ -123,6 +123,7 @@
unsigned long flags;
+ printk("DEBUG: INSIDE calculate_cpu_khz()\n");
/* run 3 times to ensure the cache is warm */
@@ -187,7 +188,7 @@
if (!cpu_has_tsc || tsc_disable)
- cpu_khz = calculate_cpu_khz();
+ cpu_khz = 1266700;
tsc_khz = cpu_khz;
I tested the new kernel on two identical systems.
The frequency offset computed by NTP is now very consistent, within 1-2
ppm each time. This dispersion could easily be attributed to temperature
variation, I think.
Sometime next week, I'll try and understand *why* the calibration in
Linux is incorrect. I've been told to look into SMI and SMM.
Keep me posted if you get other interesting results.
More information about the questions