[ntp:questions] Clock skew changes drastically between reboots

Spoon devnull at localhost.com
Wed Apr 11 09:35:41 UTC 2007


Hal Murray wrote:

> Spoon wrote:
>
>> I've noticed something I find very strange on the systems I have to work
>> with. Every time I reboot the computer, the clock skew of the local
>> clock changes, sometimes by what seems to be a huge amount.
>> 
>> For example, I boot the computer, let ntpd run for 12 hours, and the
>> value recorded in the drift file is 35 ppm. I reboot the computer, let
>> ntpd run for 12 hours, and I get 5 ppm...
> 
> I'm chasing the same glitch.
> 
> I've seen it on two systems, both i386 running Linux 2.6 kernel.
> 
> I think I've tracked it to tsc_init which calls calculate_cpu_khz
> both are in ./arch/i386/kernel/tsc.c
> tsc_init prints a line like this:
>   kernel: Detected 2793.226 MHz processor.
> 
> The problem is that calculate_cpu_khz doesn't return the
> same answer.  I hacked the code to call/print it 10 times
> and I get things like this:
>  kernel: Detected 2793.287 MHz processor.
>  kernel: Detected 2793.225 MHz processor.
>  kernel: Detected 2793.228 MHz processor.
>  kernel: Detected 2793.304 MHz processor.
>  kernel: Detected 2793.242 MHz processor.
>  kernel: Detected 2793.192 MHz processor.
>  kernel: Detected 2793.334 MHz processor.
>  kernel: Detected 2793.203 MHz processor.
>  kernel: Detected 2793.292 MHz processor.
>  kernel: Detected 2793.237 MHz processor.
> 
> That's a spread of about 50 ppm which matches what I've seen
> before I started looking for this glitch.

I believe you've nailed the problem.

I patched my kernel with:
--- tsc.c	2007-04-11 10:04:50.000000000 +0200
+++ tsc.c	2007-04-11 10:13:13.000000000 +0200
@@ -123,6 +123,7 @@
  	int i;
  	unsigned long flags;

+	printk("DEBUG: INSIDE calculate_cpu_khz()\n");
  	local_irq_save(flags);

  	/* run 3 times to ensure the cache is warm */
@@ -187,7 +188,7 @@
  	if (!cpu_has_tsc || tsc_disable)
  		goto out_no_tsc;

-	cpu_khz = calculate_cpu_khz();
+	cpu_khz = 1266700;
  	tsc_khz = cpu_khz;

  	if (!cpu_khz)

I tested the new kernel on two identical systems.

The frequency offset computed by NTP is now very consistent, within 1-2 
ppm each time. This dispersion could easily be attributed to temperature 
variation, I think.

Sometime next week, I'll try and understand *why* the calibration in 
Linux is incorrect. I've been told to look into SMI and SMM.

Keep me posted if you get other interesting results.

Regards.




More information about the questions mailing list