[ntp:questions] NTP vs chrony comparison (Was: oscillations in ntp clock synchronization)
kaukasoina708n8s6l3g0 at sci.fi
Fri Jan 25 10:05:29 UTC 2008
Hal Murray <hal-usenet at ip-64-139-1-69.sjc.megapath.net> wrote:
>>On recent Linux kernels, I think the drift file is always bad after reboot.
>>HZ=100, no dynamic ticks aka tickless system (CONFIG_NO_HZ not set). I think
>>I even tried with a kernel command line option lpj= but it didn't help.
>>If the system is rebooted, ntpd stabilizes to a new different drift value.
>That's a bug in the TSC calibration code.
>grep your /var/log/messages* for "Detected". You will find things like thsi:
> Jan 4 11:21:49 shuksan kernel: Detected 2793.137 MHz processor.
> Jan 4 21:30:43 shuksan kernel: Detected 2793.209 MHz processor.
> Jan 22 09:32:20 shuksan kernel: Detected 2793.139 MHz processor.
Yes, you are right. I had looked at some other lines changing in every boot,
different values even for the two separate hyper threading "cores" of the
same p4 processor:
Jan 9 08:30:06 elektroni kernel: Calibrating delay using timer specific routine.. 6388.50 BogoMIPS (lpj=31942516)
Jan 9 08:30:06 elektroni kernel: Calibrating delay using timer specific routine.. 6384.21 BogoMIPS (lpj=31921075)
Jan 9 08:30:06 elektroni kernel: Total of 2 processors activated (12772.71 BogoMIPS).
Jan 9 08:46:16 elektroni kernel: Calibrating delay using timer specific routine.. 6388.46 BogoMIPS (lpj=31942340)
Jan 9 08:46:16 elektroni kernel: Calibrating delay using timer specific routine.. 6384.19 BogoMIPS (lpj=31920985)
Jan 9 08:46:16 elektroni kernel: Total of 2 processors activated (12772.66 BogoMIPS).
I had already forbidden this calibration and given lpj a constant value but
when I now also forced the processor MHz calibration value (actually cpu_khz
in tsc.c) to a constant value, the problem vanished.
I did some tests to see how nptd behaves in different cases. 32-bit Linux
kernel 220.127.116.11 without any patches and ntpd 4.2.4p4 without any patches.
ntpd gets time from the internet (WAN).
1. Without an initial drift file, time set to a correct value with ntpdate
before starting ntpd. The first frequency drift value in ntp-loopstats is 89
ppm and it grows to 92 ppm before starting to get lower again. The time
offset in ntp-loopstats immediately grows to +77 ms and then starts to lower
but overshoots badly to -112 ms and then finally steps ("time reset
-0.135405") four hours after starting ntpd. Frequency is still 91 ppm. The
time offset continues to lower from zero to -20 ms (freq. 88 ppm) until it
starts to go to the right direction again. It takes over 7 hours to get the
the offset to -10 ms and 16 hours to get it to -1 ms. I reboot after 34
hours when frequency drift value is 72 ppm and the offset is about 0.1 ms.
2. After reboot, the Linux kernel thinks the processor clock is 3192.182 MHz
instead of 3192.210 MHz as before booting. So, while we have a drift file,
it's not quite correct. Again, time is set to a correct value with ntpdate
before starting ntpd. The frequency gets lower from the initial 72 ppm and
the time offset grows (negative). After 40 minutes the time offset is
largest -9 ms and after that it starts going to the right direction. After
nine hours, the offset is better that -1 ms. Finally the frequency is 64
ppm. (About 8 ppm lower than in test 1. The calibrated processor frequency
was 9 ppm lower than in test1. So, the connection is clear.)
3. If I force cpu_khz in the Linux kernel to a constant value, the problem
goes away. (Just for fun, I lowered the frequency given by the kernel
calibration routine by the frequency offset given by the ntp drift file and
put that to the cpu_khz variable. So now the drift value stabilizes very
near zero.) Now the absolute value of the time offset always stays below 1
ms even after reboot. (BTW, in linux-2.6.24 the variable has moved. It's now
in file arch/x86/kernel/tsc_32.c)
4. Even if I allow the Linux kernel calibrate cpu_khz itself, I can also get
good results by calibrating the drift value before starting ntpd, with a
script I sent to this thread earlier, no need for any previous drift file.
Basically, it stepped time with ntpdate, slept 100 seconds and stepped time
again with ntpdate. From the time adjustment, the script calculated the
drift value and put that to the drift file. Again, the time offset always
stays below 1 ms.
More information about the questions