[ntp:questions] high precision tracking: trying to understand sudden jumps

Unruh unruh-spam at physics.ubc.ca
Tue Apr 1 01:09:55 UTC 2008


starlight at binnacle.cx writes:

>Here are URLs for those two sample graphs:

>http://binnacle.cx/file/ntp_hickups_linux.gif
>http://binnacle.cx/file/ntp_hickups_win.gif

>David Woolley wrote:
>>
>>> The clients are a rag-tag assembly of diverse systems including 

>>> a Centos 4.5 Linux i686, Linux x86_64, Sun Ultra 10, Sun Ultra 80, 

>>> IBM RS/6000 44p, Windows 2003 X64, and a Windows XP laptop.
>>
>>How are you interpolating the 16ms ticks on the Windows system?
>>How are you disabling power management on the lap top?

>The generic version of 'ntpd' has some sophisticated code that 

>handles interpolation.  See the source.  Power management is 

>disabled on the laptop using the standard control panel option.  

>Don't really care that much about this machine anyway.

>>> It generally is working well, with the systems tracking anywhere 

>>> from +/- 100 microseconds to +/- 500 microseconds most of the 

>>> time.
>>
>>How are you measuring the difference from true time?  In principle, if 

>>ntpd can measure it, it will correct it.

>Using 'ntpd' 'loopstats'.  It does, check out the graphs.

>Maybe I'll turn on 'peerstats' too, but I really doubt a 

>stand-alone good quality switch would be causing random delays.  

>Pings are consistently 400 microseconds and 'ntpq -p' reports 800 

>microsecond roundtrip delays.  I've never heard of a switch
>causing a 5ms delay.

>>> 

>>> However once or twice a day, all the systems experience a 

>>> random, uncorrelated time shift of from one to several 

>>> milliseconds.  Had an issue where a UPS voltage correction shift 

>>
>>In which direction is the slip?  Backward only slips against true time 

>>(these might appear as forward slips if the real error is in the server) 

>>are typically due to lost clock interrupts.  If that is the case it 

>>implies you are using a tick rate of other than 100Hz.  Please note that 

>>the Linux kernel code is broken for clock frequencies other than 100Hz 

>>and the use of 1000Hz significantly increases the likelihood of a lost 

>>interrupt.

>Perhaps that's a problem.  The RHEL/Centos stock kernel seems to
>have a 1000Hz clock interrupt.  At least 'vmstat' shows 1000
>ints/sec on an idle system.

>>The normal source of lost interrupts is disk drivers using programmed 

>>transfers.

>Think it's all DMA.  Remember this is a really diverse bunch
>of machines and OSs.  The RS/6000 is working the best.

>These jumps aren't killing me.  Just want to figure out if they 

>can be eliminated.  If we needed super accurate time we'd 

>probably have make use of PTP (precision timing protocol).

No idea what that is. If you had wanted super precision you would have put
a GPS onto each machine, I hope. 

>From the Wikipedia entry on PTP it looks absolutely no different from ntp.
I have no idea what the idea is. 

I highly doubt that you will get better time with PTP. NOw with chrony, my
measurements indicate that with the typical drift wander on my machines,
chorny gives 2-3 times better variance than does ntp. But it uses exactly
the same exchange protocol as ntp and uses a different clock discipline
algorithm. 



>Still très expensive.




More information about the questions mailing list