[ntp:questions] high precision tracking: trying to understand sudden jumps

Unruh unruh-spam at physics.ubc.ca
Sun Mar 30 18:57:32 UTC 2008


starlight at binnacle.cx writes:

>Hello,

>I'm trying to configure a small network for high precision time. 
>Recently acquired an Endrun CDMA time server that runs like 
>a dream, tracking CDMA time to about +/- 5 microseconds.

No idea what CDMa time is, but that does not matter. 
Do you have peerstats running on the various machines so you can look at
the raw offset and particularly the round trip times? It may be that your
network one way is suddenly delaying things for mseconds one way for half
an hour say. 


>The clients are a rag-tag assembly of diverse systems including 
>a Centos 4.5 Linux i686, Linux x86_64, Sun Ultra 10, Sun Ultra 80, 
>IBM RS/6000 44p, Windows 2003 X64, and a Windows XP laptop.

>All are configured to prefer the Endrun clock and poll it on a 
>16 second interval.  All are attached to a single SMC gigabit 
>Ethernet switch with only the Endrun and two Sun systems running 
>at a lower speed of 100 MBPS.  Close to zero network traffic
>and system loads.

Maybe that ethernet switch suffers a nervous breakdown (too little to do?)
once a day. 



>All systems are running 'ntpd' 4.2.4p4.  Compiled NTP native 
>64-bit for the Windows X64 system.  [A #ifdef tweak to 
>'intptr_t' and 'uintptr_t' is required, will provide patch if 
>desired].

>It generally is working well, with the systems tracking anywhere 
>from +/- 100 microseconds to +/- 500 microseconds most of the 
>time.

Should be within 10s of usec, not hundreds.



>However once or twice a day, all the systems experience a 
>random, uncorrelated time shift of from one to several 
>milliseconds.  Had an issue where a UPS voltage correction shift 
>and cheap power supply on the Windows X64 box appeared to be a
>problem, but that was fixed by configuring the UPS to consider 
>110V nominal instead of 120V.

>Does anyone have any ideas about what could be causing these 
>random time jumps and what might be done to eliminate them?

>Something I'm planning to try is to make sure that 'mlock' is 
>configured in the daemons--presently 'autoconf' has left it 
>disabled for some reason.  However I don't belive page
>faults are the culprit.  All the daemons are running at 
>the highest real-time priority in the respective systems.

>The above configuration is a controlled lab setup.  The next 
>target is a stack eight of DELL 1950 servers in a production 
>data center running Windows 2003 R2 and slaved to a newer Endrun 
>time server.  Don't have useful data from these systems yet 

I would have just used a cheap GPS receiver, not pay $700 for one of these, 
but it's your money.

Ah, just looked at their web page. Would I really believe that the CDMA
cell phone network would care if their time signal were accurate to usec? 
There is no time path correction. But you should see that on your server
connected to the device. 

Anyway, look at the peerstats file, esp the roundtrip times and the
offsets. The ntp clock-filter tries to compensate for vast variations in
these but can only do so much.




>because the network jitter is outrageous.  Working with the 
>network admin to hopefully have the NTP traffic to and from the 
>Endrun clock bypass level 3 switch/router rule checking.  They 
>have large, complex router ACL rulesets I suspect as the cause
>of the jitter.

Sounds a bit weird. On an ADSL link from home through the telco to the university, I get
better than 1ms time accuracy. 

>Attached are fairly representative graphs of the offset and 
>frequency for two of the lab servers.

Netnews is text only. Post the info on a web page where anyone can look at
it. 



>Thanks
>P.S. Resent without graphs as the list mailer says
>they're not allowed.  Happy to send them or the raw
>'loopstats' to anyone interested.

Just post them.




More information about the questions mailing list