[ntp:questions] very slow convergence of ntp to correct time.

Unruh unruh-spam at physics.ubc.ca
Sun Jan 20 01:12:22 UTC 2008


"Richard B. Gilbert" <rgilbert88 at comcast.net> writes:

>Unruh wrote:
>> Unruh <unruh-spam at physics.ubc.ca> writes:
>> 
>> 
>>>In trying to figure out whether it is chrony or the machine which is
>>>causing the oscillations in the rate on my systems, I decided to switch one
>>>to ntp. That system was withing 10usec of the "correct" time ( ie the time
>>>as desgnated by a GPS driven computer withing about 150usec roundtrip
>>>network distance). ntp started up and the offset steadily climbed until it
>>>was 60msec (not usec) offset. At that point finally ntp started to do
>>>something   and
>>>gradually brought the offset down. Now, three hours later it is still
>>>14msec (1000 times worse than it started out as) offset and has settled
>>>down there. 
>>>loopstats says that the timescale is now a 9 ( which I guess is 500 sec--
>>>10 min) while the clock still sits there 14 msec away from the correct
>>>time with little evidence of the offset decreasing.
>>> While I can understand the desire to have
>>>long integration constants, but surely the convergence to the correct time
>>>( or rather say +- 10usec, not 10ms from correct time) should be a bit
>>>faster than this.
>> 
>> 
>> 
>>>Is there some problem in ntp?
>>>ntp-4.2.0-31.2mdv2007.0
>> 
>> 
>> 
>> 
>> Just an update. ntp has now been running for 15 hrs, and the offsets are
>> still in the few ms range (all positive). The frequency has converged but
>> the offsets have not. ntp has gone to max time (level 10) for the querying,
>> but the offsets are still over 100 times worse than I was getting with
>> chrony. (Yes, I know, one suggestion is-- go back to chrony-- but the
>> reason I am carrying out this test is to see if those rate oscillation I
>> was seeing with chrony are eliminated with ntp, with comparable accuracy.)
>> 
>> 
>> Since the client and server are only about 50usec apart (round trip time is
>> about 120usec typically) ntp should be able to much better than 3-4ms
>> control of the clock on the client I am seeing, since the server has typical offsets of
>> 3 or 4 usec.
>> 
>> I am also getting weird behaviour of the round trip time. Both machines are
>> lightly loaded, so it is not a problem internal to the machines (client and
>> server) and both are connected through a single Gigabit switch to each
>> other. While the typical round trip time is .12 ms, overnight (network
>> traffic low on a Fri night) I have gotten about 10 cases where the round
>> trip time is 5-10ms (Ie, 40-80 times longer than typical.) I have long
>> suspected that there is a problem in the switches but other suggestions are
>> welcome.
>> 
>> I would assume that ntp is giving these samples with long round trip very low weight, or even
>> eliminating them.
>> 
>> 

>Could you take a moment to explain what "chrony" is/does?  One gathers 
>that it's some sort of a clock managing/correcting program but one I've 
>never heard of before!

Sorry, Chrony is an ntp server/client written by Richard Curnoe mainly for
Linux. chrony.sunsite.dk 


>Did you have an existing "drift" file when you started ntpd?  A drift 

No. Clearly that was a problem,(the drift of the clock is about -25PPM) But
with no drift file, the system should not be allowing things to to 60ms
out. The offset dropped to -40ms, ntp applied something to try to haul it
back, which overshot to +60ms, and then gradually over the course of a day
came back to an offset around 4ms, and stayed there for many hours. I
decided to make it more similar to chrony and put in a maxpoll of 7,
restaring ntp with the new maxpoll ( and now the drift file left over from
the last running) the offset rapidly dropped to about .4ms, and then over
the next 5 hours came down to somewhere around 20usec, and I began to see
some negative offsets. Ie, it took a day for the system to come back to the
accuracy I had had AND it required a restart of ntp. Had I left it without
restarting I am sure it would still have an offset around 4ms. Ie, getting
rid of that last bit of offset could well have taken weeks. 


>file that's far from the current drift value can mess things up!  If the 
>drift file is stale, it's best to delete it.

>Ntpd generally needs about thirty minutes to beat your clock into 
>submission.  Once synchronized to a stable source it should stay 
>synchronized.  Note that the "stable source" part is important!  A 
>stable source is something like a GPS timing receiver either locally 

The source is a GPS PPP source, with offsets fluctuating around +- 4usec.
It is .12ms round trip away from the machine ( and about 10 feet-- but the
GB switch which does the routing is about 30m in cablelength away from
both)

>connected or connected to a server nearby in net space.  Ntpd tends to 
>"clock hop" among internet servers if that's all it has to work with; 

Single clock. The machine had that gps pps stratum 0 as the Prefered
source. It had another stratum 1 source ( about 5ms round trip away) as a
secondary source, but any time I checked the nearby was the synchronized
source, and the logs never show synchronization with the other source.

>since internet servers seldom agree with each other exactly (most likely 
>the network's fault) what you get is ntpd's best guess as to the correct 
>time.

>Posting your ntpq -p "banner" might tell us something useful.  Just note 
>that ntpq -p is not much use until ntpd has been running for at least 
>thirty minutes.


This is now 24 hours.
     remote           refid      st t when poll reach   delay   offset jitter
==============================================================================
+ntp.ubc.ca      140.142.16.34    2 u  194  256  371    0.673    4.834 1.677
*string.physics. .PPS.            1 u   24  128  377    0.168    0.031 1.827

Note that the accuracy this morning ( 8 hours ago) was 4ms offset and it
had been stuck there for at least 8 hours. It was restarting ntp that
finally got it to get rid of that final bit of offset.

The graph of the offset, the round-trip and the rate since it started in on
www.theory.physics.ubc.ca/chrony/chrony.html
and the computer I am refering to is flory (second last one)
string, the last one, is the PPS server for flory

I restarted ntp at 19.815 UTC day number reducing the maxpoll from 10 to 7.

The other earier graphs are all of machines running chrony, with their
rate, offset and round trip records. Note that flory's roundtrip record has
occasional very large outliers ( 5-10ms), but these are only about 20% of
the events, so the clock filter should get rid of them. After I
reduced the maxpoll to 7, these long delays ceased.

The reason I converted flory from chrony to ntp was to see if it behaved
better than chrony, with the oscillations in the rate and on some of the
machines ( including flory) also having large (1ppm) steps in the rate.








More information about the questions mailing list