[ntp:questions] Re: offset is correlated with delay

Richard B. Gilbert rgilbert88 at comcast.net
Sun Jan 18 13:41:43 UTC 2004

The delay bounds the error; that is we know that the server's timestamps 
occurred between the time the packet left our system and the time it 
returned. There are four different timestamps in there two from our 
clock and two from the server's clock.  Those four timestamps are used 
to determine both the delay and the offset.  See RFC1305 page 100 et. 
seq. for the math.

If I haven't made a stupid mistake somewhere, the transit delays are 
5.4usec per mile plus the packet size in bits divided by the bit rate, 
plus the delays introduced by every network router along the way.   I 
did a traceroute to one of the servers I was using at the time and found 
there were something like seventeen such devices between me and the 
server!  Each one has to decode enough of the packet to find out where 
it's going and then figure out the next step in getting it there.  This 
takes a little time and these times add up.

The distance between you and the server is not necessarily even close to 
the distance shown on a map.  Did you ever fly from New York to Los 
Angeles direct and return with a stop in Dallas-Fort Worth?  Network 
routers will use whatever path is "best" at the moment!  The "best" path 
is not necessarily the shortest.

The network delays usually dominate according to page 101 of RFC1305.  
Remember that the average RISC processor (Alpha, Sparc, etc.) can swamp 
a 100 megabit/second ethernet!  Most WAN connections handle far less 
than that.  T1 is 1 megabit/second and T3 is 45 megabits/second.  Cable 
modem and ADSL handle less than that.  I suspect that "server overload"  
really means "server network overload".

Would anybody running an "overloaded" server care to comment?  Is the 
machine really working up a sweat or is network congestion the problem?

Andrew Schulman wrote:

>When I run 'ntpq -c peers', I've noticed that offset is correlated with
>delay.  Here's a recent set of output:
># ntpq -nc peers
> remote          refid     st t when poll reach   delay   offset  jitter
>+x.x.x.x        x.x.x.x     3 u   9h  36h  377   13.893  -13.756   6.267
>+x.x.x.x        x.x.x.x     2 u   8h  36h  377  201.608   81.170  90.862
>*x.x.x.x        x.x.x.x     2 u   8h  36h  377   23.361  -13.657   5.336
>(server addresses obscured).  This is typical of what I see, in that the
>server with the large delay also shows a large positive offset.
>If I understand correctly (the docs don't ever say, as far as I can find),
>"delay" is the round-trip time (in ms) of the query to the server, and
>"offset" is the estimated difference between my clock and the server's (or
>vice versa-- whichever).
>Here's my hypothesis about how this might happen:
>Delay is the sum of three smaller delays:
>query delay = travel time of my NTP query packet to the server
>server delay = time spent waiting for the server to answer once it's
>received the query
>return delay = travel time back from the server.
>Large delays are usually (according to my hypothesis) caused by server
>overload, so that they mostly consist of server delay.
>Now to estimate the true time based on data from the server, ntpd adds the
>time in the server's response to half of the total delay, which is its best
>available estimate of the return delay.  But half the total delay equals
>(assuming that query delay and return delay are about the same) the return
>delay plus half of the server delay, which is too large.  Hence the offset
>is too large, by about half of the server delay.  
>If my hypothesis is true, then the increase in offset should be about half
>of the increase in delay (server delay).  And that does seem to be the
>case-- it's about right in the example above.
>I know that the real method of estimation is more complicated than this,
>that there's a statistical model involved.  Right now I'm just trying to
>get the basic idea right.  But another possibility is that the statistical
>model introduces this correlation.
>What do you think?  Am I on the right track here?  Is this a well-known
>problem?  Feel free to set me straight if I've completely missed the
>boat :)

More information about the questions mailing list