[ntp:questions] Sub-millisecond NTP synchronization for local network

Jeremy Leibs leibs at willowgarage.com
Wed Dec 3 21:39:46 UTC 2008


Willow Garage is designing a robotic research platform and completely
open-source robotic software framework.  We are attempting to use NTP to
handle the task of maintaining synchronization of the clocks within our
system.  Unfortunately, we are having an extremely difficult time finding an
appropriate configuration.  We are looking for someone to help us figure out
the correct NTP configuration for our use case, or determine if NTP is even
capable of doing what we want.

Our configuration is 4 machines connected on a local gigabit network located
on a mobile robotic base.  These machines are subject to frequently being
powered down or restarted.  In order to use the robot, the clocks on these
machines must be self-synchronized to less than 1 millisecond.  Ping times
between machines on this local network vary between 100 microseconds, and
1ms depending on saturation of the network by sensor data streams.

The 4 machines are connected to the rest of the world through a wireless
link.  The delay time on the wireless link is much more variable: in the
range of 2ms to 300ms depending on the quality of the link and the amount of
data going over the wire.  We don't care nearly as much about
synchronization between the robot and the outside world, though it would be
nice to avoid unbounded drift.  A synchronization in the range of 10's of ms
would be acceptable.

Our present configuration is made up of 1 machine syncing to an external
server over the wireless link and acting as a local server for the robot.
The remaining 3 machines then sync to this local server.

Operating under "stable" conditions, this configuration seems to work well
and eventually converges to our sub-millisecond criteria.  However, we have
2 large problems.

1) When the operating conditions suddenly change, the system diverges
dramatically, and sometimes becomes unstable/divergent.  In particular, a
pathological case we have seen is when the wireless link is near saturation
for an extended period of time such as when copying over multi-gigabyte log
files over the course of several hours.  Once the transfer completes and the
wireless link opens up again, the delay time across the wireless link
plummets, the local server immediately diverges from the external server by
around 30 ms.  After this initial divergence, the local server stops
qualifying as a good source of time, and the remaining 3 machines start
drifting apart in independent directions.

2) When the system is in a non-converged state, such as after diverging in
case 1, or on boot, the time it takes for the system to converge is
unacceptably long.  If I disable NTP, and run ntpdate on each of the client
machines, I can synchronize them to within 1 ms, but as soon as I start NTP
again, all of the clocks begin to diverge, often taking hours to re-converge
back to to steady state.

We are looking for a way of configuring the system to be robust to sudden
changes in otherwise stable network latency, and additionally looking for a
way to get the local system to converge to sub-ms offsets on the order of
minutes instead of hours.

Does anyone have suggestions for best practices in configuring an NTP
network for these conditions?

Thanks,
--Jeremy Leibs



More information about the questions mailing list