[ntp:questions] NTP seems unsuitable for this application... what do you think?

John Seal sealj at indy.raytheon.com
Wed Dec 1 21:48:35 UTC 2004


We have a networked system with the following characteristics:

- Not on the internet.
- Several dozen hosts, mostly Solaris 2.6 and Solaris 9.
- Normally OFF, turned ON only during use for 8-10 hours at a time.
- Hosts booted in random order as needed, sometimes not all of them.
- Clock batteries cannot be easily replaced, so they're often dead.
- Time (but not date) is available, ultimately from GPS, I think.

The last two points bear a little elaboration.  It is very difficult to 
replace the clock batteries in this system, so many of them are dead at 
any given time.  Some hosts boot with approximately the correct time; 
some with the time they were last shutdown, and some at the epoch.  Two 
of the hosts are special in that they have a connection to a source of 
GPS data, and there is a program that reads the time (but not the date) 
and keeps the local processor clock synced to it.  The GPS connection is 
not any standard connection that could be used directly by NTP.

As part of the login process, the user is offered a date/time window 
which can be used to accept or modify the current date and time.  So, if 
one of the special GPS hosts comes up at the epoch, for example, then 
GPS will set the right time but the date will be 1/1/1970.  When the 
user logs in, the date and time will be set to whatever "wristwatch 
time" the user enters, on both the local host and the one special GPS 
hosts.  We assume the user will enter the correct date; if the time is a 
little off, GPS will quickly correct that, and the one special GPS host 
will now have the correct date and time.

The local host where the user logged in then does a one-time sync to the 
one special GPS host, so now the local date and time are correct as 
well.  No additional syncing takes place; the processor clocks drift 
from that point on.  This happens for each host that logs in.

Bottom line: there is a complex boot/login dance using rdate and custom 
programs that ensures that hosts start out synced to the one special GPS 
host at the beginning, but then they are free to drift until the system 
is shutdown.

There is another special GPS host, but its local clock is not normally 
kept synced to GPS time.  It is for backup use, and the GPS sync 
functionality must be manually started when required.

We thought NTP might help.  It was already installed, and was fairly 
easy to configure, but didn't really work like we expected.  I mainly 
referred to the Sun Blueprint publication "Using NTP to Control and 
Synchronize System Clocks - Part II: Basic NTP Administration and
Architecture" by David Deeths and Glenn Brunnette.

We configured both of the special GPS hosts as servers, with their local 
processor clocks as the reference (127.127.1.0), as peers of each other, 
and using authentication keys.  Even though ntptrace and ntpq showed 
pretty much what I expected, I consistently saw 3-5 seconds difference 
in their times.  The clients were configured to use both servers, but to 
prefer the main one.  Interestingly, it didn't seem to matter whether 
the clients were configured to use authentication keys or not.

I say ntpq showed "pretty much" what I expected, because sometimes it 
showed a peer as unreacheable even though it should have been, or a 
peer's clock as "insane" when it seemed reasonable.  The clients did 
switch as servers were stopped and started, but it took a while, like on 
the order of 5 minutes.  That was one of our concerns, that NTP seemed 
to take a long time to do anything, and I couldn't find firm answers to 
how it handled large initial differences, when it stepped vs. slewed 
("ntpdate -b" excepted), and how long a slewing correction took.

So, here are a few questions I have:

- Why weren't the two peer servers synced closer than a few seconds?

- If a server started out at the epoch, then changed to the right time 
on the wrong date, then changed to the right date and time, how would 
NTP react?  In other words...

- Starting from a situation with the servers and clients in sync, would 
large time changes on the server be propagated to the clients?

- Why didn't it seem to make any difference whether the clients used 
authentication keys or not?

Our decision, as of this morning, is that NTP really isn't suitable to a 
system like this that's not ON for long periods of time, not on the 
internet, has hosts that boot with wildly different local times, and 
lacks direct connection to a GPS.  What do you think?



More information about the questions mailing list