[ntp:questions] NTP seems unsuitable for this application... what do you think?
sealj at indy.raytheon.com
Wed Dec 1 21:48:35 UTC 2004
We have a networked system with the following characteristics:
- Not on the internet.
- Several dozen hosts, mostly Solaris 2.6 and Solaris 9.
- Normally OFF, turned ON only during use for 8-10 hours at a time.
- Hosts booted in random order as needed, sometimes not all of them.
- Clock batteries cannot be easily replaced, so they're often dead.
- Time (but not date) is available, ultimately from GPS, I think.
The last two points bear a little elaboration. It is very difficult to
replace the clock batteries in this system, so many of them are dead at
any given time. Some hosts boot with approximately the correct time;
some with the time they were last shutdown, and some at the epoch. Two
of the hosts are special in that they have a connection to a source of
GPS data, and there is a program that reads the time (but not the date)
and keeps the local processor clock synced to it. The GPS connection is
not any standard connection that could be used directly by NTP.
As part of the login process, the user is offered a date/time window
which can be used to accept or modify the current date and time. So, if
one of the special GPS hosts comes up at the epoch, for example, then
GPS will set the right time but the date will be 1/1/1970. When the
user logs in, the date and time will be set to whatever "wristwatch
time" the user enters, on both the local host and the one special GPS
hosts. We assume the user will enter the correct date; if the time is a
little off, GPS will quickly correct that, and the one special GPS host
will now have the correct date and time.
The local host where the user logged in then does a one-time sync to the
one special GPS host, so now the local date and time are correct as
well. No additional syncing takes place; the processor clocks drift
from that point on. This happens for each host that logs in.
Bottom line: there is a complex boot/login dance using rdate and custom
programs that ensures that hosts start out synced to the one special GPS
host at the beginning, but then they are free to drift until the system
There is another special GPS host, but its local clock is not normally
kept synced to GPS time. It is for backup use, and the GPS sync
functionality must be manually started when required.
We thought NTP might help. It was already installed, and was fairly
easy to configure, but didn't really work like we expected. I mainly
referred to the Sun Blueprint publication "Using NTP to Control and
Synchronize System Clocks - Part II: Basic NTP Administration and
Architecture" by David Deeths and Glenn Brunnette.
We configured both of the special GPS hosts as servers, with their local
processor clocks as the reference (127.127.1.0), as peers of each other,
and using authentication keys. Even though ntptrace and ntpq showed
pretty much what I expected, I consistently saw 3-5 seconds difference
in their times. The clients were configured to use both servers, but to
prefer the main one. Interestingly, it didn't seem to matter whether
the clients were configured to use authentication keys or not.
I say ntpq showed "pretty much" what I expected, because sometimes it
showed a peer as unreacheable even though it should have been, or a
peer's clock as "insane" when it seemed reasonable. The clients did
switch as servers were stopped and started, but it took a while, like on
the order of 5 minutes. That was one of our concerns, that NTP seemed
to take a long time to do anything, and I couldn't find firm answers to
how it handled large initial differences, when it stepped vs. slewed
("ntpdate -b" excepted), and how long a slewing correction took.
So, here are a few questions I have:
- Why weren't the two peer servers synced closer than a few seconds?
- If a server started out at the epoch, then changed to the right time
on the wrong date, then changed to the right date and time, how would
NTP react? In other words...
- Starting from a situation with the servers and clients in sync, would
large time changes on the server be propagated to the clients?
- Why didn't it seem to make any difference whether the clients used
authentication keys or not?
Our decision, as of this morning, is that NTP really isn't suitable to a
system like this that's not ON for long periods of time, not on the
internet, has hosts that boot with wildly different local times, and
lacks direct connection to a GPS. What do you think?
More information about the questions