[ntp:questions] Re: NTP seems unsuitable for this application... what do you think?
jdc at smof.fiawol.org
Fri Dec 3 14:20:01 UTC 2004
In article <N0rrd.7$%55.0 at dfw-service2.ext.ray.com>,
John Seal <sealj at indy.raytheon.com> wrote:
>We have a networked system with the following characteristics:
>- Not on the internet.
>- Several dozen hosts, mostly Solaris 2.6 and Solaris 9.
>- Normally OFF, turned ON only during use for 8-10 hours at a time.
>- Hosts booted in random order as needed, sometimes not all of them.
>- Clock batteries cannot be easily replaced, so they're often dead.
>- Time (but not date) is available, ultimately from GPS, I think.
>The last two points bear a little elaboration. It is very difficult to
>replace the clock batteries in this system, so many of them are dead at
>any given time. Some hosts boot with approximately the correct time;
>some with the time they were last shutdown, and some at the epoch. Two
>of the hosts are special in that they have a connection to a source of
>GPS data, and there is a program that reads the time (but not the date)
>and keeps the local processor clock synced to it. The GPS connection is
>not any standard connection that could be used directly by NTP.
>As part of the login process, the user is offered a date/time window
>which can be used to accept or modify the current date and time. So, if
>one of the special GPS hosts comes up at the epoch, for example, then
>GPS will set the right time but the date will be 1/1/1970. When the
>user logs in, the date and time will be set to whatever "wristwatch
>time" the user enters, on both the local host and the one special GPS
>hosts. We assume the user will enter the correct date; if the time is a
>little off, GPS will quickly correct that, and the one special GPS host
>will now have the correct date and time.
>The local host where the user logged in then does a one-time sync to the
>one special GPS host, so now the local date and time are correct as
>well. No additional syncing takes place; the processor clocks drift
>from that point on. This happens for each host that logs in.
>Bottom line: there is a complex boot/login dance using rdate and custom
>programs that ensures that hosts start out synced to the one special GPS
>host at the beginning, but then they are free to drift until the system
>There is another special GPS host, but its local clock is not normally
>kept synced to GPS time. It is for backup use, and the GPS sync
>functionality must be manually started when required.
>We thought NTP might help. It was already installed, and was fairly
>easy to configure, but didn't really work like we expected. I mainly
>referred to the Sun Blueprint publication "Using NTP to Control and
>Synchronize System Clocks - Part II: Basic NTP Administration and
>Architecture" by David Deeths and Glenn Brunnette.
>We configured both of the special GPS hosts as servers, with their local
>processor clocks as the reference (127.127.1.0), as peers of each other,
>and using authentication keys. Even though ntptrace and ntpq showed
>pretty much what I expected, I consistently saw 3-5 seconds difference
>in their times. The clients were configured to use both servers, but to
>prefer the main one. Interestingly, it didn't seem to matter whether
>the clients were configured to use authentication keys or not.
>I say ntpq showed "pretty much" what I expected, because sometimes it
>showed a peer as unreacheable even though it should have been, or a
>peer's clock as "insane" when it seemed reasonable. The clients did
>switch as servers were stopped and started, but it took a while, like on
>the order of 5 minutes. That was one of our concerns, that NTP seemed
>to take a long time to do anything, and I couldn't find firm answers to
>how it handled large initial differences, when it stepped vs. slewed
>("ntpdate -b" excepted), and how long a slewing correction took.
>So, here are a few questions I have:
>- Why weren't the two peer servers synced closer than a few seconds?
>- If a server started out at the epoch, then changed to the right time
>on the wrong date, then changed to the right date and time, how would
>NTP react? In other words...
>- Starting from a situation with the servers and clients in sync, would
>large time changes on the server be propagated to the clients?
>- Why didn't it seem to make any difference whether the clients used
>authentication keys or not?
>Our decision, as of this morning, is that NTP really isn't suitable to a
>system like this that's not ON for long periods of time, not on the
>internet, has hosts that boot with wildly different local times, and
>lacks direct connection to a GPS. What do you think?
Looking at all the postings in this thread I'd suggest you do the following.
1. Have your "primary" GPS host setup stand alone. Don't bother to peer it
with anything else.
2. Have your "backup" GPS host setup to sync to the primary host. Fudge the
backup GPS host to stratum 4. This will cause it to prefer the primary
GPS host as it's time source and fall back to the local clock if the
primary is down.
3. Have all your client hosts sync to both the primary and backup GPS hosts.
don't bother setting a prefer for any of them. Let them select the one
they want. You may want the boot sequence issue a ntpdate command pointing
to both the primary and backup GPS hosts as well as as many clients as you
can reasonably use. The effect of the ntpdate command will be to set the
clock to something reasonably close to the correct time if anything listed
in the ntpdate command is currently syncronized.
NOTE; The key is fudging the backup GPS host to a lower stratum than the
primary GPS host. Without this fudging, the backup GPS host will consider
itself to be as good as the primary and without a 3rd clock to examine
never attempt to syncronize to the primary.
The ntpdate command on boot may point to any or all of the computers
on your network be they a GPS host or another client. The main
disadvantage of using a large number of computers is the long delay as
each one times out. The advantage is that if any of the computers on
your network is already up and syncronized, then the newly booted
computer will also come up with close to the correct time.
More information about the questions