[ntp:questions] Proposed NTP solution for a network
Richard B. Gilbert
rgilbert88 at comcast.net
Tue Mar 3 12:42:39 UTC 2009
> Below is a description of the environment, and my thoughts on, a
> resilient and precise NTP configuration. All comments, suggestions, etc.
> are welcome, indeed requested. I am not a software type, rather networks
> and hardware, so please consider that with comments and questions.
> Here goes:
> Three locations: A, B, & C. Locations A and B are datacenters, C is a
> business office with back-office processing and long-term storage.
> A and B are within 10-15 miles of each other near NYC, and C is about
> 1200 miles from A and B.
> All three sites are interconnected in a mesh IP network with dual OC-3
> connections from each site -- the network is highly resilient although
> perhaps not as fast as we might like. A and B additionally have a GigE
> connection between them for host-host communication, database
> updates/backups, command & control, etc.
> Locations A and B have a large community of Suse 10.x Enterprise
> servers, each with very stringent requirements to have time be very
> closely "in sync" with each other at that site, as well as at the other
> site. Absolute accuracy (i.e. "true time") is not as important as
> "precision" (that is, all the hosts should be within a few 10s of
> microseconds, but they could be as much as a small hundreds of
> microseconds off of UTC).
> Steping for time adjustment during prime hours (0700 - 2000) would be
> very very bad for the transaction record (transactions are very time
> sensitive). Less sensitive between 2000-0700.
> Each client at A and B has multiple GigE connections to the LANs.
> The timestamps on transactions should be traceable (i.e. we may need to
> provide to regulators information on the source, accuracy, and precision
> of the timestamp of any transaction).
> Each of A, B, and C have a dedicated NTP appliance (same make and model,
> with differing manufacturing dates -- I have since learned that maybe we
> should mix up the make/model, but "one thing at a time"), with
> integrated GPS receiver and antenna on the roof. Each site also has
> access to the Internet.
> Note that each NTP appliance can output PPS, but the hosts have no
> method to receive the PPS (blade servers in an enclosure, and all
> available expansion slots on each individual blade are in use). In
> addition, there is no provision on the enclosure to accept a PPS or
> other time source for distribution to the individual blades using a
> backplane mechanisim.
A very poor configuration if accuracy is wanted. Typically, one "edge",
leading or trailing, of the PPS output is within 50 to 100 nanoseconds
of the "top of the second"! The serial output tells you the time value
of the PPS "edge".
"Precision" tells you "how fine you can slice it"; e.g. tens of
milliseconds, milliseconds, hundreds of microseconds, etc. Accuracy is
the difference between your clock and the master clock at the National
Institute of Standards and Technology.
Using the serial output alone introduces some uncertainty in the time
value. Read the instructions for your appliance CAREFULLY.
> Current configuration has all the A and B clients synchronizing with the
> NTP appliance at B. The NTP appliance at A has suffered an antenna
> fault, which is being repaired, but even after it is back on-line, the
> software group wants all hosts to sync to a single NTP appliance. The
> NTP appliance at C is new and not yet integrated to the solution -- part
> of the reason for this message.
A bad idea! When that appliance fails, as it inevitably will, you will
be in the world of hurt!
Let's imagine ten or twenty years from now; your "appliance" has just
emitted a cloud of evil smelling black smoke and ceased operation.
What time is it? You'd better hope your wristwatch is accurate!
> From reading this newsgroup, the wiki
> (http://www.ntp.org/ntpfaq/NTP-a-faq.htm) and of course
> http://www.ntp.org/, this is what I think the hardware configuration
> should be:
> 1. Reference clocks: GPS receivers in the NTP appliance are Stratum 0.
> 2. Stratum 1 level: Each Appliance has an output at Stratum 1 via the
> Ethernet connection. Each appliance should be a peer to the other
> appliances (symetric active/passive) as discussed at
> http://www.eecis.udel.edu/~mills/ntp/html/assoc.html#symact. This would
> enable the appliance to lose the reference source and still be useful to
> the Stratum 2 servers that are clients of these appliances.
Suppose you lose your "reference source" or your connection to it?
> 3. Stratum 2: One server at each of the three locations, each
> referenceing each of the three NTP appliances. Each would also peer with
> the other two servers. This will enable the datacenters to keep the
> local hosts synchronized even if the other sites are unreachable (the
> servers at A can continue to process transactions even without
> connectivity to B and C, for example).
> 4. Clients: All clients at location A would sync to the local server
> (prefer) and to the server at location B. All clients at B would sync to
> the A server (prefer) and to the local server at B. All clients at
> location C would sync to their local server (prefer) and to the server
> at location A. Thus each client would have a choice of two Stratum 2
> servers, each of which is trusted and peered with one-another. In
> addition, this makes the clients at A and B likely, although not
> guaranteed, to use the same server for their time.
> Several questions:
> A. Is the above architecture fitting with best practices? Suggestions
> for improvement? It seems to fit with Section 22.214.171.124 at
> B. I'm unclear where, or if, "orphan" mode should be used on the
> servers. Should it be configured at all? What will be the advantage
> either way? Oh, some more research
> (http://support.ntp.org/bin/view/Support/OrphanMode) shows that orphan
> mode is not available in the version we are running.
> ntpdc> version
> ntpdc 4.2.0a at 1.1196-r Thu Jun 29 17:48:04 UTC 2006 (1)
> Is the use of orphan mode advantageous enough to update the NTPd on 200+
Orphan mode is for the situation where you lose your external source(s),
or where you never had such a source. Some shops are not allowed to
connect to the internet, have no GPS, WWV or WWVB receiver. . . .
> C. This configuration cannot get past the "survivor" problem where, with
> three servers, if one fails then the other two cannot find a majority
> (see http://www.ntp.org/ntpfaq/NTP-s-algo-real.htm, section 5.3.2). So
> that leads to either trusting an Internet host, adding another receiver,
> or using a source at an interconnected sister-company in Europe. So,
> should the servers also have a trusted Internet-based time source? The
> nature of our business makes the Internet inherently un-trusted for a
> number of reasons, and having traceable time sources is one of them.
Outside time sources can use cryptographic authentication; the otherwise
unencrypted packet contains an encrypted signature that assures you that
it could have been sent only by a holder of the keys.
> D. Does it make sense, because the time precision is so important, to
> use servers for the Stratum 2 level that are un-encumbered by other
> processes? Or should one of the existing 8-core blades be sufficient,
> perhaps with using processor affinity for the NTP process?
NTP is not terribly demanding! You could run it perfectly well on an
old 486/33 if the last one hadn't been consigned to a museum years ago!
> E. The "precision" requirement leads me to think that I need all clients
> at a site to be receiving time from the _same_ server, whether that is
> the local server or not. How to ensure this requirement is met?
Accuracy!!! Things equal to the same thing are equal to each other. In
principle, all the atomic clocks at NIST and at national standards
laboratories around the world agree on the time to within a few
nanoseconds or maybe better. You can't hope to get that accuracy over
the internet. You can, however, keep a small herd of servers marching
to that drumbeat even if it differs by five or ten milliseconds from the
"One True Time"!
More information about the questions