[ntp:questions] Proposed NTP solution for a network

Jason bmwjason at bmwlt.com
Tue Mar 3 05:27:11 UTC 2009


Below is a description of the environment, and my thoughts on, a 
resilient and precise NTP configuration. All comments, suggestions, etc. 
are welcome, indeed requested. I am not a software type, rather networks 
and hardware, so please consider that with comments and questions.

Here goes:

Three locations: A, B, & C. Locations A and B are datacenters, C is a 
business office with back-office processing and long-term storage.

A and B are within 10-15 miles of each other near NYC, and C is about 
1200 miles from A and B.

All three sites are interconnected in a mesh IP network with dual OC-3 
connections from each site -- the network is highly resilient although 
perhaps not as fast as we might like. A and B additionally have a GigE 
connection between them for host-host communication, database 
updates/backups, command & control, etc.

Locations A and B have a large community of Suse 10.x Enterprise 
servers, each with very stringent requirements to have time be very 
closely "in sync" with each other at that site, as well as at the other 
site. Absolute accuracy (i.e. "true time") is not as important as 
"precision" (that is, all the hosts should be within a few 10s of 
microseconds, but they could be as much as a small hundreds of 
microseconds off of UTC).

Steping for time adjustment during prime hours (0700 - 2000) would be 
very very bad for the transaction record (transactions are very time 
sensitive). Less sensitive between 2000-0700.

Each client at A and B has multiple GigE connections to the LANs.

The timestamps on transactions should be traceable (i.e. we may need to 
provide to regulators information on the source, accuracy, and precision 
of the timestamp of any transaction).

Each of A, B, and C have a dedicated NTP appliance (same make and model, 
with differing manufacturing dates -- I have since learned that maybe we 
should mix  up the make/model, but "one thing at a time"), with 
integrated GPS receiver and antenna on the roof. Each site also has 
access to the Internet.

Note that each NTP appliance can output PPS, but the hosts have no 
method to receive the PPS (blade servers in an enclosure, and all 
available expansion slots on each individual blade are in use). In 
addition, there is no provision on the enclosure to accept a PPS or 
other time source for distribution to the individual blades using a 
backplane mechanisim.

Current configuration has all the A and B clients synchronizing with the 
NTP appliance at B. The NTP appliance at A has suffered an antenna 
fault, which is being repaired, but even after it is back on-line, the 
software group wants all hosts to sync to a single NTP appliance. The 
NTP appliance at C is new and not yet integrated to the solution -- part 
of the reason for this message.

 From reading this newsgroup, the wiki 
(http://www.ntp.org/ntpfaq/NTP-a-faq.htm) and of course 
http://www.ntp.org/, this is what I think the hardware configuration 
should be:

1. Reference clocks: GPS receivers in the NTP appliance are Stratum 0.

2. Stratum 1 level: Each Appliance has an output at Stratum 1 via the 
Ethernet connection. Each appliance should be a peer to the other 
appliances (symetric active/passive) as discussed at 
http://www.eecis.udel.edu/~mills/ntp/html/assoc.html#symact. This would 
enable the appliance to lose the reference source and still be useful to 
the Stratum 2 servers that are clients of these appliances.

3. Stratum 2: One server at each of the three locations, each 
referenceing each of the three NTP appliances. Each would also peer with 
the other two servers. This will enable the datacenters to keep the 
local hosts synchronized even if the other sites are unreachable (the 
servers at A can continue to process transactions even without 
connectivity to B and C, for example).

4. Clients: All clients at location A would sync to the local server 
(prefer) and to the server at location B. All clients at B would sync to 
the A server (prefer) and to the local server at B. All clients at 
location C would sync to their local server (prefer) and to the server 
at location A. Thus each client would have a choice of two Stratum 2 
servers, each of which is trusted and peered with one-another. In 
addition, this makes the clients at A and B likely, although not 
guaranteed, to use the same server for their time.

Several questions:

A. Is the above architecture fitting with best practices? Suggestions 
for improvement? It seems to fit with Section 6.2.1.3 at 
http://www.ntp.org/ntpfaq/NTP-s-config-adv.htm.

B. I'm unclear where, or if, "orphan" mode should be used on the 
servers. Should it be configured at all? What will be the advantage 
either way? Oh, some more research 
(http://support.ntp.org/bin/view/Support/OrphanMode) shows that orphan 
mode is not available in the version we are running.
(
ntpdc> version
ntpdc 4.2.0a at 1.1196-r Thu Jun 29 17:48:04 UTC 2006 (1)
ntpdc>
)
Is the use of orphan mode advantageous enough to update the NTPd on 200+ 
hosts?

C. This configuration cannot get past the "survivor" problem where, with 
three servers, if one fails then the other two cannot find a majority 
(see http://www.ntp.org/ntpfaq/NTP-s-algo-real.htm, section 5.3.2). So 
that leads to either trusting an Internet host, adding another receiver, 
or using a source at an interconnected sister-company in Europe. So, 
should the servers also have a trusted Internet-based time source? The 
nature of our business makes the Internet inherently un-trusted for a 
number of reasons, and having traceable time sources is one of them. 
Recommendations?

D. Does it make sense, because the time precision is so important, to 
use servers for the Stratum 2 level that are un-encumbered by other 
processes? Or should one of the existing 8-core blades be sufficient, 
perhaps with using processor affinity for the NTP process?

E. The "precision" requirement leads me to think that I need all clients 
at a site to be receiving time from the _same_ server, whether that is 
the local server or not. How to ensure this requirement is met?

F. I'm sure I'm forgetting some questions, and perhaps need more 
education about this, please help me to understand.

G. What have I missed, or gotten confused?

Oh, and I have Dr. Mills book on order, it should arrive in a few days.

Thanks,

Jason.




More information about the questions mailing list