[ntp:questions] question regarding NTP configuration for clusters, and "cluster time" stability

Unruh unruh-spam at physics.ubc.ca
Mon Sep 14 21:42:08 UTC 2009


"rotordyn at yahoo.com" <rotordyn1 at gmail.com> writes:

a) Get one or more Garmin 18xLVC gps receivers and set them up on a few
of your nodes. their time will then be within a few usec of UTC. Use
those nodes as the servers to the rest of your network.

You do need a view of the sky.

b) Set up one node to be the master and have it sync to the outside
world. One node can never disagree with itself. On the other had, one
node could die (eg due to someone tripping over it and pulling the
plug).

c) Have an external program keep track of the unity of your servers, and
send you a warning if they disagree by more than a few ms with respect
to each other. 



>I have a question that seems somewhat similiar to one that was just
>asked,
>but there are a couple of differences, so I figured I'd ask mine as
>well.
>Apologies for the long post, but I'm trying to skip the "more info
>please"
>phase. :-)

>I have a product that is comprised of a cluster of Linux nodes, with
>the
>cluster ranging in size from 4 to over 100 nodes. To date, we've used
>the version of NTP included in the OS (SLES 10) to maintain internal
>time synchronization in the cluster, but without associations to any
>external NTP servers nor any hardware based time sources. While
>this has worked satisfactorily, it does allow for a gradual drift
>from
>UTC over time, so we'd like to extend the product to eliminate this.

>What this means in terms of requirements is that we still must
>maintain a stabile internal "cluster time" with sub-second tolerance.
>This should be trivial for NTP to maintain, as that is a rather loose
>tolerance compared to many others I've seen discussed. The requirement
>to match true UTC is even looser, as all we're trying to do is enable
>the use of an external reference to stop what can be a perpetual
>drift.
>Just to give it a number though, let's say we'd like it to be within
>60 seconds of UTC.

>The topology of our cluster has two tiers. All of the nodes are
>interconnected
>over a private network, and some subset of the nodes also have
>external
>connections to the LAN where it is deployed. The subset is always at
>least 2 nodes, and can be as high as 25% of the total number of nodes.

>Prior to extending the product to allow use of an external (to the
>cluster)
>NTP server or servers, those nodes with external connections were
>configured as peer servers to the internal cluster, with all other
>nodes
>pure clients.

>After adding support for external NTP servers, we kept something like
>the same config: The nodes with external connections were still
>servers to the internal network, and were peers of each other. But now
>they were also clients of one or more external servers. I understand
>that
>requiring three or more would be better, and we can do that, but we
>still
>have to ensure stability of the internal cluster time even if a
>reduced set
>of servers (including the null set) were reachable.

>Our configuration did not work, because we were able to cause
>instability
>in the internal cluster time with perturbations in the external
>server. And we
>have to guarantee stabililty even with bad inputs.

>What happened was that some (but not all) of those externally
>connected
>nodes deemed the external server a false ticker, and stopped believing
>it.
>But some of the other externally connected nodes did not, and as a
>result
>there was time divergence between members of this group. It is this
>divergence
>that I'm referring to when I speak of a lack of stability.

>So before I go into configuration details, is there a known "best way"
>to
>handle the sort of requirements I described? It sounds like orphan
>mode
>might provide functionality I'm looking for, but I figured in parallel
>with
>emperical experimentation, I'd pursue the analytical approach and ask
>people who know more than me. :)

>thanks,
>Tim




More information about the questions mailing list