[ntp:questions] NTP in a Linux cluster

Lorcan lorcan.hamill at gmail.com
Tue Sep 8 12:45:19 UTC 2009


Many thanks to everyone who replied!  I'm slowly learning
my way around this stuff, and your help is much appreciated.

In my original posting, I should probably have provided a litlle more
information about the environment in which our systems are typically
deployed.  Our customers are telcos,  which has some interesting
implications.

For one thing, they take the physical security of their systems very
seriously.  The cluster nodes will quite possibly be buried in a
bunker somewhere, with little hope of picking up a GPS signal.
Also, I'm hoping to avoid extra hardware - not really on cost grounds,
but because I would hope we could retro-fit the new solution to
existing sites;  tweaking config is one thing, installing new
hardware,
cabling etc. is quite another.

On the plus side, our systems typically are hooked into a large
corporate LAN that will already have an in-house NTP server
(maybe more than one) that we can  slave off.  We already do
that in fact, but with less-than-optimal config, I think...

So, I'm coming to the idea that what I'm really missing may be
appropriate use of the "peer" option in /etc/ntp.conf.

Could anyone explain exactly how peers (as distinct from servers)
are used?  If two clients sync off the same server, and manage to
get to within (say) 10mS of the server, I would expect that they
could be as much as 20mS apart (if one is at +10ms, and one
at -10ms).  Would making them peers mean that they should
remain within 10ms of each other?

I'm leaning towards having a common ntp.conf file on all nodes.
On a four-node cluster, with a single ntp server, it would look like
this:

==== start

driftfile /var/lib/ntp/drift

server the_ntp_server  minpoll 4 maxpoll 6

peer node1
peer node2
peer node3
peer node4

==== end

Is that a reasonable starting point?

Can the "peer" entries have minpoll and maxpoll?  If so, is there
any reason not to set those to low values (4? 6?)?

(The nodes have multiple NICs; the inter-node traffic is carried
on a separate "internal" LAN, which has low latency and should
have bandwidth to spare.)

Also, I'll use the "-x" command-line flag to enforce "slewing only"
behaviour.   (If the clock on a single node is way out of sync
for some reason we can shut it down and "jump" the clock when
restarting it.)

If there are multiple NTP servers available on the LAN, is it possible
that our particular requirements (inter-node synchronisation being
more important than accurate timekeeping) means that we're better
off just using one of them?  Or should we use several of them,
and use the "prefer" option to encourage all nodes to attach more
significance to the same one?

I'm worried that with multiple servers, different nodes may end up
more influenced by different servers.  I take Dave's point about
the weighted average of all survivors, but I wonder what happens
when new nodes are added, or nodes are rebooted at different
times, etc...  is it possible that different nodes, which have been
monitoring the same servers but over different time periods, would
draw different conclusions about the servers, and therefore drift
apart?

As before, any insights would be welcome.   I'm about to start
testing some of these ideas on a test cluster; I'll post any
interesting
results that I get.

Regards,
Lorcan Hamill




More information about the questions mailing list