[ntp:questions] Configuring a cluster: ntpd choosing local clock over server with lower stratum (and cycling between them)
dromedaryl at yahoo.com
dromedaryl at yahoo.com
Thu Apr 5 00:28:58 UTC 2007
I'm trying to configure NTP for a cluster. The cluster has a "master"
node and all nodes know which node is master. My NTP configuration is
to have the master node the only node that contacts an external NTP
server, with the other nodes using the master as their time server.
The master node's ntp.conf:
server 10.54.141.76 # A time server on another subnet
server 127.127.1.1
fudge 127.127.1.1 stratum 10
driftfile /etc/ntp.drift
The non-master nodes' ntp.conf:
server 192.168.140.40 # The master node
server 127.127.1.1
fudge 127.127.1.1 stratum 15
driftfile /etc/ntp.drift
So, the master node uses an external time server, and can also uses
its internal clock as a stratum 10 time server.
The non-master nodes' use the master node as their time server and can
use their internal clock as a stratum 15 time server.
By setting the master node's system clock to stratum 10, and the non-
master nodes' system clocks to stratum 15, I would expect that the
master node would always be a lower stratum time server than the other
nodes no matter if the master node is able to maintain a connection to
its external time server or not.
This worked as expected most of the time. I used a simple two node
cluster for testing, with both nodes in the same subnet, and on the
same switch.
However, on occasion during the first few hours ntpd was running on
the nodes, the non-master node would use its system clock as the
system peer even though it had a higher stratum level than the master
node. During a course of an hour the non-master node would switch
between using the master node or its system clock as the system peer.
Here's some output from ntpq run on the non-master node (node-2):
chil43-2# ntpq -p
remote refid st t when poll reach delay offset jitter
======================================================================
*node-1 10.54.141.76 3 u 92 128 377 0.089 472.785 109.277
LOCAL(1) LOCAL(1) 15 l 18 64 377 0.000 0.000 0.002
... And a few minutes later ...
chil43-2# ntpq -p
remote refid st t when poll reach delay offset jitter
======================================================================
node-1 10.54.141.76 3 u 75 256 17 0.085 368.154 35.584
*LOCAL(1) LOCAL(1) 15 l 11 64 77 0.000 0.000 0.002
Eventually, things would settle down and the master node would remain
the system peer for the non-mater node continuously
A few questions:
1) Why would ntpd choose to use its higher stratum system clock rather
than a lower stratum server and why would ntpd cycle between them?
2) In the non-master node's ntp.conf, should I just remove the listing
of its system clock as a server? Is it completely unnecessary as the
non-master node is only a client?
3) In general, is the way I'm setting up the ntp.confs for the nodes
on the cluster reasonable?
(Note: there's been no master node change during the testing. node-1
has always remained master. And I have scripts that will reconfigure
ntp.conf on each node if the master changes.)
Many thanks.
DD
More information about the questions
mailing list