[ntp:questions] Configuring a cluster: ntpd choosing local clock over server with lower stratum (and cycling between them)
Tom Smith
smith at cag.zko.hp.com
Thu Apr 5 02:18:32 UTC 2007
dromedaryl at yahoo.com wrote:
> I'm trying to configure NTP for a cluster. The cluster has a "master"
> node and all nodes know which node is master. My NTP configuration is
> to have the master node the only node that contacts an external NTP
> server, with the other nodes using the master as their time server.
>
> The master node's ntp.conf:
>
> server 10.54.141.76 # A time server on another subnet
> server 127.127.1.1
> fudge 127.127.1.1 stratum 10
> driftfile /etc/ntp.drift
>
> The non-master nodes' ntp.conf:
>
> server 192.168.140.40 # The master node
> server 127.127.1.1
> fudge 127.127.1.1 stratum 15
> driftfile /etc/ntp.drift
>
> So, the master node uses an external time server, and can also uses
> its internal clock as a stratum 10 time server.
>
> The non-master nodes' use the master node as their time server and can
> use their internal clock as a stratum 15 time server.
>
> By setting the master node's system clock to stratum 10, and the non-
> master nodes' system clocks to stratum 15, I would expect that the
> master node would always be a lower stratum time server than the other
> nodes no matter if the master node is able to maintain a connection to
> its external time server or not.
>
> This worked as expected most of the time. I used a simple two node
> cluster for testing, with both nodes in the same subnet, and on the
> same switch.
>
> However, on occasion during the first few hours ntpd was running on
> the nodes, the non-master node would use its system clock as the
> system peer even though it had a higher stratum level than the master
> node. During a course of an hour the non-master node would switch
> between using the master node or its system clock as the system peer.
>
> Here's some output from ntpq run on the non-master node (node-2):
>
> chil43-2# ntpq -p
> remote refid st t when poll reach delay offset jitter
> ======================================================================
> *node-1 10.54.141.76 3 u 92 128 377 0.089 472.785 109.277
> LOCAL(1) LOCAL(1) 15 l 18 64 377 0.000 0.000 0.002
>
> ... And a few minutes later ...
>
> chil43-2# ntpq -p
> remote refid st t when poll reach delay offset jitter
> ======================================================================
> node-1 10.54.141.76 3 u 75 256 17 0.085 368.154 35.584
> *LOCAL(1) LOCAL(1) 15 l 11 64 77 0.000 0.000 0.002
>
> Eventually, things would settle down and the master node would remain
> the system peer for the non-mater node continuously
>
> A few questions:
> 1) Why would ntpd choose to use its higher stratum system clock rather
> than a lower stratum server and why would ntpd cycle between them?
Because stratum is a relatively low priority criterion. Note, for example,
the far lower jitter on your local clock, the smaller delay, smaller offset,
etc.
>
> 2) In the non-master node's ntp.conf, should I just remove the listing
> of its system clock as a server? Is it completely unnecessary as the
> non-master node is only a client?
Yes.
>
> 3) In general, is the way I'm setting up the ntp.confs for the nodes
> on the cluster reasonable?
No.
>
> (Note: there's been no master node change during the testing. node-1
> has always remained master. And I have scripts that will reconfigure
> ntp.conf on each node if the master changes.)
>
> Many thanks.
>
> DD
>
Your problem occurs becuase you have only 2 servers and one of them is
phoney. How would it know which one to believe - a correct local clock
and a misbehaving remote one or a correct remote one and a wrong local
one?
Either get rid of the local undisciplined clock or add at least
2 more real ones (which is recommended practice anyway).
-Tom
More information about the questions
mailing list