[ntp:questions] Configuring a cluster: ntpd choosing local clock over server with lower stratum (and cycling between them)

dromedaryl at yahoo.com dromedaryl at yahoo.com
Thu Apr 5 00:28:58 UTC 2007


I'm trying to configure NTP for a cluster. The cluster has a "master"
node and all nodes know which node is master. My NTP configuration is
to have the master node the only node that contacts an external NTP
server, with the other nodes using the master as their time server.

The master node's ntp.conf:

    server 10.54.141.76    # A time server on another subnet
    server 127.127.1.1
    fudge  127.127.1.1 stratum 10
    driftfile /etc/ntp.drift

The non-master nodes' ntp.conf:

    server 192.168.140.40   # The master node
    server 127.127.1.1
    fudge  127.127.1.1 stratum 15
    driftfile /etc/ntp.drift

So, the master node uses an external time server, and can also uses
its internal clock as a stratum 10 time server.

The non-master nodes' use the master node as their time server and can
use their internal clock as a stratum 15 time server.

By setting the master node's system clock to stratum 10, and the non-
master nodes' system clocks to stratum 15, I would expect that the
master node would always be a lower stratum time server than the other
nodes no matter if the master node is able to maintain a connection to
its external time server or not.

This worked as expected most of the time. I used a simple two node
cluster for testing, with both nodes in the same subnet, and on the
same switch.

However, on occasion during the first few hours ntpd was running on
the nodes, the non-master node would use its system clock as the
system peer even though it had a higher stratum level than the master
node. During a course of an hour the non-master node would switch
between using the master node or its system clock as the system peer.

Here's some output from ntpq run on the non-master node (node-2):

chil43-2# ntpq -p
remote        refid      st t when poll reach   delay   offset  jitter
======================================================================
*node-1    10.54.141.76   3 u   92  128  377    0.089  472.785 109.277
 LOCAL(1)  LOCAL(1)      15 l   18   64  377    0.000    0.000   0.002

... And a few minutes later ...

chil43-2# ntpq -p
remote       refid       st t when poll reach   delay   offset  jitter
======================================================================
 node-1    10.54.141.76   3 u   75  256   17    0.085  368.154  35.584
*LOCAL(1)  LOCAL(1)      15 l   11   64   77    0.000    0.000   0.002

Eventually, things would settle down and the master node would remain
the system peer for the non-mater node continuously

A few questions:
1) Why would ntpd choose to use its higher stratum system clock rather
than a lower stratum server and why would ntpd cycle between them?

2) In the non-master node's ntp.conf, should I just remove the listing
of its system clock as a server? Is it completely unnecessary as the
non-master node is only a client?

3) In general, is the way I'm setting up the ntp.confs for the nodes
on the cluster reasonable?

(Note: there's been no master node change during the testing. node-1
has always remained master. And I have scripts that will reconfigure
ntp.conf on each node if the master changes.)

Many thanks.

DD




More information about the questions mailing list