[ntp:questions] Configuring a cluster: ntpd choosing local clock over server with lower stratum (and cycling between them)

Tom Smith smith at cag.zko.hp.com
Thu Apr 5 02:18:32 UTC 2007


dromedaryl at yahoo.com wrote:
> I'm trying to configure NTP for a cluster. The cluster has a "master"
> node and all nodes know which node is master. My NTP configuration is
> to have the master node the only node that contacts an external NTP
> server, with the other nodes using the master as their time server.
> 
> The master node's ntp.conf:
> 
>     server 10.54.141.76    # A time server on another subnet
>     server 127.127.1.1
>     fudge  127.127.1.1 stratum 10
>     driftfile /etc/ntp.drift
> 
> The non-master nodes' ntp.conf:
> 
>     server 192.168.140.40   # The master node
>     server 127.127.1.1
>     fudge  127.127.1.1 stratum 15
>     driftfile /etc/ntp.drift
> 
> So, the master node uses an external time server, and can also uses
> its internal clock as a stratum 10 time server.
> 
> The non-master nodes' use the master node as their time server and can
> use their internal clock as a stratum 15 time server.
> 
> By setting the master node's system clock to stratum 10, and the non-
> master nodes' system clocks to stratum 15, I would expect that the
> master node would always be a lower stratum time server than the other
> nodes no matter if the master node is able to maintain a connection to
> its external time server or not.
> 
> This worked as expected most of the time. I used a simple two node
> cluster for testing, with both nodes in the same subnet, and on the
> same switch.
> 
> However, on occasion during the first few hours ntpd was running on
> the nodes, the non-master node would use its system clock as the
> system peer even though it had a higher stratum level than the master
> node. During a course of an hour the non-master node would switch
> between using the master node or its system clock as the system peer.
> 
> Here's some output from ntpq run on the non-master node (node-2):
> 
> chil43-2# ntpq -p
> remote        refid      st t when poll reach   delay   offset  jitter
> ======================================================================
> *node-1    10.54.141.76   3 u   92  128  377    0.089  472.785 109.277
>  LOCAL(1)  LOCAL(1)      15 l   18   64  377    0.000    0.000   0.002
> 
> ... And a few minutes later ...
> 
> chil43-2# ntpq -p
> remote       refid       st t when poll reach   delay   offset  jitter
> ======================================================================
>  node-1    10.54.141.76   3 u   75  256   17    0.085  368.154  35.584
> *LOCAL(1)  LOCAL(1)      15 l   11   64   77    0.000    0.000   0.002
> 
> Eventually, things would settle down and the master node would remain
> the system peer for the non-mater node continuously
> 
> A few questions:
> 1) Why would ntpd choose to use its higher stratum system clock rather
> than a lower stratum server and why would ntpd cycle between them?

Because stratum is a relatively low priority criterion. Note, for example,
the far lower jitter on your local clock, the smaller delay, smaller offset,
etc.

> 
> 2) In the non-master node's ntp.conf, should I just remove the listing
> of its system clock as a server? Is it completely unnecessary as the
> non-master node is only a client?

Yes.

> 
> 3) In general, is the way I'm setting up the ntp.confs for the nodes
> on the cluster reasonable?

No.

> 
> (Note: there's been no master node change during the testing. node-1
> has always remained master. And I have scripts that will reconfigure
> ntp.conf on each node if the master changes.)
> 
> Many thanks.
> 
> DD
> 

Your problem occurs becuase you have only 2 servers and one of them is
phoney. How would it know which one to believe - a correct local clock
and a misbehaving remote one or a correct remote one and a wrong local
one?

Either get rid of the local undisciplined clock or add at least
2 more real ones (which is recommended practice anyway).

-Tom




More information about the questions mailing list