[ntp:questions] Why false tickers one day, and not the next day?

dromedaryl at yahoo.com dromedaryl at yahoo.com
Mon Dec 17 23:52:25 UTC 2007


I'm trying to deterime why I'm having a problem with ntpd marking all
the servers it contacts as false tickers one day and the next day
everything is okay. I'm giving the explanation at the top of this
posting, with the output of various ntpq's below.

The setup is a two node FreeBSD cluster, node-A and node-B. They are
on the same subnet and switch.

Node-A's ntp.conf:
    server 210.173.160.27  # external NTP server
    server node-B iburst   # other node
    server 127.127.1.1
    fudge  127.127.1.1 stratum 9
    driftfile /etc/ntp.drift

Node-B's ntp.conf:
    server 210.173.160.27  # external NTP server
    server node-A iburst   # other node
    server 127.127.1.1
    fudge  127.127.1.1 stratum 11
    driftfile /etc/ntp.drift

The idea is that as the cluster expands, node-A and node-B will be
time servers for the new nodes. Node-A's local clock has a lower
stratum value than node-B's so in the case that the cluster loses
connection to the external server, node-A is the preferred chimer for
the cluster. If node-A loses its connection to the external server
(but not to node-B), node-A will use node-B as its server, and vice
versa.

What's happening is that things go as expected for a short time with
node-A and node-B using the external time server as their system peer,
and using each other as candidate peers.

But within a few minutes, the external time server gets marked as a
false ticker by both nodes, and both nodes mark each other as false
tickers.

There is nothing logged by ntpd.

The nodes' drifts are high:
# cat /etc/ntp.drift
node-A: 499.206
node-B: 497.070

The nodes and the external time server are in Asia. I have an
identically setup cluster in North America using the same Asian time
server, and that cluster has no problem keeping the Asian server as a
peer, despite having a delay of about 120 msecs, nearly a 100 times
higher than the Asian cluster's delay to the time server.

The next day, after restarting ntpd on the nodes and resetting
the time on all nodes with ntpdate, everything worked as
expected with the time syncing properly, no false tickers, and the
nodes' drifts are under 30.0. No network changes were made.

Any idea on what's going on here? What would cause all the servers to
be marked as false tickers, and then be fine the next day? Is there
a way to configure ntpd so this won't happen?

Here's the output of a number of sequential calls to "ntpq -p":

Just after starting up ntpd:

node-A:      remote           refid      st t when poll reach
delay   offset  jitter
node-A:
==============================================================================
node-A:  210.173.160.27  210.173.176.251  2 u  112  256   17
1.447   -2.810  78.540
node-A:  node-B         .INIT.           16 u  273  512    0
0.000    0.000 4000.00
node-A: *LOCAL(1)        LOCAL(1)         9 l   33   64   77
0.000    0.000   0.002
node-B:
==============================================================================
node-B:  210.173.160.27  210.173.176.251  2 u  101  256   17
1.358   -4.869  76.916
node-B:  node-A         .INIT.           16 u  265  512    0
0.000    0.000 4000.00
node-B: *LOCAL(1)        LOCAL(1)        11 l   36   64   77
0.000    0.000   0.002

A few minutes later, node-A has the external server as a system peer
and node-B as a candidate peer. But node-B marks the external server
as a false ticker, and using Node-A as the system peer:

node-A:      remote           refid      st t when poll reach
delay   offset  jitter
node-A:
==============================================================================
node-A: *210.173.160.27  210.173.176.251  2 u  290  512   37
1.447   -2.810 137.124
node-A: +node-B          LOCAL(1)        12 u  181 1024    1    0.109
-12.652   0.128
node-A:  LOCAL(1)        LOCAL(1)         9 l   14   64  377
0.000    0.000   0.002
node-B:
==============================================================================
node-B: x210.173.160.27  210.173.176.251  2 u  278  512   37
1.358   -4.869 133.904
node-B: *node-A          LOCAL(1)        10 u  172 1024    1
0.113   12.824   0.099
node-B:  LOCAL(1)        LOCAL(1)        11 l   16   64  377
0.000    0.000   0.002

A few minutes later. This looks great as it's what's expected:

node-A:      remote           refid      st t when poll reach
delay   offset  jitter
node-A:
==============================================================================
node-A: *210.173.160.27  210.173.176.251  2 u  492  512   37
1.447   -2.810 137.124
node-A: +node-B          LOCAL(1)        12 u  383 1024    1    0.109
-12.652   0.128
node-A:  LOCAL(1)        LOCAL(1)         9 l   24   64  377
0.000    0.000   0.002
node-B:
==============================================================================
node-B: *210.173.160.27  210.173.176.251  2 u  480  512   37
1.358   -4.869 133.904
node-B: +node-A          LOCAL(1)        10 u  374 1024    1
0.113   12.824   0.099
node-B:  LOCAL(1)        LOCAL(1)        11 l   24   64  377
0.000    0.000   0.002

A few minutes later everything becomes a false ticker. The offset to
the external server has increased dramatically:

node-A:      remote           refid      st t when poll reach
delay   offset  jitter
node-A:
==============================================================================
node-A: x210.173.160.27  210.173.176.251  2 u   28   64  377    1.455
-572.95 354.508
node-A: xnode-B          LOCAL(1)        12 u  624 1024    1    0.109
-12.652   0.128
node-A: *LOCAL(1)        LOCAL(1)         9 l    2   64  377
0.000    0.000   0.002
node-B:
==============================================================================
node-B: x210.173.160.27  210.173.176.251  2 u   15   64  377    1.529
-561.00 345.641
node-B: xnode-A          LOCAL(1)        10 u  614 1024    1
0.113   12.824   0.099
node-B: *LOCAL(1)        LOCAL(1)        11 l    9   64  377
0.000    0.000   0.002

It also appears that when node-B polls the external server and decides
to mark it as a false ticker, it also decided to change node-A from a
candidate to false ticker, despite not polling it.

node-A:      remote           refid      st t when poll reach
delay   offset  jitter
node-A:
==============================================================================
node-A: x210.173.160.27  210.173.176.251  2 u   32   64  377    1.455
-572.95 134.136
node-A: xnode-B          LOCAL(1)        12 u   52   64    3    0.109
-12.652   4.786
node-A: *LOCAL(1)        LOCAL(1)         9 l   66   64  377
0.000    0.000   0.002
node-B:
==============================================================================
node-B: x210.173.160.27  210.173.176.251  2 u   14   64  377    1.529
-561.00 130.750
node-B: xnode-A          LOCAL(1)        10 u   41   64    3
0.113   12.824   4.833
node-B: *LOCAL(1)        LOCAL(1)        11 l   10   64  377
0.000    0.000   0.002

Thanks for any help.

DD




More information about the questions mailing list