[ntp:questions] Why false tickers one day, and not the next day?
dromedaryl at yahoo.com
dromedaryl at yahoo.com
Mon Dec 17 23:52:25 UTC 2007
I'm trying to deterime why I'm having a problem with ntpd marking all
the servers it contacts as false tickers one day and the next day
everything is okay. I'm giving the explanation at the top of this
posting, with the output of various ntpq's below.
The setup is a two node FreeBSD cluster, node-A and node-B. They are
on the same subnet and switch.
Node-A's ntp.conf:
server 210.173.160.27 # external NTP server
server node-B iburst # other node
server 127.127.1.1
fudge 127.127.1.1 stratum 9
driftfile /etc/ntp.drift
Node-B's ntp.conf:
server 210.173.160.27 # external NTP server
server node-A iburst # other node
server 127.127.1.1
fudge 127.127.1.1 stratum 11
driftfile /etc/ntp.drift
The idea is that as the cluster expands, node-A and node-B will be
time servers for the new nodes. Node-A's local clock has a lower
stratum value than node-B's so in the case that the cluster loses
connection to the external server, node-A is the preferred chimer for
the cluster. If node-A loses its connection to the external server
(but not to node-B), node-A will use node-B as its server, and vice
versa.
What's happening is that things go as expected for a short time with
node-A and node-B using the external time server as their system peer,
and using each other as candidate peers.
But within a few minutes, the external time server gets marked as a
false ticker by both nodes, and both nodes mark each other as false
tickers.
There is nothing logged by ntpd.
The nodes' drifts are high:
# cat /etc/ntp.drift
node-A: 499.206
node-B: 497.070
The nodes and the external time server are in Asia. I have an
identically setup cluster in North America using the same Asian time
server, and that cluster has no problem keeping the Asian server as a
peer, despite having a delay of about 120 msecs, nearly a 100 times
higher than the Asian cluster's delay to the time server.
The next day, after restarting ntpd on the nodes and resetting
the time on all nodes with ntpdate, everything worked as
expected with the time syncing properly, no false tickers, and the
nodes' drifts are under 30.0. No network changes were made.
Any idea on what's going on here? What would cause all the servers to
be marked as false tickers, and then be fine the next day? Is there
a way to configure ntpd so this won't happen?
Here's the output of a number of sequential calls to "ntpq -p":
Just after starting up ntpd:
node-A: remote refid st t when poll reach
delay offset jitter
node-A:
==============================================================================
node-A: 210.173.160.27 210.173.176.251 2 u 112 256 17
1.447 -2.810 78.540
node-A: node-B .INIT. 16 u 273 512 0
0.000 0.000 4000.00
node-A: *LOCAL(1) LOCAL(1) 9 l 33 64 77
0.000 0.000 0.002
node-B:
==============================================================================
node-B: 210.173.160.27 210.173.176.251 2 u 101 256 17
1.358 -4.869 76.916
node-B: node-A .INIT. 16 u 265 512 0
0.000 0.000 4000.00
node-B: *LOCAL(1) LOCAL(1) 11 l 36 64 77
0.000 0.000 0.002
A few minutes later, node-A has the external server as a system peer
and node-B as a candidate peer. But node-B marks the external server
as a false ticker, and using Node-A as the system peer:
node-A: remote refid st t when poll reach
delay offset jitter
node-A:
==============================================================================
node-A: *210.173.160.27 210.173.176.251 2 u 290 512 37
1.447 -2.810 137.124
node-A: +node-B LOCAL(1) 12 u 181 1024 1 0.109
-12.652 0.128
node-A: LOCAL(1) LOCAL(1) 9 l 14 64 377
0.000 0.000 0.002
node-B:
==============================================================================
node-B: x210.173.160.27 210.173.176.251 2 u 278 512 37
1.358 -4.869 133.904
node-B: *node-A LOCAL(1) 10 u 172 1024 1
0.113 12.824 0.099
node-B: LOCAL(1) LOCAL(1) 11 l 16 64 377
0.000 0.000 0.002
A few minutes later. This looks great as it's what's expected:
node-A: remote refid st t when poll reach
delay offset jitter
node-A:
==============================================================================
node-A: *210.173.160.27 210.173.176.251 2 u 492 512 37
1.447 -2.810 137.124
node-A: +node-B LOCAL(1) 12 u 383 1024 1 0.109
-12.652 0.128
node-A: LOCAL(1) LOCAL(1) 9 l 24 64 377
0.000 0.000 0.002
node-B:
==============================================================================
node-B: *210.173.160.27 210.173.176.251 2 u 480 512 37
1.358 -4.869 133.904
node-B: +node-A LOCAL(1) 10 u 374 1024 1
0.113 12.824 0.099
node-B: LOCAL(1) LOCAL(1) 11 l 24 64 377
0.000 0.000 0.002
A few minutes later everything becomes a false ticker. The offset to
the external server has increased dramatically:
node-A: remote refid st t when poll reach
delay offset jitter
node-A:
==============================================================================
node-A: x210.173.160.27 210.173.176.251 2 u 28 64 377 1.455
-572.95 354.508
node-A: xnode-B LOCAL(1) 12 u 624 1024 1 0.109
-12.652 0.128
node-A: *LOCAL(1) LOCAL(1) 9 l 2 64 377
0.000 0.000 0.002
node-B:
==============================================================================
node-B: x210.173.160.27 210.173.176.251 2 u 15 64 377 1.529
-561.00 345.641
node-B: xnode-A LOCAL(1) 10 u 614 1024 1
0.113 12.824 0.099
node-B: *LOCAL(1) LOCAL(1) 11 l 9 64 377
0.000 0.000 0.002
It also appears that when node-B polls the external server and decides
to mark it as a false ticker, it also decided to change node-A from a
candidate to false ticker, despite not polling it.
node-A: remote refid st t when poll reach
delay offset jitter
node-A:
==============================================================================
node-A: x210.173.160.27 210.173.176.251 2 u 32 64 377 1.455
-572.95 134.136
node-A: xnode-B LOCAL(1) 12 u 52 64 3 0.109
-12.652 4.786
node-A: *LOCAL(1) LOCAL(1) 9 l 66 64 377
0.000 0.000 0.002
node-B:
==============================================================================
node-B: x210.173.160.27 210.173.176.251 2 u 14 64 377 1.529
-561.00 130.750
node-B: xnode-A LOCAL(1) 10 u 41 64 3
0.113 12.824 4.833
node-B: *LOCAL(1) LOCAL(1) 11 l 10 64 377
0.000 0.000 0.002
Thanks for any help.
DD
More information about the questions
mailing list