[ntp:questions] Isolated Network Drift Problem

Steve Kostecke kostecke at ntp.org
Fri Nov 21 16:04:01 UTC 2008


On 2008-11-21, David Woolley <david at ex.djwhome.demon.co.uk.invalid> wrote:

> Any pure clients should not have a local clock.  That is universally 
> true, not just for time islands.  For the remaining machines, you should 
>   either specify a clear hieararchy, with steps of two in the local 
> clock stratum between each one, or, I think orphan mode will work, 
> providing the master server, with the local clock, never goes down for 
> more than a few hours at a time.

Unfortunately I can no longer find any mention of Orphan Mode in the NTP-Dev
Documentation at http://www.eecis.udel.edu/~mills/ntp/html/index.html.

But the Distribution Documentation for 4.2.4 (see
http://doc.ntp.org/4.2.4/manyopt.html#orphan) states:

| "Sometimes it is necessary to operate an NTP subnet in isolation,
| because a local reference clock is unavailable or connectivity to the
| Internet is not provided. In such cases it may be necessary that the
| subnet servers and clients remain synchronized to a common timescale,
| not necessarily the UTC timescale. Previously, this function was
| provided by the local clock driver, which could be configured for a
| server that could be reached, directly or indirectly from all other
| servers and clients in the subnet.
| 
| There are many disadvantages using the local clock driver: multiple
| source redundancy is not possible and the subnet is vulnerable to
| single-point failures. Orphan mode is intended to replace the need for
| the local clock driver. It operates in subnet configurations in all
| modes, including broadcast, and multiple servers and clients and handles
| seamless switching as primary sources fail and recover."

I've just run a test between an 4.2.5p145 Orphan Server configured with
just 'tos orphan 6' as a time source and a 4.2.5p20 Orphan Client. The
client _WAS_ successfully able to sync to the server.

So the claim in the documentation seems to be correct.

However ntpdate rejects the Orphan Server as unsuitable even though
it sees the server operating at stratum 6.

The new gsoc_sntp code, on the other hand, accepts this Orphan Server.

It seems to me that the real problem here is a bug in ntpdate.

| "Orphan parents show offset zero, root delay zero and reference ID
| 127.0.0.1, which of course is the Unix loopback address. Orphan children
| show the mitigated offset of their servers, root delay randomized over a
| moderate range and reference ID of their system peer. An important
| distinction is that the entire subnet operates at the same orphan
| stratum and that the order of preference is the root delay, not the
| stratum and root distance as usual."

> (There is circumstancial evidence, in a recent thread, that root
> dispersion will diverge on orphan mode servers until they get rejected
> for excessive root distance.)

The "evidence" is misleading. I have _actually_ observed the behavior
of an ntpd functioning as an "Orphan Server" when the real time sources
become unreachable.

This is the test system while is synced to a time source (in this case a
remote time server):

| $ ntpq -crv
| associd=0 status=0645 leap_none, sync_ntp, 4 events, clock_sync,
| version="ntpd 4.2.5p145 at 1.1791-o Thu Nov 20 13:43:00 UTC 2008 (1)",
| processor="i586", system="Linux/2.6.18-k6", leap=00, stratum=3,
| precision=-19, rootdelay=1.818, rootdisp=478.928, refid=192.168.19.3,
| reftime=ccd144c5.9a13218a  Fri, Nov 21 2008  9:31:33.601,
| clock=ccd14511.7df12fd9  Fri, Nov 21 2008  9:32:49.491, peer=10331,
| tc=7, mintc=3, offset=-0.113, frequency=148.703, sys_jitter=0.049,
| clk_jitter=0.171, clk_wander=0.028
| 
| $ ntpq -np
|      remote        refid   st t when poll reach delay offset jitter
| ===================================================================
| *192.168.19.3 149.35.47.206 2 u   76  128  377  0.536 -0.113  0.049
| 
| $ ntpdate -q localhost
| server 127.0.0.1, stratum 3, offset -0.000010, delay 0.02565
| 21 Nov 09:33:40 ntpdate[12529]: adjust time server 127.0.0.1 \
|    offset -0.000010 sec

This is the test system shortly after the remote time server was
unreachable for 8 consecutive polls:

| $ ntpq -crv
| associd=0 status=0058 leap_none, sync_unspec, 5 events, no_sys_peer,
| version="ntpd 4.2.5p145 at 1.1791-o Thu Nov 20 13:43:00 UTC 2008 (1)",
| processor="i586", system="Linux/2.6.18-k6", leap=00, stratum=6,
| precision=-19, rootdelay=0.000, rootdisp=5.000, refid=127.0.0.1,
| reftime=ccd144c5.9a13218a  Fri, Nov 21 2008  9:31:33.601,
| clock=ccd14a2d.e6484982  Fri, Nov 21 2008  9:54:37.899, peer=0, tc=7,
| mintc=3, offset=-0.113, frequency=148.703, sys_jitter=0.023,
| clk_jitter=0.171, clk_wander=0.028
| 
| $ ntpq -np
|      remote        refid   st t when poll reach delay offset jitter
| ===================================================================
|  192.168.19.3 149.35.47.206 2 u 1250  128    0  0.588 -0.096 0.002

This is the test system after the remote time server had been
unreachable for ~ 45 minutes.

| $ ntpq -crv
| associd=0 status=0058 leap_none, sync_unspec, 5 events, no_sys_peer,
| version="ntpd 4.2.5p145 at 1.1791-o Thu Nov 20 13:43:00 UTC 2008 (1)",
| processor="i586", system="Linux/2.6.18-k6", leap=00, stratum=6,
| precision=-19, rootdelay=0.000, rootdisp=5.000, refid=127.0.0.1,
| reftime=ccd144c5.9a13218a  Fri, Nov 21 2008  9:31:33.601,
| clock=ccd14fe9.75b2c454  Fri, Nov 21 2008 10:19:05.459, peer=0, tc=7,
| mintc=3, offset=-0.113, frequency=148.703, sys_jitter=0.023,
| clk_jitter=0.171, clk_wander=0.028
| 
| $ ntpq -np
|      remote        refid   st t when poll reach delay offset jitter
| ===================================================================
|  192.168.19.3 149.35.47.206 2 u  45m  128    0  0.588 -0.096 0.000
| 
| $ ntpdate -q localhost
| server 127.0.0.1, stratum 6, offset -0.000015, delay 0.02567
| 21 Nov 10:19:05 ntpdate[18184]: adjust time server 127.0.0.1 \
|     offset -0.000015 sec

As you can see the rootdelay and root dispersion are unchanged.

The system I used for this test is a production server so I'm
not inclined to let it free-wheel for longer than an hour or so.

-- 
Steve Kostecke <kostecke at ntp.org>
NTP Public Services Project - http://support.ntp.org/




More information about the questions mailing list