[ntp:questions] Re: NTP seems unsuitable for this application... what do you think?

John Howells John.Howells at marconi.com
Fri Dec 3 12:29:00 UTC 2004



David Woolley wrote:
> 
> In article <%eFrd.4$5V2.2 at dfw-service2.ext.ray.com>,
> John Seal <sealj at indy.raytheon.com> wrote:
> 
> > "if two NTP servers are synchronized to each other as peers, what
> > actually happens is the clocks decide among themselves which is the
> > better source of time, and both clocks attempt to synchronize to that".
> 
> I think that there is a lot of urban folk law about peering.  If you
> really want the two to negotiate, you need timed, not ntpd.  This came
> with SunOS and may be the other protocol your documentation refers to.
> timed doesn't provide a correct time, only a mutually agreed one.
> 
> With ntpd, in this case, both machines will consider their local clocks
> to be perfect as they have the lowest stratum number, are preferred,
> and have a zero offset.  The other machine, after a small amount of drift
> will also be outside the very small estimated error tolerance of the local
> clock.  The local clocks also never fail, even if they have never been
> given a good time.
> 
> As I understand it, the effect of peering is to take the other machine into
> consideration when trying to decide which clocks are true-chimers, but you
> haven't got any tie breakers and the perceived error band is very small.
> (Even then, I think they will only take into account machines that are not
> at higher stratum numbers.)  The peer is also available if it becomes the
> server nearest the root that survives the true chimer test.
> 
> One other thing to watch out for is that people tend to answer based
> on an assumption that you are only slightly at variance from a normal
> configuration, but you are actually a very long way from a normal
> configuration.  To have something like a normal system, the GPS would
> have to be operating a proper reference clock driver, and servicing it
> all the time.  That driver would have to refuse to give out time unless
> there was a GPS signal.

FWIW, this is how two Solaris 8 systems, with no active time control mechanism
other than NTP, behave with the following ntp.conf files:

Machine A
>>
restrict default nomodify notrap
restrict 127.0.0.1
server 127.127.1.0
fudge 127.127.1.0 stratum 10
peer 192.168.64.112
driftfile /etc/inet/drift.file
<<

Machine B
>>
restrict default nomodify notrap
restrict 127.0.0.1
server 127.127.1.0
fudge 127.127.1.0 stratum 10
peer 192.168.64.111
driftfile /etc/inet/drift.file
<<

Machine A is started first, and eventually starts to use its own H/W clock, with
"ntpq -p" output of:

     remote           refid      st t when poll reach   delay   offset    disp
==============================================================================
*LOCAL(0)        LOCAL(0)        10 l    1   64   37     0.00    0.000  885.01
 192.168.64.112  0.0.0.0         16 u   20   64    0     0.00    0.000 16000.0

Machine B is started about 30 seconds after Machine A, with its time
deliberately set at an 80 second (ish) offset into the future from Machine A.
Machine B first uses its own H/W clock, as in

Friday December  3 11:19:14 GMT 2004
     remote           refid      st t when poll reach   delay   offset    disp
==============================================================================
 LOCAL(0)        LOCAL(0)        10 l   62   64   17     0.00    0.000 1885.01
 192.168.64.111  LOCAL(0)        11 u   40   64    1     0.60  -79754. 15875.3
Friday December  3 11:19:17 GMT 2004
     remote           refid      st t when poll reach   delay   offset    disp
==============================================================================
*LOCAL(0)        LOCAL(0)        10 l    1   64   37     0.00    0.000  885.01
 192.168.64.111  LOCAL(0)        11 u   43   64    1     0.60  -79754. 15875.3

and then changes to report the peer as a synch source when the reach gets to 37,
but still reporting the 80 second offset, as in:

Friday December  3 11:22:48 GMT 2004
     remote           refid      st t when poll reach   delay   offset    disp
==============================================================================
*LOCAL(0)        LOCAL(0)        10 l   20   64  377     0.00    0.000   10.01
 192.168.64.111  LOCAL(0)        11 u   62   64   36     0.63  -79754. 1875.37
Friday December  3 11:22:51 GMT 2004
     remote           refid      st t when poll reach   delay   offset    disp
==============================================================================
 LOCAL(0)        LOCAL(0)        10 l   23   64  377     0.00    0.000   10.01
*192.168.64.111  LOCAL(0)        11 u    1   64   37     0.64  -79754.  875.35

but after about 15 minutes it changes to report no sync source:

Friday December  3 11:37:45 GMT 2004
     remote           refid      st t when poll reach   delay   offset    disp
==============================================================================
 LOCAL(0)        LOCAL(0)        10 l   21   64  377     0.00    0.000   10.01
*192.168.64.111  LOCAL(0)        11 u   63   64  376     0.61  -79754.    0.40
Friday December  3 11:36:28 GMT 2004
     remote           refid      st t when poll reach   delay   offset    disp
==============================================================================
 LOCAL(0)        LOCAL(0)        10 -    -   64    0     0.00    0.000 16000.0
 192.168.64.111  LOCAL(0)        11 -    3   64    0     0.64  -79754. 16000.0

and although these messages are only nominally 3 seconds apart the time has
skipped back to remove the 80 second offset, so the two machines are now in
sync. After about a further two minutes Machine B reports:

Friday December  3 11:38:34 GMT 2004
     remote           refid      st t when poll reach   delay   offset    disp
==============================================================================
 LOCAL(0)        LOCAL(0)        10 l    6   64    1     0.00    0.000 15885.0
 192.168.64.111  LOCAL(0)        11 u   64   64    2     0.64  -79754. 16000.0
Friday December  3 11:38:37 GMT 2004
     remote           refid      st t when poll reach   delay   offset    disp
==============================================================================
 LOCAL(0)        LOCAL(0)        10 l    9   64    1     0.00    0.000 15885.0
 192.168.64.111  LOCAL(0)        11 u    3   64    3     0.63   -6.820 15875.0

where "ntpq -p" is also reporting the reduced offset, and finally reports:

Friday December  3 11:42:41 GMT 2004
     remote           refid      st t when poll reach   delay   offset    disp
==============================================================================
 LOCAL(0)        LOCAL(0)        10 l   61   64   17     0.00    0.000 1885.01
 192.168.64.111  LOCAL(0)        11 u   55   64   37     0.60   -6.821 1875.09
Friday December  3 11:42:44 GMT 2004
     remote           refid      st t when poll reach   delay   offset    disp
==============================================================================
*LOCAL(0)        LOCAL(0)        10 l    1   64   37     0.00    0.000  885.01
 192.168.64.111  LOCAL(0)        11 u   59   64   37     0.60   -6.821 1875.09
Friday December  3 11:42:48 GMT 2004
     remote           refid      st t when poll reach   delay   offset    disp
==============================================================================
*LOCAL(0)        LOCAL(0)        10 l    4   64   37     0.00    0.000  885.01
 192.168.64.111  LOCAL(0)        11 u   62   64   76     0.60   -6.821 1875.09
Friday December  3 11:42:51 GMT 2004
     remote           refid      st t when poll reach   delay   offset    disp
==============================================================================
 LOCAL(0)        LOCAL(0)        10 l    7   64   37     0.00    0.000  885.01
*192.168.64.111  LOCAL(0)        11 u    1   64   77     1.16   -7.109  875.35

after which it continues to report its peer as the sync source, except that it
quickly reduces the offset to about 0.25.

FWIW, another system using these as an NTP source shows:

     remote           refid      st t when poll reach   delay   offset    disp
==============================================================================
*192.168.64.111  127.127.1.0     11 u  116  128  377     0.47   -0.083    0.05
+192.168.64.112  192.168.64.111  12 u   28  128  377     0.50   -0.241    0.05

at the end of this sequence.

I don't know how this affects the discussion, or whether this is normal NTP
behaviour or something unique to the Solaris 8 implementation, but it does
appear to be behaving as described in the original quote from the Sun document,
that the second machine waits for a while and then metaphorically shrugs its
shoulders and decides to go with the time from its peer, as in

> > "if two NTP servers are synchronized to each other as peers, what
> > actually happens is the clocks decide among themselves which is the
> > better source of time, and both clocks attempt to synchronize to that".

Clearly the behaviour would be different if either or both had an external
source higher than stratum 10, as that would then be used in preference to
either the local clock or the peer. But I have no idea what would happen if each
had separate external sources, but those sources differed significantly in the
time they offered. I guess that each would regard its peer as a crazy false
ticker, but I have no evidence to back up this hunch.

As you said, there does seem to be a lot of urban folk law about peering, and
(though it may be my failing) I was never able to work out what should happen
from the documentation I could find. But this test indicates that if two Solaris
8 systems have no external NTP source and are set to peer they will soon come
into step, and then stay there.

John Howells



More information about the questions mailing list