[ntp:questions] Isolated Network Drift Problem

Cal Webster cwebster at ec.rr.com
Mon Nov 24 20:27:52 UTC 2008


Gentlemen,

If there is any way I can help shed light on this spirited discussion
please let me know. I have 4 NTP servers running in isolation (versions
4.2.0.a.20040617-6.el4, ntp-4.2.4p2-6.fc8, and ntp-4.1.2-0.rc1.2 ) and
I'm willing to play around with them as long as I don't end up choking
my client machines to death.


-------- Anyway, here's where I'm at with the original problem --------

I found that adjtimex(8) is not available for RHEL (or CentOS) 3 or 4.
It was available in all Red Hat distros prior to and since. I may be
able to use one from a Fedora version based on gcc-3.4 (looks like fc6)
or try building from a RHEL 5 SRPM.

Although ntptime(1) is not a replacement for adjtimex(8), it's available
for all our NTP servers and it looks like I can set parameters such as
frequency offset, clock offset, estimated error and others.

What's the best way to determine which of our NTP servers provides the
best local clock?

I've changed my server ntp.conf files so that one machine (jato) is
designated as a server with its undisciplined clock set at stratum 5.
The other three are peers to each other, each pointing to the one
stratum 5 server. The peers still have the undisciplined clock
configured but at stratum 8. I guess I'll see how this goes.

Before restarting the ntp daemons I zeroed out the drift files then set
the system times to the exact second showing on
"http://wwp.greenwichmeantime.com/time-zone/usa/eastern-time/". I then
set the hardware clocks to match system time with "hwclock --systohc".
Although the network has no Internet connection, I have a KVM to an
Internet connected machine I can use. I started the master server daemon
first, followed by the others. As I understand it, all NTP servers will
now use the master server unless it becomes unreachable when they'll
fall back to orphan mode until the master is again reachable. So far, it
looks like they've all selected the master server.

If I notice any drift after several days I'll try setting a frequency
offset on the master server using "ntptime -f (ppm)". How do I calculate
the offset in ppm from my observed seconds per day?

Here is the output of ntptime -r for each of the servers (master first)
shortly after they were reinitialized. I don't understand where it's
getting the estimated error values so soon after being initialized.
Should I have removed the existing /etc/adjtime files first?

[root at jato ~]# ntptime -r
ntp_gettime() returns code 0 (OK)
  time ccd58790.2b171000  Mon, Nov 24 2008 15:05:36.168, (.168321),
  maximum error 43628 us, estimated error 0 us
ntptime=ccd58790.2b171000 unixtime=492b0910.168321 Mon Nov 24 15:05:36
2008
 
ntp_adjtime() returns code 0 (OK)
  modes 0x0 (),
  offset 0.000 us, frequency 0.000 ppm, interval 4 s,
  maximum error 43628 us, estimated error 0 us,
  status 0x1 (PLL),
  time constant 6, precision 1.000 us, tolerance 512 ppm,
  pps frequency 0.000 ppm, stability 512.000 ppm, jitter 200.000 us,
  intervals 0, jitter exceeded 0, stability exceeded 0, errors 0.


[root at pegasus etc]# ntptime -r
ntp_gettime() returns code 0 (OK)
  time ccd58790.83905000  Mon, Nov 24 2008 15:05:36.513, (.513921),
  maximum error 139493 us, estimated error 86735 us
ntptime=ccd58790.83905000 unixtime=492b0910.513921 Mon Nov 24 15:05:36
2008
 
ntp_adjtime() returns code 0 (OK)
  modes 0x0 (),
  offset -12033.000 us, frequency 160.918 ppm, interval 4 s,
  maximum error 139493 us, estimated error 86735 us,
  status 0x1 (PLL),
  time constant 2, precision 1.000 us, tolerance 512 ppm,
  pps frequency 0.000 ppm, stability 512.000 ppm, jitter 200.000 us,
  intervals 0, jitter exceeded 0, stability exceeded 0, errors 0.


[root at fluid root]# ntptime -r
ntp_gettime() returns code 0 (OK)
  time ccd58790.b0ea6000  Mon, Nov 24 2008 15:05:36.691, (.691076),
  maximum error 128778 us, estimated error 94045 us
ntptime=ccd58790.b0ea6000 unixtime=492b0910.691076 Mon Nov 24 15:05:36
2008
 
ntp_adjtime() returns code 0 (OK)
  modes 0x0 (),
  offset 101585.000 us, frequency -167.834 ppm, interval 4 s,
  maximum error 128778 us, estimated error 94045 us,
  status 0x1 (PLL),
  time constant 3, precision 1.000 us, tolerance 512 ppm,
  pps frequency 0.000 ppm, stability 512.000 ppm, jitter 200.000 us,
  intervals 0, jitter exceeded 0, stability exceeded 0, errors 0.


[root at axl ~]# ntptime -r
ntp_gettime() returns code 0 (OK)
  time ccd58791.092d0000  Mon, Nov 24 2008 15:05:37.035, (.035843),
  maximum error 563641 us, estimated error 1944 us
ntptime=ccd58791.92d0000 unixtime=492b0911.035843 Mon Nov 24 15:05:37
2008
 
ntp_adjtime() returns code 0 (OK)
  modes 0x0 (),
  offset 4435.000 us, frequency 1.367 ppm, interval 1 s,
  maximum error 563641 us, estimated error 1944 us,
  status 0x1 (PLL),
  time constant 6, precision 1.000 us, tolerance 512 ppm,



Thanks much!

./Cal



On Mon, 2008-11-24 at 18:31 +0000, Steve Kostecke wrote:
> On 2008-11-22, David Woolley <david at djwhome.demon.co.uk> wrote:
> 
> > Steve Kostecke wrote:
> >
> >> On 2008-11-22, David Woolley <david at djwhome.demon.co.uk> wrote:
> >>
> >>> In that case, you need to come up with
> 
> [snip]
> 
> >>> In particular, you need to ...
> 
> [snip]
> 
> >> You are entitled to ask if I can provide explanations for reported
> >> ntp behavior. But telling me that I "need to" do something at your
> >> whim is simply going too far.
> >
> > If you don't do so,
> 
> There's that tone again.
> 
> > there is evidence pointing in both directions. As such, I will have to
> > continue warning people that there is reasonable doubt about whether
> > orphan mode works for time islands. (It may turn out that it works for
> > some ntpd versions, but not others, for example.)
> 
> Based on my _actual_ _tests_ (which, again, I have yet to see you deign
> to conduct) ...
> 
> Version 4.2.5p145 works as a stand alone Orphan Server. The refid is set
> to 127.0.0.1 and the startum to the orphan level shortly after start up
> and the rootdispersion (now named rootdisp) is stable at a fixed value.
> 
> Version 4.2.5p20, on the other hand, does not work as a stand alone
> Orphan Server. This version of NTP leaves the refid set to .INIT. and
> the stratum at 16. And the rootdispersion increases as ntpd runs.
> 
> According to the ChangeLogs, changes to the Orphan Mode code were
> incorporated to 4.2.4p5 and 4.2.5p124 on 2008/08/17. However 4.2.4p5
> shows the same incorrect behavior.
> 
> Several hours of testing have shown that only releases after (and
> including) 4.2.5p101 are usable as stand alone Orphan Servers (e.g. in a
> time island).
> 




More information about the questions mailing list