[ntp:questions] Isolated Network Drift Problem

Richard B. Gilbert rgilbert88 at comcast.net
Tue Nov 25 14:43:48 UTC 2008


Cal Webster wrote:
> Gentlemen,
> 
> If there is any way I can help shed light on this spirited discussion
> please let me know. I have 4 NTP servers running in isolation (versions
> 4.2.0.a.20040617-6.el4, ntp-4.2.4p2-6.fc8, and ntp-4.1.2-0.rc1.2 ) and
> I'm willing to play around with them as long as I don't end up choking
> my client machines to death.
> 
> 
> -------- Anyway, here's where I'm at with the original problem --------
> 
> I found that adjtimex(8) is not available for RHEL (or CentOS) 3 or 4.
> It was available in all Red Hat distros prior to and since. I may be
> able to use one from a Fedora version based on gcc-3.4 (looks like fc6)
> or try building from a RHEL 5 SRPM.
> 
> Although ntptime(1) is not a replacement for adjtimex(8), it's available
> for all our NTP servers and it looks like I can set parameters such as
> frequency offset, clock offset, estimated error and others.
> 
> What's the best way to determine which of our NTP servers provides the
> best local clock?

Note the difference between each clock under test and the correct time.
Wait seven to ten days and compare these clocks again.  The clock with 
the least change in the original offset is the "best clock".  Note that 
this is not fool proof; you might find a situation in which the "best" 
clock varies wildly from hour to hour and is only the best "on average".

A more adequate test would be to set up a server with a GPS receiver and 
compare the offset of all the local clocks under test every thirty 
minutes over a period of several days.  But, if you are going to do 
that, you might as well make it permanent and use it for your master clock!

Consider that the Garmin GPS18LVC has a pulse per second output and 
costs less than $100 US.  If you can site an antenna with a good view of 
the sky, you can have a stratum 1 server of your very own and have the 
time accurate to within a millisecond or less.  Note that while the GPS 
is accurate to 50ns or better the process of getting the time into your 
computer may introduce several hundred microseconds of uncertainty.



> 
> I've changed my server ntp.conf files so that one machine (jato) is
> designated as a server with its undisciplined clock set at stratum 5.
> The other three are peers to each other, each pointing to the one
> stratum 5 server. The peers still have the undisciplined clock
> configured but at stratum 8. I guess I'll see how this goes.
> 
> Before restarting the ntp daemons I zeroed out the drift files then set
> the system times to the exact second showing on
> "http://wwp.greenwichmeantime.com/time-zone/usa/eastern-time/". I then
> set the hardware clocks to match system time with "hwclock --systohc".
> Although the network has no Internet connection, I have a KVM to an
> Internet connected machine I can use. I started the master server daemon
> first, followed by the others. As I understand it, all NTP servers will
> now use the master server unless it becomes unreachable when they'll
> fall back to orphan mode until the master is again reachable. So far, it
> looks like they've all selected the master server.
> 
> If I notice any drift after several days I'll try setting a frequency
> offset on the master server using "ntptime -f (ppm)". How do I calculate
> the offset in ppm from my observed seconds per day?
> 
> Here is the output of ntptime -r for each of the servers (master first)
> shortly after they were reinitialized. I don't understand where it's
> getting the estimated error values so soon after being initialized.
> Should I have removed the existing /etc/adjtime files first?
> 
> [root at jato ~]# ntptime -r
> ntp_gettime() returns code 0 (OK)
>   time ccd58790.2b171000  Mon, Nov 24 2008 15:05:36.168, (.168321),
>   maximum error 43628 us, estimated error 0 us
> ntptime=ccd58790.2b171000 unixtime=492b0910.168321 Mon Nov 24 15:05:36
> 2008
>  
> ntp_adjtime() returns code 0 (OK)
>   modes 0x0 (),
>   offset 0.000 us, frequency 0.000 ppm, interval 4 s,
>   maximum error 43628 us, estimated error 0 us,
>   status 0x1 (PLL),
>   time constant 6, precision 1.000 us, tolerance 512 ppm,
>   pps frequency 0.000 ppm, stability 512.000 ppm, jitter 200.000 us,
>   intervals 0, jitter exceeded 0, stability exceeded 0, errors 0.
> 
> 
> [root at pegasus etc]# ntptime -r
> ntp_gettime() returns code 0 (OK)
>   time ccd58790.83905000  Mon, Nov 24 2008 15:05:36.513, (.513921),
>   maximum error 139493 us, estimated error 86735 us
> ntptime=ccd58790.83905000 unixtime=492b0910.513921 Mon Nov 24 15:05:36
> 2008
>  
> ntp_adjtime() returns code 0 (OK)
>   modes 0x0 (),
>   offset -12033.000 us, frequency 160.918 ppm, interval 4 s,
>   maximum error 139493 us, estimated error 86735 us,
>   status 0x1 (PLL),
>   time constant 2, precision 1.000 us, tolerance 512 ppm,
>   pps frequency 0.000 ppm, stability 512.000 ppm, jitter 200.000 us,
>   intervals 0, jitter exceeded 0, stability exceeded 0, errors 0.
> 
> 
> [root at fluid root]# ntptime -r
> ntp_gettime() returns code 0 (OK)
>   time ccd58790.b0ea6000  Mon, Nov 24 2008 15:05:36.691, (.691076),
>   maximum error 128778 us, estimated error 94045 us
> ntptime=ccd58790.b0ea6000 unixtime=492b0910.691076 Mon Nov 24 15:05:36
> 2008
>  
> ntp_adjtime() returns code 0 (OK)
>   modes 0x0 (),
>   offset 101585.000 us, frequency -167.834 ppm, interval 4 s,
>   maximum error 128778 us, estimated error 94045 us,
>   status 0x1 (PLL),
>   time constant 3, precision 1.000 us, tolerance 512 ppm,
>   pps frequency 0.000 ppm, stability 512.000 ppm, jitter 200.000 us,
>   intervals 0, jitter exceeded 0, stability exceeded 0, errors 0.
> 
> 
> [root at axl ~]# ntptime -r
> ntp_gettime() returns code 0 (OK)
>   time ccd58791.092d0000  Mon, Nov 24 2008 15:05:37.035, (.035843),
>   maximum error 563641 us, estimated error 1944 us
> ntptime=ccd58791.92d0000 unixtime=492b0911.035843 Mon Nov 24 15:05:37
> 2008
>  
> ntp_adjtime() returns code 0 (OK)
>   modes 0x0 (),
>   offset 4435.000 us, frequency 1.367 ppm, interval 1 s,
>   maximum error 563641 us, estimated error 1944 us,
>   status 0x1 (PLL),
>   time constant 6, precision 1.000 us, tolerance 512 ppm,
> 
> 
> 
> Thanks much!
> 
> ./Cal
> 
> 
> 
> On Mon, 2008-11-24 at 18:31 +0000, Steve Kostecke wrote:
>> On 2008-11-22, David Woolley <david at djwhome.demon.co.uk> wrote:
>>
>>> Steve Kostecke wrote:
>>>
>>>> On 2008-11-22, David Woolley <david at djwhome.demon.co.uk> wrote:
>>>>
>>>>> In that case, you need to come up with
>> [snip]
>>
>>>>> In particular, you need to ...
>> [snip]
>>
>>>> You are entitled to ask if I can provide explanations for reported
>>>> ntp behavior. But telling me that I "need to" do something at your
>>>> whim is simply going too far.
>>> If you don't do so,
>> There's that tone again.
>>
>>> there is evidence pointing in both directions. As such, I will have to
>>> continue warning people that there is reasonable doubt about whether
>>> orphan mode works for time islands. (It may turn out that it works for
>>> some ntpd versions, but not others, for example.)
>> Based on my _actual_ _tests_ (which, again, I have yet to see you deign
>> to conduct) ...
>>
>> Version 4.2.5p145 works as a stand alone Orphan Server. The refid is set
>> to 127.0.0.1 and the startum to the orphan level shortly after start up
>> and the rootdispersion (now named rootdisp) is stable at a fixed value.
>>
>> Version 4.2.5p20, on the other hand, does not work as a stand alone
>> Orphan Server. This version of NTP leaves the refid set to .INIT. and
>> the stratum at 16. And the rootdispersion increases as ntpd runs.
>>
>> According to the ChangeLogs, changes to the Orphan Mode code were
>> incorporated to 4.2.4p5 and 4.2.5p124 on 2008/08/17. However 4.2.4p5
>> shows the same incorrect behavior.
>>
>> Several hours of testing have shown that only releases after (and
>> including) 4.2.5p101 are usable as stand alone Orphan Servers (e.g. in a
>> time island).
>>




More information about the questions mailing list