[ntp:questions] Isolated Network Drift Problem

Cal Webster cwebster at ec.rr.com
Fri Nov 21 22:25:54 UTC 2008


Thanks much to you and the others who have provided useful information.
I'll digest all this and try something else Monday. 

Couple of notes:

"adjtimex" is not available on our systems. They're all Red Hat/Fedora
or derived from them. I do have "hwclock" but I don't think it will do
what was suggested.

If I set a "bias" value in the drift file won't NTP change it anyway?
After I zeroed it out, it has been changing over time.

I'd like to stay away from a hierarchy with a single point of failure.

Best Regards,

./Cal


On Fri, 2008-11-21 at 16:06 +0000, Unruh wrote:
> David Woolley <david at ex.djwhome.demon.co.uk.invalid> writes:
> 
> >Cal Webster wrote:
> >> Our NTP servers are slowly loosing time. All are in nearly perfect sync
> >> but collectively drift backwards over time. Is there a way to apply a
> >> bias to the drift calculations?
> 
> >ntp.drift on the one machine with the local clock configured.
> >> 
> >> We had to disconnect from the Internet several months ago. Since then we
> >> have had serious drift problems. Shortly after the disconnect I
> >> discovered that we were predictably loosing 10 minutes every 15 days. I
> >> tried several things but not until I zeroed out the
> >> "driftfile" (/var/lib/ntp/drift) 10 days ago [Mon Nov 10 18:10:00 2008]
> >> did this large drift abate.
> 
> >Drift > 463ppm (500ppm is ntpd's limit of correctable drift, when no 
> >phase noise is present).  Something is seriously broken.  I suspect that 
> >you have a lost timer interrupts problem and ntpd was papering over the 
> >cracks.  That has to be fixed at source.  If the 10/15 minutes a day was 
> >consistent from when you started free-running, that is the only thing I 
> >can think of.  If it ramped up, another problem might be your misuse of 
> >local clock drivers.
> >> 
> >> Although it is much improved, we are still steadily loosing time. Three
> >> days after I zeroed the drift file [Thu Nov 13 15:04:00 EST 2008] we
> >> were 32 seconds behind. Today, 10 days later [Thu Nov 20 09:05:00 2008]
> >> we are 1 min 54 secs behind. This works out to roughly 12 secs per day -
> >> not bad I guess but still requires regular monitoring.
> >> 
> 
> >138 ppm is still way too high; temperature only tends to produce 
> >variations in the single figures.  Whilst you will get some benefit by 
> >setting the drift file to 138, with the opposite sign from before, the 
> >instability you report indicates that you a more serious problem to fix.
> 
> >Before all the recent clock hacks in Linux, when using just the CTC 
> >interrupts, 30 seconds a year was a reasonable target for an air 
> >conditioned computer room and a reasonably stable processing load.
> 
> That was corrected by ntp. 138PPM is not that far off the norm. Especially
> since we have no idea what his adjtimex corrections are. 
> 
> Run
> adjtimex -p
> The two key items are frequency and tick.
> 
> Note that if you use the tsc clock in Linux, that drift rate will fluctuate
> with each reboot.
> 
>  
> 
> >> server 127.127.1.0
> >> fudge   127.127.1.0 stratum 5
> 
> >If you have a time island, there should be exactly one master server 
> >with a relatively low stratum local clock, although stratum 5 is 
> >dangerously low.  Your target should be that you end up with some 
> >clients at stratum 14 or 15.
> 
> ??? Why would they be that high? The clients are surely all getting their
> time from that one master, and their stratum should be one higher. Also who
> cares what stratum he declares his master to be. If he reallynever goes to
> the net, he could make it stratum 1 for all ntp cares. 
> 
> 
> 
> >Any pure clients should not have a local clock.  That is universally 
> >true, not just for time islands.  For the remaining machines, you should 
> >  either specify a clear hieararchy, with steps of two in the local 
> >clock stratum between each one, or, I think orphan mode will work, 
> >providing the master server, with the local clock, never goes down for 
> >more than a few hours at a time.  (There is circumstancial evidence, in 
> >a recent thread, that root dispersion will diverge on orphan mode 
> >servers until they get rejected for excessive root distance.)
> 
> 
> >> 
> >> 
> >> [root at axl /]# cat /etc/adjtime
> >> ------------------------------------
> >> 44.508790 1226358437 0.000000
> >> 1226358437
> >> LOCAL
> 
> >You should not use this and ntpd at the same time (actually, if you are 
> >careful, you may be able to use it for correcting the time across a 
> >period in which the machine is powered down, but doing so requires 
> >special considerations
> 
> _______________________________________________
> questions mailing list
> questions at lists.ntp.org
> https://lists.ntp.org/mailman/listinfo/questions




More information about the questions mailing list