[ntp:questions] Isolated Network Drift Problem

Unruh unruh-spam at physics.ubc.ca
Fri Nov 21 16:06:14 UTC 2008


David Woolley <david at ex.djwhome.demon.co.uk.invalid> writes:

>Cal Webster wrote:
>> Our NTP servers are slowly loosing time. All are in nearly perfect sync
>> but collectively drift backwards over time. Is there a way to apply a
>> bias to the drift calculations?

>ntp.drift on the one machine with the local clock configured.
>> 
>> We had to disconnect from the Internet several months ago. Since then we
>> have had serious drift problems. Shortly after the disconnect I
>> discovered that we were predictably loosing 10 minutes every 15 days. I
>> tried several things but not until I zeroed out the
>> "driftfile" (/var/lib/ntp/drift) 10 days ago [Mon Nov 10 18:10:00 2008]
>> did this large drift abate.

>Drift > 463ppm (500ppm is ntpd's limit of correctable drift, when no 
>phase noise is present).  Something is seriously broken.  I suspect that 
>you have a lost timer interrupts problem and ntpd was papering over the 
>cracks.  That has to be fixed at source.  If the 10/15 minutes a day was 
>consistent from when you started free-running, that is the only thing I 
>can think of.  If it ramped up, another problem might be your misuse of 
>local clock drivers.
>> 
>> Although it is much improved, we are still steadily loosing time. Three
>> days after I zeroed the drift file [Thu Nov 13 15:04:00 EST 2008] we
>> were 32 seconds behind. Today, 10 days later [Thu Nov 20 09:05:00 2008]
>> we are 1 min 54 secs behind. This works out to roughly 12 secs per day -
>> not bad I guess but still requires regular monitoring.
>> 

>138 ppm is still way too high; temperature only tends to produce 
>variations in the single figures.  Whilst you will get some benefit by 
>setting the drift file to 138, with the opposite sign from before, the 
>instability you report indicates that you a more serious problem to fix.

>Before all the recent clock hacks in Linux, when using just the CTC 
>interrupts, 30 seconds a year was a reasonable target for an air 
>conditioned computer room and a reasonably stable processing load.

That was corrected by ntp. 138PPM is not that far off the norm. Especially
since we have no idea what his adjtimex corrections are. 

Run
adjtimex -p
The two key items are frequency and tick.

Note that if you use the tsc clock in Linux, that drift rate will fluctuate
with each reboot.

 

>> server 127.127.1.0
>> fudge   127.127.1.0 stratum 5

>If you have a time island, there should be exactly one master server 
>with a relatively low stratum local clock, although stratum 5 is 
>dangerously low.  Your target should be that you end up with some 
>clients at stratum 14 or 15.

??? Why would they be that high? The clients are surely all getting their
time from that one master, and their stratum should be one higher. Also who
cares what stratum he declares his master to be. If he reallynever goes to
the net, he could make it stratum 1 for all ntp cares. 



>Any pure clients should not have a local clock.  That is universally 
>true, not just for time islands.  For the remaining machines, you should 
>  either specify a clear hieararchy, with steps of two in the local 
>clock stratum between each one, or, I think orphan mode will work, 
>providing the master server, with the local clock, never goes down for 
>more than a few hours at a time.  (There is circumstancial evidence, in 
>a recent thread, that root dispersion will diverge on orphan mode 
>servers until they get rejected for excessive root distance.)


>> 
>> 
>> [root at axl /]# cat /etc/adjtime
>> ------------------------------------
>> 44.508790 1226358437 0.000000
>> 1226358437
>> LOCAL

>You should not use this and ntpd at the same time (actually, if you are 
>careful, you may be able to use it for correcting the time across a 
>period in which the machine is powered down, but doing so requires 
>special considerations




More information about the questions mailing list