[ntp:questions] losing time fast
Ron Frazier (NTP)
timekeepingntplist at techstarship.com
Tue Jul 10 13:09:29 UTC 2012
unruh <unruh at invalid.ca> wrote:
>On 2012-07-09, Ron Frazier (NTP) <timekeepingntplist at techstarship.com>
>wrote:
>>
>>
>>
>>
>> unruh <unruh at invalid.ca> wrote:
>>
>>>On 2012-07-09, Dave Hart <hart at ntp.org> wrote:
>>>> On Mon, Jul 9, 2012 at 18:14 UTC, Fritz Wuehler wrote:
>>>>> I noticed the clock on my main desktop was off by 28 minutes today
>>>and it
>>>>> increased to 45 minutes. I resync'd with ntpdate manually and it
>has
>>>drifted
>>>>> behind again about 7 minutes in the last few hours.
>>>>>
>>>>> I am using ntp version 4.2.4p7 which was installed with Slackware
>on
>>>Linux
>>>>> kernel 2.6.29.6. Until today the clock on this system has always
>>>matched the
>>>>> clocks of the other machines on my network. The system has been
>>>running for
>>>>> several years essentially unchanged.
>>>>>
>>>>> The only thing that changed (that I know of) is I added a new
>>>machine to my
>>>>> network recently. Its clock matches all the other clocks. I don't
>>>see any
>>>>> unusual messages from ntpd in my log or messages files on the
>system
>>>with
>>>>> the problem. One system has problems, all others appear to be fine
>>>and have
>>>>> synchronized clocks.
>>>>>
>>>>> I realize this isn't much information but I don't know what to
>look
>>>for. Can
>>>>> anyone tell me how to troubleshoot this? Thank you.
>>>>
>>>> Stop ntpd using whatever means is normal for your OS. Find the
>path
>>>> to the drift file (which preserves an estimate of your system's
>clock
>>>> rate error):
>>>>
>>>> $ fgrep drift /etc/ntp.conf
>>>> driftfile /var/lib/ntp/drift
>>>>
>>>> Then remove it and restart ntpd. It will synchronize once then
>spend
>>>> 1024 seconds (17m) measuring the clock rate error. With any luck
>it
>>>> will be an accurate-enough estimate that ntpd will then converge on
>>>> its own.
>>>>
>>>
>>>If he is losing minutes per hour, this is hopeless. That is 16000PPM.
>>>npt cannot correct that. Ssomething is very very wrong.
>>>
>>>> Note to regulars: I'm going to have sporadic internet access for
>the
>>>> rest of July, so I won't be as responsive. Your help is welcome.
>>>>
>>>
>>
>> I'll admit to not following this thread too closely. I've been
>looking at some of the posts. I will also admit to not being an NTP
>expert. However, I remember once when I was getting a clock error of a
>few minutes per hour. I think (but wouldn't bet my life on it) that it
>may have happened after I copied the NTP directory to another system
>and the drift file was wrong. Therefore, the NTP program was running
>the system software clock at the wrong rate and the clock was rapidly
>getting too far out to correct.
>
>If he is really loosing minutes par day, then there is no drift file
>that could do that.
>But certainly your suggestions are worth trying.
>
Hi Unruh,
I'm glad you like the suggestions. You may be right about the drift file. I don't know for sure what caused the runaway clock I experience in the past. However, it's my understanding that NTP sets the computer clock frequency to what it finds in the drift file at startup. If the drift file is way off, for whatever reason, then the initial clock frequency will be way off. As far as I know, it won't write another drift file for an hour. So, during that hour, particularly if the polling interval is long, the clock can be running so fast or slow that it will drift so far out that NTP might just give up on it.
Here's an experiment someone could try if they were inclined to. Warning, doing this will probably really irritate the experimenter, but also provide a learning experience. Find a PC which is running NTP, which no other PC's or critical users or applications depend on for time, and which you don't care if the clock is horribly wrong for a while. Make sure it's using a drift file, and preferably producing clockstats and loopstats error files for tracking. A windows system may be more illustrative than linux, but I'm not sure. Warning, doing the following will probably make the PC's clock run massively fast, necessitating a massive step change backwards later to fix it.
Stop NTPD.
Go find the drift file and make a backup copy of it.
Edit the drift file with elevated privileges if needed.
Say it says 6.234.
Add a 10, 20, or 30 in front of it, so it's 106.234 or 206.234 or 306.234 then save it.
If possible, add a local clock source in ntp.conf on the lan or using gps that you can poll very often. Set that source as noselect and polling every 8 or 16 seconds.
Set minpoll for other sources to 12 (1 hour).
Restart NTPD.
If my thinking, and my memory of my prior experience, is correct, your clock will now be running massively fast. (I assume you could use negative numbers if you want the clock to run slower.) Monitor NTPQ and observe the offset to your local reference clock. You should see the system clock rapidly diverging from the reference clock. You may see NTP give up entirely trying to fix it and never correct it. You can then have loads of "fun" trying to fix the problem. If you knew what the drift file normally is, you could run ntpdate a few times to get the time close and restore the drift file manually to the correct value. However, it's more interesting to try to fix it by letting NTP restore everything to it's normal setting. When I had my runaway clock, something had originally kicked the clock about 30 seconds off. I don't know if that was related to the drift file. But, I also think I may have copied a drift file from another system as part of an NTP setup procedure. So, the system could never correct itself. The correct value for the drift file was very different for the two systems. I actually had to stop NTP a number of times, tweak the drift file by hand, and restart NTP, and monitor NTPQ, to see if the PC clock was still drifting away from a reference clock. Then, I'd tweak the drift file again and restart again. Each time, I got closer to the proper drift file value and hence to the proper clock frequency. After doing this a few times, I got the drift file close enough to the correct value for that system so NTP could finish things on its own.
This experience was not fun, was painful, but I did learn some things. It was very frustrating to have the system's clock running so horribly wrong that I couldn't calibrate it. I should point out that we're not talking about the physical real time clock here, the chip. We're talking about the software clock, running with the frequency correction factors, good or bad, that have been applied to it, as I understand it.
I hope this info may help others experiencing wild clocks.
>>
>> I would recommend the following:
>>
>> a) Stop ntpd (as mentioned above)
>> b) check the config file to make sure it's configured to use a drift
>file, and find the location of it
>> c1) set both minpoll and maxpoll to 6 (64 seconds) if polling the
>internet
>> c2) set both minpoll and maxpoll to 4 (16 seconds) or 3 (8 seconds)
>if polling the LAN or a GPS
>> d) delete the drift file (as mentioned above)
>> e) find the startup script for ntpd, which might be in the
>/etc/init.d or similar folder, is probably named NTP, and see what
>parameters it uses, and make a backup of it
>> f) edit the startup script with elevated privileges (ie sudo, if
>applicable)
>> g) insert the parameter (which I cannot remember the letter of) which
>allows ntpd to step the time at first
>> h) save the startup script
>> i) sync to a national time standard server for his country 3 times in
>quick succession with ntpdate set to make a step change (In the USA, I
>would use NIST.)
>
>Why a national time server? No need for that kind of accuracy.
>
I was just thinking that we want to get the system clock as close as possible to "true" time. That way, NTPD has the maximum chance of being able to regain control of the system. Whatever server is used, make sure it's not an outlier that will be rejected later by the selection algorithm versus the other sources. That would cause your initial time setting to be further out and possibly cause a convergence failure when combined with the clock running at the wrong speed.
>
>> j) start ntpd back up
>> k) let if run several hours
>> l) this should set a valid drift file and reign in the clock speed
>fairly rapidly
>> m) stop ntpd
>> n) reset the startup script to the way it was unless you want to
>leave the step command in there
>> o) reset the config file for the original minpoll and maxpoll
>> p) restart ntpd
>>
>> Hopefully after a few more hours of running, the clock will be
>stable. You can even put the stop ntpd, ntpdate, ntpdate, ntpdate,
>start ntpd sequence in your own script and run that for greater speed
>and accuracy of the ntpdate sequences and minimal delay restarting
>ntpd.
>
>_______________________________________________
>questions mailing list
>questions at lists.ntp.org
>http://lists.ntp.org/listinfo/questions
--
Sent from my Android Acer A500 tablet with bluetooth keyboard and K-9 Mail.
Please excuse my potential brevity.
(To whom it may concern. My email address has changed. Replying to former
messages prior to 03/31/12 with my personal address will go to the wrong
address. Please send all personal correspondence to the new address.)
(PS - If you email me and don't get a quick response, don't be concerned.
I get about 300 emails per day from alternate energy mailing lists and such.
I don't always see new email messages very quickly. If you need a reply and
haven't heard from me in 1 - 2 weeks, send your message again.)
Ron Frazier
timekeepingdude AT techstarship.com
More information about the questions
mailing list