[ntp:questions] Re: ntpd, boot time, and hot plugging

David L. Mills mills at udel.edu
Fri Feb 4 02:44:50 UTC 2005


Hangups as you describe is exactly what the simulator is designed to 
reveal and it has revealed them from time to time as folks learn new 
ways to misconfigure and warp the hardware and new operating system 
violations of the Principle of Least Astonishment occur. I try to keep 
ahead of those as they pop up, but do confirm that any combination of 
broken frequency file and initial clock error does damp out eventually, 
although sometimes with behavior like a pinball machine. Emphasis added: 
I can confirm all atrocities found do damp out only on the latest 
(development) version.

There are a great many divergent views on what to expect of the NTP 
model, some very contradictory and unworkable in a specific combination 
of ntpd and j-random operating system. Solaris 2.7 comes to mind where 
the CPU clock frequency was determined by the kernel in error and far 
beyond the tolerance of ntpd. I've had to simulate all of these things, 
including a system with a clock resolution of one second (sic) (and it 
works). It could be your systems suffer from one or another of such ills 
or simply that you are using an older version not yet recently tamed. It 
could be your operating system has the same ill-mannered behavior as 
current Solaris adjtime(). With large time adjustments, this turns ntpd 
into a megawatt oscillator.


Tom Smith wrote:

> David L. Mills wrote:
>> I get nervous about nonquantitative statements, since they might start 
>> urban legends. A "decent" frequency file is one created when first 
>> starting ntpd without the file and letting it determine the intrinsic 
>> frequency error. This takes about fifteen minutes. However, the 
>> frequency file itself is written only after the first hour and at 
>> hourly intervals after that. The discipline should be stable even if 
>> the frequency file is present and intentionally set as much as +-500 
>> PPM in error and that even with a large initial time offset. This has 
>> been confirmed by simulation; however, the simulations assume the 
>> adjtime() system call operates as in original Unix model; the Solaris 
>> adjtime() is a killer when large offsets are involved.
> A physicist I worked with early in my career taught me a very
> useful law. "Different things vary."
> I couldn't tell you how many ntp.drift files I've encountered
> with a vlaue of +-500.000. It's a lot. There are many ways this
> can occur, but all of them involve ntpd starting up against a large
> offset with its reference clocks and/or shutting down while it is
> working one off, the latter usually because of an NTP misconfiguration,
> but also sometimes because of thunderstorms in July. Others have
> also observed how this happens on mobile systems that get booted
> and shut down a lot.
> Once a system is in this state, it depends on the specifics of the OS
> how long or even if that system will "settle". I can assure you that
> for some systems, if not most, this is most assuredly not 15 minutes,
> might be days, or might, for all practical purposes, be never. These
> are the systems on which the ordinary non-NTP-expert system
> manager or field support team will go through several rounds of
> battery or crystal or motherboard or system replacement before
> anyone tells them to just delete the drift file and start over.
> ________________________________________________________________________
> Tom Smith                       smith at alum.mit.edu,smith at cag.lkg.hp.com
> Hewlett-Packard Company                          Tel: +1 (603) 884-6329
> 110 Spit Brook Road ZKO1-3/H42                   FAX: +1 (603) 884-6484
> Nashua, New Hampshire 03062-2698, USA           Mobile: +1 978 397 3411

More information about the questions mailing list