[ntp:questions] Re: ntpd, boot time, and hot plugging
David L. Mills
mills at udel.edu
Thu Feb 3 19:50:06 UTC 2005
It is true that ntpd is specifically engineered for Internet badlands
where popcorn spikes, evil masqueraders, misbehaving clocks and other
vermin might poison your DNS cache. A client using the pool servers
scheme really needs this ammunition.
The current version tries hard to strike a compromise between verifiable
assertions and acquisition speed. However, with the twinkle I described
earlier you can engineer any compromise your wish, including set the
clock on the first response received and in principle when the first two
responses from at least two servers and so on.
In very many simulation runs here I found it hard to get into a true
lockup condtion where the daemon did not recover from large initial time
or frequency offsets, even with the -x option. However, there are some
things the simulator can't pick up, like a large frequency offset
pre-programmed in the kernel. The recommended repair procedure should
that somehow happen is to run "ntptime -f 0" to kill the kernel offset
and then remove the ntp.drift file. Upon restart ntpd measures the
intrinsic frequency offset over about fifteen minutes, sets the clock
and resumes normal operation. I would think this a good way to determine
if a motherboard is or is not acceptable. I've seen lots of motherboards
and found most of them within 100 PPM and all of them within 500 PPM.
Even if over 500 PPM the clock is still disciplined but the offset
cannot be forced to zero.
So, best advice is to run ntpd with -g and "tos maxdist 16" in the
configuration file. I assume the version with this command will soon
appear as a snapshot. Note that the only thing this does is admit
servers to the selection algorithm no matter what the synchronization
distance is. Ordinarily the distance starts from 16 and reduces by half
for each response received. Other than this criterion, the algorithms
operate without change. You can do your own security analysis.
Tom Smith wrote:
> David L. Mills wrote:
>> This is the single most persistent issue in the engineering design of
>> NTP. There must be tradeoffs between security, robustenss, accuracy
>> and initial delay. In the current design compromise, a server is
>> acceptable only after three/four rounds of messages and the ensemble
>> time is acceptable with at least one of possibly several acceptable
>> servers. With IBURST mode, takes takes 6-8 seconds.
>> For better robustness use "tos minclock N", where the at least N
>> (default 1) servers must be acceptable to set the clock. Tonight I put
>> in a "tos maxdist M", where M is the distance threshold below which
>> the server is acceptable. Set "tos maxdist 16" and the first sample
>> received from any server will set the clock likety-split. Of course,
>> essentially all the mitigation algorithms using multiple-sample
>> redundancy and multiple-server diversity are systematically defeated.
>> You might as well use SNTP.
> I know the subject has been workstations, but let's talk for a moment
> about this religion as it concerns servers - like the ones that run
> telephone companies, stock exchanges, and banks inside heavily
> defended firewalls. It's the same issue, it's just that the stakes
> are higher. The issue is how quickly can you get these
> systems back up at boot. 15-30 seconds is a long time to wait.
> Too long.
> We're not talking about one-shot sampling for maintaining the time,
> so comparisons to SNTP are not helpful. We're talking about speed of
> acquistion of an initial "good enough" time, keeping in mind that the
> perfect is often the enemy of the good.
> You might argue that if boot time is critical, just let the server come
> up with whatever random time it comes up with and let ntpd fix
> it up later. Give it a "-g" so it doesn't complain. A lot of folks
> have tried this in the past inadvertently (and continue to do so)
> by neglecting to put ntpdate into their boot sequence ahead of ntpd.
> I've fixed a lot of systems whose drift files were pinned
> at 500 ppm and whose systems ran perpetually fast or slow as
> a result. We've also spent a lot of money fruitlessly replacing
> motherboards on those systems. Turning a large initial offset over
> to ntpd is decidedly NOT a Good Idea.
> The reason why so many of your constituency keep bringing this
> subject up is that they know that ntpd needs a good (not perfect)
> estimate of the time before it starts and that critical systems
> can't wait for perfection to get that estimate.
> Tom Smith smith at alum.mit.edu,smith at cag.lkg.hp.com
> Hewlett-Packard Company Tel: +1 (603) 884-6329
> 110 Spit Brook Road ZKO1-3/H42 FAX: +1 (603) 884-6484
> Nashua, New Hampshire 03062-2698, USA Mobile: +1 978 397 3411
More information about the questions