[ntp:questions] Flash 400 on all peers; can't get ntpd to be happy

Chuck Swiger cswiger at mac.com
Tue Mar 8 23:26:34 UTC 2011


On Mar 8, 2011, at 1:18 PM, Steve Kostecke wrote:
On 2011-03-08, Chuck Swiger <cswiger at mac.com> wrote:
>> Seriously, each physical machine only has one RTC and crystal
>> oscillator. It's useful to run one instance of ntpd in the Dom0 (or
>> host ESX) context where it can actually work and keep this real
>> hardware clock in sync.
> 
> NTP disciplines the system (i.e. kernel) clock, not the hardware clock
> on the mother board.

That's right, although in reasonably common for platforms to periodically write the system clock time back to the hardware clock-- variously called the RTC/TOD/TOY clock which is in the BIOS/EFI/firmware and keeps time when the system is off.

The kernel/system clock is typically based off of a timer source like ACPI or HPET, which in turn uses a crystal oscillator running at some fairly rapid rate (HPET provides >10 MHz interrupts, for example), rather than the ~32kHz frequency of a classic RTC.  It generates interrupts at kern.hz (or a multiple, perhaps, if you're doing a separate profile or stats clock for profiling or process usage) which invoke the scheduler and call hardclock or equivalent.

Anyway, there isn't a separate RTC *or* timer crystal driving ACPI/HPET/etc for each VM.

>> Running ntpd's in the other DomUs/guest VMs is almost entirely
>> pointless; it might be useful only if Dom0->DomU time is busted,
>> and even in that case, ntpd is unlikely to ever obtain good time
>> synchronization running in a DomU.
> 
> That's debatable.

Evidently.  :-)

> I have a Debian 6.0 system running as a VMWare guest. ntpd on this
> system has no problem disciplining the clock.

OK.  Does it do any better than using VMWare's "tools.syncTime = true"?

> Recent peer billboard snapshot:
> 
> steve at www:/var/log/ntpstats$ ntpq -p
> remote       refid  st t when poll reach   delay   offset jitter
> ================================================================
> +ntp.my.isp  .GPS.   1 u   34 1024  377   60.665    1.623 1.617
> -enob...     .PPS.   1 u 1041 1024  377   39.552   -8.220 2.120
> *emit...     .PPS.   1 u  184 1024  377   27.404    3.936 1.347
> +yamo...     [snip]  2 u  768 1024  377   33.565   -1.757 2.256
> -3snd...     [snip]  2 u  102 1024  377   26.294    7.261 1.179

Your jitter values are well over an order of magnitude worse than that of ntpd running on a non-virtualized machine, and your offsets are nearly an order of magnitude worse:

% ntpq -p -c rv
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
-ntp.pbx.org     192.5.41.40      2 u  119  256  377   22.076    0.946   0.027
*bonehed.lcs.mit .PPS.            1 u  183  256  377   23.741   -0.079   0.027
+hickory.cc.colu 128.59.39.48     2 u  138  256  377   22.427   -0.210   0.049
+time1.apple.com 17.107.131.11    2 u  168  256  377   55.828    0.315   0.202
[ ... ]
associd=0 status=0694 leap_none, sync_ntp, 9 events, freq_mode,
version="ntpd 4.2.4p5-a Wed Feb 16 17:12:20 EST 2011 (1)",
processor="i386", system="FreeBSD/7.4-PRERELEASE", leap=00, stratum=2,
precision=-19, rootdelay=23.741, rootdispersion=25.764, peer=5314,
refid=18.26.4.105,
reftime=d1212f3d.75251aea  Tue, Mar  8 2011 17:42:05.457, poll=8,
clock=d1213495.8f71f337  Tue, Mar  8 2011 18:04:53.560, state=4,
offset=-0.079, frequency=19.348, jitter=0.167, noise=0.032,
stability=0.001, tai=0

For all of that, your VM is doing pretty well running ntpd compared to others I'd seen.  I'd imagine the host running the VM isn't especially busy; if it was, I wouldn't be surprised if ntpd can't manage to discipline the clock without "tinker panic 0".

Seriously, even VMware documents this, for example see http://kb.vmware.com/kb/1006427:

"The configuration directive tinker panic 0 instructs NTP not to give up if it sees a large jump in time. This is important for coping with large time drifts and also resuming virtual machines from their suspended state.
 
Note: The directive tinker panic 0 must be at the top of the ntp.conf file.
 
It is also important not to use the local clock as a time source, often referred to as the Undisciplined Local Clock. NTP has a tendency to fall back to this in preference to the remote servers when there is a large amount of time drift."

>> You are better off running ntpdate (or sntp) periodically via cron in
>> the DomUs.
> 
> Perhaps in certain cases, but not across the board.

I'd be happy to review counterexamples to my generalization....

Regards,
-- 
-Chuck

PS: I'd just updated this system a two weeks ago, but it's running the system-provided /usr/sbin/ntpd.  At least this thread has reminded me to switch to the 4.2.6p2 in /usr/local.  :-)




More information about the questions mailing list