[ntp:questions] ntpd wedged again

Dave Hart davehart_gmail_exchange_tee at davehart.net
Mon Feb 13 12:06:20 UTC 2012

On Sat, Feb 11, 2012 at 22:16, Chuck Swiger <cswiger at mac.com> wrote:
> On Feb 11, 2012, at 11:58 AM, Dave Hart wrote:
>> On Sat, Feb 11, 2012 at 17:17, Chuck Swiger <cswiger at mac.com> wrote:
>>>> Have you tried to time the minimum clock reading time with RDTSC
>>>> or GetPerformance* counter calls?
>>>> I wrote a tiny test program on my Win7-64 laptop, it got:
>>>> Reading the system clock 10000000 times, minimum reading time =
>>>> 24 clock cycles,
>>>> minimum OS step = 0 ticks, maximum OS step = 10000 ticks
>>>> The clock frequency is 2.7 GHz or so, the FileTime ticks should be
>>>> 100 ns each, so my OS clock is making 1 ms steps, while the clock
>>>> reading time is down to less than 10 ns!
>>> Well, the code above is not reading a clock; you're reading the
>>> stored value of what time it was when the kernel's scheduler last
>>> updated that value.  When the OS scheduler ticks, it reads a clock,
>>> then trims down the precision to the scheduler quantum (ie, 1ms
>>> for HZ=1000), and stores in a "commpage", which is mapped RO
>>> into userland processes.
>> Terje's code is reading the only clock available on Windows.
> Terje's GetSystemTimeAsFTime() was exactly what I described
> above: a userland function which looks up a pre-stored value
> which gets updated periodically by the kernel, but is not actually
> calling a real clock which will return the time as seen when the
> clock is read *right* *then*
>> It may not be what you think of as reading a clock based on your
>> understanding of other operating systems, but Windows isn't
>> necessarily the same as other operating systems.
> A clock isn't a stored value on a memory page, even if that value
> gets periodically updated within the system scheduler on per HZ
> interrupt, via the Windows multimedia timer, or whatever.
> A clock is an oscillator and a counter.  (Go read VMWare's
> "Timekeeping-In-VirtualMachines.pdf" or PHK's
> "timecounter.pdf" for considerably more detailed description
> and examples if this is unclear.)

By your definition, NTP was developed and used for quite a few years
on operating systems which lacked a clock.  I have to say, as
impressed as I have always been with NTP and Dr. Mills, I'm even more
impressed to know he spent a decade doing the impossible.

PHK's timecounter.pdf [1] (circa 2002-2004 [1][2]) says "We can
therefore with good fidelity define 'a clock' to be the combination of
an oscillator and a counting mechanism."

Nearly every computer system which provided time-of-day to
applications _ever_ meets that definition.  The classic tick-based
software clock I described meets the definition.  The oscillator was
in some cases simply AC 50/60 Hz supply, much more commonly in the
home/personal computer era a quartz crystal paired with a hardware
counter circuit configured as a divider feeding an interrupt line
triggered 10 to 100 times per second.  The counting mechanism was the
tick interrupt handler, which incremented the software clock by the
appropriate amount, 1/10th to 1/100th of a second (plus or minus a
smidge from adjtime use, where present).

As for the VMware paper, it looks quite informative and detailed, but
I seriously doubt there is anything in there that says anything
remotely like "a clock is never implemented in software, and is always
a (possibly virtualized) hardware counter interrogated at the time of
reading."  Perhaps there's something particular you'd like to draw
attention to in that voluminous paper to bolster such a claim?

>> 15 years ago, most POSIX-style OSes used a simple tick-based system clock like Windows
>> that was very fast to read, though typically not as fast as Windows'
>> because the current time wasn't mapped into unprivileged memory of
>> each process, so the time to read the clock was dominated by the
>> system call overhead of transitioning to and from kernel mode/code,
>> probably a couple of orders of magnitude more expensive than actually
>> reading the stored current clock value in the kernel.
> [ ...followed by a long disagreement based on your assessment of
>     my experience with "POSIX-style OSes"... ]
> The use of a "commpage" (that's a Mac term, the Linux equivalent
> appears to be "vsyscall page")

Or simply "shared memory", one or more pages mapped into more than one
logical address space simultaneously.

> to store a low-resolution approximation to "now" was used in pre-OSX
> MacOS and in Linux, but it isn't being used under FreeBSD.
> And "Windows Services for Unix" claims to be POSIX-compliant or at
> least "POSIXy" for NT/Win2k/XP/etc, so the distinction you're drawing
> just doesn't appear to make sense.

Windows SFU (ahem) is not Windows.  NT is a microkernel design (in the
Mach sense, not the NTP sense) with multiple OS personalities built on
top of it.  My discussion relates only to the Windows subsystem of NT,
which sometimes has sported OS/2 1.x as well as POSIX/SFU subsystems
which operate side-by-side with the Windows subsystem, not on top of
it.  The only thing special about the Windows subsystem compared to
others is it owns the console/GUI and other subsystem I/O is funneled
through it.

The only distinction I'm trying to draw is simply that you are wrong
to claim a software clock that does not involve interrogating hardware
is not a clock.

> Look, I admire the notion of quibbling over details, but not when it
> is used to obscure the central point rather than help resolve it.

I'm not attempting to obscure any point, central or not.  I'm saying
you're wrong about GetSystemTimeAsFileTime not reading a clock.  That
is the only way a Windows API program can read the Windows clock, and
Terje was perfectly correct to use it.

>>> Also please note that you can't just call rdtsc by itself and get
>>> good results.  At the very least, you want to put a serializing
>>> instruction like cpuid first, and secondly, you really want to call
>>> rdtsc three times, and verify that t1 <= t2 <= t3, otherwise you
>>> have a chance of getting bogus time results due to your thread
>>> switching to another processor core between calls, CPU power-
>>> state changes, chipset bugs, interference from SMC interrupts,
>>> and PHK knows what else.  :-)
>> Not on modern AMDs, or any Intel, as far as my admittedly sub-PHK
>> understanding goes.  AMD really screwed the pooch by allowing the TSC
>> to vary between processors and vary with power state, causing all
>> sorts of headaches for all sorts of software.  Even on buggy systems,
>> reading TSC once is enough if you've locked the thread to a single
>> logical processor.
> See http://en.wikipedia.org/wiki/Time_Stamp_Counter and references.

Thanks for the pointer.  I learned some Intel processors I didn't have
experience with also suffer power-state-related TSC issues.  I also
note it's a Wikipedia article, which means the reliability as an
authority is questionable at best.  Some articles have thriving
collaboration and put other encyclopedic articles to shame.  Others
suffer from inaccuracies and self-contradiction due to
less-than-optimum level of interest in editing encylopedias among
those expert in the topic.  This article is a mixed bag in that
regard.  Much of the summary at the top is solidly on-target, but some
of it is not.

For example, the paragraph mentioning clock_gettime(CLOCK_MONOTONIC)
and QueryPerformanceCounter is good stuff, pointing out they provide
similar capabilities without the AMD and to a much lesser extent Intel
fast-and-loose fallout.  On the other hand, the second paragraph is
outdated and overgeneralizes, contradicted by the information later in
the same article pointing out newer processors have TSCs that don't
suffer divergence across logical processors or power-saving-induced
rate changes.  For those targeting only recent systems, or only
non-power-saving Intel systems, TSC is both stable and often much
faster than either QPC or CLOCK_MONOTONIC thanks to avoiding system
call overhead.

If your goal reading the TSC is to timestamp some event that just
occurred and calculate a seconds-enumerated timestamp from it, using a
serializing instruction first is counterproductive.  If your goal is
to read TSC before and after some sequence of instructions to later
subtract the two TSC values to measure the duration of the sequence,
using a serializing instruction is likely wise.  That is, "good
results" may or may not mandate serializing before reading TSC,
depending on the context.

[1] http://phk.freebsd.dk/pubs/timecounter.pdf
[2] http://phk.freebsd.dk/pubs/
[3] http://www.vmware.com/files/pdf/Timekeeping-In-VirtualMachines.pdf

Clock me over the head,
Dave Hart

More information about the questions mailing list