[ntp:questions] ntpd wedged again

Terje Mathisen "terje.mathisen at tmsw.no" at ntp.org
Sat Feb 11 08:11:38 UTC 2012

Dave Hart wrote:
> On Fri, Feb 10, 2012 at 03:07, A C<agcarver+ntp at acarver.net>  wrote:
>> So far so good, it's running with reasonable offsets and jitter (PPS not yet
>> enabled).
>> But this is new to me in the logs:
>> 10 Feb 01:41:37 ntpd[242]: ts_min 1328838097.185664676 ts_prev
>> 1328838097.185550676 ts 1328838097.189745353
>> 10 Feb 01:41:37 ntpd[242]: sys_fuzz 114000 nsec, this fuzz 0.000099410
>> 10 Feb 01:41:37 ntpd[242]: get_systime prev result 0xd2def451.30cccb9b is
>> 0.000780166 later than 0xd2def451.3099aa99
> This is from recently-introduced code in ntp-dev that is reporting a
> hoped-to-be-impossible situation of reading the clock using ntpd
> routine get_systime() where the fuzzed result is earlier than the
> previous get_systime()'s result.  These messages are not really
> intended to be scrutable by those not familiar with the internals of
> that routine, and my hope was they would not be seen at all.
> Briefly, ntpd "fuzzes" each timestamp's bits below sys_fuzz.  The
> display is inconveniently mixed between decimal and hexadecimal
> representations, as those were the forms at hand and I didn't want to
> try to convert them and risk bugs in such conversion confusing
> results.  Until recently, the fuzzing was below the reported
> precision, or minimum increment observed between subsequent clock
> readings.  That should be no problem on typically high-precision unix
> system clocks, but led to predictable "time went backwards" issues
> with low-precision clocks such as Windows ntpd using the native system
> clock ticking every 0.5 or 1.0 msec.
> The new code measures not only the minimum increment between
> subsequent readings, but also the minimum time to read the clock,
> which is not the same with a low-precision system clock.  If ntpd
> reads the system clock at the same value 100 times in a row, then it
> increments by 1 msec, the code infers the time to read the clock is 1
> msec /  100 or 10 usec, and sets sys_fuzz to 10 usec.  Later when
> reading the system clock and converting to NTP 32:32 fixed-point
> representation, a random value up to 10 usec will be added to the
> value from the OS, where previously up to 1000 usec / 1 msec would be
> added.
> At the same time fuzzing approach was changed, paranoid checks to be
> sure both the OS clock and the converted and fuzzed 32:32 is always
> advancing as well as the fuzzed conversion always increasing.  So
> either those paranoid checks are wrong, or the code to measure the
> fuzz at startup is wrong, leading to the logically impossible result
> of the clock appearing to move backward relative to the prior reading.
> In this specific case, the minimum time to read the clock was measured
> at ntpd startup to be 114 usec, so each raw OS clock reading is

OK, that's the problem right there: That value is obviously wrong!

The Win* OS clock is so dead simple to read that I would expect the 
minimum reading time to be well below a us!

Even though the clock is maintained by the HAL, and therefore impossible 
to fix properly, even with a regular driver, the read interface should 
be effectively a usermode function call with a single load/store operation:

I.e. GetSystemTimeAsFTime(&LARGE_INTEGER) could look like this:

int GetSystemTimeAsFTime(&LARGE_INTEGER tstamp)
     if (!LARGE_INTEGER) return 0;
     *tstamp = _kSystemClockCurrentValue; // ??? Some kernel RO value
     return 1;

The variable is named something else of course, but since Win* doesn't 
even try to report an interpolated clock value, it can in principle be 
almost as simple as that.

Have you tried to time the minimum clock reading time with RDTSC or 
GetPerformance* counter calls?

I wrote a tiny test program on my Win7-64 laptop, it got:

Reading the system clock 10000000 times, minimum reading time = 24 clock 
  minimum OS step = 0 ticks, maximum OS step = 10000 ticks

The clock frequency is 2.7 GHz or so, the FileTime ticks should be 100 
ns each, so my OS clock is making 1 ms steps, while the clock reading 
time is down to less than 10 ns!

Here's the code btw:

#include "stdafx.h"

typedef unsigned __int64 uint64_t;

uint64_t rdtsc(void)
	__asm {

typedef struct _clok {
	uint64_t os;
	uint64_t tsc;
} clok;

#define SAMPLES 10000000

clok samples[SAMPLES];

int _tmain(int argc, _TCHAR* argv[])
	for (int i = 0; i < SAMPLES; i++) {
		uint64_t t = rdtsc();
		GetSystemTimeAsFileTime((LPFILETIME) &(samples[i].os));
		samples[i].tsc = t;
	uint64_t minrt = samples[1].tsc - samples[0].tsc;
	uint64_t minft = samples[1].os - samples[0].os;
	uint64_t maxft = minft;
	for (int i = 2; i < SAMPLES; i++) {
		uint64_t rt = samples[i].tsc - samples[i-1].tsc;
		uint64_t ft = samples[i].os - samples[i-1].os;
		if (rt < minrt) minrt = rt;
		if (ft < minft) minft = ft;
		if (ft > maxft) maxft = ft;
	printf("Reading the system clock %d times, minimum reading time = %llu 
clock cycles,\nminimum OS step = %llu ticks, maximum OS step = %llu 
ticks\n", SAMPLES, minrt, minft, maxft);
	return 0;

> expected to be no less than 114 usec later than the prior.  The first
> line mentioning three ts_ variable names is displaying the raw clock
> readings from the OS, which are ordered as expected and not less than
> 114 usec apart, so far so good.  The second line shows the measured
> minimum time to read the clock which sets the fuzz threshold, and the
> randomly selected fuzz value for this reading, which is a random
> number between 0 and sys_fuzz (114 usec).  The last line shows the
> resulting fuzzed clock reading is 780 usec earlier than the prior
> fuzzed reading, which should not be possible.

I agree, that should not have been possible, it sounds like Win* ntpd 
clock interpolation code has to be involved as well, i.e. that a 1 ms 
clock slip got into the timestamp as well!
> This code is green and could well be wrong.  I encourage others seeing
> such messages in syslog or a configured ntpd log file to share a few
> examples here or in a bug report.
> Those who followed the introduction of a high-precision clock into
> FreeBSD many years ago might recall similar OS-level warnings
> reporting that time appeared to run backwards, seen for example when
> using processor cycle counters on SMP systems where the counters are
> not synchronized across processors.  I hope these messages will be
> tamed by a bug fix and the new paranoid checks can continue to be
> used, with more comprehensible warning text when the checks fail.
Good plan!

- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

More information about the questions mailing list