[ntp:questions] NTP not syncing

Bruce Evans bde at besplex.bde.org
Fri Dec 6 15:29:47 UTC 2013


In article <99gou.413035$uR1.286796 at fx19.iad>, unruh  <unruh at invalid.ca> wrote:
>On 2013-12-06, Bruce Evans <bde at besplex.bde.org> wrote:
>> ...
>> To handle the calibration varying across reboots, under FreeBSD I just
>> blow away the system calibration using a sysctl in an etc/rc file.
>> FreeBSD never had large variance in TSC calibration across reboots,
>> but I found the ones that it has annoying.  Most versions have a jitter
>> of only a couple of parts per million (ppm), but some have a fixed
>> error of about 10 ppm due to a sloppy calibration algorithm.  When
>> switching to a test or reference version with worse of just different
>> calibration, ntpd takes noticeably longer to sync, and syncing messes
>> up the driftfile for switching back.
>
>Do you know how the system calibrates the TSC on bootup? 

Standard FreeBSD uses the following rather too simple method:

	tsc1 = rdtsc();
	DELAY(1000000);
	tsc2 = rdtsc();
	tsc_freq = tsc2 - tsc1;

Here rdtsc() reads the TSC (on the current core), and DELAY(1 million)
counts down the i8254 timer by 1193182 cycles, which should take about
1 second.  The inaccuracies in this are:
- any difference between the i8254's nominal frequency of 1193182 Hz and
  its actual frequency.  Typically 20 parts per million.
- any setup overheads in DELAY() that are not accounted for.  Typically
  5 microseconds = another 5 parts per million.
- any setup overheads in DELAY() that are accounted for, but incorrectly.
  DELAY() still has some code that tries to compensate for old 486 systems,
  but this makes little difference now.
- any jitter in the setup overheads in DELAY().  Typically 2 microseconds.
  (It normally takes about 5 microseconds to read the i8254, and you can't
  control the timing of when the read starts.)
- any jitter from bus contention.  This may be large (more than 100
  microseconds), but usually isn't at boot time.
- any jitter from being interrupted near the beginning or end of DELAY().
  This may be huge (milliseconds), but usually isn't at boot time.
  Interrupts and bus contention both increase the DELAY() time by an
  amount that is not measured by the above.

I use many experimental variations of this.  The simplest one is:

	DELAY(1);
	tscval[0] = rdtsc();
	DELAY(1000);
	tscval[1] = rdtsc();
	DELAY(1001000);
	tscval[2] = rdtsc();
	tsc_freq = tscval[2] - tscval[1] - (tscval[1] - tscval[0]);

The first DELAY() warms up caches.  The second one is used to compensate
for setup overheads (assuming low jitter and that DELAY() is not too smart,
so that the overhead is fixed).  The third one works as before.  This
method works well enough in practice.  (So does the previous one, but it
gives an unnecessary error of 5-10 ppm which ntpd has to compensate for.)
For 10 times more accuracy, it is easy to wait for 10 seconds instead of 1,
but that is too long when you reboot often.  Even 1 second of busy-waiting
is too long, even at boot time (do that for lots of devices and you get
boots taking minutes).

More sophisticated versions measure the actual delay time and compensate:

	tsc_freq = (tscval[1] - tscval[0]) * actual_delay_in_usec / 10000000;

(This is the first verison with a scale factor of not necessarily 1.)

Even more sophisticated versions determine error bounds for the actual
delay time, and discard samples where these bounds would be higher than
necessary, and run for just long enough to get a specified error bound
on the final frequency.  It is amusing to use the TSC (disciplined by
ntpd) to calibrate itself (raw).  On a 2 year old system, it takes 13
seconds to calibrate to the specified error bound of 1 cycle (1 part
in 2.67 billion).  You can see ntpd micro-adjusting the clock and the
TSC varying due to temperature changes because ntpd takes much longer
than 13 seconds to notice them.

Bruce



More information about the questions mailing list