[ntp:questions] how do I lock in average frequency correction
Dennis Ferguson
dennis.c.ferguson at gmail.com
Thu Feb 16 19:47:04 UTC 2012
On 15 Feb, 2012, at 22:53 , Terje Mathisen wrote:
> Dennis Ferguson wrote:
>> I have had very good results with the following procedure. If `tc'
>> is the 64 bit output of a counter which is continuously incremented
>> at the rate of some oscillator then the time can be computed as follows:
>>
>> t = r * (tc<< s) + c
>
> This is a reasonable way to get a nice clock...
>>
>> The value of `s' is selected so that a second's worth of advance of
>> tc causes a change to bit 32 of (tc<< s); that is, `s' is a constant
>> pre-computed from the nominal frequency of tc's counter and an assumption
>> about how far from the actual frequency of the oscillator that might
>
> The main/only problem with this assumption is that it breaks as soon as your tick counter (like the one you get from RDTSC) runs at more than 4.2 GHz! (Maybe you intended to say that the shift was such that the top bit of a 64-bit value would be set?)
No, the requirement is just that s be selected so that in one second (tc << s) advances
by more than 0x100000000ull (which ensures that bit 32 in (tc << s) will change at least
once every second). For frequencies higher than 2^32 GHz `s' is 0, for frequencies
lower than that but higher than 2^31 GHz it is 1, for frequencies lower than that
but higher than 2^30 it is 2, and so on. At very high frequencies (above 8 GHz) `r'
begins to lose resolution, one bit for each doubling, but 64 bits is enough that
this is of no practical importance.
> It also breaks if/when the tick counter rolls over, doesn't it?
>
> With a full 64-bit counter that happens slightly less often than the NTPD clock rolls over, as long as the increment rate is less than 2^32 ticks/second, but if you are limited to (say) 40 bits, which is the resolution of most of the non-TSC counters in an x86 cpu, you do need a dependable way to handle rollovers.
My implementation distinguishes between a "time count", which is
always 64 bits long, and a "tick count", which carries the number of
bits (<= 64) which are convenient to sample from the hardware with
the instruction set.. "Tick counts" are converted to 64 bit "time
counts" by maintaining the high order bits to extend the "tick count"
to 64 bits in software (this does require periodic maintenance, but
doesn't incorporate interrupt scheduling jitter into the times). Time
counts are used for the computation above. Time count (i.e. 64 bit)
roll-overs, both of tc and of (tc << s) are handled by the implementation.
The implementation assumes in another spot that the system will be rebooted
no less frequently than once every 60 years, however, so if tick counters
increment at no more than 8 GHz and reset to zero across reboots I expect
the code to handle rollovers to never run. I'm not proud of 60 year thing,
but I wanted to stick to a 64 bit format for internal time computations
until there is compiler support for doing 128 bit integer math; the code
gets too ugly if even just basic shifts and adds require function calls.
> Finally, all of this does of course require either a single, very stable) frequency for said counter, or some driver support that will know about any changes and modify the r and c values at the same time as the frequency changes.
The rate at which the counter is incremented needs to have a fixed relationship to
the output of the oscillator from which its timing is derived; there are theoretical
arguments I like to make (related to the ntpd PLL, or any feedback controller, being
an implementation choice but not a requirement to guarantee the basic stability of
clock adjustments) which depend on this somewhat. On x86 machines I prefer the TSC if
it works like this (it does on recent processors, and very old ones, but there is a
vintage of processors where it doesn't), but pick one of the other counters on machines
where it doesn't.
>> be. The multiply (r * (tc<< s)), where `r' is a 64 bit value, returns
>> the high order 64 bits of the 128 bit product. In my case, `t' (and
>> hence 'c') was picked to be the time of day computed in the NTP fixed
>> point format, i.e. a 32 bit integer seconds part and a 32 bit fractional
>> part, so given a tc from the counter the above converts it directly to
>> a time of day. The computation itself takes nanoseconds, or less, on
>> a modern processor.
>
> It does take nanoseconds even on the very fastest (IBM Power7/8?) available cpus: The 64x64->128 multiplication requires about 7 clock cycles, while the shift and add will each require at least one cycle.
I think it was a Power CPU I computed the 1 ns for, though I could have been
mistaken. Note that it takes two Power instructions to do a 64x64->128 bit
multiplication, but the 64x64-><high order 64 bits> multiply it needs only takes
one of those instructions. I vaguely remember this taking only 2 or 3 cycles,
but that could be wrong.
The x86-64 instruction set does have the distinction that all the required math
can be done with inline instructions (it has a 128/64=64 bit divide instruction,
an operation which is sometimes needed for computing adjustments; it must be an
all-microcode instruction, though, since it is only about 30% faster than calling
the 50 line C function I use to do the same math with lower precision operations).
> Add the time to read the current tick counter (which cannot be less than a single cycle, but much more likely to stay in the 10-30 range since this isn't a resource you would locate in the critical path of the cpu pipeline) and you get anywhere from 2 to 10 ns:
>
> Still pretty fast. :-)
>
> I really don't like the fact that you seem to disregard the need for extended time range handling, i.e. even if you produce NTP timestamps, they should have a full 64-bit seconds counter so that the full counter will never roll over.
Note that the epoch of the 32-bit seconds UTC timestamp is not necessarily 1-1-1970,
it can be more recent and can be advanced across a reboot. It is adjusted to 1-1-1970
when producing POSIX time stamps, which on most systems have a longer seconds value.
The "reboot every 60 years" thing is related to this handling. This is less than
perfect, but I decided this handling was good enough for practical purposes and it
wasn't worth fixing right until compilers could handle the longer integer math.
>> Clearly the adjustment interface operates to produce values of `r' and
>> `c' that minimize the error in the result. Phase adjustments are implemented
>> by adding an offset to 'c', and hence have a precision of 233 picoseconds.
>> Rate (i.e. frequency) adjustments are implemented by modifying the value
>> of 'r', with a corresponding change to 'c' to ensure phase continuity at
>> the time of the change, and have a precision of better than 10^-19. Slews,
>> a la adjtime(), can be implemented by making a rate change for a fixed
>> interval, at the end of which `r' is returned to its previous value with
>> 'c' precisely offset. The only time `r' and `c' change is when the time
>> synchronization software makes a system call to adjust the clock; there
>> is no need to do anything in an interrupt service routine (the latter
>
> Except for any interrupt which causes frequency throttling.
I don't use counters whose rate is effected by this. The implementation
supports administrative selection of the counter to use as the system clock
time source on x86 hardware to avoid having to. If the TSC on the
processor you are running on works right then it would be foolish not to use
it, but if it doesn't then you pick another one which does.
>> assumes that the system call interface is not ntp_adjtime() or adjtimex();
>> while that code can be supported as well there are good arguments for
>> why you can expect better results, even from ntpd, if you don't include
>> that code in the kernel, though that's a different topic). If your TSC
>> counter is reliable then making the sets of (s, r, c) constants available
>> to user space on a shared page allows user space processes to obtain
>> accurate time without system calls. There are no "inconvenient" clock
>> rates and the clock precision and jitter is the same as the precision
>> and jitter of the underlying hardware counter.
>>
>> If you want a "tickless" kernel the best way to do that might be to
>> start with a "tickless" system clock. If you have a continuously
>> incrementing counter there is nothing that an interrupt service
>> routine needs to do.
>
> I do like your approach. :-)
Thanks. This was arrived at when I acquired hardware which allowed the
system clock to be measured against an external reference with an
accuracy in the very low 10's of nanoseconds and found the clocks in
the kernels I had were jittering by many 100's of nanoseconds all by
themselves.
There is an adjustment system call interface which goes along with
this which addresses the issues which might otherwise be argued to
theoretically require the use of a feedback control discipline (like
the ntpd PLL) to ensure clock adjustment stability. As a side effect
it provides enough information for you to compute the Allan deviation
from data collected while the clock is simultaneously being adjusted.
I need to finish documenting this some day.
Dennis Ferguson
More information about the questions
mailing list