[ntp:questions] 500ppm - is it too small?

Unruh unruh-spam at physics.ubc.ca
Mon Aug 17 22:25:26 UTC 2009


"Richard B. Gilbert" <rgilbert88 at comcast.net> writes:

>Unruh wrote:
>> Harlan Stenn <stenn at ntp.org> writes:
>> 
>>>>>> In article <o48hm.41192$PH1.17076 at edtnps82>, Unruh <unruh-spam at physics.ubc.ca> writes:
>> 
>>> Unruh> So I have 9 clocks.  The rates are -190, 19 , -106, -67,-200 -219,
>>> Unruh> -115 -140 221
>> 
>>> Unruh> On reboot, the latter changed from 221 to 215 (Which took ntp about 6
>>> Unruh> hours to recover from)
>> 
>>> Unruh> The clock scaling in linux seems to suffered a real problem in the
>>> Unruh> past year or two, so that the rate from one reboot to the next can
>>> Unruh> change by 50PPM, which then takes ntp a long time to recover from.
>> 
>>> Unruh> two years ago those same clocks, running earlier kernels, had rates
>>> Unruh> of 5 -17 45 27 23 100 101 -10 8 -39 39 25.
>> 
>>> Unruh> It will not take much more degredation for the clocks to surpass the
>>> Unruh> 500PPM limit. And this is not due to any change in the hardware. It
>>> Unruh> seems to be kernel software and the scaling calibration being
>>> Unruh> performed at bootup.
>> 
>>> So it looks like the problem is in the way your kernel is deciding the value
>>> of 'tick' at boot time.
>> 
>> Agreed. 
>> 
>>> Why is it better to 'fix' this problem in ntpd instead of either fixing the
>>> kernel boot calibration or finding a way to override the kernel calibration
>>> routine's "wrong choice"?
>> 
>> Clearly it would be better if the kernel were fixed. But that is
>> something neither I nor 99.9% of the users of Linux are unable to do.
>> ntp's purpose is precisely to "fix " problems in the rate of clocks.
>> This is a problem in the rate of clocks. 
>> 
>> One could argue that all computers should have temperature controlled
>> clocks trimmed to 1PPM or better. But they do not. One could argue that
>> the linux kernel writers should not have broken the calibration in the
>> newer kernels, but they did. ntp is a way of helping users to fix those
>> harware/software problems. 
>> 
>>> We really would prefer to keep bloat out of ntpd, and this problem, to me,
>>> should really be fixed closer to its source.
>> 
>> Having a 1000 or 5000PPM limit would not "bloat" ntp. You may be right
>> that it is a major task to fix the limit, and that IS  an argument for
>> not doing it. 
>> 

>I have yet to see an argument sufficient to justify increasing the 500 
>PPM limit.  The benefit is small, the risks are high. . . .

That may be true, or it may be that it is a trivial change. My key worry
would be that on Linux, the kernel itself in the adjtimex call imposes a
512PPM limit. (It has the tickadj option to the adjtimex call which gives far greater
latitude-- 100000PPM, but also more complex coding as you have to adjust
both the tickadj and the rate in that call.)  Futhermore, the non-kernel
route, in which the rate is adjusted via the one second timer interrupt
is more difficult. 

IF there is also some interaction between the algorithm and max rate,
then that would introduce further worries. There was nothing like that
that I saw on my perusal of the code, but it is very easy to miss
things. 



>Further, finding and changing the value of the limit in the code, while 
>tiresome, is hardly rocket science.  Making the code work properly after 
>the change could very well be rocket science.

No rocket science, but fiddly and especially if you are changing someone
else's code, prone to overlooking stuff. Thus it would be best if Mills
did it, but the chances are probably not good. 

This is one of the disavantages to the ntp design. Two things are
lumped into one-- the rate correction and the offset correction. It
would be far better if those were separated, so that the rate correction
was something different from the offset correstion. But the simple
markovian design of ntp mixes them together. Thus, if the clock has a
450PPM drift rate, it leaves only +50PPM available to correct offset
errors ( and -950PPM for positive offset errors). The algorithm knows
nothing about rate except as reflected in offset.  




More information about the questions mailing list