[ntp:questions] Google and leap seconds

unruh unruh at wormhole.physics.ubc.ca
Thu Sep 22 05:28:30 UTC 2011


On 2011-09-22, Harlan Stenn <stenn at ntp.org> wrote:
> Bill wrote:
>> On 2011-09-21, Harlan Stenn <stenn at ntp.org> wrote:
>> > Bill wrote:
>> >> On 2011-09-21, Harlan Stenn <stenn at ntp.org> wrote:
>> >> > Bill wrote:
>> >> >> Some operating systems (eg Linux) have the ability to do a much faster
>> >> >> than 500PPM rate change (100000PPM in the case of Linux) but ntp does
>> >> >> not make use of that. 
>> >> >
>> >> > ... because it would violate the assumptions and rules about the loop
>> >> > behavior (I'm painting with a wide brush here).  If you want a system
>> >> 
>> >> ? But that 500PPM was an arbitrary and random value chosed X years ago. 
>> >
>> > "Arbitrary and random" sounds like Spin to me.
>> >
>> > It was several sigmas bigger than the expected worst-case "acceptable"
>> > clock drift.
>> 
>> And it could have been 1000 PPM or 2000 AFAIK, and it would not have
>> made any difference.
>
> At some levels, sure.  But a step is an indication that something is
> *wrong* and it is a perfectly reasonable and normal design decision to
> state that the clock freq is within some given range, and once it is
> there one should *only* see a step if there is a problem.

500PPM is an idication that something is wrong. I have seen steps when
the clock rate was badly calibrated at the bootup, and the old rate was
100PPM out from the current ( older linux kernels)

>
> And if it takes 2000 seconds to add in a slewed leap-second at 500ppm,
> it would take 1000 seconds at 1000ppm, and 500 seconds at 2000ppm.
> Arbitrary numbers again, or are there specific constraints you are
> trying to hit by changing that value?

Quicker convergence. 


>
> The POSIX folks made a choice about leap seconds, and that choice was
> clearly sub-optimal, and now we're dealing with that friction.

Oh, I agree with that. 

>
>>> What are the effects of a different (presumably larger) value going to
>>> have on the behavior of the loop?
>> 
>> Why should it have any effect? steps do not appear to have any effect. 
>
> Statements like that cause me to wonder about you.
>
> I suck at filter/oscillator theory and even I know that one cannot make
> arbitrary changes to such a core value and expect the framework to
> simply continue to work.

500PPM is not a core value. And I perhaps do not suck as much.

>
>>> What will the effect be on other systems that "associate" with these
>>> boxes that have the faster rate?
>>>
>>>>> that is designed to work with a faster slew rate that's fine, but then
>>>>> you also have to consider legacy and interoperability issues.
>>>> 
>>>> sure. But what legacy issues do you have in mind? 
>>>
>>> machines that operate at the 500ppm limit that are now talking to
>>> machines that operate at this new higher rate.
>> 
>> Clearly if the machine is at the end of the chain, it should not
>> matter.
>
> It could trigger steps where they would not have happened before.  This
> could cause time islands and oscillations.

?

>
> NTP is robust.  What you are currently advocating seems to fly in the
> face of robust behavior.
>

That is position of never changing anything. 
Note that as far as I can see, chrony is also stable and robust. And it
allows rate adjustments much higher than 500PPM.


> What is your goal?

To make ntpd more robust and able to deal with bad situations. 


>
>> If it operates as a server, the 500PPM system will fall behind until
>> the rate drops and then will catch up. Or it will have an offset
>> greater than .128ms and step.
>
> But you are imposing your policy on others, apparently with no
> justification for this change, and with no regard for the effects it
> will have on their "systems".

The justification is not to have jumps in time. I think they are much
worse than higher rates.


>
>>> Applications that "knew" that if a time correction was going to
>>> happen, it would be at the 500ppm rate - this goes to data
>>> correlation and timestamp sequencing.
>> 
>> But ntpd is allowed to step, so it has to assume that an infinte rate is
>> possible. 
>
> So what?  Steps SHOULD NOT HAPPEN.  If they do, something is WRONG.

rate updates of 500PPM should not happen. But they do. 


>
> Find the problem and fix it.

Sure. Eg, get a new kernel, or fix the kernel? 

>
> And when ntpd steps, it changes its "state" to accomodate that rate change.
>
>>> That's just 2 legacy issues off the top of my head.
>>>
>>>> > It also goes to timestamp correlation issues.
>>>> 
>>>> It would seem to me that a step (infinite PPM) would be far far worse.
>>>> ntpd does NOT limit the slew rate to 500PPM. It limits it to infinite
>>>> PPM. 
>>>
>>> And when it does it logs the event, so it is still traceable and
>>> auditable.  It's also an indication of "something is wrong, as this
>>> should not have happened."
>> 
>> But your questions where what happens to other systems. a drift of
>> 500PPM for an hour is surely just as bad. 
>
> Why do you say 1 hour?  Why do you maintain it "is surely just as bad?"

I am hypothesising an extreme situation. 

>
> Exactly what problem(s) are you solving?

Trying to make ntpd better. 

I am not claiming that having rate adjustments higher than 500PPM is a
completely innocuous. I do not know. I do know what claims that somehow
500PPM is sacrosanct and anything else will break ntpd is almost
certainly bogus. 

To get back to leapseconds, there is certainly a problem with the
current situation. Hasler's suggestion that system time be in consequtive
seconds, and that reported time be corrected for leap seconds. Ie, that
system time be TAI, while reported time is in UTC makes sense to me, and
would alleviate much of the problem. The system really should not be
throwing away seconds or adding them in, and letting ntpd try to fudge
its way around that problem. 

 

>
> H




More information about the questions mailing list