[ntp:hackers] smearing the leap second

Harlan Stenn stenn at ntp.org
Mon Jun 22 10:39:52 UTC 2015


Terje Mathisen writes:
> Harlan Stenn wrote:
> > I find myself somewhere in the middle between Brian and Terje.
> >
> > Please see http://support.ntp.org/bin/view/Dev/LeapSmearForNTPv4
> 
> I've added a bunch of comments to this page!

Thanks, I'll probably read them now even though it's 0330 here and I
need to fall asleep.

>> I'm still thinking it's better to put the entire smear correction in two
>> places:
>>
>> - in the timestamp
>> - in the bottom 27 bits of the refid
> 
> About 20 bits is sufficient.

Cool.

>> I *think* we want to add the current smear correction to the root
>> dispersion field.
>>
>> If we see that the top 5 bits of the refid are 1110 we'll know we're
>> getting smeared time - that should be enough to prevent the next link
>> of the chain from adding a 2nd smear correction.
> 
> Right, but we do need to propagate the smearing status, which probably 
> means that a non-S1 smearing server must lie about its stratum.

If we use my refid scheme we don't need to lie about the stratum.  I
think.

>> The refid loop detection is only used by immediate neighbors.  I'm not
>> seeing that this will be a problem with either Terje's idea to use SMRx
>> or my idea to use 11110xxx.
>>
>> I'm leaving out some details above (reporting no_leap on these packets,
>> etc).
>>
>> Will clients have a sufficiently short poll interval to track this?
> 
> Yes: The maximum poll for a standard client is 1024 seconds, so it takes 
> a little over an hour to get 4 new updates, right?
> 
> During that time, the smearing server will have adjusted the offset by a 
> maximum of about 1.6/smearing_period (the 1.6 is to adjust for the 
> gradual start and stop), so with 20 hours we're talking about 1.6/72000 
> which is about 22 ppm. 22 ppm over a period of 4096 seconds corresponds 
> to 91 ms, which is safely within the 128 ms limit for the normal control 
> loop.

Good, and I just had another thought.

If we do this, then updated clients could also notice the refid is one
used for a leap smear in-process and could crank down the poll interval
to 64 seconds or so, if that would be a help.

> The slew/smear will in fact start before we have 4 new samples, so we 
> will never see that 91 ms offset. However, if you have manually 
> configured your client(s) to poll even more rarely, i.e. poll 15 for a 
> single sample every 32768 seconds or 9 hours, then your client clock 
> will see the smearing as a step function.

I'm OK with this, as we should document it, and if somebody makes that
choice then they must have a reason for it.  My favorite "between the
lines" Bible quote is "Blessed are those who get what they deserve."

And I think there's a curse in Chinese along the lines of "May you get
what you wish for."

> For normal clients, the up to 22 ppm slew rate will be similar to the 
> frequency excursions many servers go through every day as the AC units 
> turn on/off or cpu-intensive processes turn on/off.

OK.

>> I don't think we have anywhere near enough time to test and
>> understand what will go right/wrong with these approaches.  But with
>> some luck we'll get some interesting data.
>>
>> And if there's enough interest and support for handling this we'll be
>> able to do a proper job after this event.  Remember, if leap seconds
>> are abolished there will(!) be a 5-year clock before they are
>> actually stopped.  The odds are good we'll have 1 or 2 more leap
>> seconds before they are ignored.
> 
> Right.
>
>> I'm planning to release 4.2.8p3 no later than the 25th, and I don't
>> really think that many people will be installing it.  I suspect a
>> bunch will, but many folks will "choose the devil you know".
>
> Absolutely. Even here where I work (the largest IT vendor/hosting 
> company in Norway) most server operators insist that only the default 
> ntpd included with the OS can be used, even if it is 20 years old!

This is also why we're publishing the number of bugfixes on the
ReleaseTimeline page on the wiki, and why I'm not inclined to backport
certain patches to EOL'd releases of NTP.  It's also why I want to get
NTF successful (ie, start getting paid and having enough $ to pay staff)
so we can get the Certification and Compliance Program going, and find a
way to start selling traceable/auditable timestamps that come with a
warranty.  At that point the server operators will have a means to
charge for these timestamps too and part of that process will be that
folks are running up-to-date software.

H


More information about the hackers mailing list