[ntp:hackers] smearing the leap second

Martin Burnicki martin.burnicki at burnicki.net
Fri Jun 19 20:13:39 UTC 2015


Hal Murray wrote:
> 
> martin.burnicki at burnicki.net said:
>> The main point is just that the time is correct again at the beginning of
>> the new UTC day. Otherwise it doesn't matter much, IMO. 
> 
> How did you decide to smear from leap-delta to leap rather than leap-delta/2 
> to leap+delta/2?
> 
> If the smear straddles the leap the worst offset is 1/2 second rather than a 
> whole second.

Agreed.

However, let me give you some details about the reasons why I started
the leap smear implementation.

You know in ntpd 4.2.4, if kernel support was not available, or not
used, ntpd didn't care about the leap second at all. So if ntpd was run
with -x and thus kernel support wasn't used, ntpd saw a sudden 1 s
offset after the leap second and normally would have stepped the time by
-1 s a few minutes later. However, due to -x ntpd did *not* step the
time but started slewing over a long period.

I would have considered this behaviour a bug, mainly since i bet this
was an accidental behaviour. However, as we learned in the discussion in
bug 2745, this behaviour was very much appreciated since indeed the time
was never stepped back, and even though the start of the slewing was
somewhat undefined and depending on the poll interval. I mean the system
time was off by 1 second for several minutes before slewing even started.

In ntpd 4.2.6 Dave Mills had added some code which let ntpd step the
time at UTC midnight to insert a leap second, if kernel support was not
used. Unfortunately this also happened if ntpd was started with -x, so
the folks who expected that the time was *never* stepped when ntpd was
run with -x found this wasn't true anymore, and from the discussion in
bug 2745 I learned that there were even some folks who patched ntpd to
get the 4.2.4 behaviour back.

In 4.2.8 the kleap second code was rewritten by Juergen Perlinger, who
did a great job, IMO. However, the resulting code still showed the
behaviour of 4.2.6, i.e. ntpd with -x would still step the time.

I have to admit that I became aware of the full problem only from the
discussion in the bug mentioned above. Fortunately, Juergen has fixed
this in the current ntpd code, but this fix is only available with a
certain patch level of ntpd 4.2.8.

In the mean time we (Meinberg) had a number of requests from customers
to provide a way how to get over the leap second without the time being
stepped.

Of course we could have told the customers to check the version of ntpd
installed on each server in the company. If it's still 4.2.4 be sure to
start the client ntpd with -x. If it's 4.2.6 or 4.2.8 it won't work
anyway except if you had a patched version instead of the original
version. So you need to upgrade to the current -stable code to be able
to run ntpd with -x and get the desired result, so you'd still have the
requirement to check/update/configure every single machine in the company.

Google's leap smear approach is a very efficient solution for this. You
just have to take care that the company's NTP servers support leap
smearing and configure those few servers accordingly. If the smear
interval is long enough so that NTP clients can follow the smeared time
it doen't matter at all which version of ntpd is installed on a client
machine, it just works, and it even works around kernel bugs due to the
leap second.

Since all clients follow the same smeared time the time *difference*
between the clients during the smear interval is as small as possible,
compared to the -x approach.

Juergen's leap second code determines the point in system time when the
leap second is to be inserted, and given a particular smear interval
it's easy to determine the start point of the smearing, and the smearing
is finished when the next UTC day begins. The maximum error doesn't
exceed what you'd get with the old smearing caused by -x in ntpd 4.2.4,
so if users could accept the old behaviour they would even accept the
smearing at the server side.

When I started to think about implementing leap smear support in ntpd I
hadn't noticed that Google had changed their smearing strategy. I agree
that the advantage Google's new approach is that the maximum time error
is only 0.5 s instead of 1 s.

What I've currently implemented is the old approach, but it should be
easy to change this to the new approach, mainly by computing the start
and end time of the smear interval differently, and eventually adjust
the formula to compute the current smear offset.

My main concern was to find a way to implement this without affecting
the local system time, which might have caused potential problems in
many places of the code. Since I now just modify the timestamps in
outgoing reply packets to clients I can be pretty sure that the basic
functionality of ntpd is not accidentally broken.

So if most of you prefer to smear the leap second half before and half
after the leap second time I'm also happy with this and can change this
accordingly.

Martin



More information about the hackers mailing list