[Pool] Off by a second leap second failures

Miroslav Lichvar mlichvar at redhat.com
Wed Jul 15 12:29:36 UTC 2015


On Mon, Jul 13, 2015 at 01:06:51AM -0700, Ask Bjørn Hansen wrote:
> > On Jul 10, 2015, at 8:13, Svavar Kjarrval <svavar at kjarrval.is> wrote:
> > 
> > Are the server admins notified of the problem and advised on general
> > solutions? Even if they're kicked out of the pool, they'll probably
> > continue to serve bad time for others.
> 
> When a server score goes below 10 (if I remember right) the system sends off an email to tell the operator that something is amiss. Depending on what’s wrong this is usually within 90-120 minutes of the server going sour. If the server continues to be down then every few weeks the system will send you a reminder before (eventually) marking the server deleted and notifying the operator about that.

What about servers that serve good time and have good score, but are
still announcing a leap second? Their clients could insert the leap
second again on the end of the month, or any day actually depending on
their implementation.

In my limited monitoring I see about 2% of the servers still have
leap=01. Interestingly, in Czech Republic it's almost a third of 41
servers.

About half of them seem to be running ntp-4.2.6 (not responding to
port < 123) and probably hit the ntp bug #2246, which Martin has
pointed out earlier. Restarting ntpd should fix that.

I suspect the rest are running 4.2.4 or older, which have the design
flaw that the leap status is set on any day when any of its sources
is announcing the leap. If two or more such servers are polling each
other, the status will be passed in a loop and they will get stuck to
it after the insertion. To fix this some of these servers will
probably need to be reconfigured to not use any servers currently
announcing the leap second or at least shut down until the infection
in the loop they are part of clears up.

-- 
Miroslav Lichvar


More information about the pool mailing list