[ntp:hackers] Release fixing all issues

Harlan Stenn stenn at ntp.org
Thu Jan 21 11:06:28 UTC 2016


Martin Burnicki writes:
> Miroslav Lichvar wrote:
> > On Thu, Jan 21, 2016 at 03:22:54AM +0000, Harlan Stenn wrote:
> >>  https://bugzilla.redhat.com/show_bug.cgi?id=3DCVE-2015-5300
> >>
> >> and practically INSTANTLY see why that fix is inadequate.
> >>
> >> I suspect the key issue is knowing what -g is supposed to do.
> >=20
> > Right.
> >=20
> >> -g should allow the first *correction* to exceed the panic gate.
> >=20
> > That doesn't seem to be how -g is described. Quoting ntpd.html:
> >=20
> >   Normally, ntpd exits with a message to the system log if the offset
> >   exceeds the panic threshold, which is 1000 s by default. This option
> >   allows the time to be set to any value without restriction; however,
> >   this can happen only once.
> >=20
> > See the difference?
> >=20
> > The trouble with enabling the panic threshold right after the first
> > clock update, at least in 4.2.6, is that it doesn't work well with the
> > LOCAL driver. If the initial offset is larger than 1000 seconds and
> > the first update comes from the driver (e.g. it's specified in the
> > config before other sources or the network comes up couple seconds
> > later after ntpd start), the second update (from a real source) will
> > cause ntpd to exit.
> 
> Correct. This has also been discussed in Bug 988, long time ago:
> http://bugs.ntp.org/show_bug.cgi?id=3D988

Which goes to show how This Is Not Easy, and there are many different
aspects and use-cases.

What is a feature in some situations is a bug in some others.

> > Allowing a large step in first two updates instead of one doesn't help
> > the attacker much, but breaking a valid use case would be a problem
> > for a security update. I know you are trying to deprecate the LOCAL
> > driver. There are people who still use it, for good or bad reasons.
> >=20
> > Anyway, this applies to 4.2.6 and older. In 4.2.8 it doesn't really
> > matter which patch is applied. Large step would still be allowed only
> > on the first update as there is no case in the loopfilter for the FSET
> > state when offset is smaller than 0.128 second and allow_panic is set
> > to false in the default case.
> 
> As far as I can see the situation in 4.2.8 is a little bit better than
> in 4.2.6.
> 
> If I remember correctly then in 4.2.6 ntpd sync'ed to the local clock
> very quickly, and thus the allow_panic flag was cleared before any
> upstream source was accepted, even if the external source was configured
> with iburst. So the only workaround was to run ntpdate before starting
> ntpd, just like in the good old days. :-(
> 
> In 4.2.8 ntpd sync's to the local clock only slowly, so if an upstream
> source is available there's a good chance ntpd syncronizes to the
> external source first, and step the time due to -g before ntpd has a
> chance to sync to the local clock.
> 
> This may still fail, though, if an external source is not reachable when
> ntpd starts, e.g. because the network connection is down. I haven't
> tried this, but I'd expect that ntpd would still synchronize to the
> local clock, then clear the allow_panic flag, and terminate itself if
> there's a large time offset when a real time source becomes reachable.
> 
> (Please note this refers to the behavior observed before any patches for
> https://bugzilla.redhat.com/show_bug.cgi?id=3DCVE-2015-5300 had been appl=
> ied.)
> 
> So IMO a good and easy solution would have been just to check the clock
> type, and never clear allow_panic if the system peer is only the local
> clock.

OK, so where is the benefit to syncing to a system that is sync'd to a
LOCAL refclock?

The best example I can think of is that a group of machines are
synchronized together, even if the overall time in that "island" is
wrong.

I'll point out that this is EXACTLY one of the situations that NTF's
General Timestamp API is designed to address.

Backing up a bit, in the situation with the LOCAL refclock we're about
to start covering similar territory to what ntpdate went thru.  There
first was a time when ntpdate was a "clone" of ntpd, and it set the time
once, using similar algorithms to ntpd based on the number of servers it
could contact.  It suffered from very bad bitrot, and also its users
fell in to 2 camps - one that wanted the time set as quickly as
possible, and another group that wanted the time set once, as *well* as
possible.

The answer to these was to deprecate ntpdate and give folks enough
mechanism choices to implement their local policy choices.

One group can use sntp, which sets the time as quickly as possible.

The other group can use ntpd -q, which sets the time once, as well as
possible.

Somebody could write ntpdate as a script to do these.  The trick is that
there is no single way to give folks both behaviors.  The behaviors are
mutually exclusive.

One of the goals is to "have the old stuff keep working" but there are
new choices now and it's not clear how to give folks distinct exclusive
choices without making any changes to the current setup.

If the new default gives "Group A" the behavior they want, then by
definition "Group B" will get the undesired behavior *unless they make a
change to their old setup*, and the whole point here is to support old
behavior without change.

Therefore, I submit that the only useful way to go is to acknowledge
that given we are finding bugs and things to improve, the only way to
make forward progress is to give folks newer mechanisms to implement
better policy choices that let folks make the choices that are right for
them.

If that means that a vendor makes a choice as to what they think works
best for their customers, great.

If that means that when installing the new software the users gets
informed (somehow) that there are new options that can be configured,
that's great too.

What other choices are there, really?

H


More information about the hackers mailing list