[ntp:hackers] Release fixing all issues

Martin Burnicki martin.burnicki at meinberg.de
Thu Jan 21 12:27:35 UTC 2016


Harlan Stenn wrote:
> Martin Burnicki writes:
>> Miroslav Lichvar wrote:
[...]
>> Correct. This has also been discussed in Bug 988, long time ago:
>> http://bugs.ntp.org/show_bug.cgi?id=3D988
> 
> Which goes to show how This Is Not Easy, and there are many different
> aspects and use-cases.
> 
> What is a feature in some situations is a bug in some others.

Why should it be a bug that you prevent the local clock from clearing
the allow_panic flag if you *know* the local clock always return offset 0?

In our discussion we need to distinguish here

1.) whether it make sense to use a feature like the local clock, or not

2.) if the feature should work as expected, and fix it if it doesn't

The latter especially if it's easy to be fixed.

[...]

>> So IMO a good and easy solution would have been just to check the clock
>> type, and never clear allow_panic if the system peer is only the local
>> clock.

What I wrote above belongs to 2.)

> OK, so where is the benefit to syncing to a system that is sync'd to a
> LOCAL refclock?

The question above belongs to 1.).

The local clock is easy to use, folks have been familiar with it, and
for many use cases it works as expected.

For example, if system time is disciplined by a different program, and
ntpd just has to make the disciplined time available on the network. We
are using this approach under Windows, if one of our PCI cards is
installed, and the time adjustment service from our driver package
adjusts the system time.

Also, if you just want a quick setup for a time island, e.g. for leap
second tests where you run the test machines on completely different
times than the current time.

And finally in simple corporate networks, with only the requirements for
which the local clock has been designed. Even if you run ntpd on pure
NTP client it should not hurt if you have the local clock configured.

But of course, if you are setting up a large corporate time
synchronization network with a higher level of redundancy it's better to
use orphan mode instead. But folks have to investigate more to set it up
so that it also works as expected.

IMO this is similar to a e.g. an LDAP server setup: Either I can install
a single server, which is easier than configuring a group of servers
with redundancy and data replication.

You should let users/admins make the decision which way to go.

> The best example I can think of is that a group of machines are
> synchronized together, even if the overall time in that "island" is
> wrong.

Yes.

> I'll point out that this is EXACTLY one of the situations that NTF's
> General Timestamp API is designed to address.

I have to admit that I don't know how this is supposed to work.

If you just want to compare time stamps inside the time islands then you
can simply do it in a similar way as if all times are close to UTC or
whatever.

If you want to compare time stamps from a time island to time stamps
outside the time island you have to know the offset at some point, so
that the GTSA can deal with it.

Regarding our main topic, I don't see where it makes a practical
difference if the time island runs controlled by a local clock or by
orphan mode.

After many years working with ntpd I still really don't understand why
the local clock should generally be bad.

> Backing up a bit, in the situation with the LOCAL refclock we're about
> to start covering similar territory to what ntpdate went thru.

Indeed, there's a similar situation here where we have to distinguish
between point 1.) and 2.).

> There
> first was a time when ntpdate was a "clone" of ntpd, and it set the time
> once, using similar algorithms to ntpd based on the number of servers it
> could contact.  It suffered from very bad bitrot, and also its users
> fell in to 2 camps - one that wanted the time set as quickly as
> possible, and another group that wanted the time set once, as *well* as
> possible.

AFAIK the quick/slow updated could be controlled by just indicating the
number of polls on the command line.

IMO the real problem was that ntpdate was based on a *copy* of the ntpd
code which was current at that time. After this only ntpd was enhanced,
but ntpdate was mainly stuck with the old code.

The situation would be *much* better today if the code had been
separated so that the common algorithms were in shared source file which
could be used by both ntpd and ntpd, so ntpdate had automatically
inherited the benefits of ntpd.

Again, after many years working with ntpd and ntpdate I still really
don't understand why ntpdate is generally bad.

Even though you don't need it anymore in many cases because we have -g
for ntpd today, it's still a good debugging tool which can help to find
out why time synchronization doesn't work as expected, e.g. due to a
firewall.

I *know* sntp can also do most of this stuff, but AFAIK this doen't work
under Windows, and many of our customers are running Windows clients.
Personally, I prefer Linux, BTW. ;-)

> The answer to these was to deprecate ntpdate and give folks enough
> mechanism choices to implement their local policy choices.
> 
> One group can use sntp, which sets the time as quickly as possible.

Or ntpdate -p1 ;-)

> The other group can use ntpd -q, which sets the time once, as well as
> possible.
> 
> Somebody could write ntpdate as a script to do these.  The trick is that
> there is no single way to give folks both behaviors.  The behaviors are
> mutually exclusive.

Agreed. We should offer both ways, as we do now.

> One of the goals is to "have the old stuff keep working" but there are
> new choices now and it's not clear how to give folks distinct exclusive
> choices without making any changes to the current setup.
> 
> If the new default gives "Group A" the behavior they want, then by
> definition "Group B" will get the undesired behavior *unless they make a
> change to their old setup*, and the whole point here is to support old
> behavior without change.
> 
> Therefore, I submit that the only useful way to go is to acknowledge
> that given we are finding bugs and things to improve, the only way to
> make forward progress is to give folks newer mechanisms to implement
> better policy choices that let folks make the choices that are right for
> them.
> 
> If that means that a vendor makes a choice as to what they think works
> best for their customers, great.
> 
> If that means that when installing the new software the users gets
> informed (somehow) that there are new options that can be configured,
> that's great too.

I agree. However, on the one hand there is a bug regarding local clock
which did never really hurt anybody (undocumented flag 1) but is fixed
quickly, while the fix introduces another bug, and on the other hand
there is a bug which is really annoying but was refused to be fixed,
even though it would have been easy to do it.

> What other choices are there, really?

Fix tiny bugs and let users decide what to do.

Martin
-- 
Martin Burnicki

Senior Software Engineer

MEINBERG Funkuhren GmbH & Co. KG
Email: martin.burnicki at meinberg.de
Phone: +49 (0)5281 9309-14
Fax: +49 (0)5281 9309-30

Lange Wand 9, 31812 Bad Pyrmont, Germany
Amtsgericht Hannover 17HRA 100322
Geschäftsführer/Managing Directors: Günter Meinberg, Werner Meinberg,
Andre Hartmann, Heiko Gerstung
Web: http://www.meinberg.de


More information about the hackers mailing list