[ntp:questions] question regarding NTP configuration for clusters, and "cluster time" stability

rotordyn@yahoo.com rotordyn1 at gmail.com
Tue Sep 29 19:31:45 UTC 2009


On Sep 29, 2:37 pm, Harlan Stenn <st... at ntp.org> wrote:

> You've read assoc.html#orphan, right?

Yes. And I think I understand how it would work in common scenarios,
but I have to account for the corner cases as well. In particular, the
documentation at that link says:

    "If no UTC sources are available to any core server, one of them
    can provide a simulated UTC source for all other hosts in the
    subnet. However, only one core server can simulate the UTC
    source and all direct dependents, called orphan children, must
    select the same one, called the orphan parent."

So I'm not sure what happens if some core servers lose access
to their UTC sources, while the remainder do not. I had hoped
that one core server switching to orphan mode would somehow
trigger the others, but I don't see that it does in the code.

> If you "choose unwisely" and select a poor master to take over in your
> falure case, you'll see a time jump when you regain internet access.

True. I think I can survive that, as long as all the nodes stay close
enough to each other. If it has to, the cluster can signal that it is
too far from its external UTC reference to resynchronize. Then
the internal cluster time would just continue to drift until there was
a service action to fix it. What would be worse is for nodes in the
cluster to diverge from each other.

> If correct time is really important, why not run an inexpensive S1 device
> locally?

Oddly enough, "correct time" isn't all that important, at least not to
the accuracy often discussed in the context of NTP. And I think that's
the root of my issues: A consistent "cluster time" among the nodes
is much higher priority than accurately following UTC. Prioritizing
a common cluster time absolutely over UTC in a redundant fashion
seems to be difficult, since the peers that provide redundancy can
diverge from each other if given different inputs. (Different in that
one
could lose its external UTC reference while another does not.)

The current implementation reflects this, since we use NTP internally
with no outside references. But that means that over time we can
drift
pretty far away from UTC, and my goal is just to limit that. I think
I've
said it already, but roughly speaking I need the nodes to agree with
each other to less than a second, and even within a minute or even
an hour to UTC is enough.

> If it's *really* imporatant, you can build a very high quality S1 server
> with Rb and GPS and/or modem for under US$2k.

Adding hardware isn't an option. This product exists in the field.

As always, thanks for the guidance. And if anyone with the right
level of experience wants a consulting gig, send me an email. :)

tim




More information about the questions mailing list