[ntp:hackers] Fwd: waiting for ntpd "first fix" with orphan mode

Todd Glassey tglassey at glassey.com
Fri Apr 23 13:37:09 UTC 2010


On 4/22/2010 8:20 PM, David Mills wrote:
> Dave,
> 
> I think you might need two parameters:
> 
> o minwait - set the minimum time to wait after the minsane threshold is
> reached until the clock can be set. This takes the place of the ntp-wait
> script and is much better, as it gives the mitigation algorithms more
> time to settle.
> 
> o orphwait - set the minimum time to wait from the time all sources have
> been lost until switching to orphan mode.  Remember, under ordinary
> circumstances when all sources are lost, the client as a server for
> dependent clients continues to operate at the same stratum but with
> increasing dispersion until the synchronization distance seen by the
> clients exceeds the distance threshold, which usually is in about a day.
> The orphwait could shorten that time by switching to orphan mode after
> orphwait.
> 
> The orphan case is rather tricky, but not if you put it in context.
> Orphan mode is most useful for core servers normally operating at the
> same stratum and at relatively low stratum. Normally all servers run NTP
> with each other, either using symmetric modes or sharing the same
> Ethernet, as described in the documentation. Whether the servers are
> synchronized to external sources or operating as orphan parents, the
> orphan children will see the same scenarios as in the previous
> paragraph, so orphan is not necessary.

Orphan Mode also need to trigger an alarm process notification one would
think too.

> 
> Ordinarily, it is good practice for the core servers for a potentially
> isolated subnet to run NTP with each other in symmetric modes, so each
> can back of the other in case all sources are lost. Thus, orphan mode
> kicks in only if all sources for all servers are lost. In such a case
> and depending on initial conditions, one of the core servers will become
> the orphan parent and the other core servers will operate at orphan
> stratum plus one and thus become orphan children. The only case where an
> election algorithm is required is when two or more core servers become
> orphan but do not synchronized with each other.
> 
> However, let's take a closer look at the case involving a client with
> some number of servers upon first startup. The intent of the minwait is
> to avoid a step should one or another server be quite different from
> another. However, it only a single server A is available, there is no
> reason to wait. If two servers A and B are available, the issue is
> whether their correctness intervals overlap; i.e., whether the
> intersection interval is nonempty. If so, there is no reason to wait. If
> not, there can be no majority clique and the client would never
> synchronize. Now consider three servers A and B where A intersect B is
> nonempty and X does intersects neither A or B. The operator should set
> minsane = 2, in which case the order the servers are found makes no
> difference and synchronization will be delayed until both A and B are
> found and a majority clique exists. In none of these cases is minwait is
> necessary. Other cases can be generalized, but not considered here.
> 
> There might be other cases where minwait might be useful, as in the case
> where a relatively large number of servers are first found (pool) and
> not yet winnowed down to minclock. This is a tuning parameter, not an
> anti-step measure.
> 
> Dave
> 
> Dave Hart wrote:
> 
>> I am looking forward to any comments you might have on this email from
>> a couple of weeks ago to hackers.  I have a branch with (among other
>> ntpdate-elimination goodies) the --wait-sync functionality I've
>> discussed a few times previously that brought up this issue of
>> defining first sync for clients configured with orphan mode.
>>
>> Please feel free to add hackers@ to the recipients of your response.
>>
>> Thanks,
>> Dave Hart
>>
>>
>> ---------- Forwarded message ----------
>> From: Dave Hart <davehart at gmail.com>
>> Date: Fri, Apr 9, 2010 at 3:43 PM
>> Subject: waiting for ntpd "first fix" with orphan mode
>> To: hackers at lists.ntp.org
>> Cc: Dave Hart <hart at ntp.org>, Steve Kostecke <kostecke at ntp.org>
>>
>>
>> Suppose you want to wait to start a database until after ntpd first
>> synchronizes the clock, to minimize the likelihood of the clock
>> stepping after the database starts. Â Today, the best way to do that is
>> with scripts/ntp-wait from the distribution, with ntpd -gq or ntpdate
>> run before starting the daemon ntpd as other options.
>>
>> That preferred approach breaks down with machines configured to use
>> orphan mode as a backup, when orphan mode is not being used. Â A ntpd
>> configured with "tos orphan" behaves differently than without at
>> startup -- orphan participants immediately come up as orphan parents,
>> self-synchronized and showing leap=00, before any sources become
>> reachable. Â Using ntp-wait with such a configuration will never wait
>> substantially.
>>
>> If no external sources are available to any of the orphan
>> participants, they self-organize and a single ntpd becomes the orphan
>> parent, self-synched, while the other participants sync to it. Â Even
>> in that case, with more than two hosts the odds are against any given
>> ntpd that starts up as an orphan mode parent remaining so once it has
>> reachability to the other orphan mode ntpds.
>>
>> Perhaps the cleanest solution would be to prohibit orphan parent
>> operation for some brief startup delay (in seconds or poll intervals).
>> Â That may be undesirable in an intentionally orphan-only configuration
>> (where orphan mode is not simply a backup mechanism), but could be
>> minimized or disabled in that case with an appropriate knob (tos
>> orphandelay? Â tos orphanparentdelay?), which presumably would default
>> to an interval long enough that ntpd configured with orphan mode
>> backup would behave like one without orphan mode -- come up leap=11
>> unsynchronized until enough sources become usable, then transition to
>> leap != 11. Â In other words, so that one could have more confidence
>> that after ntp-wait is satisifed, the clock will not shortly
>> thereafter be stepped.
>>
>> I've been discussing this in the context of the ntp-wait script. Â In
>> fact, my interest in resolving the question involves an alternative
>> meant to solve the same problem as ntp-wait with a slightly different
>> interface, a new ntpd option "--wait-sync 300" that would fork off a
>> daemon child process, and then wait for either 300 seconds or until
>> ntpd's first synch, and exit with failure (nonzero) if the 300 seconds
>> timed out. Â In either case, the daemon continues on.
>>
>> Your thoughts?
>>
>> Dave Hart
>>  
>>
> 
> 
> _______________________________________________
> hackers mailing list
> hackers at lists.ntp.org
> http://lists.ntp.org/listinfo/hackers
> 
> 



More information about the hackers mailing list