[ntp:hackers] [Bug 1378] Unnecessary resetting of peers during interface update

David Mills mills at udel.edu
Mon Nov 30 06:32:38 UTC 2009


Frank,

This is getting far too long. Three test  hosts mort, macabre and 
howland, are used to test Autokey and friends. Mort and macabrfe or 
configured  in symmetric mode with each other and also as broadcast 
servers. Howland is configured as broadcast client. Restarting any of 
these machines rolls a new private value and the associations in the 
other machines return a crypto-NAK that restarts the association. I have 
verified that any combination of failures/restarts results in glorious 
joy. However, for reasons of household power and cooling expense, I 
normally shut those machines down unless somebody asks..

Dave

Frank Kardel wrote:

> Danny Mayer wrote:
>
>> Frank Kardel wrote:
>>   ...
>> I remember the discussion with Dave. However, if neither remote nor
>> local address has changed you don't need to reset the association.
>>   
>
> Actually we should (possibly must). If we determine that the peer in 
> question now (at re-scan time) would get a different local address we 
> need to reset all cryptographic information and restart the dance (-> 
> peer_clear() will do that for us).
> Why ?
>    1) If ntpd had been (re-)started t this time it would have choosen 
> the new local address (even if the interface list is the same). This 
> enforces the invariant that a longer running ntpd and a freshly 
> started one only disagree for a short time about local address 
> selection. Also the is the implementation of the principle of least 
> astonishment wrt/  local address selection.
>    2) we cannot infer why routing decides to take a different route to 
> our peer/server in the association
>    3) we should take the best available information. It could even be 
> that the other local address is not reachable any more.
>    4) As we already discussed we only need to reset association that 
> use cryptographic materials - but we must reset these for 1) - 3) and 
> we cannot just change the local address as it is part of the 
> cryptographic material.
>
>> ...
>>
>>> Would it be possible to get a log showing the actions at -D5 ?
>>>     
>>
>>
>> Not right now unless I still have the binary and I think that there was
>> an incompatible change that prevented it from working with mort which is
>> the broadcasting server.
>>   
>
> Too bad. But howland isn't representative at all if I understood Dave 
> correctly. So I'd like to dismiss any observations on howland unless 
> Dave says howland is authoritive again.
>
>> ...
>> Why would we do that for only one type of association?
>>
>>   
>
> We only need to do that for associations with cryptographic 
> information - all others can easily cope with the switch of the local 
> address and keep their statistics which is what you were driving at 
> when trying to avoid 'unnecessary resetting'.
>
>> ...
>>
>>>> (ie via a different address)
>>>>       
>>>
>>> How would you do that? ntpd should stick to the mechanism used at
>>> configuration time and repeat that when necessary. I'd rather not add
>>> another network detection mechanism there that would differ in behavior
>>> from a plain new address configuration at the time of address change
>>> detection.
>>>     
>>
>>
>> Agreed, so if nothing changed why reset the association in case the
>> routing might have changed?
>>
>>   
>
> See the three point above 'Consistency', 'possible 
> unreachability/better path', 'best currently available information'.
>
>> ...
>>
>>> I would imagine, that is not trivial to do - another stage of deferring
>>> local address changes until a peer is deemed unreachable -That might
>>> take some time for the reach register to clear. I could imagine that
>>> people might ask why it take so long to follow a ppp dynamic address. I
>>> don't see any advantage in delaying the change - especially if we don't
>>> reset  non-cryptographic assiciations.
>>>     
>>
>>
>> Agreed but you only need to do that if the address changed.
>>
>>   
>
> We only need to do that when the 'local address selection' for an 
> association has changed. That is slightly different from the address 
> list determined by an interface scan. It can very well happen that an 
> association is formed via one or another local address. We need to 
> cope with that and keep that in line with the algorithm that picks the 
> local address at configuration time.
>
>>> ...
>>> Could you give any precise hints on what is not handled correctly
>>> (except for too aggressive peer_clearing()) - preferrably  with logfile
>>> evidence ? I'd rather work from facts that from fiction.
>>>
>>>     
>>
>>
>> Me roo but the logs I got aren't useful because the timestamping of the
>> debug output when sent to syslog got dropped when in debug mode. That
>> was an annoying and unexpected change.
>>
>>   
>
> So it is not possible to see what address changes were experienced ? 
> The time stamps wouldn't matter too much if we still have the event 
> sequence.
>
>>> The existence of many examples/cases does not mean that code needs 
>>> to be
>>> overly complicated.
>>>
>>> There are not so many exceptions to consider. peer_clearing() can be
>>> reduced to cryptographic associations. The other exceptions are cast
>>> reception and initial volley(calibration and autokey). The effects of
>>> these are located in findpeer(), set_peerdstadr() and
>>> peer_refresh_interface().
>>>     
>>
>>
>> No, you really need to consider all types especially as dynamically
>> allocated client may have moved sites and is now no longer getting
>> broadcast packets from server A and is not getting them from server B
>> instead and server A broadcast packets is no longer being received on
>> the new LAN or wireless segment.
>>   
>
> Well, ntpd algorithm already take care of that by dropping the server 
> A as it is not sending any more and picking up server B by dynamically 
> configuring it. I have already seen it work nicely with multicast 
> servers.
>
> It may very well be that we miss to do a 
> io_unsetbclient()/io_setbclient() after a true interface change.We 
> seem to run the io_setbclient() code only at configuration time. This 
> seems incomplete to me when we discover a new broadcast capable 
> network. Thus we should do:
> if (sys_bclient == 0)
>  {
>     io_unsetbclient();
>  }
> else
>  {
>     io_setbclient();
>  }
> right before the UNBLOCKIO() call in ntp_io.c:interface_update() to 
> pick up new broadcast addresses and enable them for broadcast reception.
> This seems to be a leftover/miss from pre-dynamic interfaces time and 
> is very likely the reason that you see new broadcast servers on new 
> networks not being picked up.
> There seems to be a bit more cruft even. 
> ntp_io.c:get_broadcastclient_flag() is not called anywere. It's 
> underlying variable is not managed sensibly with the
> state of the interfaces. In short we need to cleanup the broadcast 
> client enabling/disabling code to make it work correctly.
>
> Issues I see there:
>    1) not invoked on newly detected networks (will miss new broadcasters)
>    2) stale code (get_broadcastclient_flag())
>    3) possibly incomplete shutdown in broadcast sockets in 
> io_unsetbclient() (does not match logic in io_setbclient()) - needs 
> verification wrt/ broadcast server mode interaction
>
> => Say 'hello' to bug 1402
>
> Frank




More information about the hackers mailing list