[ntp:hackers] [Bug 1378] Unnecessary resetting of peers during interface update

Danny Mayer mayer at ntp.org
Sun Nov 15 20:42:09 UTC 2009


Dave,

I agree with most of what you said. The concern has been almost
exclusively with static servers that are using broadcast/multicast
client mode and autokey so clearing the association when no local
address has changed causes an authentication reset as well as the
clearing of all history regardless of the information already collected
about the remote server. Part of the problem with howland was that it
was associating the multicast address with the wildcard address which is
wrong since you cannot use the wildcard address for authentication. The
system would get jerked around every 5 minutes and never be able to
synchronize. While we have corrected the wildcard address issue, the
problem of changing multicast associations even when no local address
has changed remained. I changed the code to do nothing if there were no
local address changes. Frank objected to that because you might be able
to take advantage of a new route if the routing changed. My biggest
objection to that is by resetting the association you have just lost all
of the work done to amortize the jitter, delay, etc. of the association
and I don't consider it important enough to worry about different
routing unless the remote server becomes unreachable for any reason. One
method of addressing this is to see if a different route is available
(ie via a different address) and only then reset the association if the
local address to be used had changed and the remote server is not
reachable. For broadcast and multicast addresses the other part of this
is if the interface to be used could have changed. I think all of this
is much more complicated than the logic that Frank currently has in
place since it is also not clear that the multicast interface socket
needs to be changed in the face of a local address change. From the look
of the code does not try to change the multicast interfaces if the local
addresses changed.

I hope this is clearer.

Danny

David Mills wrote:
> Danny,
> 
> I found this discussion really hard to follow, so I will confine my
> comments only to the reset association issue. The local and remote
> addresses are part of the security parameters. If they change, the
> association must be reconstructed; therefore, if Autokey is in use when
> the local address changes, the associations must be remobilized. This
> requires considerable resources and should be avoid unless strictly
> necessary. In preinciple, it is not necessare to clear the association
> if security parameters are not instantiated. Clearing the associaiton
> throws away all accumulated statistics and dynamic state. The clock
> filter, select, cluster, comibine and mitigation algorithms all take a
> hit. The poll interval gets reset and a hole of at least four packets is
> created in the data stream.
> 
> There might have been a temporary occastion when the associations were
> cleared every five minutes, but that is not the case now.
> 
> Dave
> 
> Danny Mayer wrote:
> 
>> I am redirecting the discussion to hackers and copying Dave on this.
>> While you claim that nothing happens if nothing has changed, I beg to
>> differ especially as I see this in the wild. I fail to see why it is
>> essential to reset peers just because routing has changed as long as the
>> client is still receiving the packets. The underlying code takes care of
>> dealing with the the delays and jitter caused by the existing routing,
>> but yanking the association causes a lot of problems especially when
>> dealing with authenticating peers.
>>
>> What you appear to be saying here is that the routing change is more
>> important than keeping the association which I think is fundamentally
>> wrong. If you had lost the connection that would be something different
>> but that has not been the case here in the two situations that I have
>> looked at. The routing changes that you write about seem to be only
>> about sending packets and not about receiving packets. Since broadcast
>> is only about receiving packets (outside of the autokey dance) this
>> should not normally apply to broadcast associations.
>>
>> Note that the changes I have made cause the refresh to happen if a local
>> address has been added or removed but that will not happen on statically
>> allocated addresses.
>>
>> Dave, can you weigh in here on your opinion on this?
>>
>> Danny
>>
>> Frank Kardel via the NTP Bugzilla wrote:
>>
>>> http://bugs.ntp.org/1378
>>>
>>>
>>>
>>> ----------------------------------------------------------------------------
>>>
>>> Additional Comments From kardel at ntp.org (Frank Kardel)
>>> Submitted on 2009-11-14 08:19
>>>
>>> (In reply to comment #5)
>>>
>>>> The reason for the change is that when you reset the peer, you lose
>>>> all of the
>>>> history and on top of that the authentication needs to be restarted
>>>> from the
>>>> beginning.
>>>>
>>> Danny know that - I wrote the code. Resetting the peer is necessary
>>> if you have
>>> crypto keys associated with the peer as the local address changed and
>>> the
>>> address is part of the crypto protocol.
>>>
>>>
>>>> By default this is happening every 5 minutes.
>>>>
>>> No - if nothing changes nothing will happen - even if the update
>>> routine is
>>> called repeatedly. If you are observing changes please send a log
>>> file - The
>>> code takes care not to update when no change happens - see
>>> ntp_peer.c:peer_refresh_interface().
>>>
>>>
>>>> Both Dave Mills and
>>>> Steve have seen this, Dave on howland (hence the name of this
>>>> repository).
>>>>
>>> Please provide a log there must be some other reason for this to happen.
>>>
>>>
>>>> The
>>>> Reach never reaches maximum because it keeps getting reset by the
>>>> code. This
>>>> gets really bad with authenticated peers. Just because the routing
>>>> changed
>>>> (maybe) there is no reason to reset the connection.
>>>>
>>> When the local address changes you must at least reset the crypto
>>> info - I had
>>> that in the code but Dave recommended to reset the entire peer. Now I
>>> would like
>>> to see the actual reason why the code thinks the the local address
>>> has changed.
>>> Note: M/BCLNTs shouldn't be affects as they are ignored in the update.
>>>
>>>
>>>> Are you expecting it to now
>>>> use a different interface because of that? Don't forget that with
>>>> broadcast and
>>>> multicast we open separate sockets to handle the incoming packets.
>>>>
>>> Danny, have you ever looked into ntp_peer.c:peer_refresh_interface()/
>>> set_peerdstadr()?
>>>
>>>
>>>> The interface list has not changed and if the routing table has
>>>> changed why do
>>>> you believe that the peer needs to use a different interface? It's
>>>> not as if the
>>>> socket specifically set up for the task is no longer receiving the
>>>> packets.
>>>>
>>> It is not about reception it is about sending. On a mutlihomed system
>>> a peer is
>>> configured with the address of the peer - the code determines at
>>> configuration
>>> time the respective local address to use this decision is based on
>>> the current
>>> routing tables. It can happen the the routing tables change and the
>>> outcome of
>>> this decision will be different (choosing a different local address).
>>> In case of
>>> such a change we will update the local address in order to send from
>>> the new
>>> local address.
>>> This holds up the *invariant* that a long running daemon uses the
>>> same network
>>> configuration as a freshly started daemon - that is the whole idea of
>>> the update
>>> code.
>>> Changing the local address when routing sees it fit is necessary as
>>> the network
>>> path to the 'old' local address may have become unusable.
>>>
>>>
>>>> Additionally I was seeing on Steve's server the same peer show up
>>>> upto 4-5 times
>>>> in the ntpq -p list.
>>>>
>>> This is interesting and important - I think this is the hint we need
>>> to solve
>>> your problem.
>>>
>>>
>>>> It took a while for it to reduce the list of identical
>>>> peers. So from the look of it, the peers are probably also not
>>>> getting removed
>>>> properly before being reassigned to a new interface.
>>>>
>>> Danny - it does not work that way. peers are not added or removed
>>> because of
>>> interface updates. The local address update code will only exchange
>>> the peers
>>> binding to the local address (and thus the socket used for sending).
>>>
>>> Important part:
>>> What you are probably seeing are the associations formed by a
>>> (m/bcast-)server
>>> that is using changing source addresses. I have seen and tested this
>>> by having a
>>> WLAN connected mcast server. This server uses as local address for
>>> the mcast
>>> packets the OS preferred interface (it can use only one address so it
>>> must pick
>>> one - it will in many implementations be the address of the interface
>>> where the
>>> mcast packets are sent out initially before being replicated by mcast
>>> routing).
>>> Every time the sender changes the local address new associations will
>>> be formed
>>> in the clients - this works correctly including authentication.
>>>
>>> Example: Any time my WLAN address (default route) changes on the
>>> sender the
>>> peers follow the new association - btw.the old association was
>>> unusable at the
>>> time anyhow - it will slowly time out. As my WLAN address changes
>>> once a day so
>>> will my peers form a new association once a day. So whenever the
>>> sender changes
>>> its address for the BCAST/MCAST packets sent the receivers/client
>>> will form new
>>> associations - this is just the normal protocol. What is unusual here
>>> is that
>>> the sender seems to change its local addresses often.
>>>
>>> Now it would be interesting to find out why the sender seemingly
>>> picks new
>>> addresses - maybe the OS picks random interfaces for MCAST
>>> destinations - we
>>> would have to look into that then.
>>>
>>> I have the suspicion that it is the sender that causes trouble in this
>>> environment. It may be that something lets the sender change the
>>> (observed)
>>> local address for M/BCAST transmission very often. What are the
>>> characteristics
>>> of the M/BCAST server (OS, ntpd version)?
>>>
>>>
>>>> The current fix is to deal with the initial issue.
>>>>
>>> It will break the interface update code as it breaks its invariant.
>>> This is not a fix. It seems to be based on a incomplete analysis and
>>> probably
>>> only attempts to cure symptoms.
>>>
>>>
>>>> There are other things that
>>>> need to be reviewed but I'm happy with the current fix to prevent
>>>> the kind of
>>>> churn that we have been seeing.
>>>> It certainly cannot remain the way it was.
>>>>
>>> It has been working an many environments correctly - that 'fix' will
>>> break more
>>> than it will do good.
>>>
>>> I still have not seen any logs that allow me to verify your reasoning
>>> for the fix.
>>>
>>>
>>>> I
>>>> don't understand your objection so you really need to lay out what
>>>> will happen
>>>> if you don't reset them as opposed to what happens when you do.
>>>>
>>> Again: for M/BCAST NOTHING is reset ! - you know where to look (try
>>> ntp_peer.c:peer_refresh_interface()/set_peerdstadr()).
>>> Important/Conclusion:
>>> If you are seeing many associations formed for a bcast/mcast address
>>> you must
>>> not look at the receiver - you must analyze the sender.
>>>
>>> Please provide ideally logs from sender and receiver where this
>>> problem appears.
>>>
>>> Frank
>>>
>>>
>>
>>
> 
> 
> 




More information about the hackers mailing list