[ntp:hackers] [Bug 1378] Unnecessary resetting of peers during interface update

Danny Mayer mayer at ntp.org
Sat Nov 14 17:42:56 UTC 2009


I am redirecting the discussion to hackers and copying Dave on this.
While you claim that nothing happens if nothing has changed, I beg to
differ especially as I see this in the wild. I fail to see why it is
essential to reset peers just because routing has changed as long as the
client is still receiving the packets. The underlying code takes care of
dealing with the the delays and jitter caused by the existing routing,
but yanking the association causes a lot of problems especially when
dealing with authenticating peers.

What you appear to be saying here is that the routing change is more
important than keeping the association which I think is fundamentally
wrong. If you had lost the connection that would be something different
but that has not been the case here in the two situations that I have
looked at. The routing changes that you write about seem to be only
about sending packets and not about receiving packets. Since broadcast
is only about receiving packets (outside of the autokey dance) this
should not normally apply to broadcast associations.

Note that the changes I have made cause the refresh to happen if a local
address has been added or removed but that will not happen on statically
allocated addresses.

Dave, can you weigh in here on your opinion on this?

Danny

Frank Kardel via the NTP Bugzilla wrote:
> http://bugs.ntp.org/1378
> 
> 
> 
> ----------------------------------------------------------------------------
> Additional Comments From kardel at ntp.org (Frank Kardel)
> Submitted on 2009-11-14 08:19
> 
> (In reply to comment #5)
>> The reason for the change is that when you reset the peer, you lose all of the
>> history and on top of that the authentication needs to be restarted from the
>> beginning.
> Danny know that - I wrote the code. Resetting the peer is necessary if you have
> crypto keys associated with the peer as the local address changed and the
> address is part of the crypto protocol.
> 
>> By default this is happening every 5 minutes.
> No - if nothing changes nothing will happen - even if the update routine is
> called repeatedly. If you are observing changes please send a log file - The
> code takes care not to update when no change happens - see
> ntp_peer.c:peer_refresh_interface().
> 
>> Both Dave Mills and
>> Steve have seen this, Dave on howland (hence the name of this repository).
> Please provide a log there must be some other reason for this to happen.
> 
>> The
>> Reach never reaches maximum because it keeps getting reset by the code. This
>> gets really bad with authenticated peers. Just because the routing changed
>> (maybe) there is no reason to reset the connection.
> 
> When the local address changes you must at least reset the crypto info - I had
> that in the code but Dave recommended to reset the entire peer. Now I would like
> to see the actual reason why the code thinks the the local address has changed.
> Note: M/BCLNTs shouldn't be affects as they are ignored in the update.
>  
>> Are you expecting it to now
>> use a different interface because of that? Don't forget that with broadcast and
>> multicast we open separate sockets to handle the incoming packets.
> Danny, have you ever looked into ntp_peer.c:peer_refresh_interface()/
> set_peerdstadr()?
> 
>> The interface list has not changed and if the routing table has changed why do
>> you believe that the peer needs to use a different interface? It's not as if the
>> socket specifically set up for the task is no longer receiving the packets.
> It is not about reception it is about sending. On a mutlihomed system a peer is
> configured with the address of the peer - the code determines at configuration
> time the respective local address to use this decision is based on the current
> routing tables. It can happen the the routing tables change and the outcome of
> this decision will be different (choosing a different local address). In case of
> such a change we will update the local address in order to send from the new
> local address.
> This holds up the *invariant* that a long running daemon uses the same network
> configuration as a freshly started daemon - that is the whole idea of the update
> code.
> Changing the local address when routing sees it fit is necessary as the network
> path to the 'old' local address may have become unusable.
> 
>> Additionally I was seeing on Steve's server the same peer show up upto 4-5 times
>> in the ntpq -p list.
> This is interesting and important - I think this is the hint we need to solve
> your problem.
> 
>> It took a while for it to reduce the list of identical
>> peers. So from the look of it, the peers are probably also not getting removed
>> properly before being reassigned to a new interface.
> Danny - it does not work that way. peers are not added or removed because of
> interface updates. The local address update code will only exchange the peers
> binding to the local address (and thus the socket used for sending).
> 
> Important part:
> What you are probably seeing are the associations formed by a (m/bcast-)server
> that is using changing source addresses. I have seen and tested this by having a
> WLAN connected mcast server. This server uses as local address for the mcast
> packets the OS preferred interface (it can use only one address so it must pick
> one - it will in many implementations be the address of the interface where the
> mcast packets are sent out initially before being replicated by mcast routing).
> Every time the sender changes the local address new associations will be formed
> in the clients - this works correctly including authentication.
>  
> Example: Any time my WLAN address (default route) changes on the sender the
> peers follow the new association - btw.the old association was unusable at the
> time anyhow - it will slowly time out. As my WLAN address changes once a day so
> will my peers form a new association once a day. So whenever the sender changes
> its address for the BCAST/MCAST packets sent the receivers/client will form new
> associations - this is just the normal protocol. What is unusual here is that
> the sender seems to change its local addresses often.
> 
> Now it would be interesting to find out why the sender seemingly picks new
> addresses - maybe the OS picks random interfaces for MCAST destinations - we
> would have to look into that then.
> 
> I have the suspicion that it is the sender that causes trouble in this
> environment. It may be that something lets the sender change the (observed)
> local address for M/BCAST transmission very often. What are the characteristics
> of the M/BCAST server (OS, ntpd version)?
> 
>> The current fix is to deal with the initial issue.
> It will break the interface update code as it breaks its invariant.
> This is not a fix. It seems to be based on a incomplete analysis and probably
> only attempts to cure symptoms.
> 
>> There are other things that
>> need to be reviewed but I'm happy with the current fix to prevent the kind of
>> churn that we have been seeing.
>> It certainly cannot remain the way it was.
> It has been working an many environments correctly - that 'fix' will break more
> than it will do good.
> 
> I still have not seen any logs that allow me to verify your reasoning for the fix.
> 
>> I
>> don't understand your objection so you really need to lay out what will happen
>> if you don't reset them as opposed to what happens when you do.
> Again: for M/BCAST NOTHING is reset ! - you know where to look (try
> ntp_peer.c:peer_refresh_interface()/set_peerdstadr()). 
> 
> Important/Conclusion:
> If you are seeing many associations formed for a bcast/mcast address you must
> not look at the receiver - you must analyze the sender.
> 
> Please provide ideally logs from sender and receiver where this problem appears.
> 
> Frank
> 




More information about the hackers mailing list