[Babel-users] Bucket full, dropping packet

Sat Dec 12 03:43:58 UTC 2015

Hi!

On Fri, Dec 11, 2015 at 6:02 PM, Juliusz Chroboczek
<jch at pps.univ-paris-diderot.fr> wrote:
> The amount of state that a Babel node maintains is proportional to v*r,
> where v is the number of neighbours and r the number of routes.  Your
> network is somewhat unusual in that it has some very central nodes -- 75
> neighbours max, I believe --, which is something that Babel doesn't like
> very much.

Oh, sadly not 75 max. The number of neighbours of one node is
potentially very large, because all nodes with Internet uplink
connects to the VPN server. So, all those nodes are then neighbours of
that VPN server node. Currently this is for example 140 nodes on one
VPN server. The good thing is that that VPN servers have larger
quantities of memory, so probably this issue with the state is not a
problem? Or should it be a problem? How much memory would that take,
approximately?

Are there some other constants we would have to adapt to such large
state? It cannot grow those constants automatically?

We had issues with OLSRv1 as well because of this topology. But in
that case it was because of the MTU problems and sizes of packages
announcing peers which had some problems with fragmentation.

> The protocol should be able to deal with that (75 * 500 is
> less than 40000), but the implementation will likely need some tuning.
> I'm hoping that you can help me do the tuning.

So numbers are currently 140 peers per VPN node and will go up in the
future as the network grows. How can we tune it for now to this size?
Can we make it so that tuning is automatic?

> You are the largest Babel network right now.  I'm very excited about your
> deployment, and I'm looking forward to tuning the babeld implementation to
> work well enough for your needs.

Aaaa. I didn't want that our network would be that. It is always hard
to be first. Ah, well.

> Which is a reasonable thing to do in order to solve your short-term
> issues.  I hope that you'll remain open to working with me to get babeld
> to scale to your needs -- I assure you that it can be done, but I need
> profiling data in order to do that.

Ah, don't worry, just ignore my childish rants. :-)

But I am thinking that we do need some faster way to debug these
issues that this ping-pong over the mailing list, with what should we
run and then file back and so on. What about you getting one cheap
TPLink, go to https://nodes.wlan-si.net/, register a node and deploy
it? It would be great to have a node in Paris. :-) But it would also
allow you to directly log into the network and maybe see things from
there much quicker? This would not solve all issues (if you have ideas
for general things we should be monitoring in and across the network
we can also try adding it to nodewatcher), but for some you might spot
some issues quickly.

Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m