[Babel-users] Ability to work with massive number of routes? (global full-table)

Christof Schulze christof.schulze at gmx.net
Tue Sep 25 08:16:49 BST 2018


>> Hi Babel community,

>> As I mentioned in another thread, I am curious about whether Babeld
>> can be adapted to work with global full-table.
>
>No.
>
>There is one long standing issue with merging from the kernel table
>that would benefit from a qsort.
>
>But you are going to
>
>A) run out of bandwidth - 785k routes  = ~14,000 babel packets. I
>think that rounds to 1280 bytes/packet,
>    so - and babel will want to announce these every 4 seconds - so
>call it 44mbits/sec? (feel free to check my math, it's friday). That's
>well above what I've ever seen wifi mcast in particular, achieve. And
>that's *per router*.
>
>To get there, the announcement interval would have to be increased up
>to at least a typical bgp interval (2 minutes) and even then...
>
>B) you run out of cpu - babeld uses linked lists, and tries to recalc
>bellman-ford every 4 seconds also. There's a need for a faster, safer
>kernel interface.
I do not see a reason why we could not change the data structures to 
consume less CPU under the given scenario.

>my rtod tests showed babeld typically falling over for any one of
>these four reasons in well under 4k routes on low end mips and arm
>hardware. Even the low end apu2 eats a whole cpu with about that many
>(ipv6) routes.
What are the other two reasons?

>I made a few sloppy computational improvements and so on while
>developing the rtod test. Tried to upstraeam a few,  my then-current
>employer wasn't happy with me working under anything but the apache
>license and didn't care, and I ran out of time and energy and have to
>admit I was hacking far more than programming -
>
>I think making some version of babel (be it bird or frr ) scale well
>to at least 64k routes would be a very good idea, 
I agree.

>and once things now entering it like unicast, and crypto, are stable,
>it would be a GREAT thing to have a version that did that, but I fear
>it will involve parallizing hellos and bellman ford and per interface
>threads, changes to the protocol to adapt the interval to the bandwidth
>and cpu available, tcp friendly rate control (or swapping routes via
>tcp), etc, etc.
I agree, as routers become more powerful and even low-end devices are 
emerging that feature multiple CPU cores, there might be a benefit. On 
another note - just parallelizing any algorithm (not specific to babeld) 
will only get you so far. The algorithm/data structures should be 
optimized first.

>
>and a whole suite of other cool things that nobody has time, energy,
>or sufficient programmers for. And it wouldn't be babeld anymore.
>
>Bird's version of babel should perform mildly better, as it has
>tighter code (xor rather than memcmp in one case I tried to upstream),
>and a few other better algorithms overall, but I suspect few besides
>me and john ( http://the-edge.taht.net/post/gilmores_list/ ) care
>enough about city-scale routing to get anywhere.
You are forgetting the Freifunk Communities in Germany. This is what 
they do: building city-wide wifi mesh networks. Currently mostly with 
segmented batman. Now that my patchset for babeld integration has been 
merged in gluon (the framework which most communities use to build their 
networks) the babeld technology is available to a wider audience for 
their meshes. To get an impression on the size of the community, 
https://www.freifunk-karte.de/ might be an interesting start.
Agreed, the development community is much smaller - still I can see 
dozen or so people contributing to gluon. It certainly could be worse.

>
>I should probably try to extract more patches from my misguided
>efforts, like this:
>
>https://github.com/dtaht/rabeld/commit/b74b4a6f9b532717ee93346963efd894e94615b3
>
>and I had a bpf filter that helped a lot, and I sunk time into
>enabling sse and neon ins...
>
>but I was mostly hoping the unicast/crypto/etc stuff would land in one
>piece I could do all up testing on before tackling the scaling
>problems, on someone elses time. I ended up deciding that I wanted to
>rewrite it all from scratch, hit licensing and employer problems...
>and time.....
I would appreciate that. We are just starting another test network for a 
city-wide mesh which will be based on babeld. Links that do not have 
wifi connectivity will be using wireguard as vpn. There is a significant 
speed improvement of that tech stack over batman+fastd. Let's see how 
much pull this gets. In any case: 64K routes should be working on 
current cheap routers in a network like that.

>> One of my environments uses BGP full-table from 3 upstream ISPs (each
>> with 785k routes currently).
>>   +----------+  +----------+  +----------+
>>   |Customer A|  |Customer B|  |Customer C|
>>   +----+-----+  +----+-----+  +----+-----+

>>   +----+----+    +---+---+   +-----+-----+
>>   |Edge Asia|----|Edge US|---|Edge Europe|
>>   ++-------++    +---+---+   +-----------+

>> +--+--+ +--+--+   +--+--+
>> |ISP A| |ISP B|   |ISP C|
>> +-----+ +-----+   +-----+
>> Babeld would simply refuse to run on this environment, blocking the
>> whole network without converging, with 100% CPU utilization.
>
>Don't do that. It hurts when you do that. :) rtod is a way to get to
>overload more gently.
What would have to be done to get confirmation of Juliusz' theory wrt 
the source of the load?

Christof

-- 
()  ascii ribbon campaign - against html e-mail
/\  against proprietary attachments

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
URL: <http://alioth-lists.debian.net/pipermail/babel-users/attachments/20180925/62a5cdd7/attachment.sig>


More information about the Babel-users mailing list