[Babel-users] Ability to work with massive number of routes? (global full-table)

Dave Taht dave.taht at gmail.com
Sat Sep 22 01:42:16 BST 2018


On Fri, Sep 21, 2018 at 4:36 PM StarBrilliant <coder at poorlab.com> wrote:
>
> Hi Babel community,
>
> As I mentioned in another thread, I am curious about whether Babeld
> can be adapted to work with global full-table.

No.

There is one long standing issue with merging from the kernel table
that would benefit from a qsort.

But you are going to

A) run out of bandwidth - 785k routes  = ~14,000 babel packets. I
think that rounds to 1280 bytes/packet,
    so - and babel will want to announce these every 4 seconds - so
call it 44mbits/sec? (feel free to check my math, it's friday). That's
well above what I've ever seen wifi mcast in particular, achieve. And
that's *per router*.

To get there, the announcement interval would have to be increased up
to at least a typical bgp interval (2 minutes) and even then...

B) you run out of cpu - babeld uses linked lists, and tries to recalc
bellman-ford every 4 seconds also. There's a need for a faster, safer
kernel interface.

my rtod tests showed babeld typically falling over for any one of
these four reasons in well under 4k routes on low end mips and arm
hardware. Even the low end apu2 eats a whole cpu with about that many
(ipv6) routes.

I made a few sloppy computational improvements and so on while
developing the rtod test. Tried to upstraeam a few,  my then-current
employer wasn't happy with me working under anything but the apache
license and didn't care, and I ran out of time and energy and have to
admit I was hacking far more than programming -

I think making some version of babel (be it bird or frr ) scale well
to at least 64k routes would be a very good idea, and once things now
entering it like unicast, and crypto, are stable, it would be a GREAT
thing to have a version that did that, but I fear it will involve
parallizing hellos and bellman ford and per interface threads, changes
to the protocol to adapt the interval to the bandwidth and cpu
available, tcp friendly rate control (or swapping routes via tcp),
etc, etc.

and a whole suite of other cool things that nobody has time, energy,
or sufficient programmers for. And it wouldn't be babeld anymore.

Bird's version of babel should perform mildly better, as it has
tighter code (xor rather than memcmp in one case I tried to upstream),
and a few other better algorithms overall, but I suspect few besides
me and john ( http://the-edge.taht.net/post/gilmores_list/ ) care
enough about city-scale routing to get anywhere.

I should probably try to extract more patches from my misguided
efforts, like this:

https://github.com/dtaht/rabeld/commit/b74b4a6f9b532717ee93346963efd894e94615b3

and I had a bpf filter that helped a lot, and I sunk time into
enabling sse and neon ins...

but I was mostly hoping the unicast/crypto/etc stuff would land in one
piece I could do all up testing on before tackling the scaling
problems, on someone elses time. I ended up deciding that I wanted to
rewrite it all from scratch, hit licensing and employer problems...
and time.....

> One of my environments uses BGP full-table from 3 upstream ISPs (each
> with 785k routes currently).
>   +----------+  +----------+  +----------+
>   |Customer A|  |Customer B|  |Customer C|
>   +----+-----+  +----+-----+  +----+-----+
>        |             |             |
>   +----+----+    +---+---+   +-----+-----+
>   |Edge Asia|----|Edge US|---|Edge Europe|
>   ++-------++    +---+---+   +-----------+
>    |       |         |
> +--+--+ +--+--+   +--+--+
> |ISP A| |ISP B|   |ISP C|
> +-----+ +-----+   +-----+
> Babeld would simply refuse to run on this environment, blocking the
> whole network without converging, with 100% CPU utilization.

Don't do that. It hurts when you do that. :) rtod is a way to get to
overload more gently.

In that case I'd just export a default gw with source specific routing.

> I understand Babel is an IGP, thus not designed to be as capable as
> EGPs. However I see Babel possesses many great features that are
> exclusive to other routing protocols:
> - RTT measurement (good for tunnels)
> - Packet loss measurement (good for wireless bridges)
> - SADR support (so I don't have to bother with complicated MPLS to use
>   policy-based routing)
> These features above are exactly the reason why I love Babel.
>
> And here are the things Babel currently lacks:
> - Capability of handling 785k × 3 routes
> - An extension to preserve AS Path
>
> I am optimistic to use Babel for full-table routing. And just in case
> Babel will be so successful, we might see Babel running between border
> gateways in the future. But I currently can not figure out where the
> bottleneck is. Is it because Babel runs on multicast? Or is it because
> Babeld uses some inefficient data structures or algorithms?

yes.

> Does anyone have ideas? Please discuss with me.

>
> _______________________________________________
> Babel-users mailing list
> Babel-users at alioth-lists.debian.net
> https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users



-- 

Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619



More information about the Babel-users mailing list