[Babel-users] Ability to work with massive number of routes? (global full-table)

Dave Taht dave.taht at gmail.com
Tue Sep 25 14:39:15 BST 2018


On Tue, Sep 25, 2018 at 12:30 AM Christof Schulze
<christof.schulze at gmx.net> wrote:
>
> >> Hi Babel community,
>
> >> As I mentioned in another thread, I am curious about whether Babeld
> >> can be adapted to work with global full-table.
> >
> >No.
> >
> >There is one long standing issue with merging from the kernel table
> >that would benefit from a qsort.
> >
> >But you are going to
> >
> >A) run out of bandwidth - 785k routes  = ~14,000 babel packets. I
> >think that rounds to 1280 bytes/packet,
> >    so - and babel will want to announce these every 4 seconds - so
> >call it 44mbits/sec? (feel free to check my math, it's friday). That's
> >well above what I've ever seen wifi mcast in particular, achieve. And
> >that's *per router*.
> >
> >To get there, the announcement interval would have to be increased up
> >to at least a typical bgp interval (2 minutes) and even then...
> >
> >B) you run out of cpu - babeld uses linked lists, and tries to recalc
> >bellman-ford every 4 seconds also. There's a need for a faster, safer
> >kernel interface.
> I do not see a reason why we could not change the data structures to
> consume less CPU under the given scenario.

Well, I took a stab at some of that. In particular, babeld has to compare
a lot of bytes, and while profiling it on these loads, was totally
bottlenecked on memcmp.

So I attempted to add sse2 and arm neon support to make
that easier to xor on 128 bits at a time, and nearly eliminated memcmp
with inline xor in general
to find that the real cause of the bottleneck on that version of the
benchmark was a dumb sort routine...

Juliusz then rightly pointed out that a better algorithm for merging
routes would be good... I'd rather index them... I'm also of the
opinion we should announce
routes over an increasingly larger interval based on the available
bandwidth and number of speakers.

And - setting a goal for 64k routes - I thought that switching to a normalized
table structure internally would be useful. Instead of storing the
nexthop as a full address
store an 16 bit index pointing to an array of those nexthops. And so on.

In the end I decided that making wifi into a suitable network substrate,
again, with good latency under load and vastly improved multistation
support, was the
most worthwhile, before tackling mesh again.

The first version of that is done and shipping for ath9k, ath10k, mt76
since openwrt 17.01. Ath10k with the ct firmware does adhoc. Toke and
the other make-wifi-fast folk are busy trying to make it generic (for
the iwl, now also), fq_codel's the default now on OSX... so wifi is
looking vastly improved.

There's still quite a few things left on the make-wifi-fast roadmap
left to tackle. I haven't looked at this document
since the last funding round collapsed.

https://docs.google.com/document/d/1Se36svYE1Uzpppe1HWnEyat_sAGghB3kE285LElJBW4/edit

Then 802.11s started to work...

There were all sorts of other issues, getting into a dogfight with either
odhcpd or network manager has always been a problem, address assignment
for ipv6 is still a problem,

We could write a roadmap for "better wifi mesh networking", but it starts with
"more bodies". if we could just siphon off 1/10000 the effort going into 5G,
we could get somewhere. My own network is decrepit, wlan-slovenia has
mostly switched to 4G uplinks, sudomesh died (I think), and I didn't
know freifunct was still alive. Is guifi still alive?


>
> >my rtod tests showed babeld typically falling over for any one of
> >these four reasons in well under 4k routes on low end mips and arm
> >hardware. Even the low end apu2 eats a whole cpu with about that many
> >(ipv6) routes.
> What are the other two reasons?

* walking linked lists
* running out of bandwidth
* having to process a ton of packets per router
* faster safer kernel interface
* recalcing bellman-ford

I note, incidentally, that my memory of that experiment was getting
fuzzy. I'd had 8+ devices on that network,
and also was regularly having odhcpd or network manager duke it out
with babeld for cpu due to all the messages on
the netlink bus. I just did the same experiment with
only 5 devices on that scenario and kept the local network running ok
(with both bird and babeld), on up to 16k routes, crashing a 64k box
locally.. and crashing 3 nanostation/picostation devices a hop away
that either ran out of memory or cpu, or both. Thankfully procd
restarted two of em, the third was so old it didn't, and I just had to
climb on the roof at 3 in the morning to reboot it)

One of the huge advantages of vlans or 802.11s is the number of routes
is *bounded* to a fairly low figure. A general purpose routing
protocol has to somehow sanely start rejecting routes at some bound,
for some definition of sane.

>
> >I made a few sloppy computational improvements and so on while
> >developing the rtod test. Tried to upstraeam a few,  my then-current
> >employer wasn't happy with me working under anything but the apache
> >license and didn't care, and I ran out of time and energy and have to
> >admit I was hacking far more than programming -
> >
> >I think making some version of babel (be it bird or frr ) scale well
> >to at least 64k routes would be a very good idea,
> I agree.

It's a nice defining goal for a project. 64k routes with 32 stations or bust!

> >and once things now entering it like unicast, and crypto, are stable,
> >it would be a GREAT thing to have a version that did that, but I fear
> >it will involve parallizing hellos and bellman ford and per interface
> >threads, changes to the protocol to adapt the interval to the bandwidth
> >and cpu available, tcp friendly rate control (or swapping routes via
> >tcp), etc, etc.
> I agree, as routers become more powerful and even low-end devices are
> emerging that feature multiple CPU cores, there might be a benefit. On
> another note - just parallelizing any algorithm (not specific to babeld)
> will only get you so far. The algorithm/data structures should be
> optimized first.

Heh. One of my targets for that attempt a few years back was a 1024 core
parallella chip. When you have cores to burn like that you can come up with
all sorts of crazy ideas. The 1024 core version never shipped.

>
> >
> >and a whole suite of other cool things that nobody has time, energy,
> >or sufficient programmers for. And it wouldn't be babeld anymore.
> >
> >Bird's version of babel should perform mildly better, as it has
> >tighter code (xor rather than memcmp in one case I tried to upstream),
> >and a few other better algorithms overall, but I suspect few besides
> >me and john ( http://the-edge.taht.net/post/gilmores_list/ ) care
> >enough about city-scale routing to get anywhere.
> You are forgetting the Freifunk Communities in Germany. This is what
> they do: building city-wide wifi mesh networks. Currently mostly with
> segmented batman. Now that my patchset for babeld integration has been
> merged in gluon (the framework which most communities use to build their
> networks) the babeld technology is available to a wider audience for
> their meshes. To get an impression on the size of the community,
> https://www.freifunk-karte.de/ might be an interesting start.
> Agreed, the development community is much smaller - still I can see
> dozen or so people contributing to gluon. It certainly could be worse.

yep. Personally I'd really like to get my hands into some 5G stuff,
get my own base station,
make private 5G possible, but I know that isn't going to happen...
> >
> >I should probably try to extract more patches from my misguided
> >efforts, like this:
> >
> >https://github.com/dtaht/rabeld/commit/b74b4a6f9b532717ee93346963efd894e94615b3
> >
> >and I had a bpf filter that helped a lot, and I sunk time into
> >enabling sse and neon ins...

The bpf filter was helpful in lowering the noise and cpu usage from odhcpd.

> >
> >but I was mostly hoping the unicast/crypto/etc stuff would land in one
> >piece I could do all up testing on before tackling the scaling
> >problems, on someone elses time. I ended up deciding that I wanted to
> >rewrite it all from scratch, hit licensing and employer problems...
> >and time.....
> I would appreciate that. We are just starting another test network for a
> city-wide mesh which will be based on babeld. Links that do not have

Cool. Enable ecn. :)

> wifi connectivity will be using wireguard as vpn. There is a significant
> speed improvement of that tech stack over batman+fastd. Let's see how

Wireguard is nice. It however can chew up memory with it's default
queue size.

> much pull this gets. In any case: 64K routes should be working on
> current cheap routers in a network like that.

Can you define a "current cheap router"? For example, I am sad to
report the uap-lite-mesh APs have
really terrible range compared to nanostation and picostations, and
only have 8k of flash. Finding
a decent outdoor mesh capable dual radio AP has been on my mind for
years. Arm multicores are cheap, but
I haven't seen any...

> >> One of my environments uses BGP full-table from 3 upstream ISPs (each
> >> with 785k routes currently).
> >>   +----------+  +----------+  +----------+
> >>   |Customer A|  |Customer B|  |Customer C|
> >>   +----+-----+  +----+-----+  +----+-----+
>
> >>   +----+----+    +---+---+   +-----+-----+
> >>   |Edge Asia|----|Edge US|---|Edge Europe|
> >>   ++-------++    +---+---+   +-----------+
>
> >> +--+--+ +--+--+   +--+--+
> >> |ISP A| |ISP B|   |ISP C|
> >> +-----+ +-----+   +-----+
> >> Babeld would simply refuse to run on this environment, blocking the
> >> whole network without converging, with 100% CPU utilization.
> >
> >Don't do that. It hurts when you do that. :) rtod is a way to get to
> >overload more gently.
> What would have to be done to get confirmation of Juliusz' theory wrt
> the source of the load?
>
> Christof
>
> --
> ()  ascii ribbon campaign - against html e-mail
> /\  against proprietary attachments
>
> _______________________________________________
> Babel-users mailing list
> Babel-users at alioth-lists.debian.net
> https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users



--

Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619



More information about the Babel-users mailing list