[Babel-users] babeld slashes kernel route manipulation performance by 17000%

Toke Høiland-Jørgensen toke at toke.dk
Wed Apr 13 23:12:36 BST 2022


dxld at darkboxed.org writes:

> Hi Toke,
>
> So this is definetly a kernel bug. I've managed to reproduce it with only
> iproute2 commands. The problem seems to be dumping the whole FIB while lots
> of individual route modifications are taking place.

Ah! Excellent work! :)

How about submitting this report to netdev and asking for advice there?
>From a quick glance at the kernel fib code, this does not look like it's
an easy fix (if it can be fixed at all), but we should really get
someone who is an expert in the kernel routing code (which I'm not,
sadly) to weight in. You could add an explicit Cc to David Ahern
<dsahern at kernel.org> when doing submitting, and please keep me in Cc as
well. Or if you'd prefer, I can submit the report on your behalf?

As for why you're seeing this in particular when Babel is running, now
that we know the route dump is the culprit, it's quite obvious: While
Babel listens for new route notifications from the kernel, it doesn't
actually use those notifications directly; instead, it just sets a flag
(see kernel_route_notify() in babeld.c), and does a full dump whenever
it gets a notification. Which obviously interacts really badly with lots
of routes being inserted at the same time, as that will basically send
Babel into a loop of doing nothing but route dumps.

Bird does things a bit differently: it will directly update its internal
routing table from the netlink notification messages, and only does a
full dump at intervals (by default once every minute, but it can be
configured to run entirely without dumps).

AFAICT the babeld code will require quite a bit of surgery to change
this behaviour; to the point where I think it may be simpler to
implement the RTT extension in Bird (but I'm obviously biased here)... :)

-Toke



More information about the Babel-users mailing list