[Babel-users] babeld slashes kernel route manipulation performance by 17000%

Toke Høiland-Jørgensen toke at toke.dk
Thu Apr 14 19:58:53 BST 2022


Daniel Gröber <dxld at darkboxed.org> writes:

> Hi Toke,
>
> On Thu, Apr 14, 2022 at 12:12:36AM +0200, Toke Høiland-Jørgensen wrote:
>> How about submitting this report to netdev and asking for advice there?
>> From a quick glance at the kernel fib code, this does not look like it's
>> an easy fix (if it can be fixed at all), but we should really get
>> someone who is an expert in the kernel routing code (which I'm not,
>> sadly) to weight in. You could add an explicit Cc to David Ahern
>> <dsahern at kernel.org> when doing submitting, and please keep me in Cc as
>> well. Or if you'd prefer, I can submit the report on your behalf?
>
> I'll try to get around to that but no promises :)

Alright; I don't think you have to do a longer writeup than what you
already did, your reproducer should be enough to show the problem :)

> Do you know David? I don't like just CCing people I don't know at random.

Yeah, I do. He's also one of the maintainers of the routing code, so
definitely the right person to Cc on this (explicitly Cc'ing maintainers
makes sure they see your email as not everyone follows netdev
rigorously).

>> As for why you're seeing this in particular when Babel is running, now
>> that we know the route dump is the culprit, it's quite obvious: While
>> Babel listens for new route notifications from the kernel, it doesn't
>> actually use those notifications directly; instead, it just sets a flag
>> (see kernel_route_notify() in babeld.c), and does a full dump whenever
>> it gets a notification. Which obviously interacts really badly with lots
>> of routes being inserted at the same time, as that will basically send
>> Babel into a loop of doing nothing but route dumps.
>
> I saw that too and I was poking at the babeld code for a while before
> settling on the iproute2 reproducer, also compared it quite closely with
> bird and I can't say I really see a difference in what they do other than
> netlink buffer sizing.
>
> Both will periodically dump the whole table so if I had two instances of
> bird running concurrently I could experience the same problem as it seems
> to be the recvmsg call that's blocking forever in the kernel while the
> table churn is going on so it's not even related to babeld doing a
> quadratic number of dumps or anything.
>
> What is also interesting is that babeld already seems to correctly filter
> the notifications by table id so all my route churn never actually sets the
> kernel_routes_changed flag (see parse_kernel_route_rta import_tables check
> at the bottom).

Ah, okay, that's interesting. Playing around with your examples, on my
laptop the performance goes from ~90k/s to ~1k/s when doing just a
single 'ip -6 route show table 1337'. The dump itself takes between 5-10
seconds, so with the 30-sec interval in babeld I guess the periodic dump
can coincide with the update at random.

Side note: why is bird replacing all the routes in the first place? :)

>> Bird does things a bit differently: it will directly update its internal
>> routing table from the netlink notification messages, and only does a
>> full dump at intervals (by default once every minute, but it can be
>> configured to run entirely without dumps).
>
> Right but the important part is that it does very much still do the dumps
> :)
>
> Also I wonder how netlink buffer overruns are dealt with when there isn't a
> periodic dump? Wouldn't it still have to do a full dump to resync if that
> happens?

I don't think there's any mechanism to ensure that; you have to
explicitly turn off the periodic dumps, though, so I guess it's a matter
of "don't do that, then" if you do so and your system can't process
notifications quickly enough :)

>> AFAICT the babeld code will require quite a bit of surgery to change
>> this behaviour; to the point where I think it may be simpler to
>> implement the RTT extension in Bird (but I'm obviously biased here)... :)
>
> In order to scale the number of native babel routes further you're
> probably right but that's not necessary for my use-case anyway. If
> this kernel bug goes away babeld would still work fine IMO.
>
> I'm currently working on babel ECMP support in bird though maybe I'll
> have a stab at RTT after that.

On the subject of ECMP and Babel, you may want to read this thread:
https://mailarchive.ietf.org/arch/msg/babel/i4tqsRIL3DS9e22GJ0QuoMef-P0/ 

I.e., it's not just a matter of writing the code we'll also need to
define the semantics in the spec. Just so you know what you're getting
yourself into ;)

-Toke



More information about the Babel-users mailing list