[Babel-users] babeld slashes kernel route manipulation performance by 17000%

Toke Høiland-Jørgensen toke at toke.dk
Sat Apr 16 18:46:15 BST 2022


Daniel Gröber <dxld at darkboxed.org> writes:

> Hi Toke,
>
> On Fri, Apr 15, 2022 at 12:48:05AM +0200, Toke Høiland-Jørgensen wrote:
>> Poked a bit more into the kernel fib code; the more I'm looking at it,
>> the more I'm convinced that the contention between add and dump is a
>> fundamental feature of the way the routing table is implemented, so I'm
>> not so sure it's simply a "bug" that can be "fixed" :(
>
> Hmm, so do you think we should still send a report to netdev then?

Hmm, no, probably no need...

>> > That is a good question, bird should really be able to see that the route
>> > is already installed and just don't bother. I see this del/add behaviour
>> > even when bgp is otherwise nice and converged though so I assumed bird is
>> > just like this.
>> 
>> Hmm, that's odd. What's your "background radiation" (i.e. route updates
>> per second when Bird is running normally and babeld isn't started? I
>> just checked my own router (which also imports a full v6 table), and
>> that churns less than one route per second. So if you're seeing a lot of
>> churn, maybe it's something in your config that could be fixed?
>
> No, no, it's also just a couple routes per second here too.
>
>> Alternatively, an option could be to improve Bird's performance when
>> replacing routes; for one thing, there's this comment in bird's
>> netlink.c:
>
> Right if they don't quite trust the kernel that could explain the add/del
> behaviour then.

Yeah, but I believe this is outdated; i.e., on newer kernels replace
should work just fine for IPv6 as well.

>> I've been meaning to look into adding nexthop support to Bird anyway, so
>> this could be a nice occasion to bump that up my list. Don't take that
>> as a promise, though... :P
>
> I'm not sure how nexthop objects would help with this problem specifically?
> If bird doesn't trust the kernel even if it can update the nexthop directly
> it can't necessarily trust the other route attributes are right either and
> so would still have to replace the FIB entry.

The idea would be to use the nexthop support as a marker for the kernel
doing the right thing on replace; and it'll be a performance improvement
in itself to use nexthop objects when updating routes. Especially for
ECMP, but also for single-path routes.

>> > As I said before it always triggers when I (re)start babeld but I can't see
>> > anything obvious in the log even with debug on as to why. Particularily I
>> > don't see any bgp state events so the sessions should be fine but for some
>> > reason it decides to churn everything anyway.
>> 
>> Well, the trigger when starting babeld would be the initial route dump,
>> I suppose: If you have lots of route churn happening in the background,
>> the drop in insert performance caused by the dump would be your trigger,
>> no?
>
> To clarify: when I start babeld bird is not yet churning just doing
> background level updates but the act of starting babeld seems to somehow
> make bird start churning routes soon after. I don't think the route dump
> alone should/could make bird do anything to it's route otherwise a iproute2
> dump would likely also do it.
>
> I can tell bird is starting to churn because its CPU usage goes up to 100%
> (most in the kernel). It's pretty mystifying how this could be connected to
> be sure, I'll have to do more testing.

Hmm, yeah, this seems odd; I don't really see how starting Babel itself
should affect this. So maybe there's a bug in Bird causing the churn?
Would definitely be worth looking into in more detail!

>> One thing I noticed when playing around with your reproducer example,
>> which may be something we could apply to the babeld case: If I run 'ip
>> -6 route show table 1337' I get the slowdown, but if I just run a
>> regular 'ip -6 route show', I do not. This seems to be because iproute2
>> is adding the table to the route dump request, which will make the
>> kernel dump only the requested table. And since the lock that's being
>> contended is per table, that should nicely get rid of the contention. A
>> patch to do this is included below (only compile-tested, so no idea if
>> it'll actually work :)).
>
> Ah this is excellent, thanks! I was wondering if the kernel keeps the
> tables in separate datastructures or not.
>
> Your patch seems to be against a different babeld branch than what I have
> (can't see any CHANGE_RULE stuff here) but removing that bit it applies
> fine.

Ah, I forgot to 'git pull' before patching. I'll rebase and fix the
other bug you noticed, then send a proper patch for babeld, also adding
the kernel feature detection.

> I just tested it and it does indeed seem to work, however I think we
> also need to make bird use table specific dumps since I'm still seeing
> the slowdown and it doesn't seem to set rtm_table in
> nl_request_dump_route either. I'll get on that.

Cool! I commented on that patch on the Bird list; feature detection is a
bit more complicated there, unfortunately, since it's a compile-time
switch...

-Toke



More information about the Babel-users mailing list