[Babel-users] babeld slashes kernel route manipulation performance by 17000%

Daniel Gröber dxld at darkboxed.org
Sat Apr 16 12:38:02 BST 2022


Hi Toke,

On Fri, Apr 15, 2022 at 12:48:05AM +0200, Toke Høiland-Jørgensen wrote:
> Poked a bit more into the kernel fib code; the more I'm looking at it,
> the more I'm convinced that the contention between add and dump is a
> fundamental feature of the way the routing table is implemented, so I'm
> not so sure it's simply a "bug" that can be "fixed" :(

Hmm, so do you think we should still send a report to netdev then?

> > That is a good question, bird should really be able to see that the route
> > is already installed and just don't bother. I see this del/add behaviour
> > even when bgp is otherwise nice and converged though so I assumed bird is
> > just like this.
> 
> Hmm, that's odd. What's your "background radiation" (i.e. route updates
> per second when Bird is running normally and babeld isn't started? I
> just checked my own router (which also imports a full v6 table), and
> that churns less than one route per second. So if you're seeing a lot of
> churn, maybe it's something in your config that could be fixed?

No, no, it's also just a couple routes per second here too.

> Alternatively, an option could be to improve Bird's performance when
> replacing routes; for one thing, there's this comment in bird's
> netlink.c:

Right if they don't quite trust the kernel that could explain the add/del
behaviour then.

> I've been meaning to look into adding nexthop support to Bird anyway, so
> this could be a nice occasion to bump that up my list. Don't take that
> as a promise, though... :P

I'm not sure how nexthop objects would help with this problem specifically?
If bird doesn't trust the kernel even if it can update the nexthop directly
it can't necessarily trust the other route attributes are right either and
so would still have to replace the FIB entry.

> > As I said before it always triggers when I (re)start babeld but I can't see
> > anything obvious in the log even with debug on as to why. Particularily I
> > don't see any bgp state events so the sessions should be fine but for some
> > reason it decides to churn everything anyway.
> 
> Well, the trigger when starting babeld would be the initial route dump,
> I suppose: If you have lots of route churn happening in the background,
> the drop in insert performance caused by the dump would be your trigger,
> no?

To clarify: when I start babeld bird is not yet churning just doing
background level updates but the act of starting babeld seems to somehow
make bird start churning routes soon after. I don't think the route dump
alone should/could make bird do anything to it's route otherwise a iproute2
dump would likely also do it.

I can tell bird is starting to churn because its CPU usage goes up to 100%
(most in the kernel). It's pretty mystifying how this could be connected to
be sure, I'll have to do more testing.

> One thing I noticed when playing around with your reproducer example,
> which may be something we could apply to the babeld case: If I run 'ip
> -6 route show table 1337' I get the slowdown, but if I just run a
> regular 'ip -6 route show', I do not. This seems to be because iproute2
> is adding the table to the route dump request, which will make the
> kernel dump only the requested table. And since the lock that's being
> contended is per table, that should nicely get rid of the contention. A
> patch to do this is included below (only compile-tested, so no idea if
> it'll actually work :)).

Ah this is excellent, thanks! I was wondering if the kernel keeps the
tables in separate datastructures or not.

Your patch seems to be against a different babeld branch than what I have
(can't see any CHANGE_RULE stuff here) but removing that bit it applies
fine.

I just tested it and it does indeed seem to work, however I think we also
need to make bird use table specific dumps since I'm still seeing the
slowdown and it doesn't seem to set rtm_table in nl_request_dump_route
either. I'll get on that.

--Daniel



More information about the Babel-users mailing list