[Babel-users] babeld slashes kernel route manipulation performance by 17000%

Toke Høiland-Jørgensen toke at toke.dk
Thu Feb 24 22:36:06 GMT 2022


Daniel Gröber <dxld at darkboxed.org> writes:

> Hi Toke and Juliusz,
>
> On Wed, Feb 23, 2022 at 09:43:29PM +0100, Toke Høiland-Jørgensen wrote:
>> Probably because babeld subscribes to netlink notifications for all new
>> routes, and only filters them on the table name fairly late,
>> specifically here:
>> 
>> https://github.com/jech/babeld/blob/master/kernel_netlink.c#L1175
>
> Thanks for the pointer, I figured it would be something like that but I'm
> still surprised babled should be able to (seemingly) block the kernel from
> processing other netlink messages but I haven't had the time to really
> review the code yet properly yet.
>
> I would have expected the kernel to just drop events when babled falls
> behind with processing.

Yeah, I find this a bit surprising as well. What kernel version are you
seeing this on, and what does the CPU usage show while it's ongoing
(just starting 'top' and sorting by CPU usage should show you which
process(es) are using the most CPU time).

>> So babeld will process and parse all route entries even if it won't
>> export them.
>
> Right, so I wonder if there is a way to let the kernel do the filtering
> before passing events to babeld. Perhaps just making babled faster at
> processing route updates would be a better solution though. Maybe I'll try
> my hand at some profiling when I get a chance.

See my other reply to Juliusz; from a quick look, I can't see any
obvious low-hanging fruit. But as per the above, I do agree that it's a
bit strange that the whole system slows down like this, so figuring out
why may be worthwhile.

>> implementation in Bird as well; that has no issues with running
>> concurrently with a full BGP table. It is even possible to run babel and
>> BGP in the same Bird instance, but I split mine out to two instances
>> (one for BGP, one for Babel) because I had issues with the
>> single-threaded nature of Bird causing Babel to miss hello updates while
>> processing a large BGP update.
>
> I am aware of the babel support in bird, but in my setup the whole
> point of using babel is for the RTT metric support which bird doesn't
> seem to support yet.

Ah, right, yeah, it doesn't. But good to know there's demand for this,
that's a motivation for implementing it :)

-Toke



More information about the Babel-users mailing list