[Babel-users] babeld slashes kernel route manipulation performance by 17000%

Dave Taht dave.taht at gmail.com
Thu Feb 24 00:36:18 GMT 2022


Ages ago I attempted a string of optimizations for babeld which sped
it up by a lot, but introduced new bugs along the way, and juliusz
preferred the relative cleanliness of the babeld code compared to:

using ebpf to filter routes in the kernel (big speedup, but it was buggy)
inlining qsort (2x speedup), and leveraging sse/neon for comparisons
using uthashes to manage scheduling updates (or one day a timerwheel)
switching to a struct for the common params (someone elses branch)
threads for managing I/O and bailing out

Some of that work made it back into the mainline. Some things (like
unicast updates) sort of fell out of that.

In actuality I was also experimenting with a custom processor and
trying to find optimizations that could make it into hw of some form
or another, and it was a mess. I am not proud of those few weeks of
hacking/flailing.

I had a goal of 64k routes, and ultimately wanted to be working on
optimizing updates (the protocol supports longer lasting
announcements), in response to not being able to meet compute
deadlines and fell short (with still buggy code) at about 30k routes
on the very limited hw I was using.

Anyway, that tree https://github.com/dtaht/rabeld had the semi-broken
ebpf code for filtering kernel updates better, I've also long held out
hope that some new kernel support for switching routes faster could be
leveraged, and I've kind of longed that someone else would stress out
the bird version, not just of babel, but of multiple routing
protocols, using tools like rtod, here:

https://github.com/dtaht/rtod

I like bird's codebase a lot.

On Wed, Feb 23, 2022 at 7:13 PM Daniel Gröber <dxld at darkboxed.org> wrote:
>
> Hi Toke and Juliusz,
>
> On Wed, Feb 23, 2022 at 09:43:29PM +0100, Toke Høiland-Jørgensen wrote:
> > Probably because babeld subscribes to netlink notifications for all new
> > routes, and only filters them on the table name fairly late,
> > specifically here:
> >
> > https://github.com/jech/babeld/blob/master/kernel_netlink.c#L1175
>
> Thanks for the pointer, I figured it would be something like that but I'm
> still surprised babled should be able to (seemingly) block the kernel from
> processing other netlink messages but I haven't had the time to really
> review the code yet properly yet.
>
> I would have expected the kernel to just drop events when babled falls
> behind with processing.
>
> > So babeld will process and parse all route entries even if it won't
> > export them.
>
> Right, so I wonder if there is a way to let the kernel do the filtering
> before passing events to babeld. Perhaps just making babled faster at
> processing route updates would be a better solution though. Maybe I'll try
> my hand at some profiling when I get a chance.
>
> > implementation in Bird as well; that has no issues with running
> > concurrently with a full BGP table. It is even possible to run babel and
> > BGP in the same Bird instance, but I split mine out to two instances
> > (one for BGP, one for Babel) because I had issues with the
> > single-threaded nature of Bird causing Babel to miss hello updates while
> > processing a large BGP update.
>
> I am aware of the babel support in bird, but in my setup the whole point of
> using babel is for the RTT metric support which bird doesn't seem to
> support yet.
>
> I had a look at FRR too since it supposedly does support RTT but according
> to the babel homepage using it is discouraged. I was wondering if that is
> still correct actually?
>
> On Thu, Feb 24, 2022 at 12:55:01AM +0100, Juliusz Chroboczek wrote:
> > > I run Bird in a similar setup as yours, BTW, but using the Babel
> > > implementation in Bird
> >
> > Just to clarify: there are two major implementations of Babel:
> >
> >   - babeld, which is a research project, and was written over the years by
> >     myself and a number of students, most of whom only stayed during an
> >     internship before moving on;
>
> > While I find babeld more convenient than BIRD, since it requires little
> > configuration in many common cases, I recommend that people use BIRD in
> > preference to babeld in production deployments.
>
> Yeah like I said above I am using babeld because of the RTT metric support
> otherwise I would have preferred bird :)
>
> --Daniel
> _______________________________________________
> Babel-users mailing list
> Babel-users at alioth-lists.debian.net
> https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users



-- 
I tried to build a better future, a few times:
https://wayforward.archive.org/?site=https%3A%2F%2Fwww.icei.org

Dave Täht CEO, TekLibre, LLC



More information about the Babel-users mailing list