[Babel-users] Higher CPU usage since 1.9.x leads to instability on slow devices

Dave Taht dave.taht at gmail.com
Tue Jun 2 06:03:08 BST 2020


you can easily introduce all sorts of issues in any routing daemon
using the rtod tool. ( https://github.com/dtaht/rtod )

I've certainly hit this one. I really wish more folk tested meshy
routing daemons for 4k+ routes or more.

One cure is to basically leverage the same qsort technique added
elsewhere in babel in this component. You can look at the
git diff from each addition of qsort to grok what to change. Big help.

(I note I also then saw some benefit in using an inline qsort, which
while ugly and using a macro, let you stick the base compared (ipv6)
value in registers, which helped a lot. I have some code for this
lying around somewhere)

Another thought (resend.c has issues too, which is not easily resolved
by qsort), was to switch to using hashing throughout. I gave
uthash (my most commonly used C hashing lib) a shot on resend.c, which
worked pretty good, but I felt the overhead was too high
and started to work with the less loved but lighter hhash instead,
then dug a hole for myself in wanting to use "route tags" instead of
full blown ip and ipv6 addressess everywhere, then timerwheels, then gave up.

There's new support for switching nexthops in modern kernels worth leveraging.

Threads might help...

I have a couple days a year to muck with babel, tops. I was hoping
someone would be inspired to do a rust version, because that's the
only thing I think could be competetive with the C version, easier to
extend, and could perhaps attract funding.

The go version stalled out, at least in part, because at the time the
kernel netlink interface for go, sucked. Go is finally getting smaller
shared
libs but I figure the GC will suck far worse than the C version
does... and has anyone ever played with librcu??

I'd set a goal for myself a few years back of 64k routes (city scale
networking: http://the-edge.taht.net/post/gilmores_list/ )), and then
started running into congestion control issues...

There's an awful lot we could do to make babeld awesome, just ENOTIME.

On Sun, May 31, 2020 at 5:04 PM Fabian Bläse <fabian at blaese.de> wrote:
>
> Hi Johannes,
>
> thanks again for the analysis!
> I made the mistake of not inspecting kinstall_route closer. For some reason I thought that this only does the actual netlink communication.
>
> Your guess actually could explain the behaviour I've seen very well. Installing routes takes a very long time, but only if all routes are installed already.
> Therefore it is relatively easy for the node to initially connect to the network, because there are less routes to compare to, when they are received and installed for the first time.
>
> As I've already said it might be possible that I've mixed up babeld version numbers, so I analyzed versions with a known issue.
> So it is very possible that this issue does not originate from babeld-1.9.x, but our network just got too big at a very unfortunate time.
>
> Regards,
> Fabian
>
> _______________________________________________
> Babel-users mailing list
> Babel-users at alioth-lists.debian.net
> https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users



-- 
"For a successful technology, reality must take precedence over public
relations, for Mother Nature cannot be fooled" - Richard Feynman

dave at taht.net <Dave Täht> CTO, TekLibre, LLC Tel: 1-831-435-0729



More information about the Babel-users mailing list