[Babel-users] Higher CPU usage since 1.9.x leads to instability on slow devices

Dave Taht dave.taht at gmail.com
Tue Jun 2 06:11:54 BST 2020


lastly, after cracking 1k routes myself, I went and rigorously
renumbered and partitiioned my network, and
put in a ton of covering routes, which was a PITA. But I went from 1k
routes all over to a max of 69. Down to one packet.

root at ricko:~# ip route | wc -l
42
root at ricko:~# ip -6 route | wc -l
27

(most of my boxes prior to this exported 10-12 routes each, now it's 2)

However using covering routes can make things worse, in that I
perpetually end up adding a weak node in my lab that can reach a wifi
AP
outside of it, and the lab routes escape, and then I fix it and then
go offline for 2+ minutes while they expire.

I'm not satisfied with my solutions for managing default routes that
go up or down, or the covering routes I install in openwrt.

/etc/config/babeld.conf says: (you can also do this in /etc/config/babeld)

redistribute proto 50
redistribute local deny

and /etc/config/network has, as an example:

config 'route' 'myroutes'
        option 'interface' 'lan'
        option 'target' '172.23.48.0'
        option 'netmask' '255.255.252.0'
        option 'type' 'unreachable'
        option 'proto' '50'

In the hope this helps some.

On Mon, Jun 1, 2020 at 10:03 PM Dave Taht <dave.taht at gmail.com> wrote:
>
> you can easily introduce all sorts of issues in any routing daemon
> using the rtod tool. ( https://github.com/dtaht/rtod )
>
> I've certainly hit this one. I really wish more folk tested meshy
> routing daemons for 4k+ routes or more.
>
> One cure is to basically leverage the same qsort technique added
> elsewhere in babel in this component. You can look at the
> git diff from each addition of qsort to grok what to change. Big help.
>
> (I note I also then saw some benefit in using an inline qsort, which
> while ugly and using a macro, let you stick the base compared (ipv6)
> value in registers, which helped a lot. I have some code for this
> lying around somewhere)
>
> Another thought (resend.c has issues too, which is not easily resolved
> by qsort), was to switch to using hashing throughout. I gave
> uthash (my most commonly used C hashing lib) a shot on resend.c, which
> worked pretty good, but I felt the overhead was too high
> and started to work with the less loved but lighter hhash instead,
> then dug a hole for myself in wanting to use "route tags" instead of
> full blown ip and ipv6 addressess everywhere, then timerwheels, then gave up.
>
> There's new support for switching nexthops in modern kernels worth leveraging.
>
> Threads might help...
>
> I have a couple days a year to muck with babel, tops. I was hoping
> someone would be inspired to do a rust version, because that's the
> only thing I think could be competetive with the C version, easier to
> extend, and could perhaps attract funding.
>
> The go version stalled out, at least in part, because at the time the
> kernel netlink interface for go, sucked. Go is finally getting smaller
> shared
> libs but I figure the GC will suck far worse than the C version
> does... and has anyone ever played with librcu??
>
> I'd set a goal for myself a few years back of 64k routes (city scale
> networking: http://the-edge.taht.net/post/gilmores_list/ )), and then
> started running into congestion control issues...
>
> There's an awful lot we could do to make babeld awesome, just ENOTIME.
>
> On Sun, May 31, 2020 at 5:04 PM Fabian Bläse <fabian at blaese.de> wrote:
> >
> > Hi Johannes,
> >
> > thanks again for the analysis!
> > I made the mistake of not inspecting kinstall_route closer. For some reason I thought that this only does the actual netlink communication.
> >
> > Your guess actually could explain the behaviour I've seen very well. Installing routes takes a very long time, but only if all routes are installed already.
> > Therefore it is relatively easy for the node to initially connect to the network, because there are less routes to compare to, when they are received and installed for the first time.
> >
> > As I've already said it might be possible that I've mixed up babeld version numbers, so I analyzed versions with a known issue.
> > So it is very possible that this issue does not originate from babeld-1.9.x, but our network just got too big at a very unfortunate time.
> >
> > Regards,
> > Fabian
> >
> > _______________________________________________
> > Babel-users mailing list
> > Babel-users at alioth-lists.debian.net
> > https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users
>
>
>
> --
> "For a successful technology, reality must take precedence over public
> relations, for Mother Nature cannot be fooled" - Richard Feynman
>
> dave at taht.net <Dave Täht> CTO, TekLibre, LLC Tel: 1-831-435-0729



-- 
"For a successful technology, reality must take precedence over public
relations, for Mother Nature cannot be fooled" - Richard Feynman

dave at taht.net <Dave Täht> CTO, TekLibre, LLC Tel: 1-831-435-0729



More information about the Babel-users mailing list