[Babel-users] [PATCH] Fix ifup bug in send_multicast

Dave Taht dave at taht.net
Thu Nov 29 06:55:02 GMT 2018


Christof Schulze <christof.schulze at gmx.net> writes:

> On Mon, Nov 26, 2018 at 03:41:49AM -0800, Dave Taht wrote:
>>My followup just created a FOR_ALL_INTERFACES_UP macro which led to
>>even less code. :)
> \o/
>
>
>>Elsewhere I've been dithering. The proof of concept for uthash in
>>resend, was quite satisfying but far too much unneeded overhead (56
>>bytes per entry on 64 bits!! + the malloc overhead!), so I have been
>>looking over khash and so forth (
>>https://github.com/attractivechaos/klib )
>>
>>the benchmarks were impressive, but like all benchmarks, flawed -
>>testing an integer hash, where we need 34 bytes of "some hash"
>>(jenkins? spooky?), and x86 only, where I care mostly about mips, and
>>I care about startup time for a hash a lot....- I figure reworking
>>those benchmarks to (say) import a BGP route table, and then try to
>>project how much SADR and p2p routes are used, and a real
>>lookup/insert ratio in a benchmark would be more useful. But I get
>>14MOPs/sec on the basic benchmark on my x86 box on a million
>>integers... lookups take 10ns....
>>
>>Babel has an ordered "slot" concept in it...
>>
>>Elsewhere I was poking into the the evolution of the kernel's
>>timerwheel thing ( https://lwn.net/Articles/646950/ ) which makes the
>>valid point that most babel "timeouts" are really recurring events....
> Indeed they are. Do we really have to handle so many events that the
> complexity of using a timerwheel is justified?

Don't know!!

We hit a brick wall at 20k routes on the edgerouter X for a variety of
reasons. Earlier on weaker hardware, and about 30k on even fairly high
end x86 hardware, in a fairly small configuration in my lab. There's
other parameters like "number of speakers", and other stuff like that to
factor in to truly get a grip on messages/sec and other variables. 

I'm exploring various reasons for why that happens. It's pps, it's
lookups, it's logic, it's life in the big city.

After falling in love with datum api and it's possibilities...

I'm actually fiddling with the idea of taking the "datum" idea one step
further and using an integer tag for routes of all sorts, against a
master table. You still have to hash on input from packets and the
kernel but after that... 32 bits, no waiting, no slinging around 34
bytes at a time to get anywhere.

If you encode into
the tag IS_SS, IS_MARTIAN, IS_DEFAULT, IS_V4, and perhaps more dynamic
attributes like IS_UNREACH, IS_BEST, IS_FILTERED - and accept that you
can no longer represent 2 billion routes but, say 2 million... a whole
bunch more deep data compares go away.

A whole lot of things that currently require a search, vanish. Routes
become very abstract. Lots of stuff stay in icache.

Perhaps MPLS or bier become easier.

...

and I've kind of got a really long todo list of things accumulating that
bug me, like finally fixing wifi channel detection.

I'd really like to deploy hmac, which has gradually bubbled up to first
on my list to solidify.

I'd really like a gnarly test suite, a virtual environment as good or
bette the the RPL one, some fuzzing stuff too.

Far more than I could ever do by myself! - I've had now, what? 6 weeks to
hack on babel for the first time in 2 years... and no doubt something
will pop up to stop me in my tracks, soon.

But high on my personal desires is to see the next bigger babel
deployment - which yours sounds like it will be - succeed beyond your
wildest dreams. And to see homenet finally get off the launch pad.

I'd really like to turn a couple of these emails and deployment
experience into a best practices document.

My strongest recommendation remains - carry the minimum number of routes
possible and keep the churn to the edges! :)

...


That said, the timerwheel idea in part is driven by a desire to enable
various forms of pacing updates automagically so as to not overwhelm the
system - both internal cpu and route reception elsewhere.

I'm a really long way from actually implementing anything, my first
objective was to do as many "dumb user" things as possible - like
injecting the current standard bogon lists and bgp route table - and see
what broke, and why. Still dealing with the fallout from that!

...

As one example of generally improving openwrt router behaviors, I've got
most of an ebpf filter written that looks for RTM_ADDROUTE and
RTM_DELROUTE events with a protos of X and drops the ones you don't
want.

This would reduce the enormous amount of cpu dnsmasq and odhcpd eat if
they just never saw 20k routes they didn't actually care about in the
first place. By a lot. Maybe my bogon list injection would become
invisible. Etc. In my copious spare time....




More information about the Babel-users mailing list