[Babel-users] basic bird//babeld interop for ss

Dave Taht dave at taht.net
Sun Oct 28 15:39:47 GMT 2018


Juliusz Chroboczek <jch at irif.fr> writes:

>> * I hit the nlogn branch with the same stuff... it's cpu barely ticks
>> over, thousands of routes get distributed... it gets knocked off the
>> network... and all end up unreachable, after everybody else runs out of
>> some resource or another....
>
> Dave, I'm planning to merge nlogn into master, so if you have time to
> debug that, that'd be helpful.

Sorry to have been unclear.

So far the nlogn patch is a vast improvement on getting xroutes in.

It appears to be correct from eyeballing the route distribution
and doing things like ip -6 route show proto babel | grep that:addr | wc -l
elsewhere.

Ship it.

...

The confusion part stems from, what I had been doing before to induce
carnage, and bypassing the xroute import issue, was injecting lesser
tons of routes into multiple other boxes, rather than one.

The cpu bottleneck shifts over the various resend routines (I'd sent you
a gprof of that at some point)

Once you get over ~8k routes, you end up compute bound again on every
architecture, late hellos ensue, metrics climb, stuff goes unreachable,
routes get retracted, and you probably hit other cpu bottlenecks while
clearing things out (the injector is 12 core xeon, and it too goes
unreachable because it's in resend, or it's got late hellos from elsewhere)

In other words, I can blow up the network by injecting
16k routes from one now very fast nlogn xroute capable box,
where before it took 8 injecting 1k each. :)

anyway - the babel.conf over there is:

default enable-timestamps true
interface enp7s0
interface enp6s0
interface enp4s0
redistribute proto 50 allow

And this used to just ssh to various boxes to do them in. It was
unclear which of the xroute vs resend routines that would do them in...

#!/bin/sh

for i in `seq 1 32` # really! don't go above 2-4 unless you are crazy 
do
	./rtod -r 2000 -H test$i # -H generates a different ipv6 prefix
	sleep 13
done

Now it's just resend.

The good thing to note is *no crashes*. After the route flood clears,
all the lost routes to "normal" stuff reappear, and everything returns
to normal. There's a mips (campusgw) box doing heavy filtering, a bird
box now, a couple boxes with this patch (arm, mips, x86 apu2), 

I'm going to add a second bird box at some point, but I have trees
to climb. 

Things are pretty good in the sub 2k route range, across
all arches.

My goal in life today was to get the last 6 routers up and since nlogn
has the latest rfcbis stuff in it (?) and source specific seems to be
working (when not under stress) I may finally go retire the last cerowrt
box also.

(note I did send a small patch for a segvio at one point,
 "check for interface buffer on a multihop request")

and ya should take the "late hello" patch as it's the first sign of trouble.



More information about the Babel-users mailing list