[Babel-users] 64000 route bust

Dave Taht dave.taht at gmail.com
Fri Oct 26 23:00:22 BST 2018


I should stress that my real goal was to be able to safely carry about
4k routes, and
everything else was just gravy... 16000 was really encouraging....

So I tried with the nlogn branch to get to 64000 routes and went bust.

Ran into various system limits - I get back an ENOSPC error:

netlink_read: recvmsg(): No buffer space available

which I tried to work around by adding ENOSPC to the list of error nos
to recheck for, increasing the relevant setsockopts for buffer space,
and stuff like this...

--- a/xroute.c
+++ b/xroute.c
@@ -343,7 +343,7 @@ check_xroutes(int send_updates)
     struct filter_result filter_result;
     int numroutes, numaddresses;
     static int maxroutes = 8;
-    const int maxmaxroutes = 16 * 1024;
+    const int maxmaxroutes = 128 * 1024;

This also calls calloc, which I'm not certain is actually needed for
this particular operation...

netlink, I believe also supports recvmmsg, but the api is no fun.

I did get it as high as here but
root at spaceheater:~/git/rtod# ip -6 route | wc -l
41797

but never got so far as hitting 64k. Usually after a minute or so I
can get to 16000, I figure I might be hitting this limit?
message.h:#define MAX_BUFFERED_UPDATES 200

dropping packets elsewhere, etc, etc.

anyway, with 64k routes injected and those patches noted above I ended
up with a gprof of this

 time   seconds   seconds    calls  ms/call  ms/call  name
 53.34     10.46    10.46                             check_xroutes
 18.82     14.15     3.69   135477     0.03     0.04  netlink_read.constprop.4
  5.48     15.23     1.08 111741232     0.00     0.00  filter_route
  4.03     16.02     0.79 99407751     0.00     0.00  redistribute_filter
  3.44     16.69     0.68  7367954     0.00     0.00  find_installed_route
  2.14     17.11     0.42     2071     0.20     0.86  flushupdates

All the routers spew these and essentially drop offline

Late hello: bufferbloated neighbor fe80::e091:f5ff:febe:a353
Late hello: bufferbloated neighbor fe80::230:18ff:fec9:de9c

I inlined qsort both times just for the heck of it, and my lab is now
running at least 5 slightly different versions of babel that I should
go unify.  :)

fun!

How's the hmac branch doin?

Anyway, thoughts: It would be good if the hello message was
prioritized somehow, my thinking was with a move to unicast for route
transfers, listening and sending on the mcast socket might have less
HOL blocking. Once upon a time in the rabeld branch I'd put checks in
core loops to make sure the hello went out on time too....

another thought was to randomize the starting point for an update, or
qsort backwards every other time, or with the hello message ensure
that the most important routes always went out (like to self, default,
etc).

>From what I can tell though the nlogn branch is correct, but some more
*limited* tests of correctness seem valuable going forward. Hope to be
done my basic campus deployment by sunday.

-- 

Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740



More information about the Babel-users mailing list