[Babel-users] 64k routes, bird, babel rfc, rtod, etc

Dave Taht dave.taht at gmail.com
Thu Nov 8 21:15:53 GMT 2018


On Thu, Nov 8, 2018 at 12:23 PM Juliusz Chroboczek <jch at irif.fr> wrote:
>
> > Would you like me to try merging this against head?
>
> No need, it should be an easy merge.  Please continue testing, you're
> being very helfpul.

If you want packet caps of any of this insanity let me know. I did
four tests, detailed below.

I want to be clear that what I just tested was my nlogn-uthash-merge
branch with your latest stuff. (however, these tests do not really
touch the resend code - it's elsewhere that runs).... compiled with
-O2 -pg on gcc 7 (except the older ubuntu boxes) (your merge pushed
out now). I do also have an osx box (on the boat).  I am primarily
using unicast and enjoying my switches blinking madly...

... being as I was always the kid that brought the lithium to the
swimming hole, I tried 64k routes from one box... (instead of my usual
2 or 6)

things fall apart at about 32k in this case.

(I'll try just the nlogn branch and see what happens. It's lunchtime
though, don't expect a fast turnaround)

1) unicast, 64k routes, my uthash + you
 %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 28.75      5.06     5.06    42690     0.12     0.16  netlink_read
 19.15      8.43     3.37                             kernel_route_compare
  8.44      9.92     1.49                             check_xroutes
  8.18     11.36     1.44 193771004     0.00     0.00  do_filter
  6.68     12.53     1.18 178893184     0.00     0.00  filter_route
  5.57     13.51     0.98 238581203     0.00     0.00  xroute_compare
  4.09     14.23     0.72 21816688     0.00     0.00  really_buffer_update
  3.21     14.80     0.57 357610981     0.00     0.00  martian_prefix
  2.53     15.24     0.45  4593628     0.00     0.00  find_xroute_slot
  1.99     15.59     0.35 171937780     0.00     0.00  redistribute_filter
  1.51     15.86     0.27 27374912     0.00     0.00  roughly
  1.48     16.12     0.26  9159708     0.00     0.00  find_route_slot
  0.88     16.27     0.16                             compare_buffered_updates
  0.80     16.41     0.14  4504797     0.00     0.00  really_send_update
  0.74     16.54     0.13 21816807     0.00     0.00  output_filter
  0.71     16.67     0.13        2    62.50    62.50  wait_for_fd
  0.68     16.79     0.12 26357212     0.00     0.00  timeval_minus_msec
  0.57     16.89     0.10        1   100.00   100.00  getint

2) I inlined QSORT, and the relevant xroute and route match routines
(the compiler thinks they are too big to inline and miss the caller
data part of the compare is static... which is the whole point of
inlining qsort....) - and I get to... wait for it... 61785 routes with
~5k being unreachable... before it all falls apart. It's not just cpu
but I/O here... lemme try mcast

  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 15.24      0.39     0.39       94     4.15    22.16  flushupdates
 13.28      0.73     0.34  2714259     0.00     0.00  find_xroute_slot
  8.20      0.94     0.21  5370044     0.00     0.00  find_route_slot
  7.81      1.14     0.20 44478662     0.00     0.00  check_xroutes
  7.62      1.34     0.20 13823548     0.00     0.00  roughly
  7.42      1.53     0.19  2634733     0.00     0.00  really_buffer_update
  7.42      1.72     0.19 10676052     0.00     0.00  send_multihop_request
  5.08      1.85     0.13 10883489     0.00     0.00  network_prefix
  4.49      1.96     0.12 55686887     0.00     0.00  find_xroute
  2.73      2.03     0.07      479     0.15     0.29  netlink_read
  2.73      2.10     0.07      437     0.16     0.20  parse_packet
  2.15      2.16     0.06 13353980     0.00     0.00  timeval_minus_msec
  2.15      2.21     0.06  3085315     0.00     0.00  filter_route
  1.95      2.26     0.05  2714259     0.00     0.00  add_xroute
  1.56      2.30     0.04 10780559     0.00     0.00  flushbuf
  1.56      2.34     0.04        1    40.00    40.00  getword
  1.17      2.37     0.03        3    10.00    10.00  do_filter
  0.98      2.40     0.03 10780555     0.00     0.00  jitter

3) mcast instead, it appears to "hold it together" longer...

4) killed -pg (64k routes or BUST!) :) compiled with -O3

nope. But I'll try 6 boxes....

...

Just like the sith, there are always two bugs, and in these cases I'm
looking at the uthash branch on my weak i3 nuc

In terms of caps, evaluating the routes sent... I could really use
some automation to pull apart the caps in context with the test.

In particular, would like to see when wildcard and other forms of
expired requests are getting sent. Got anything?

>
> -- Juliusz



-- 

Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740



More information about the Babel-users mailing list