[Babel-users] PARTIALLY SOLVED: default route weirdness

Dave Taht dave.taht at gmail.com
Wed Oct 3 20:53:11 BST 2018


On Wed, Oct 3, 2018 at 12:20 PM Dave Taht <dave.taht at gmail.com> wrote:
>
> OK. Normally throughout the rest of my original network of 3+ years
> back, I routed everything.
>
> The new install is ethernet bridged to the wifi. Without specifying
> the wired parameter for the interface, *eventually* - not on the
> initial route exchange - the txcost goes up to 256, which changes the
> ref_metric.
>
> As elsewhere on the lan, things are routed, this sucks the default
> route out to my old (routed) 1.7.1 instance.
>
> in /etc/config/babeld
>
> interface br-lan
> option wired true fixes it for the new bridged boxes.
>
> add neighbour 2390e30 address fe80::f6f2:6dff:feb6:a01c if eno1 reach
> ffff ureach 0000 rxcost 96 txcost 96 rtt 2.419 rttcost 0 cost 96
>
> add route 23922f0 prefix 0.0.0.0/0 from 0.0.0.0/0 installed yes id
> f6:f2:6d:ff:fe:b6:a0:1d metric 96 refmetric 0 via
> fe80::f6f2:6dff:feb6:a01c if eno1
> add route 23930d0 prefix 0.0.0.0/0 from 0.0.0.0/0 installed no id
> f6:f2:6d:ff:fe:b6:a0:1d metric 192 refmetric 96 via
> fe80::46d9:e7ff:fe93:822e if eno1
> add route 2393140 prefix 0.0.0.0/0 from 0.0.0.0/0 installed no id
> f6:f2:6d:ff:fe:b6:a0:1d metric 192 refmetric 96 via
> fe80::230:18ff:fec9:de9c if eno1
>
> The sad thing is that I'd encountered this problem before, and normal users who
> will probably default bridge their default gw ethernet and wifi
> together won't see it either. Just me, with my ancient 6+ year old
> network.
>
> Now, I'm still puzzled as to why the ref_metric got wau bigger than
> 256 on 1.8.3. I'm a bit puzzled about the various treatments of
> refmetric in the code as to a short > INFINITY and so on and what I'm
> currently running is 1.7.1, bird, and a version with 1.8.3 with
> refmetric universally promoted to an unsigned int. And it's lunchtime.
> 2 days later.
>
> And I don't know why the 1.7.1 couch gateway is falling off entirely,
> here present,
>
> d at dancer:~/git/babeld-refmetric$ echo dump | nc ::1 33123 | grep
> 'prefix 0.0.0.0'
> add route 23922f0 prefix 0.0.0.0/0 from 0.0.0.0/0 installed yes id
> f6:f2:6d:ff:fe:b6:a0:1d metric 96 refmetric 0 via
> fe80::f6f2:6dff:feb6:a01c if eno1
> add route 23930d0 prefix 0.0.0.0/0 from 0.0.0.0/0 installed no id
> f6:f2:6d:ff:fe:b6:a0:1d metric 192 refmetric 96 via
> fe80::46d9:e7ff:fe93:822e if eno1
> add route 2393140 prefix 0.0.0.0/0 from 0.0.0.0/0 installed no id
> f6:f2:6d:ff:fe:b6:a0:1d metric 192 refmetric 96 via
> fe80::230:18ff:fec9:de9c if eno1
> add route 239d960 prefix 0.0.0.0/0 from 0.0.0.0/0 installed no id
> a2:21:b7:ff:fe:ac:e4:55 metric 416 refmetric 320 via
> fe80::e091:f5ff:febe:a353 if eno1
> d at dancer:~/git/babeld-refmetric$
>
> here not:
>
> d at dancer:~/git/babeld-refmetric$ echo dump | nc ::1 33123 | grep
> 'prefix 0.0.0.0'
> add route 23922f0 prefix 0.0.0.0/0 from 0.0.0.0/0 installed yes id
> f6:f2:6d:ff:fe:b6:a0:1d metric 96 refmetric 0 via
> fe80::f6f2:6dff:feb6:a01c if eno1
> add route 23930d0 prefix 0.0.0.0/0 from 0.0.0.0/0 installed no id
> f6:f2:6d:ff:fe:b6:a0:1d metric 192 refmetric 96 via
> fe80::46d9:e7ff:fe93:822e if eno1
> add route 2393140 prefix 0.0.0.0/0 from 0.0.0.0/0 installed no id
> f6:f2:6d:ff:fe:b6:a0:1d metric 192 refmetric 96 via
> fe80::230:18ff:fec9:de9c if eno1

OK, I think I figured this one out. I think juliusz put in an
optimization to only consider 3 routes (somewhere in the code?).
as I had 3+ default routes available having 3+ babel speakers, the
routers re-announcing the local default route with
a "gooder" metric than the backup default route.

This meant when I took the labgw down, it retracted it's routes, then
the other stable speakers heard that and retracted their routes,
and then they had to wait for a reannouncement of the default route
from the couch gw in order to find another default route to use,
so I'd have more than a few seconds of connectivity interrupted.
(similarly, with the inflated metric by not having txcost right,
I had the opposite problem while I took the couch box down)

Assuming this theory is correct... ??

In the case of defaultish routes I think retaining all of them is
probably a good idea.

Still... other routes tend to be pretty specific, so
multiple speakers would encounter this problem less. I don't mind
chewing up more compute to keep more routes in memory
either, I guess, in the general case.

 Or perhaps there's some sort of fuzzy characteristic like "Hey, I got
a lot of routes with good metrics from this speaker", and
"a couple lousy ones, let's still keep those around because they
probably point somewhere usefully different".

>
> On Wed, Oct 3, 2018 at 7:06 AM Dave Taht <dave.taht at gmail.com> wrote:
> >
> > So I reverted those two boxes (couch and labgw) to 1.7.1 (they are arm
> > and mips respectively). No matter in which order I restart the
> > daemons, the refmetric is set properly to 0, and the metric falls to
> > 256 after a couple hellos, making the lab always be the correct
> > default gw.
> >
> > so I am assuming there is a change (bug?) in a calculation on the
> > refmetric since 1.7.1. The only puzzling thing
> > about this set of events is that *same exact babel configuration for
> > 1.8.3 and 1.7.1* I get a metric 96 refmetric 0
> > if I only have one major speaker (the labgw) and the weird refmetric
> > inflation I noted on the prior email,
> >
> > ... and a consistent metric 256 refmetric 0... with two 1.7.1 speakers
> > (watching with an 1.8.3 speaker, going to
> > kill that next)
> >
> >  echo dump | nc ::1 33123 | grep 'prefix 0.0.0.0'
> > add route 93d990 prefix 0.0.0.0/0 from 0.0.0.0/0 installed yes id
> > f6:f2:6d:ff:fe:b6:a0:1d metric 256 refmetric 0 via
> > fe80::f6f2:6dff:feb6:a01c if eno1
> > add route 937f60 prefix 0.0.0.0/0 from 0.0.0.0/0 installed no id
> > f6:f2:6d:ff:fe:b6:a0:1d metric 352 refmetric 256 via
> > fe80::230:18ff:fec9:de9c if eno1
> > add route 938e60 prefix 0.0.0.0/0 from 0.0.0.0/0 installed no id
> > f6:f2:6d:ff:fe:b6:a0:1d metric 352 refmetric 256 via
> > fe80::46d9:e7ff:fe93:822e if eno1
> > add route 941a70 prefix 0.0.0.0/0 from 0.0.0.0/0 installed no id
> > a2:21:b7:ff:fe:ac:e4:55 metric 320 refmetric 224 via
> > fe80::e091:f5ff:febe
> >
> > I kill the couche babel
> >
> > d at dancer:~/git$ echo dump | nc ::1 33123 | grep 'prefix 0.0.0.0'
> > add route 93d1f0 prefix 0.0.0.0/0 from 0.0.0.0/0 installed yes id
> > f6:f2:6d:ff:fe:b6:a0:1d metric 256 refmetric 0 via
> > fe80::f6f2:6dff:feb6:a01c if eno1
> > add route 937f60 prefix 0.0.0.0/0 from 0.0.0.0/0 installed no id
> > f6:f2:6d:ff:fe:b6:a0:1d metric 352 refmetric 256 via
> > fe80::230:18ff:fec9:de9c if eno1
> > add route 938e60 prefix 0.0.0.0/0 from 0.0.0.0/0 installed no id
> > f6:f2:6d:ff:fe:b6:a0:1d metric 352 refmetric 256 via
> > fe80::46d9:e7ff:fe93:822e if eno1
> >
> > Restart the labgw - this result totally makes sense to me, for a couple hellos,
> >
> > d at dancer:~/git$ echo dump | nc ::1 33123 | grep 'prefix 0.0.0.0'
> > add route 93d1f0 prefix 0.0.0.0/0 from 0.0.0.0/0 installed yes id
> > f6:f2:6d:ff:fe:b6:a0:1d metric 511 refmetric 0 via
> > fe80::f6f2:6dff:feb6:a01c if eno1
> > add route 937f60 prefix 0.0.0.0/0 from 0.0.0.0/0 installed no id
> > f6:f2:6d:ff:fe:b6:a0:1d metric 703 refmetric 607 via
> > fe80::230:18ff:fec9:de9c if eno1
> > add route 938e60 prefix 0.0.0.0/0 from 0.0.0.0/0 installed no id
> > f6:f2:6d:ff:fe:b6:a0:1d metric 607 refmetric 511 via
> > fe80::46d9:e7ff:fe93:822e if eno1
> >
> > Which then evolves back to this:
> >
> > d at dancer:~/git$ echo dump | nc ::1 33123 | grep 'prefix 0.0.0.0'
> > add route 93d1f0 prefix 0.0.0.0/0 from 0.0.0.0/0 installed yes id
> > f6:f2:6d:ff:fe:b6:a0:1d metric 256 refmetric 0 via
> > fe80::f6f2:6dff:feb6:a01c if eno1
> > add route 937f60 prefix 0.0.0.0/0 from 0.0.0.0/0 installed no id
> > f6:f2:6d:ff:fe:b6:a0:1d metric 369 refmetric 273 via
> > fe80::230:18ff:fec9:de9c if eno1
> > add route 938e60 prefix 0.0.0.0/0 from 0.0.0.0/0 installed no id
> > f6:f2:6d:ff:fe:b6:a0:1d metric 352 refmetric 256 via
> > fe80::46d9:e7ff:fe93:822e if eno1
>
>
>
> --
>
> Dave Täht
> CEO, TekLibre, LLC
> http://www.teklibre.com
> Tel: 1-669-226-2619



-- 

Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619



More information about the Babel-users mailing list