[Babel-users] ECMP on endpoints [was: babeld slashes...]

Sat Apr 16 19:28:14 BST 2022

Daniel Gröber <dxld at darkboxed.org> writes:

> Hi Toke,
>
> On Fri, Apr 15, 2022 at 10:03:11PM +0200, Toke Høiland-Jørgensen wrote:
>> I basically do this for my laptop, sans the MAC authentication (but I
>> really ought to get that rolled out as well). Works pretty seamlessly:
>> When I plug my laptop into the dock traffic shifts to the wired
>> interface, and when it's anywhere else it goes over wireguard. I don't
>> bother with Babel on the WiFi network, the wg tunnels go to the same
>> router in the building anyway, so there's no noticeable difference...
>
> In my case the tunnels terminate in hosted VMs doing the BGP announcements
> so it's better not to route via those. I also like the slightly better
> performance due to the larger MTU and who knows maybe I'll deploy jumbo
> frames in my physical network some time.
>
> But the main reason I want to go via the physical network is so I can tell
> when it's broken but my device's tunnels still work. Dogfooding 'ya know :)

Right, makes sense.

>> > All I have to do is run one wg tunnel per edge router to my clients
>> > (which I already do) and then have babel install a default
>> > route/nexthop for each tunnel (the bit I'm working on). Together with
>> > RTT metrics and CECMP this could even kick out edge routers where the
>> > underlay network path is performing too poorly fully automatically :)
>> 
>> How do you define "too poorly"? I guess that's the crux of the issue:
>> you could just install all feasible routes as ECMP paths, but that would
>> potentially give you wildly varying performance for each flow, which I
>> would imagine would be a pretty louse user experience. So what would you
>> do instead?
>
> Well I already have varying performance but right now it's sticky so
> per-flow varying is strictly an improvment :)

Are you sure? If it's everything that goes down the drain it's at least
obvious and straight-forward to debug...

> One of the the problem I'm trying to solve (using the RTT metric bit)
> is the underlay network path having high latency. One of my hosters
> had pesky problems with their path getting congested during prime time
> and the carrier AS responsible for the link (DTAG) refusing to do
> anything about it. However cost wise this is the cheapest host per
> unit of traffic so it's still useful to have around most of the time.

Given that this is IPv6-only am I right in guessing that this is all an
elaborate scheme to get around the fact that your ISP won't supply you
with v6 connectivity? :)

> In that case this router would just not get selected as nexthop, or kicked
> out of the "close enough cost" nexthop set when it's having
> congestion/latency issues.

Yeah, the babel RTT metric should help weed out paths with lots of
bufferbloat. For ECMP I guess you'd need some kind of threshold of when
alternatives are good enough to be added as alternate paths? Getting
that right sounds kinda fiddly; and will likely be specific to a
particular (set of) paths, so really hard to define good values for at
the protocol level...

BTW, have you looked at source-specific routing at all? With this, you'd
assign each endpoint multiple addresses, each tied to a prefix specific
to a particular upstream; and then install source-specific default
routes, so the packets go to the right upstream. With this you can
handle multi-path (or route selection in general) entirely at the
application layer simply by picking different source addresses...

> The other problem is shaky connections from the edge routers to the rest of
> the v6 internet but that's a separate and much more difficult problem that
> babel can't really help with much.
>
> I'm looking into how to solve in my case tough. Probably some kind of flow
> montitoring system that triggers automatic ping/traceroute via all
> upstreams, reroutes locally based on that info and does AS prepending too
> for the ingress side.
>
> Haven't quite figured that one out yet :)

Couldn't the edge routers themselves monitor this and simply retract/modify
their announcements if they notice they have connectivity issues?

-Toke