[Babel-users] Babel with VPP

Daniel Gröber dxld at darkboxed.org
Mon Mar 11 11:51:33 GMT 2024


Hi Pim,

On Sun, Mar 10, 2024 at 10:40:00PM +0100, Pim van Pelt wrote:
> AS8298 has a ring from Zurich, Frankfurt, Amsterdam, Lille, Paris and
> Geneva. Our links are carrier ethernet from a telco. Sometimes, when the
> underlying links fail, the underlay MPLS network will recover and route
> around the failed telco link, and I'll see latency on my own service go from
> say 6ms ZRH-FRA to 40ms ZRH-FRA; I thought this would be an excellent use
> case for Babel.
> 1) Is there any advice you could offer for rtt cost/min/max/decay values
> when using Bird2 ?

The defaults should be fine honestly. While I have recommended changing
them in the past I fear I don't (yet) fully understand the impact of that
change on stability so best to leave them as is for now.

> 2) any words of wisdom before I move from an OSPF v2/v3 IGP to Babel in a
> running network ? Does anybody on this list have operational experience to
> share ?

I run babel as IGP in my experimental network (AS212704) and I just want to
make sure you're aware of some of the quirks and challanges when using
babel outside it's original design domain (wireless/mesh networks). I've
used both BIRD and babeld in my network. Right now I'm (grudgingly) running
babeld.

1) Bird's proto/babel doesn't have good policy controls right now. Ff you
need any sort of control over your IGP announcements for TE or what have
you things might get tricky. I do have a patch ready to begin fixing that
but unfortunately it's in limbo until BIRD v3 shakes out or we find some
funding/motivation to push a port to v3 forward.

Babeld does have (most of) the knobs I think you'd need but it's just not
suitable for 24/7 operation outside of toy networks without major rework
(sorry Juliusz!).

For illustration: until one of my patches restarting babeld on a node would
cause packet loss since it retracts the kernel routes without waiting for
the rest of the network to get notified of this node going away
https://github.com/jech/babeld/pull/102. Nobody noticing this before me
tells you in what environment babeld has been deployed/tested so far.

This wouldn't be a problem if we had hot reload ofc. but babeld is just not
built for that. Without hot reload there's also no good way to put a node
in maintainance mode without internal traffic disruption, the filters are
not expressive enough to make policy writing easy and there are quirks I
haven't even yet had the energy to write up or fix.

2) When a prefix is no longer reachable babel will insert an unreachable
route for it until some timeout expires. I don't recall the details off
hand but I'm sure Juliusz will jump in here ;)

I'm not sure this is really a problem as such, it was just jarring when I
saw this unexpected behaviour in my network and something to be aware of.

My conclusion: if it aint a BIRD I ain't fly'in.

While BIRD's babel implementation is just dandy from a quality perspective
AFICT it just needs a bit more effort put in to get it ready for serious
network operations.

> By means of introduction, I'm Pim and in my spare time I work on fd.io
> <http://fd.io>'s Vector Packet Processor.

I've had my eyes on your VPP work for a while now. Very exciting especially
with babel support on the horizon :)

--Daniel



More information about the Babel-users mailing list