[Babel-users] Anybody else seeing disruption when restarting babeld?y

Daniel Gröber dxld at darkboxed.org
Sat Feb 25 00:01:15 GMT 2023


Hi Juliusz,

On Sat, Feb 25, 2023 at 12:10:26AM +0100, Juliusz Chroboczek wrote:
> > I can't say I agree with the "their problem" mentality. The way I see it
> > during graceful shutdown we're still responsible for in-flight traffic
> > anyway.
> 
> What I mean is that after our neighbours receive our retraction, they'll no
> longer be sending traffic to us, whether they have a feasible route or not.
> 
> If you're really keen on avoiding disruptions, you should first increase
> the metric to something very lare (say, 2^15), then wait a couple of
> seconds, then send a retraction, then wait 200ms.

Could you go into a bit more detail as to why that would be better? I think
I get the jist, we want to avoid other nodes installing an unreachable
route in response to our retraction while they do the seqno request dance,
right? I just don't see why the high (but non-infinite) metric would
prevent this?

AFAIK the RFC only requires nodes start sending seqno requests once the
last feasible route is already gone. Which is pretty bad from my "no
disruptions" viewpoint now that I think of it.

This also makes me wonder, looking at
https://www.rfc-editor.org/rfc/rfc8966.html#section-3.8.2.1, would it be
permissible for a node to always send seqno requests when any route is
unfeasible, in order to have as many feasible routes as possible?

> But I think that's too much hassle, I like your current approach better.

I hadn't considered this problem. So my current fix really only provides a
fix for inflight packets but not the blackhole that could be created as
soon as we send the retraction to a neighbour without any other feasible
routes.

I would prefer investing more time to fix that too if that's even possible
without a protocol extension?

> > In my mind it doesn't matter if babeld takes 500ms or 15sec to shutdown if
> > that buys me a rock solid network.
> 
> I think the default should be 300ms or so.

Works for me.

Thanks,
--Daniel



More information about the Babel-users mailing list