<!DOCTYPE html>

<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    Hoi colleagues,<br>

    <br>

    Juliusz, feel free to use my site as a backref for your research

    group. I will probably write one more article, once the in flight

    changes have been merged, and I roll out Babel with VPP in

    production [fingers crossed!]. I'll update the list once I do.<br>

    <br>

    Thank you very much for engaging with me! I'll collate a few answers

    in one reply. <br>

    <div class="moz-cite-prefix"><br>

      On 11.03.24 12:02, Juliusz Chroboczek wrote:<br>

    </div>

    <blockquote type="cite" cite="mid:87h6hdxc8l.wl-jch@irif.fr"><span

      style="white-space: pre-wrap">

</span><span style="white-space: pre-wrap">

</span>

      <pre class="moz-quote-pre" wrap="">I find it interesting that you find Babel useful on a carrier network.

Since I've been working under the impression that IS-IS is absolutely

perfect (TM) in carrier networks, we have no experience whatsoever with

that case.  </pre>

    </blockquote>

    Perhaps I should clarify. My network AS8298 is built using point to

    point carrier ethernet links provided by AS25091. They have physical

    (DWDM, dark fiber, etc) links, and use OSPF and LDP to signal

    ethernet pseudowires for me. If there is an inter-city link that

    fails, their OSPF will reroute, and my traffic will go via a

    different path.<br>

    <br>

    The problem is this: AS8298's IGP typically doesn't notice this. We

    run BFD with reasonably lax timeout of 3.0s and the convergence of

    the underlying network is pretty quick. Also, if a backbone link at

    AS25091 goes down, that typically means nothing for my ethernet link

    to their routers -- in other words: my IGP stays up, fully

    converged, but one link all of the sudden goes from 5ms

    (frankfurt-zurich) to 35ms

    (frankfurt-amsterdam-paris,geneva,zurich). <br>

    <br>

    I think Babel, for me at AS8298, will address this issue, and move

    traffic away from this now-high-latency link.<br>

    <br>

    <div class="moz-cite-prefix">On 11.03.24 12:51, Daniel Gröber wrote:<br>

    </div>

    <blockquote type="cite"

      cite="mid:20240311115133.47yvhbtik5jxwful@House.clients.dxld.at"><span

      style="white-space: pre-wrap">

</span>

      <blockquote type="cite">

        <pre class="moz-quote-pre" wrap="">1) Is there any advice you could offer for rtt cost/min/max/decay values

when using Bird2 ?

</pre>

      </blockquote>

      <pre class="moz-quote-pre" wrap="">The defaults should be fine honestly. While I have recommended changing

them in the past I fear I don't (yet) fully understand the impact of that

change on stability so best to leave them as is for now.</pre>

    </blockquote>

    If I go this route, what I think I will need to know is the normally

    expected city-to-city latency (using my AS25091 provided point to

    point ethernet VPWS), and the alternate latency (when AS25091 needs

    to reroute), and force a higher cost when this is the case. I

    realize a goal ought to be minimizing the changes of the costs and

    topology, because on top of the IGP I will have a full table at

    950K/200K prefixes, as these routers are in the DFZ. One cool thing

    is that VPP will consume a full table in about 7 seconds, including

    programming the FIB.<br>

    <br>

    <blockquote type="cite"

      cite="mid:20240311115133.47yvhbtik5jxwful@House.clients.dxld.at">

      <pre class="moz-quote-pre" wrap="">1) Bird's proto/babel doesn't have good policy controls right now. Ff you

need any sort of control over your IGP announcements for TE or what have

you things might get tricky. I do have a patch ready to begin fixing that

but unfortunately it's in limbo until BIRD v3 shakes out or we find some

funding/motivation to push a port to v3 forward.</pre>

    </blockquote>

    Understood. Luckily, I don't do traffic engineering with OSPF

    currently, and I'm OK leaving that off for now.<span

    style="white-space: pre-wrap">

</span><span style="white-space: pre-wrap">

</span>

    <blockquote type="cite"

      cite="mid:20240311115133.47yvhbtik5jxwful@House.clients.dxld.at">

      <pre class="moz-quote-pre" wrap="">Babeld does have (most of) the knobs I think you'd need but it's just not

suitable for 24/7 operation outside of toy networks without major rework

(sorry Juliusz!).</pre>

    </blockquote>

    I think I will only be using Bird2 at AS8298. It is a production

    network after all :)<span style="white-space: pre-wrap">

</span><span style="white-space: pre-wrap">

</span><span style="white-space: pre-wrap">

</span><span style="white-space: pre-wrap">

</span>

    <blockquote type="cite"

      cite="mid:20240311115133.47yvhbtik5jxwful@House.clients.dxld.at">

      <pre class="moz-quote-pre" wrap="">2) When a prefix is no longer reachable babel will insert an unreachable

route for it until some timeout expires. I don't recall the details off

hand but I'm sure Juliusz will jump in here ;)</pre>

    </blockquote>

    That's acceptable for VPP. It picks up unreachable (and blackhole)

    routes and programs them correctly in the FIB. Thank you for

    pointing it out though!<br>

    <br>

    <div class="moz-cite-prefix">On 11.03.24 23:09, Juliusz Chroboczek

      wrote:<br>

    </div>

    <blockquote type="cite" cite="mid:87cys0xvya.wl-jch@irif.fr">

      <blockquote type="cite">

        <pre class="moz-quote-pre" wrap="">1) Is there any advice you could offer for rtt cost/min/max/decay values when

using Bird2 ?

</pre>

      </blockquote>

      <pre class="moz-quote-pre" wrap="">

1. RTT-MIN 

In the ideal case, your network consists of a number of interconnected

clusters.  For example, if you have routers in Berlin, Paris and Warsaw,

then each of the cities constitutes a cluster.  Within each cluster, it

doesn't make sense to chose routes based on RTT, since small RTT values

tend to be noisy and cause instability.</pre>

    </blockquote>

    Agreed. Within the metro, reroutes are "free of charge" latency

    wise.<span style="white-space: pre-wrap">

</span><span style="white-space: pre-wrap">

</span>

    <blockquote type="cite" cite="mid:87cys0xvya.wl-jch@irif.fr">

      <pre class="moz-quote-pre" wrap="">Rtt-min should ba a value that is more than the intra-cluster latency but

less than the inter-cluster latency.  For example, if latency within each

cluster is on the order of 5ms, and the inter-cluster latency is 20ms,

then the deault value of 10ms is fine.</pre>

    </blockquote>

    I think this is what I'm looking for. Once the latency from ZRH-FRA

    is established at 5ms, but link failure drives that up to 30ms, I

    can play with a rtt-min of >>5ms (to account for jitter and

    variance) but <30ms, so that the cost raises only when strictly

    necessary.<span style="white-space: pre-wrap">

</span><span style="white-space: pre-wrap">

</span>

    <blockquote type="cite" cite="mid:87cys0xvya.wl-jch@irif.fr">

      <pre class="moz-quote-pre" wrap="">Large values of rtt-min improve stability in the presence of bufferbloat.

2. RTT-MAX

Symmetrically to rtt-min, rtt-max is the value above which links are

considered bad.  It should be slightly larger than the largest RTT in your

network.  Set it as small as possible in your network, since it has

a dramatic effect on stability in the presence of bufferbloat.

The default is 120ms, which is very conservative, but already has

a big effect on improving stability in bufferbloated networks.</pre>

    </blockquote>

    I think for most all pairs of router adjacencies, 120ms will rarely

    if ever be reached. To confirm though, I am free to take different

    values of rtt-min and rtt-max per interface, right? I have a router

    in California at 150ms normal latency, so here rtt-min would be 170

    and rtt-max might want to be 250ms or something higher like that. <span

    style="white-space: pre-wrap">

</span><span style="white-space: pre-wrap">

</span><span style="white-space: pre-wrap">

</span>

    <blockquote type="cite" cite="mid:87cys0xvya.wl-jch@irif.fr">

      <pre class="moz-quote-pre" wrap="">3. MAX-RTT-PENALTY (rtt cost in BIRD)

This is the maximum cost penalty that will be applied to high-RTT links.

The default (96) is rather conservative, it will cause one high-RTT link

to be equivalent to two low-RTT links. </pre>

    </blockquote>

    Perhaps you can confirm my understanding. Considering a ring of

    routers ZRH-FRA-AMS-LIL-PAR-GVA-{ZRH}, any given link here will

    force traffic to go the other direction. So say ZRH-FRA fails, the

    traffic that used to be cost 96 would now become larger than 5x96

    for ZRH-GVA-PAR-LIL-AMS-FRA to win. So I think my max penalty cost

    should be 500 so that the ZRH-FRA link will lose out on the

    alternate of 480. Is that how you see it as well?<br>

    <br>

    <div class="moz-cite-prefix">On 11.03.24 13:26, Dave Taht wrote:<br>

    </div>

    <blockquote type="cite"

cite="mid:CAA93jw4uX13bO2UBo3S2iHKumw3OAJKFh+hu-QkTvJJx1tgNyg@mail.gmail.com">

      <pre class="moz-quote-pre" wrap="">He also did a nice writeup of an inexpensive 32x100Gbit switch

recently, running... debian.

<a class="moz-txt-link-freetext" href="https://www.linkedin.com/pulse/debian-mellanox-100g-switch-pim-van-pelt-3pivf/">https://www.linkedin.com/pulse/debian-mellanox-100g-switch-pim-van-pelt-3pivf/</a>

</pre>

    </blockquote>

    Thank you for the plug, Dave :) If you're not on Linkedin, that

    article (and others) is primarily published on:<br>

    <a class="moz-txt-link-freetext" href="https://ipng.ch/s/articles/2023/11/11/mellanox-sn2700.html">https://ipng.ch/s/articles/2023/11/11/mellanox-sn2700.html</a><br>

    <br>

    That particular one was fun for me because I really just read their

    own docs and published my findings after playing around with the

    switch and Debian and Switchdev, but they were suitably enamored

    that they gave me a call to meet with the silicon, ethernet, and

    switchdev teams :-)<br>

    <br>

    And if you're still reading, an update from me:<br>

    - I had a conversation with the VPP developers to accept two patches

    to VPP, one of which allows the original premise of my article

    (ipv4-less transit networks, using loopback /32 only) to work.<br>

    - The other is to enable/allow point to point links over ethernet,

    to reply to ARP (which is what I see most platforms I am familiar

    with already do). It sparked a bit of discussion, but I'm hopeful

    that it can be merged.<br>

    - I also toyed a bit more with IPv4-less OSPFv2 (not quite there yet

    with VPP), but since that's probably more a topic for bird-users,

    I'll spare you the gory details.<br>

    <br>

    <br>

    groet,<br>

    Pim<br>

    <br>

    <br>

    --

    <pre class="moz-signature" cols="72">Pim van Pelt <a class="moz-txt-link-rfc2396E" href="mailto:pim@ipng.ch"><pim@ipng.ch></a>

PBVP1-RIPE - <a class="moz-txt-link-freetext" href="https://ipng.ch/">https://ipng.ch/</a></pre>

  </body>

</html>