[Babel-users] Some idea on cost calculation based on hop count, loss and RTT

Sat Apr 21 00:15:57 BST 2018

Hi there,

I am a Babel user over both Wi-Fi and tunneled mesh VPN. I want to
share some idea on cost calculation based on hop count, loss and RTT.

=== The Problem ===

I would like to find a feasible low-RTT route across my network
because direct route is usually not the best. My configuration is:
> interface zt0 type tunnel link-quality true rxcost 16 hello-interval 10 rtt-min 16 rtt-max 1024 max-rtt-penalty 1008

Even after I have tweaked with these parameters, Babel still often
chooses me a route with high loss but low latency. Such link is
impossible with Wi-Fi but very common with tunnels.

According to the code, I know Babel uses a formula like this:
> cost = hop/(1-loss)^2 + RTT, (Constants omitted. In case of mistake, please correct me.)
in which "hop" is related to the sum of (txcost * rxcost), "loss" is
the rate of one-way packet loss.

Using the formula above, we know my configuration parameters would ask
Babel to choose a route with the lowest RTT, adding a hop if 16ms can
be saved, and try a 1.73ms slower route if packet loss reaches 5%, or
3.75ms if 10%. The influence of packet loss is obviously overlooked by
Babel. To compensate this, I have to increase "rxcost" and "rtt-min",
but that would give me routes with lower hops but higher RTT, which is
not what I want.

=== My Model ===

Therefore I want to propose a new model on cost calculation, suitable
for long-range tunneled networks. Typically we usually want to
maximize TCP throughput and keep the hop count not too high. Mathis et
al. gave us a formula to estimate the theoretical maximum TCP
throughput:
> throughput <= 1/(rtt * sqrt(loss)), (Again constants omitted.)

We want to make cost linear to RTT and hop count, so we define it as
the invert of theoretical throughput:
> my_new_cost = (hop+RTT) * sqrt(loss),
where loss = 1 - (alpha+beta) / 2 or loss = 1 - sqrt(alpha*beta).

Now the derivation of loss goes downwards. I think it better reflects
the reality: Adding 1% of packet loss to a link with 5% loss, makes
things much worse than adding it to a link with already 50% loss.

=== Issues ===

There are several issues regarding my model.

First, no real experiments have done over this formula. We need to
consider more.
Second, cross-version compatibility would be a problem. It's lucky
that Babel is still a draft so we can discuss it without breaking
things.
Third, "loss" is on the denominator side, we need to set a lower bound
(e.g. 100%/32 or 100%/64) so it does not overflow.
Last, no research shows whether Mathis et al.'s formula works on Wi-Fi.

Thank you for listening. And I hope we can discuss more about this topic.