[Babel-users] latency in WLAN-SI

Tue Dec 15 10:16:26 UTC 2015

I tend to fork convos, sorry.

On Tue, Dec 15, 2015 at 12:33 AM, Mitar <mmitar at gmail.com> wrote:
> Hi!
>
> On Mon, Dec 14, 2015 at 11:39 AM, Juliusz Chroboczek
> <jch at pps.univ-paris-diderot.fr> wrote:
>> I'd like more evidence that this is needed.  Estimating packet loss is
>> very slow (since we're computing a metric from what is just a discrete,
>> one-bit signal), so it slows down convergence.  Hopefully we can get away
>> without it.
>
> Hm, isn't computation on WiFi links exactly the same?
>
>> Does your RTT increase at the same time as packet loss?  If so, we could
>> probably do without packet loss.
>
> Not really. At least on the fiber links (which are most of our VPN
> links) it does not.
>
>> (Recall that the goal is not to have an accurate model of the real
>> world -- the goal is to have traffic flow according to optimal paths.  If
>> the traffic is going where you want it to go, there's no need to add more
>> complexity to babeld.)
>
> Currently it seems that the routes over VPN are dropped while we would
> prefer them to stay up, even if there is a slight packet loss.
>
>>> Why have you disabled packet-loss metric on VPN links?
>>
>> Because it's an experimental feature, that hasn't had enough real-world
>> deployment.  It works beautifully in our tests, it works beautifully in
>> Nexedi's network, and if it works as well in your network, I'll enable it
>> by default.
>
> Hm, are we talking here about packet-loss or delay-based routing
> (RTT)? I understand that RTT metric is experimental, but I was talking
> about packet-loss, why is that not enabled. Or am I missing something?
>
>> First, it slows down reaction to link failures.  If you're on an Ethernet,
>> and you lose two packets in a row, you can be pretty sure the link is
>> down
>
> Or we have a very short buffer. ;-)

You certainly know how to tweak me!

I'd hope that your fiber nodes were fast enough to never need stuff
like fq_codel and sqm-scripts configured, but that would need testing
and confirmation.

I note that recently a change to fq_codel's default behavior landed in
openwrt, changing the quantum from 300 to 1514. This really should not
affect anything but really slow links (say, < 4mbit), and fq_codel was
never the right thing for anything but p2p wifi anyway.

>
> But yes, maybe our recent instability in VPN links is more to the
> problems with routing we have, then really link instability.
>
> But we do have VPN links which go between countries. We have observed
> really crazy stuff on for example links between Croatia and Slovenia.
> Sometimes extra 100 ms appears on the link, because they have some
> issues at Internet exchanges, for example (so delay is added at the
> Internet exchange).

Do you have something like smokeping configured? I would love to see
data on not just the vpn issues but on the latencies across the mesh.
The latency under load data toke showed at battle mesh for how linux
wifi behaves under load these days does also apply in the real world,
but perhaps less... or much more, depending on the quality of the
link. Yes, I have seen 10s of seconds of delay...

In lui of deploying smokeping everywhere I've been thinking of adding
a lightweight latency test to flent, where we just test lightweight
udp and icmp continuously with, say, a 1sec or even 60 sec period, to
dozens or hundreds of hosts, for hours at a time, all the time....
Smokeping's basic plot is good, but flent can do a better job of
aggregating data across more variables. Another approach would be
embed timestamps in the needed overhead traffic (be that babel or
other) and get everything synced up via ntp.......

As best as I recall your vpn was pure udp, no crypto, no retries.... I
see in your (nice!) web interface that you track if a node is
reachable, but not an observed rtt.

>
> I do not think that VPN links should be seen as Ethernet. For Ethernet
> I agree that if you loose two packets you have probably issues. But
> for VPN you have stuff in between, from bad ISPs, to MTU issues which
> make some packets get lost (especially while PMTU is in progress).

Could you clarify the behavior of your vpn? vpn's over tcp will never
lose packets. vpns that do crypto tend to bottleneck on the crypto
step and drop packets in the read side of the socket. nearly anything
using the tun/tap interfaces tends to be slow, the recent Foo over UDP
stuff corrects some of that:
https://www.netdev01.org/docs/herbert-UDP-Encapsulation-Linux.pdf

>> Second, the link quality estimator uses ETX, which is optimised for
>> multicast Hellos over WiFi links (it's quadratic in loss rate).
>> A different formula should be used for lossy wired links and for unicast
>> wireless tunnels.  (But then, perhaps ETX works well enough on tunnels --
>> I have no idea.)
>
> We have been using ETX with OLSRv1 on tunnels without visible issues.
>
> What do you use for ETX? ETX = 1 / (d_f x d_r) is for unicast (as
> described in the A High-Throughput Path Metric for Multi-Hop Wireless
> Routing paper). To my knowledge for multicast you should use ETX = 1 /
> d_f (as described in the
> High-Throughput Multicast Routing Metrics in Wireless Mesh Networks paper).
>
> So we know ETX equations for both unicast and multicast. Maybe Babel
> should support both?
>
>
> Mitar
>
> --
> http://mitar.tnode.com/
> https://twitter.com/mitar_m
>
> _______________________________________________
> Babel-users mailing list
> Babel-users at lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users