[Babel-users] Babel in Dave's network [was: babels bug with uninitialized data...]

Dave Taht dave.taht at gmail.com
Fri Jun 27 22:18:11 UTC 2014


On Fri, Jun 27, 2014 at 2:43 PM, Juliusz Chroboczek
<jch at pps.univ-paris-diderot.fr> wrote:
>>>> I turn it off (it has a noisy fan) and for sane values of "boom", the
>>>> whole network switches over to going through the wan or adhoc ports.
>
>>> You appear to be running with a very high hello interval,
>
>> It usually seems much faster than that... but I'll go measure.
>
>> I am using the default hello intervals.
>
> Sorry, I got confused.  I've looked at the dump again, and you are running
> with the default interval (4s) -- boom should be around 10s in your case.
>
>> Should I tighten that in this case?
>
> The Babel protocol is able to deal with hello intervals as low as 10ms.
> However, we've never tested the actual implementation with sub-second
> hellos, so you'll likely run into some bugs (and I'm looking forward to
> your reports).
>
> Expect reconvergence to happen after 2 hellos in the absence of packet
> loss.  (That means 2.5 hello intervals on average, since we start counting
> at a random point in the cycle.)
>
>> multicast throughout the network is set to 9mbits/sec.
>
> [...]
>
>> And: if I'm saturating the network, or using an artificial bottleneck
>> or wifi sometimes it does very briefly (far less than 4 sec) lose a
>> route through that during a rrul test.
>
> Babel doesn't like to lose three updates in a row.  When it does, it sends
> a (unicast) request for one extra update.  If that update is lost too, it
> drops the route.  Oscar Wilde famously quipped that losing one packet is
> a tragedy, while losing both is carelessness -- but unfortunately WiFi
> multicasts are extremely unreliable, especially if you raise the multicast
> throughput from the official value.  Losing all four updates at 9 Mbit/s
> is not at all unlikely.
>
> Babel is able to recover reasonably fast by switching to an alternate
> route -- the delay you're seeing is probably the time needed to make an
> end-to-end exchange to ensure that the alternate route is loop-free when
> it is unfeasible.
>
>> I have long been tempted to change the current delete/add logic for
>> add/delete for route updates now that I have some trust in modern
>> kernels.
>
> That will only help in the case where the alternate route was already
> feasible, and this case is already fast enough (on the order of
> milliseconds).  It's the unfeasible case that's bothering you.

So far as I know there are several alternate feasible routes at the
time this happens, but it may well be due to the wifi mac temporarily
losing association or something else causing it.

I will sort through the data set to show it (but it's 48 hours of full
throughput packet data and rrul results presently so I think finding
it will need me to just watch for it happening and stopping the test,
and looking over the wifi monitor and a local set of captures. Next
week.).

In the meantime I do find the basic idea of the attached patch
intuitively appealing, if only there was a way to tell if a kernel was
multipath capable or not. Should add more error logging I suspect...

>
> Let's consider the actual issue, rather than trying to work around the
> symptoms.  I can see three ways to avoid losing four updates in a row when
> running over WiFi:
>
>   - don't raise the multicast rate;

Frankly in light of the number of devices getting laid onto the wifi bands,
in the internet of things, if you can raise the multicast rate, it's a
good idea, not just for things like babel, but nd, mdns, etc.

Sadly I have found several devices that can't handle raising the
default at all, and it will have to wait for, say, 802.11ak to arrive
before this can be fixed.

thankfully all the mesh networks I know of are running at higher rates
now for their backbones.

>   - implement reliable updates (supported by the protocol, but not by the
>     implementation);

I didn't know that.

>   - implement unicast updates (supported by the protocol, but not by the
>     implementation).

unicast updates have the innate advantage of piggybacking on an
aggregate txop, and for mostly p2p links in particular seem a win.

> Note that raising the multicast rate is needed in order to get the link
> quality estimator to work.  As you well know, I'm planning a better link
> quality estimator for some future version, so the first solution above
> might become practical at some point.

I remain excited and interested on using rtts on small timescales on
fq_codel'd wifi, 10gige,1gige,and 100mbit networks as part of a core
metric.

... but low on time, as always.

Unicast wifi, with the often huge number of retries in it today for
bad links, and lousy queuing, already has rtts in babel-rtt's current
estimator range.

> -- Juliusz



-- 
Dave Täht

NSFW: https://w2.eff.org/Censorship/Internet_censorship_bills/russell_0296_indecent.article
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-babel-do-faster-route-swaps.patch
Type: text/x-patch
Size: 2155 bytes
Desc: not available
URL: <http://lists.alioth.debian.org/pipermail/babel-users/attachments/20140627/c5c5334d/attachment.bin>


More information about the Babel-users mailing list