[Babel-users] a cautionary note on setting up new babel nodes

Dave Taht dave.taht at gmail.com
Sat Jun 27 05:04:53 UTC 2015


On Fri, Jun 26, 2015 at 5:02 PM, Juliusz Chroboczek
<jch at pps.univ-paris-diderot.fr> wrote:
>> ...to discover that I was offering the shortest path to the exit nodes,
>> and thus had bypassed the two existing ~50mbit links into lab links that
>> were located indoors and going through a thousand+ meters of trees... that
>> was barely doing a megabit with 800+ms of delay.
>
>
> What was the issue?  No multicast loss on the link?  I find that difficult
> to believe.

When you gradually engineer a wireless network to where it can do
50mbits or more, packet loss at the lowest rates becomes slight,
particularly when your stations have no interference (clear channels),
and your APs output a watt (picostations), or are rated for 50km
(nanostations) - even going through heavy foliage, I had long
suspected that most of my packet loss had become congestive in nature.

Most of the network is engineered not to find a best route
interactively, but to fall back to other nodes when one fails (hangs,
or is shut off). For example the longest path is always to the 2Ghz
radios (which are also used as APs), usually going through a
nanostation m5 over ethernet, to the next "hop" nanostations, always
on different channels, with the 2ghz radios having a higher rxcost
configured and diversity on, and *no* backbone per se to the main exit
points over 2ghz. (till this accident. I had only added ethernet to
the lab a few weeks before). So they (usually) just serve as a backup
to the 5ghz directional radios, and only carry traffic to stations and
leaf nodes.

In other news the nodes I am replacing (with exactly the same
hardware, recycled) are 3+ years old. With every improvement in
software, they got faster and more reliable, when I started I was
lucky to get 5mbit end to end (7 hops worst case, no diversity, and
bridged), to 12 (diversity with a bad link), to 20+ (when aggregation
started to work in adhoc), to the latest build where I am seeing as
much as 70mbit over 1000 meters of foliage.... (and wont know til I
deploy how well that worst link will do...)

It turned out I wasn't even running the final fq_codel on one critical node:

root at pool-pole2-omni:~# uname -a
Linux pool-pole2-omni 3.3.8 #1 Wed Sep 19 09:48:30 PDT 2012 mips GNU/Linux

tc -s qdisc show dev wlan0 # 6 hours of traffic
...
qdisc nfq_codel 30: parent 1:3 limit 1000p flows 1024 quantum 1000
target 5.0ms interval 100.0ms ecn
 Sent 2265181234 bytes 1601340 pkt (dropped 3685, overlimits 0 requeues 85466)
 backlog 0b 0p requeues 85466
  maxpacket 0 drop_overlimit 0 new_flow_count 22621 ecn_mark 29
  new_flows_len 0 old_flows_len 0

Still, I take great joy from seeing at least some drops and marks, on
wifi, and a ton of FQ (requeues). It needs MORE! :)

>
>> (channel diversity not working did not help either)
>
>
> Not sure about that -- if computing the channel number fails, babeld will
> assume that the link interferes with all other links, so the misconfigured
> link should be disfavoured.
>
> I'm sure you already know that, but until I get around to rewriting the
> channel determination code for recent "we don't break userspace" Linux, you
> can work around it with manual configuration:

Yes, I will manually configure for this deployment. It would be
important to get right for battlemesh.

I will also check to see if it is working elsewhere on older systems.

(in linux´s defense, nl80211 was the future api for the last 4 years
and the other deprecated)

>     interface wlan42 channel 42
>
>> After that experience, I decided that I would make the firmware for
>> unconfigured nodes export a 512 metric,
>
>
> Good plan, although I'd make it 32000.  This way it'll be easy to spot an
> unconfigured node in BabelWeb.

Yes.

>
> Note that if you do that, you should apply it at the metric level, not at
> the

I will kill the rxcost thing universally on the entire deployement
this week, however that is mildly simpler to set in the current
openwrt config file. I think there is no way to set a "default"
exported metric?

>
> -- Juliusz



-- 
Dave Täht
worldwide bufferbloat report:
http://www.dslreports.com/speedtest/results/bufferbloat
And:
What will it take to vastly improve wifi for everyone?
https://plus.google.com/u/0/explore/makewififast



More information about the Babel-users mailing list