[Babel-users] Setting ECN on Babel packets [was: Route selection...]

Sat Jul 7 22:29:26 UTC 2012

On Sat, Jul 7, 2012 at 3:52 PM, Juliusz Chroboczek <jch at pps.jussieu.fr> wrote:
>> I will be changing the hardware wifi VO queue in cerowrt 3.3.8-10 to
>> be sane (2 packets rather than 128).
>
> That's probably going to kill your throughput for high-RTT links, unless
> you also fix the lack of backpressure in Linux.

As I explained in brussels there are multiple queues in play here.

The wifi hardware has a copy of the aggregates it is attempting to
ship, 1 or more per
hardware queue.

Multicast usually ends up in a different bit of the hardware than VO.

Linux presently has two or more queues for udp and three or more for
tcp. The lowest level
is embedded in the driver itself, and this was the queue I was
suggesting reducing to saner values. The default value of 128 per
driver queue is treble what is needed for a single aggregate, even if
all streams were perfectly ordered. Since "sane disorder" is really
want you want from "packets"... (eg fair queuing)...

All the 4*128 queues in the present ath9k driver does in most cases is
add latency. Tons of it.
And once you got it stuck in the driver, there's no getting it out.

"TCP's reaction time is quadratic to the amount of overbuffering." -
van jacobson

Codel's effects are nearly unnoticible at driver buffer sizes about 64 packets.

I'm well aware that reducing the driver queue length for VI, BE, BK
queues reduces the possibilities for aggregation, under linux,
especially at the moment. We are evolving towards a plan to revise the
linux wireless stack to have one (and only one!) fair queued and aqm'd
aggregate accumulation per destination station, rather than the single
FIFO which may or may not aggregate.

http://www.bufferbloat.net/projects/cerowrt/wiki/Fq_Codel_on_Wireless

In the interim it seems sane to reduce all extra forms of latency,
work with the fq_codel implementation at the layer we have it at and
measure what goes better and/or breaks.

> It might break Babel on
> networks with more than 120 nodes (the maximum number of packets that
> Babel will send in a burst is roughly floor(n/60) where n is the number
> of routes).

The software "qdisc" queue above the driver queues (where fq_codel
resides), in cerowrt is presently set for a limit of 100-600 packets
depending on the queue type. This is based on some measurements (not
enough!) and the fact that I ran out of memory on the box at
fq_codel's default 10000 packets under certain conditions...

Judging from the context of your statement here I think you were
reading into my mail the idea of reducing all buffering between the
software and device to 2.

That would be crazy. However the present situation in wireless, with
1000s of unmanaged buffers by default, is also crazy, and contributing
to the near-collapse of large wifi networks worldwide.

No... the whole point of codel is to have sufficient buffering to
absorb bursts, "good queue" but to evolve to a minimum queue length
over time, by measuring the delay between entering the queue and
exiting it, and starting to discard when above a target delay.

http://queue.acm.org/detail.cfm?id=2209336

I appreciate your estimate as to babel's overhead! however I need to
note that a buffer size of 400 packets translates out to 24000 routes.
Secondly, as I've also noted, multicast packets are (sometimes)
accumulated and only broadcast on wifi after some delay after a CAB,
and usually at 6Mbits or less.

It is my hope that with sane AQM strategies that meshes of much larger
size than what we see today can be effectively constructed.

>> the same test also reduced throughput by about 40% (from >10Mbit to
>> ~6.5Mbit).
>
> Please try the same on a high-RTT path.

Planned. I have setup a 12 node mesh on a campground here (it would be
cool to have a battlemesh in california one day), but I was sane about
the design,
in that it is multichannel, mostly ethernet between radios, and signal
strength is universally good.

Before I finished getting it up yesterday several prototype locations
were bad - and that kicked in all sorts of terrible things, including
3 hop RTT paths in excess of 2 seconds with the default (128) hardware
queue size. That's probably not the kind of high RTT path you were
thinking of!?

Now that I have the good locations up, well, I can get anywhere in
about 4.4ms - and under heavy load (thx to fq_codel) usually well
under 25. Now that the connections are good - I'll be doing some CDF
plots of ping latency under load of hardware queue size of 128 vs 3 in
this setup. I suspect these will be interesting.

I certainly plan to add the bad nodes back in but not on the backbone.

>
>> with *your permission* I'd like to be marking babel packets in cerowrt
>> as CS6 + ECN capable.
>
> I tried to explain to you in Brussels (over a year ago) why that's
> a horrible kludge.  You're applying AQM to locally-originated data in
> order to compensate for the lack of backpressure, and then falsely
> asserting ECN in order to bypass the AQM policy.

I understood your reasoning then and I understand it now, and agree
with it, only partially.

However there is no "separate, magic queue for routing and system
critical packets", aside from the proposed move (in cerowrt at least)
of CS6 into the underutilized qdisc VI queue. Where would you supply
backpressure without a signaling mechanism of some sort?

There are several interesting things that could be done.

Have a sys call that told you packets outstanding in the driver? And
then what would you do?

One of the few negative aspects of fair queueing is that that a stream
of packets destined for the same destination will be interspersed with
as many other streams as possible.

Now this is ok, up to a point, due to the aformentioned separate
multicast queue's delays in accumulating what will be sent, but I do
note that sending all multicast traffic in a burst will affect all
other traffic and it might make sense for a routing protocol like
babel to intentionally smear a routing table update across it's update
window, rather than temporarily slow the network down so hugely.

(I have tried to be very careful about my language in describing the
behavior of each queue here)

Another approach is to have some sort of congestion indicator that is
different from mere reachability. e.g. ECN. which when used end-to-end
is very helpful given all the work a wifi network has had to do to get
from points A-Z. As for points A-B - link local multicast -

I was not suggesting back in brussels that babel ignore ECN, I was
suggesting that ecn be leveraged (somehow) in making routing decisions
in case of congestion. I think we do agree that better metrics for
mesh networks are needed.

But, since babel shares a queue with other traffic, when ECT(0)
supplied by an AQM it would be an early warning sign. (pretty
unreliable given how sparse babel is compared to other flows). ECN
enabling babel packets distinguishes between packet drop due to
unreachability vs congestion, and should lead to a more reliable mesh
network. **IMHO**

I can think of several other more silent metrics (like leveraging
information from deep within minstrel and mac80211's states regarding
transmission rates - or from within codel as to it's drop states) that
would probably be more useful to use a base for metrics.

> -- Juliusz

-- 
Dave Täht
http://www.bufferbloat.net/projects/cerowrt/wiki - "3.3.8-6 is out
with fq_codel!"