[Babel-users] Babel MAC auth fails due to packet reordering

Toke Høiland-Jørgensen toke at toke.dk
Wed May 4 14:12:01 BST 2022


Daniel Gröber <dxld at darkboxed.org> writes:

> Hi Toke,
>
> On Wed, May 04, 2022 at 01:02:02PM +0200, Toke Høiland-Jørgensen wrote:
>> Daniel, you say you noticed this when you "turned on fq_codel", but also
>> that this is only happening over wireless links. Do you mean that you
>> explicitly enabled fq_codel on the WiFi links (as opposed to using the
>> built-in FQ-CoDel implementation in the WIFi stack), or is there an
>> Ethernet hop involved? And how are the interface(s) in question
>> configured (station/AP/mesh?) and which WiFi driver is used?
>
> Right, so my router is a separate box, attached to an OpenWrt AP
> (ubnt,unifiac-pro) via a switch. So there is an ethernet hop involved. The
> AP is in normal infrastructure mode.
>
> What I mean by fq_codel enablement is just setting `sysctl
> net.core.default_qdisc=fq_codel`
> on the router. The AP already had this set by default.
>
> Given that the problem doesn't happen over ethernet (see below) that's
> probably a red herring thogh.

Yeah, very much doubt it has anything to do with fq_codel on the
ethernet link :)

>> Also, you mention the other side is running bird; does the reordering
>> only happen with babeld as the sender?
>
> babeld doesn't log auth failures AFAICT but the neighbour cost stays at
> infinity there too so I assume it's having the same problem. I'm happy to
> debug that too but I think we should take it one problem at a time :)

Well, just including traffic in both directions in the dump, it should
be possible to detect this. However, Bird doesn't do unicast updates, so
if turning those off helps, it probably doesn't affect Bird...

>> I agree with Juliusz that 200ms seems a bit on the high side for such a
>> delay, but if the channel is suffering a lot of congestion (not
>> necessarily from the same station), I suppose it *could* take that long
>> for the scheduling to get around to servicing the multicast queue *and*
>> getting and airtime slot (multicast also runs at a very low bit level, so
>> the packet will take up more airtime, which will make it more prone to
>> interference and thus retransmissions, causing further delay).
>
> Where are you getting the 200ms number from exactly? I don't think my link
> is congested certainly nothing was going on in my network at the time and
> none of the neigbours around here are using the 5GHz band so I'd be
> surprised if that was it.
>
> I did just try it over ethernet and that isn't showing the same problem,
> hmm. So the wireless link is somehow to blame still.
>
>> Of course the diffserv hypothesis is quite easy to test: just disable
>> diffserv and see if the problem goes away. I don't think babeld has a
>> configuration knob for this, but you could clear it with an iptables
>> rule like:
>> 
>> ip6tables -t mangle -A OUTPUT -o wlan0 -p udp --dport 6696 -j DSCP --set-dscp 0
>
> I tried that with the output device adjusted to my ethernet device, didn't
> change anything.

Did the remarking it work? If you could include a dump of the babel
traffic taken on the access point ethernet device, we can check both
that, and confirm that the reordering really happens at the WiFi hop...

-Toke



More information about the Babel-users mailing list