[Babel-users] Race condition in babeld?

Juliusz Chroboczek jch at pps.jussieu.fr
Sun Jul 1 15:15:49 UTC 2012


Hi,

The author of the attached bug report wishes to remain anonymous.  I'll
puzzle over it later, in the meantime I'm sharing it with the list (with
his permission).

-- Juliusz

I am a bit new to babel but I am having an odd problem that looks like
a race condition in babel.  I am running babel-1.3.0 under Linux.

When I am at the edge of wifi range of even a 2 link (point to point)
mesh sometimes I permanently lose connection from NodeA to NodeB. When
I monitor the kernel route table on Node A I see that the host route is
permanently banned (!H). If I manually delete the route then I instantly
regain a connection.

I trying to determine the catalyst for the problem I have discovered the
following timing causes it to happen.  It looks like the internal babel
route status and the kernel routing table can get out of sync when Node
B regains comms while NodeA is in the process of banning the route.

I have 3 scenarios. I have found 2 Work and 1 does not

Scenario 1 expected Good behavior: T0: Node A loses wifi connection to
Node B ~8seconds T1: Node A host route gets marked as banned ~8seconds
T2: Node A host route gets deleted from route table ~ arbitrary time
later T3: Node B comes back into range ~1-2 seconds T4: route added and
traffic passes


Scenario 2 expected Good behavior: T0: Node A loses wifi connection to
Node B ~8 seconds T1: Node A host route gets marked as banned ~5seconds
T2: Node B comes back into range ~1-2 seconds T3: route added and
traffic passes

Scenario 3 unexpected bad behavior: T0: Node A loses wifi connection to
Node B ~8 seconds T1: Node A host route gets marked as banned T2: Node
B comes back into range banned host route never gets deleted.




More information about the Babel-users mailing list