[Babel-users] Higher CPU usage since 1.9.x leads to instability on slow devices
fabian at blaese.de
Sun May 31 12:13:58 BST 2020
sorry for the late reply.
On 15.05.20 01:18, Juliusz Chroboczek wrote:
>> Ever since we have upgraded to babeld 1.9.x, CPU usage is a lot higher
>> than with 1.8.x. Especially slower devices like embedded MIPS routers
>> are having trouble to keep up, which leads to route instablity due to
>> late helos.
> That sounds pretty bad.
> In principle, 1.9 should use less CPU than 1.8, since some algorithms have
> been reworked to be in n·logn instead of n². On the other hand, 1.9 uses
> more memory, since there is now a per-neighbour unicast buffer instead of
> a single buffer for all neighbours; this shouldn't matter much in practice,
> except if you have hundreds of neighbours.
I've tried analysing this in more detail using the debug output.
It looks like the actual route algorithms are not the problem, but the communication with netlink.
If the network gets unstable somewhere upstream, a lot of unreachable routes are sent through the downstream network (for all the routes from the upstream network, that were lost now).
However, the netlink interface seems to be relatively slow, especially on the device we are having a lot of trouble with (TP Link WDR 3600, Atheros AR9344, 74Kc MIPS 560 MHz).
Installing all these unreachable routes takes so long, that the relatively small socket buffer for babel messages overflows, because it is not read while route updates are sent to netlink. That leads to loss babel messages.
That initiates a state babeld is unlikely to recover from, because changing the state of all the routes in the kernel (reachable, unreachable) always takes so long, that new babel messages are lost.
The issue probably can only be fixed if route updates are not sent to netlink synchronously.
I'm not really shure, why this only occurs with babeld >=1.9.0.
I looks like I got a little confused with version numbers, so I might have tested with versions that still had the IPv4 xroute issue .
> Perhaps you could provide us with a CPU profile?
I don't really know, what you mean.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 833 bytes
Desc: OpenPGP digital signature
More information about the Babel-users