[Debian-ha-maintainers] Bug#1034280: False positive "down" reports from IPv6addr "monitor" action

Wed Apr 12 07:31:59 BST 2023

Package: resource-agents
Version: 1:4.12.0-1
Severity: important

Hi,

forwarding this from
https://github.com/ClusterLabs/resource-agents/issues/1855:

The "monitor" action of IPv6addr is sending an ICMPv6 echo request
to the given local address:

| resource-agents/heartbeat/IPv6addr.c
| Line 630 in d6b9548
|   ret = sendto(icmp_sock, (char *)outpack, sizeof(outpack), 0,

and is then expecting to receive the respective echo response
immediately, without delay, using MSG_DONTWAIT:

| resource-agents/heartbeat/IPv6addr.c
| Line 647 in d6b9548
|   ret = recvmsg(icmp_sock, &msg, MSG_DONTWAIT);

This works fine most of the time, but occasionally under heavy
network load, the echo response is not immediately available, and
the recvmsg fails with EAGAIN, leading to a false positive down
event ("not running") on the resource:

| Mar 28 18:53:59 sp1 pacemaker-schedulerd[41843]:  warning: Unexpected result (not running) was recorded for monitor of p_vip_neth0_v6_1 on sp1 at Mar 28 00:02:38 2023

That problem is fixed by
https://github.com/ClusterLabs/resource-agents/pull/1858 which in
the meantime was accepted and merged by upstream already, though not
yet part of any new release.

Would it be possible to get this fix into the resource-agents
version in bullseye and/or bookworm (both are affected)?
I could provide the necessary changes as MR at salsa, if that would
help.

regards
-mika-