[Debian-ha-maintainers] Partial migration bug in 2.1.5, solved in 2.1.6

Yoann CONGAL yoann.congal at smile.fr
Mon Jun 26 09:41:48 BST 2023


Le sam. 24 juin 2023 à 00:17, Valentin Vidic <vvidic at debian.org> a écrit :
> On Fri, Jun 23, 2023 at 02:28:06PM +0200, Florent Carli wrote:
> > I encountered a regression with pacemaker when upgrading from debian
> > bullseye to bookworm. First I contacted pacemaker maintainers and I
> > explained the problematic behavior:
> > https://bugs.clusterlabs.org/show_bug.cgi?id=5521
> >
> > But then I tried with pacemaker 2.1.6 (compiled from source) and I could
> > not reproduce.
> >
> > Pacemaker maintainers confirmed that: "We did a bunch of other refactoring
> > related to migrations (including partial migrations) in preparation for
> > that fix:
> > https://github.com/ClusterLabs/pacemaker/commits/main?after=44647f62c012f4305bf5d2e03cfde89356d831bd+34&branch=main&path%5B%5D=lib&path%5B%5D=pengine&path%5B%5D=unpack.c&qualified_name=refs%2Fheads%2Fmain
> > "
> >
> > So I'm now turning to you, hoping you will consider upgrading pacemaker
> > package in debian stable to 2.1.6.
>
> Hi Florent,
>
> Thanks for the report. Updating the version in the stable distribution
> probably won't be possible. If there is a way to simply reproduce the
> problem, we might be able to find the fix in 2.1.6 and include that
> patch in a stable update instead.

Hi,

(I work with Florent on this)

>From one of the pacemaker authors the precise fix seems to be
https://github.com/ClusterLabs/pacemaker/commit/ad9fd9548f02a38885fa36765af9742a5a88576e
But this is based on a bunch of refactoring so this patch does not
apply on the 2.1.5 debian branch :(

Reproducing this bug should be simple :
* 3 node cluster (node1, node2, node3)
* a VirtualDomain resource (debian1)
* debian1 has a location constraint on node2
* Gracefully shutdown pacemaker service on node2 (debian1 moves away
e.g. to node1)
* Restart pacemaker service on node2

Here, debian1 should cleanly migrate back to node2 but this migration fails :
* There is no migrate_to task generated in node1
* migrate_from task on node2 timeouts
* debian1 get restarted by a "forced recovery"

Thanks!
--
Yoann Congal
Smile ECS - Tech expert



More information about the Debian-ha-maintainers mailing list