[Debian-ha-maintainers] Bug#1073592: pacemaker: Incorrect non-spacing of attrd_updater output
Christopher Irving
c.irving at unsw.edu.au
Tue Jun 18 05:51:58 BST 2024
Package: pacemaker
Version: 2.1.8~rc-1
Severity: normal
X-Debbugs-Cc: c.irving at unsw.edu.au
Dear Maintainer,
I'm running a pacemaker cluster on several Debian trixie hosts, including 4
postgresql instances managed by pgsqlms (a resource agent from the
resource-agents-paf package) and running on guest nodes.
Pacemaker and pacemaker-remote are all version 2.1.8~rc1-1 and
resource-agents-paf is 2.3.0-2.
When I recently upgraded the nodes, including guest nodes, to 2.1.8~rc1-1, I
found
that pacemaker would still start up the pgsqlms resources in standby mode as
expected, but would no longer promote one into master mode, which it should do
automatically. The pgsqlms pre promote notify operation would print
a message of the form "INFO: Current node TL#LSN: 1#721420288" to the pacemaker
logs which is normally done along with successfully setting the attribute.
However the notify would then timeout rather than succeed, and the
subsequent attempted promotion would log "Can not find LSN Location".
Investigating, I found that the _get_priv_attr function on line 298 of pgsqlms
uses the regular expression
$ans =~ m/^name=".*" host=".*" value="(.*)"$/;
to extract the value of an attribute from attrd_updater's output. This regex
expects that the name, host, and value components of the output will be space
separated. However, when I manually performed a similar attrd_updater query,
I got output of the form
name="lsn_location-pgtest"host="pgtest"value=""
which is not space-separated.
I believe the promotion failure was because although attribute setting was in
fact
successful, the pgsqlms resource agent was failing to extract the attribute
value since attrd_updater was responding in an unexpected format.
I've checked pacemaker's upstream source code, and it seems that the lack of
spacing is an anomaly specific to 2.1.8-rc1; the spaces come back in 2.1.8-rc2.
I think the responsible function in pacemaker's source code is
"attribute_default",
in the file lib/pacemaker/pcmk_output.c
-- System Information:
Debian Release: trixie/sid
APT prefers stable-security
APT policy: (500, 'stable-security'), (500, 'unstable'), (500, 'testing'), (500, 'stable')
Architecture: amd64 (x86_64)
Kernel: Linux 6.8.12-amd64 (SMP w/8 CPU threads; PREEMPT)
Locale: LANG=en_AU.UTF8, LC_CTYPE=en_AU.UTF8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled
Versions of packages resource-agents-paf depends on:
pn corosync <none>
pn pacemaker | pacemaker-remote <none>
ii perl 5.38.2-5
pn resource-agents <none>
pn resource-agents | resource-agents-base <none>
resource-agents-paf recommends no packages.
resource-agents-paf suggests no packages.
More information about the Debian-ha-maintainers
mailing list