[Debian-ha-maintainers] Bug#974563: corosync unable to communicate with pacemaker 1.1.16-1+deb9u1 which contains the fix for CVE-2020-25654

Thu Nov 12 10:07:33 GMT 2020

Package: pacemaker
Version: 1.1.16-1+deb9u1
Severity: grave
X-Debbugs-CC: apo at debian.org

Hi,
I am running corosync 2.4.2-3+deb9u1 with pacemaker and the last run of
unattended-upgrades broke the cluster (downgrading pacemaker to 1.1.16-1
fixed it immediately).
The logs contain a lot of warnings that seem to point to a permission
problem, such as "Rejecting IPC request 'lrmd_rsc_info' from
unprivileged client crmd". I am not using ACLs so the patch should not
impact my system.

Here is an excerpt from the logs after the upgrade:
Nov 12 06:26:05 cluster-1 crmd[20868]:   notice: State transition
S_PENDING -> S_NOT_DC
Nov 12 06:26:05 cluster-1 crmd[20868]:   notice: State transition
S_NOT_DC -> S_PENDING
Nov 12 06:26:05 cluster-1 attrd[20866]:   notice: Defaulting to uname -n
for the local corosync node name
Nov 12 06:26:05 cluster-1 crmd[20868]:   notice: State transition
S_PENDING -> S_NOT_DC
Nov 12 06:26:06 cluster-1 lrmd[20865]:  warning: Rejecting IPC request
'lrmd_rsc_info' from unprivileged client crmd
Nov 12 06:26:06 cluster-1 lrmd[20865]:  warning: Rejecting IPC request
'lrmd_rsc_info' from unprivileged client crmd
Nov 12 06:26:06 cluster-1 lrmd[20865]:  warning: Rejecting IPC request
'lrmd_rsc_register' from unprivileged client crmd
Nov 12 06:26:06 cluster-1 lrmd[20865]:  warning: Rejecting IPC request
'lrmd_rsc_info' from unprivileged client crmd
Nov 12 06:26:06 cluster-1 crmd[20868]:    error: Could not add resource
service to LRM cluster-1
Nov 12 06:26:06 cluster-1 crmd[20868]:    error: Invalid resource
definition for service
Nov 12 06:26:06 cluster-1 crmd[20868]:  warning: bad input
<create_request_adv origin="te_rsc_command" t="crmd" version="3.0.11"
subt="request" reference="lrm_invoke-tengine-xxx-29"
crm_task="lrm_invoke" crm_sys_to="lrmd" crm_sys_from="tengine"
crm_host_to="cluster-1" src="cluster-2" acl_target="hacluster"
crm_user="hacluster">
Nov 12 06:26:06 cluster-1 crmd[20868]:  warning: bad input     <crm_xml>
Nov 12 06:26:06 cluster-1 crmd[20868]:  warning: bad input       <rsc_op
id="5" operation="monitor" operation_key="service:1_monitor_0"
on_node="cluster-1" on_node_uuid="xxx" transition-key="xxx">
Nov 12 06:26:06 cluster-1 crmd[20868]:  warning: bad input
<primitive id="service" long-id="service:1" class="systemd" type="service"/>
Nov 12 06:26:06 cluster-1 crmd[20868]:  warning: bad input
<attributes CRM_meta_clone="1" CRM_meta_clone_max="2"
CRM_meta_clone_node_max="1" CRM_meta_globally_unique="false"
CRM_meta_notify="false" CRM_meta_op_target_rc="7"
CRM_meta_timeout="15000" crm_feature_set="3.0.11"/>
Nov 12 06:26:06 cluster-1 crmd[20868]:  warning: bad input       </rsc_op>
Nov 12 06:26:06 cluster-1 crmd[20868]:  warning: bad input     </crm_xml>
Nov 12 06:26:06 cluster-1 crmd[20868]:  warning: bad input
</create_request_adv>
Nov 12 06:26:06 cluster-1 lrmd[20865]:  warning: Rejecting IPC request
'lrmd_rsc_info' from unprivileged client crmd
Nov 12 06:26:06 cluster-1 crmd[20868]:  warning: Resource service no
longer exists in the lrmd
Nov 12 06:26:06 cluster-1 crmd[20868]:    error: Result of probe
operation for service on cluster-1: Error
Nov 12 06:26:06 cluster-1 crmd[20868]:  warning: Input I_FAIL received
in state S_NOT_DC from get_lrm_resource
Nov 12 06:26:06 cluster-1 crmd[20868]:   notice: State transition
S_NOT_DC -> S_RECOVERY
Nov 12 06:26:06 cluster-1 crmd[20868]:  warning: Fast-tracking shutdown
in response to errors
Nov 12 06:26:06 cluster-1 crmd[20868]:    error: Input I_TERMINATE
received in state S_RECOVERY from do_recover
Nov 12 06:26:06 cluster-1 crmd[20868]:   notice: Disconnected from the LRM
Nov 12 06:26:06 cluster-1 crmd[20868]:   notice: Disconnected from Corosync
Nov 12 06:26:06 cluster-1 crmd[20868]:    error: Could not recover from
internal error
Nov 12 06:26:06 cluster-1 pacemakerd[20857]:    error: The crmd process
(20868) exited: Generic Pacemaker error (201)
Nov 12 06:26:06 cluster-1 pacemakerd[20857]:   notice: Respawning failed
child process: crmd

My corosync.conf is quite standard:
totem {
	version: 2
	cluster_name: debian
	token: 0
	token_retransmits_before_loss_const: 10
	clear_node_high_bit: yes
	crypto_cipher: aes256
	crypto_hash: sha256
	interface {
		ringnumber: 0
		bindnetaddr: xxx
		mcastaddr: yyy
		mcastport: 5405
		ttl: 1
	}
}
logging {
	fileline: off
	to_stderr: yes
	to_logfile: yes
	logfile: /var/log/corosync/corosync.log
	to_syslog: yes
	syslog_facility: daemon
	debug: off
	timestamp: on
	logger_subsys {
		subsys: QUORUM
		debug: off
	}
}
quorum {
	provider: corosync_votequorum
	expected_votes: 2
}

So is my crm configuration:
node xxx: cluster-1 \
	attributes standby=off
node xxx: cluster-2 \
	attributes standby=off
primitive service systemd:service \
	meta failure-timeout=30 \
	op monitor interval=5 on-fail=restart timeout=15s
primitive vip-1 IPaddr2 \
	params ip=xxx cidr_netmask=32 \
	op monitor interval=10s
primitive vip-2 IPaddr2 \
	params ip=xxx cidr_netmask=32 \
	op monitor interval=10s
clone clone_service service
colocation service_vip-1 inf: vip-1 clone_service
colocation service_vip-2 inf: vip-2 clone_service
order kot_before_vip-1 inf: clone_service vip-1
order kot_before_vip-2 inf: clone_service vip-2
location prefer-cluster1-vip-1 vip-1 1: cluster-1
location prefer-cluster2-vip-2 vip-2 1: cluster-2
property cib-bootstrap-options: \
	have-watchdog=false \
	dc-version=1.1.16-94ff4df \
	cluster-infrastructure=corosync \
	cluster-name=debian \
	stonith-enabled=false \
	no-quorum-policy=ignore \
	cluster-recheck-interval=1m \
	last-lrm-refresh=1605159600
rsc_defaults rsc-options: \
	failure-timeout=5m \
	migration-threshold=1

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <http://alioth-lists.debian.net/pipermail/debian-ha-maintainers/attachments/20201112/7342bcc8/attachment.sig>