[Debian-ha-maintainers] Bug#986325: corosync: crash with compression enabled

Ferenc Wágner wferi at debian.org
Sat Apr 3 09:04:00 BST 2021


Package: corosync
Version: 3.1.0-3
Severity: normal
Tags: patch upstream
Forwarded: https://github.com/corosync/corosync/issues/630

As reported by Lukey3332:

Sometimes corosync crashes at startup, but only if compression is enabled.

Distribution: Debian Bullseye

Corosync version:

Corosync Cluster Engine, version '3.1.0'
Copyright (c) 2006-2018 Red Hat, Inc.

Kronosnet version:

Package: libknet1
Source: kronosnet
Version: 1.20-4

Here is a backtrace:

#0  __GI___pthread_mutex_lock (mutex=mutex at entry=0x555558619958) at ../nptl/pthread_mutex_lock.c:67
#1  0x00007ffff7e6c728 in pmtud_reschedule (knet_h=0x555555577320 <_logsys_log_printf>, knet_h at entry=0x555558619958) at threads_common.c:42
#2  get_global_wrlock (knet_h=knet_h at entry=0x555555577320 <_logsys_log_printf>) at threads_common.c:61
#3  0x00007ffff7e64316 in knet_handle_compress (knet_h=0x555555577320 <_logsys_log_printf>, knet_handle_compress_cfg=0x7fffffffcea0) at compress.c:503
#4  0x000055555559ae8f in totemknet_configure_compression (knet_context=knet_context at entry=0x55555574d900, totem_config=totem_config at entry=0x7fffffffd310) at totemknet.c:1565
#5  0x000055555559c104 in totemknet_initialize (poll_handle=0x5555556ddd30, knet_context=0x55555574d900, totem_config=0x7fffffffd310, stats=<optimized out>, context=0x5555556eed20,
    deliver_fn=0x55555558c810 <main_deliver_fn>, iface_change_fn=0x55555558d9c0 <main_iface_change_fn>, mtu_changed=0x55555558c290 <totempg_mtu_changed>, target_set_completed=0x55555558ddd0 <target_set_completed>)
    at totemknet.c:1149
#6  0x0000555555588550 in totemnet_initialize (loop_pt=loop_pt at entry=0x5555556ddd30, net_context=net_context at entry=0x5555557026f8, totem_config=totem_config at entry=0x7fffffffd310, stats=0x555555702728,
    context=context at entry=0x5555556eed20, deliver_fn=deliver_fn at entry=0x55555558c810 <main_deliver_fn>, iface_change_fn=0x55555558d9c0 <main_iface_change_fn>, mtu_changed=0x55555558c290 <totempg_mtu_changed>,
    target_set_completed=0x55555558ddd0 <target_set_completed>) at totemnet.c:343
#7  0x000055555559541a in totemsrp_initialize (poll_handle=poll_handle at entry=0x5555556ddd30, srp_context=srp_context at entry=0x5555556760f0 <totemsrp_context>, totem_config=totem_config at entry=0x7fffffffd310,
    stats=stats at entry=0x5555556760c0 <totempg_stats>, deliver_fn=deliver_fn at entry=0x5555555970b0 <totempg_deliver_fn>, confchg_fn=confchg_fn at entry=0x5555555965a0 <totempg_confchg_fn>,
    waiting_trans_ack_cb_fn=0x555555596560 <totempg_waiting_trans_ack_cb>) at totemsrp.c:981
#8  0x0000555555597c28 in totempg_initialize (poll_handle=0x5555556ddd30, totem_config=totem_config at entry=0x7fffffffd310) at totempg.c:824
#9  0x000055555555e0de in main (argc=-11504, argv=<optimized out>, envp=<optimized out>) at main.c:1526

corosync.conf:

# Please read the corosync.conf.5 manual page
totem {
	version: 2

	cluster_name: tele-clu

	key: <snip>
	crypto_cipher: aes256
	crypto_hash: sha256

	knet_compression_model: zlib
	knet_compression_level: 6

	link_mode: passive

	interface {
		linknumber: 0
		knet_link_priority: 1
	}

	interface {
		linknumber: 1
		knet_link_priority: 0
	}
	token: 5000
}

logging {
	# Log the source file and line where messages are being
	# generated. When in doubt, leave off. Potentially useful for
	# debugging.
	fileline: off
	# Log to standard error. When in doubt, set to yes. Useful when
	# running in the foreground (when invoking "corosync -f")
	to_stderr: yes
	# Log to a log file. When set to "no", the "logfile" option
	# must not be set.
	to_logfile: yes
	logfile: /var/log/corosync/corosync.log
	# Log to the system log daemon. When in doubt, set to yes.
	to_syslog: yes
	# Log debug messages (very verbose). When in doubt, leave off.
	debug: off
	# Log messages with time stamps. When in doubt, set to hires (or on)
	#timestamp: hires
	logger_subsys {
		subsys: QUORUM
		debug: off
	}
}

quorum {
	# Enable and configure quorum subsystem (default: off)
	# see also corosync.conf.5 and votequorum.5
	provider: corosync_votequorum
}

nodelist {

	node {
		# Hostname of the node
		name: tele-clu-01
		# Cluster membership node identifier
		nodeid: 1

		ring0_addr: 192.168.233.1
		ring1_addr: 192.168.178.241
	}
	node {
		# Hostname of the node
		name: tele-clu-02
		# Cluster membership node identifier
		nodeid: 2

		ring0_addr: 192.168.233.2
		ring1_addr: 192.168.178.242
	}
	node {
		# Hostname of the node
		name: tele-clu-03
		# Cluster membership node identifier
		nodeid: 3

		ring0_addr: 192.168.233.6
		ring1_addr: 192.168.178.243
	}
}

------------------------------
As commented by fabbione:

It turns out the issue is in corosync configuration handling when doing compress.
Fix for master is here: https://github.com/corosync/corosync/pull/631
Same patch applies to 3.1.1

Backport to 3.1.0 attached to the upstream issue.



More information about the Debian-ha-maintainers mailing list