[Debian-ha-maintainers] Bug#986325: corosync: crash with compression enabled
Ferenc Wágner
wferi at debian.org
Sat Apr 3 09:04:00 BST 2021
Package: corosync
Version: 3.1.0-3
Severity: normal
Tags: patch upstream
Forwarded: https://github.com/corosync/corosync/issues/630
As reported by Lukey3332:
Sometimes corosync crashes at startup, but only if compression is enabled.
Distribution: Debian Bullseye
Corosync version:
Corosync Cluster Engine, version '3.1.0'
Copyright (c) 2006-2018 Red Hat, Inc.
Kronosnet version:
Package: libknet1
Source: kronosnet
Version: 1.20-4
Here is a backtrace:
#0 __GI___pthread_mutex_lock (mutex=mutex at entry=0x555558619958) at ../nptl/pthread_mutex_lock.c:67
#1 0x00007ffff7e6c728 in pmtud_reschedule (knet_h=0x555555577320 <_logsys_log_printf>, knet_h at entry=0x555558619958) at threads_common.c:42
#2 get_global_wrlock (knet_h=knet_h at entry=0x555555577320 <_logsys_log_printf>) at threads_common.c:61
#3 0x00007ffff7e64316 in knet_handle_compress (knet_h=0x555555577320 <_logsys_log_printf>, knet_handle_compress_cfg=0x7fffffffcea0) at compress.c:503
#4 0x000055555559ae8f in totemknet_configure_compression (knet_context=knet_context at entry=0x55555574d900, totem_config=totem_config at entry=0x7fffffffd310) at totemknet.c:1565
#5 0x000055555559c104 in totemknet_initialize (poll_handle=0x5555556ddd30, knet_context=0x55555574d900, totem_config=0x7fffffffd310, stats=<optimized out>, context=0x5555556eed20,
deliver_fn=0x55555558c810 <main_deliver_fn>, iface_change_fn=0x55555558d9c0 <main_iface_change_fn>, mtu_changed=0x55555558c290 <totempg_mtu_changed>, target_set_completed=0x55555558ddd0 <target_set_completed>)
at totemknet.c:1149
#6 0x0000555555588550 in totemnet_initialize (loop_pt=loop_pt at entry=0x5555556ddd30, net_context=net_context at entry=0x5555557026f8, totem_config=totem_config at entry=0x7fffffffd310, stats=0x555555702728,
context=context at entry=0x5555556eed20, deliver_fn=deliver_fn at entry=0x55555558c810 <main_deliver_fn>, iface_change_fn=0x55555558d9c0 <main_iface_change_fn>, mtu_changed=0x55555558c290 <totempg_mtu_changed>,
target_set_completed=0x55555558ddd0 <target_set_completed>) at totemnet.c:343
#7 0x000055555559541a in totemsrp_initialize (poll_handle=poll_handle at entry=0x5555556ddd30, srp_context=srp_context at entry=0x5555556760f0 <totemsrp_context>, totem_config=totem_config at entry=0x7fffffffd310,
stats=stats at entry=0x5555556760c0 <totempg_stats>, deliver_fn=deliver_fn at entry=0x5555555970b0 <totempg_deliver_fn>, confchg_fn=confchg_fn at entry=0x5555555965a0 <totempg_confchg_fn>,
waiting_trans_ack_cb_fn=0x555555596560 <totempg_waiting_trans_ack_cb>) at totemsrp.c:981
#8 0x0000555555597c28 in totempg_initialize (poll_handle=0x5555556ddd30, totem_config=totem_config at entry=0x7fffffffd310) at totempg.c:824
#9 0x000055555555e0de in main (argc=-11504, argv=<optimized out>, envp=<optimized out>) at main.c:1526
corosync.conf:
# Please read the corosync.conf.5 manual page
totem {
version: 2
cluster_name: tele-clu
key: <snip>
crypto_cipher: aes256
crypto_hash: sha256
knet_compression_model: zlib
knet_compression_level: 6
link_mode: passive
interface {
linknumber: 0
knet_link_priority: 1
}
interface {
linknumber: 1
knet_link_priority: 0
}
token: 5000
}
logging {
# Log the source file and line where messages are being
# generated. When in doubt, leave off. Potentially useful for
# debugging.
fileline: off
# Log to standard error. When in doubt, set to yes. Useful when
# running in the foreground (when invoking "corosync -f")
to_stderr: yes
# Log to a log file. When set to "no", the "logfile" option
# must not be set.
to_logfile: yes
logfile: /var/log/corosync/corosync.log
# Log to the system log daemon. When in doubt, set to yes.
to_syslog: yes
# Log debug messages (very verbose). When in doubt, leave off.
debug: off
# Log messages with time stamps. When in doubt, set to hires (or on)
#timestamp: hires
logger_subsys {
subsys: QUORUM
debug: off
}
}
quorum {
# Enable and configure quorum subsystem (default: off)
# see also corosync.conf.5 and votequorum.5
provider: corosync_votequorum
}
nodelist {
node {
# Hostname of the node
name: tele-clu-01
# Cluster membership node identifier
nodeid: 1
ring0_addr: 192.168.233.1
ring1_addr: 192.168.178.241
}
node {
# Hostname of the node
name: tele-clu-02
# Cluster membership node identifier
nodeid: 2
ring0_addr: 192.168.233.2
ring1_addr: 192.168.178.242
}
node {
# Hostname of the node
name: tele-clu-03
# Cluster membership node identifier
nodeid: 3
ring0_addr: 192.168.233.6
ring1_addr: 192.168.178.243
}
}
------------------------------
As commented by fabbione:
It turns out the issue is in corosync configuration handling when doing compress.
Fix for master is here: https://github.com/corosync/corosync/pull/631
Same patch applies to 3.1.1
Backport to 3.1.0 attached to the upstream issue.
More information about the Debian-ha-maintainers
mailing list