[Debian-ha-maintainers] Bug#962454: Link failures after upgrade to +deb10u1
Alberto Gonzalez Iniesta
agi at inittab.org
Mon Jun 8 11:29:35 BST 2020
Source: corosync
Version: 3.0.1-2+deb10u1
Severity: important
Hi,
Some weeks ago I upgraded corosync (3.0.1-2 -> 3.0.1-2+deb10u1) and
started to notice these messages in my nodes (two node cluster):
Jun 2 01:10:13 patty corosync[2346]: [KNET ] link: host: 2 link: 0 is down
Jun 2 01:10:13 patty corosync[2346]: [KNET ] host: host: 2 (passive) best link: 1 (pri: 1)
Jun 2 01:10:14 patty corosync[2346]: [KNET ] rx: host: 2 link: 0 is up
Jun 2 01:10:14 patty corosync[2346]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Jun 3 03:11:07 patty corosync[2346]: [KNET ] link: host: 2 link: 1 is down
Jun 3 03:11:07 patty corosync[2346]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Jun 3 03:11:08 patty corosync[2346]: [KNET ] rx: host: 2 link: 1 is up
Jun 3 03:11:08 patty corosync[2346]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Notice the failure happens on with both links. One of the links is a
cross-over cable. The other uses a bond with two interfaces.
These errors are more common on one of the nodes that on the other.
Some times they match (both nodes log the link failure), but most of the
time only one node complains:
Jun 4 01:16:23 selma corosync[52890]: [KNET ] link: host: 1 link: 0 is down
Jun 4 01:16:23 selma corosync[52890]: [KNET ] host: host: 1 (passive) best link: 1 (pri: 1)
Jun 4 01:16:24 selma corosync[52890]: [KNET ] rx: host: 1 link: 0 is up
Jun 4 01:16:24 selma corosync[52890]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Jun 4 01:16:55 patty corosync[2346]: [KNET ] link: host: 2 link: 0 is down
Jun 4 01:16:55 patty corosync[2346]: [KNET ] host: host: 2 (passive) best link: 1 (pri: 1)
Jun 4 01:16:56 patty corosync[2346]: [KNET ] rx: host: 2 link: 0 is up
Jun 4 01:16:56 patty corosync[2346]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Here's my config:
totem {
version: 2
cluster_name: web
crypto_cipher: none
crypto_hash: none
interface {
linknumber: 0
}
interface {
linknumber: 1
}
}
logging {
fileline: off
to_stderr: yes
to_logfile: yes
logfile: /var/log/corosync/corosync.log
to_syslog: yes
debug: off
logger_subsys {
subsys: QUORUM
debug: off
}
}
quorum {
provider: corosync_votequorum
expected_votes: 2
two_node: 1
}
nodelist {
node {
name: patty
nodeid: 1
ring0_addr: 192.168.144.1
ring1_addr: 10.10.1.5
}
node {
name: selma
nodeid: 2
ring0_addr: 192.168.144.2
ring1_addr: 10.10.1.6
}
}
Any help is appreciated. Thanks,
Alberto
-- System Information:
Debian Release: bullseye/sid
APT prefers unstable
APT policy: (500, 'unstable'), (500, 'testing'), (500, 'stable'), (1, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386
Kernel: Linux 5.6.0-1-amd64 (SMP w/4 CPU cores)
Kernel taint flags: TAINT_FIRMWARE_WORKAROUND
Locale: LANG=C.UTF-8, LC_CTYPE=C.UTF-8 (charmap=UTF-8), LANGUAGE= (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
More information about the Debian-ha-maintainers
mailing list