[Debian-ha-maintainers] drbd8-utils: dual primary under pacemaker leads to split brain on resource stop
bd at bc-bd.org
bd at bc-bd.org
Tue Jan 8 10:01:40 UTC 2013
Subject: drbd8-utils: dual primary under pacemaker leads to split brain on resource stop
Package: drbd8-utils
Version: 2:8.3.11-3~bpo60+1
Severity: important
Every time a pacemaker managed dual primary drbd device is stopped, e.g.
through crm resource stop $DEVICE, it results in a split brain drbd wise.
I am seeing this
block drbd0: meta connection shut down by peer.
block drbd0: Sending state for detaching disk failed
on the machine's console.
Stopping both sides of the drbd device by hand does not result in a split
brain.
Adding
sleep 1
to the linbit drbd resource agent on one or both nodes fixes this.
I can reproduce the bug and the fix on two 2 node clusters, both 64bit.
Resource:
resource debian7 {
disk {
fencing resource-only;
}
net {
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
sndbuf-size 0;
max-buffers 8000;
max-epoch-size 8000;
allow-two-primaries;
}
syncer {
rate 45M;
}
on debian8 {
device /dev/drbd0;
disk /dev/sys/debian7;
address 192.168.1.8:7791;
meta-disk internal;
}
on debian9 {
device /dev/drbd0;
disk /dev/sys/debian7;
address 192.168.1.9:7791;
meta-disk internal;
}
}
-- System Information:
Debian Release: 6.0.6
APT prefers stable
APT policy: (500, 'stable')
Architecture: amd64 (x86_64)
Kernel: Linux 3.2.0-0.bpo.4-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Versions of packages drbd8-utils depends on:
ii debconf [debconf-2.0] 1.5.36.1 Debian configuration management sy
ii libc6 2.11.3-4 Embedded GNU C Library: Shared lib
ii pacemaker 1.1.7-1~bpo60+1 HA cluster resource manager
ii corosync 1.4.2-1~bpo60+1 Standards-based cluster framework (daemon and
drbd8-utils recommends no packages.
Versions of packages drbd8-utils suggests:
pn heartbeat <none> (no description available)
-- Configuration Files:
/etc/drbd.d/global_common.conf changed:
global {
usage-count no;
# minor-count dialog-refresh disable-ip-verification
}
common {
protocol C;
handlers {
pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";
fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
# split-brain "/usr/lib/drbd/notify-split-brain.sh root";
# out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
# before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k";
# after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh;
}
startup {
# wfc-timeout degr-wfc-timeout outdated-wfc-timeout wait-after-sb
}
disk {
fencing resource-only;
# on-io-error fencing use-bmbv no-disk-barrier no-disk-flushes
# no-disk-drain no-md-flushes max-bio-bvecs
}
net {
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
# sndbuf-size rcvbuf-size timeout connect-int ping-int ping-timeout max-buffers
# max-epoch-size ko-count allow-two-primaries cram-hmac-alg shared-secret
# after-sb-0pri after-sb-1pri after-sb-2pri data-integrity-alg no-tcp-cork
}
syncer {
# rate after al-extents use-rle cpu-mask verify-alg csums-alg
}
}
-- no debconf information
--
You learn to write as if to someone else because NEXT YEAR YOU WILL BE
"SOMEONE ELSE."
More information about the Debian-ha-maintainers
mailing list