[Debian-ha-maintainers] Bug#939395: ocfs2-tools - FS can't mount at boot on drbd device
Ervin Hegedüs
airween at gmail.com
Wed Sep 4 14:28:51 BST 2019
Package: ocfs2-tools
Version: 1.8.5-7
Severity: normal
Hi,
I have two up-to-date Debian systems, with drbd and ocfs2. There
is only one drbd device, which configured as Primary/Primary.
Everything works as well, except one thing: the system can't
mount the drbd device at boot, I have to do it by hand.
Relevant configs:
# cat /etc/drbd.d/r0.res
resource r0 {
meta-disk internal;
device /dev/drbd0;
syncer {
verify-alg sha1;
}
net {
allow-two-primaries;
}
startup {
become-primary-on both;
}
on t2app1 {
disk /dev/vg-t2app1/lvdrbd0;
address 192.168.72.21:7789;
}
on t2app2 {
disk /dev/vg-t2app2/lvdrbd0;
address 192.168.72.22:7789;
}
}
# cat /etc/ocfs2/cluster.conf
cluster:
node_count = 2
name = data
node:
ip_port = 7777
ip_address = 192.168.72.21
number = 1
name = t2app1
cluster = data
node:
ip_port = 7777
ip_address = 192.168.72.22
number = 2
name = t2app2
cluster = data
# grep drbd /etc/fstab
/dev/drbd0 /drbd ocfs2 _netdev,defaults 0 0
With these settings (total equals...) on Debian 9, there the
mount works as well, without any interruption.
Additional infos:
Status of services after reboot:
# systemctl status drbd.mount
● drbd.mount - /drbd
Loaded: loaded (/etc/fstab; generated)
Active: failed (Result: exit-code) since Wed 2019-09-04 15:13:49 CEST; 16s ago
Where: /drbd
What: /dev/drbd0
Docs: man:fstab(5)
man:systemd-fstab-generator(8)
Sep t 04 15:13:49 t2app1 systemd[1]: Mounting /drbd...
Sep t 04 15:13:49 t2app1 mount[720]: mount.ocfs2: I/O error on channel while opening device /dev/drbd0
Sep t 04 15:13:49 t2app1 systemd[1]: drbd.mount: Mount process exited, code=exited, status=1/FAILURE
Sep t 04 15:13:49 t2app1 systemd[1]: drbd.mount: Failed with result 'exit-code'.
Sep t 04 15:13:49 t2app1 systemd[1]: Failed to mount /drbd.
# systemctl status drbd.service
● drbd.service - LSB: Control DRBD resources.
Loaded: loaded (/etc/init.d/drbd; generated)
Active: active (exited) since Wed 2019-09-04 15:13:53 CEST; 41s ago
Docs: man:systemd-sysv-generator(8)
Process: 686 ExecStart=/etc/init.d/drbd start (code=exited, status=0/SUCCESS)
Sep t 04 15:13:49 t2app1 systemd[1]: Starting LSB: Control DRBD resources....
Sep t 04 15:13:49 t2app1 drbd[686]: Starting DRBD resources:[
Sep t 04 15:13:49 t2app1 drbd[686]: create res: r0
Sep t 04 15:13:49 t2app1 drbd[686]: prepare disk: r0
Sep t 04 15:13:49 t2app1 drbd[686]: adjust disk: r0
Sep t 04 15:13:49 t2app1 drbd[686]: adjust net: r0
Sep t 04 15:13:49 t2app1 drbd[686]: ]
Sep t 04 15:13:53 t2app1 drbd[686]: WARN: stdin/stdout is not a TTY; using /dev/console.
Sep t 04 15:13:53 t2app1 systemd[1]: Started LSB: Control DRBD resources..
# systemctl status ocfs2.service
● ocfs2.service - Mount ocfs2 Filesystems
Loaded: loaded (/lib/systemd/system/ocfs2.service; enabled; vendor preset: enabled)
Active: active (exited) since Wed 2019-09-04 15:13:49 CEST; 1min 40s ago
Docs: man:ocfs2(7)
man:mount.ocfs2(8)
Process: 796 ExecStart=/usr/lib/ocfs2-tools/ocfs2 start (code=exited, status=0/SUCCESS)
Main PID: 796 (code=exited, status=0/SUCCESS)
Sep t 04 15:13:49 t2app1 systemd[1]: Starting Mount ocfs2 Filesystems...
Sep t 04 15:13:49 t2app1 ocfs2[796]: Starting Oracle Cluster File System (OCFS2) mount.ocfs2: I/O error on channel while opening device /dev/drbd0
Sep t 04 15:13:49 t2app1 ocfs2[796]: Failed
Sep t 04 15:13:49 t2app1 systemd[1]: Started Mount ocfs2 Filesystems.
# grep drbd /var/log/syslog
Sep 4 15:13:29 t2app1 systemd[1]: dev-drbd0.device: Dependency Before=network-online.target ignored (.device units cannot be delayed)
Sep 4 15:13:29 t2app1 systemd[1]: dev-drbd0.device: Dependency Before=network.target ignored (.device units cannot be delayed)
Sep 4 15:13:31 t2app1 systemd[1]: Unmounting /drbd...
Sep 4 15:13:31 t2app1 systemd[1021]: drbd.mount: Succeeded.
Sep 4 15:13:49 t2app1 systemd-modules-load[373]: Inserted module 'drbd'
Sep 4 15:13:49 t2app1 kernel: [ 3.422172] drbd: initialized. Version: 8.4.10 (api:1/proto:86-101)
Sep 4 15:13:49 t2app1 kernel: [ 3.422173] drbd: srcversion: 9B4D87C5E865DF526864868
Sep 4 15:13:49 t2app1 kernel: [ 3.422174] drbd: registered as block device major 147
Sep 4 15:13:49 t2app1 drbd[686]: Starting DRBD resources:[
Sep 4 15:13:49 t2app1 drbd[686]: create res: r0
Sep 4 15:13:49 t2app1 drbd[686]: prepare disk: r0
Sep 4 15:13:49 t2app1 systemd[1]: Found device /dev/drbd0.
Sep 4 15:13:49 t2app1 systemd[1]: Mounting /drbd...
Sep 4 15:13:49 t2app1 mount[720]: mount.ocfs2: I/O error on channel while opening device /dev/drbd0
Sep 4 15:13:49 t2app1 systemd[1]: drbd.mount: Mount process exited, code=exited, status=1/FAILURE
Sep 4 15:13:49 t2app1 systemd[1]: drbd.mount: Failed with result 'exit-code'.
Sep 4 15:13:49 t2app1 systemd[1]: Failed to mount /drbd.
Sep 4 15:13:49 t2app1 kernel: [ 3.964487] drbd r0: Starting worker thread (from drbdsetup-84 [727])
Sep 4 15:13:49 t2app1 kernel: [ 3.964926] block drbd0: disk( Diskless -> Attaching )
Sep 4 15:13:49 t2app1 kernel: [ 3.964984] drbd r0: Method to ensure write ordering: flush
Sep 4 15:13:49 t2app1 kernel: [ 3.964986] block drbd0: max BIO size = 1048576
Sep 4 15:13:49 t2app1 kernel: [ 3.964988] block drbd0: drbd_bm_resize called with capacity == 20970808
Sep 4 15:13:49 t2app1 kernel: [ 3.965029] block drbd0: resync bitmap: bits=2621351 words=40959 pages=80
Sep 4 15:13:49 t2app1 kernel: [ 3.965030] block drbd0: size = 10 GB (10485404 KB)
Sep 4 15:13:49 t2app1 kernel: [ 3.967075] block drbd0: recounting of set bits took additional 0 jiffies
Sep 4 15:13:49 t2app1 kernel: [ 3.967077] block drbd0: 8192 KB (2048 bits) marked out-of-sync by on disk bit-map.
Sep 4 15:13:49 t2app1 kernel: [ 3.967080] block drbd0: disk( Attaching -> UpToDate )
Sep 4 15:13:49 t2app1 kernel: [ 3.967082] block drbd0: attached to UUIDs 0756A08452088D5B:0000000000000000:724560592EBD27FB:724460592EBD27FB
Sep 4 15:13:49 t2app1 drbd[686]: adjust disk: r0
Sep 4 15:13:49 t2app1 kernel: [ 3.969313] drbd r0: conn( StandAlone -> Unconnected )
Sep 4 15:13:49 t2app1 kernel: [ 3.969324] drbd r0: Starting receiver thread (from drbd_w_r0 [730])
Sep 4 15:13:49 t2app1 kernel: [ 3.969443] drbd r0: receiver (re)started
Sep 4 15:13:49 t2app1 kernel: [ 3.969450] drbd r0: conn( Unconnected -> WFConnection )
Sep 4 15:13:49 t2app1 drbd[686]: adjust net: r0
Sep 4 15:13:49 t2app1 drbd[686]: ]
Sep 4 15:13:49 t2app1 ocfs2[796]: Starting Oracle Cluster File System (OCFS2) mount.ocfs2: I/O error on channel while opening device /dev/drbd0
Sep 4 15:13:53 t2app1 kernel: [ 8.202810] drbd r0: Handshake successful: Agreed network protocol version 101
Sep 4 15:13:53 t2app1 kernel: [ 8.202812] drbd r0: Feature flags enabled on protocol level: 0x7 TRIM THIN_RESYNC WRITE_SAME.
Sep 4 15:13:53 t2app1 kernel: [ 8.202852] drbd r0: conn( WFConnection -> WFReportParams )
Sep 4 15:13:53 t2app1 kernel: [ 8.202853] drbd r0: Starting ack_recv thread (from drbd_r_r0 [741])
Sep 4 15:13:53 t2app1 kernel: [ 8.238525] block drbd0: drbd_sync_handshake:
Sep 4 15:13:53 t2app1 kernel: [ 8.238527] block drbd0: self 0756A08452088D5A:0000000000000000:724560592EBD27FB:724460592EBD27FB bits:2048 flags:0
Sep 4 15:13:53 t2app1 kernel: [ 8.238529] block drbd0: peer 2F26127377330A65:0756A08452088D5B:724560592EBD27FB:724460592EBD27FB bits:1 flags:0
Sep 4 15:13:53 t2app1 kernel: [ 8.238530] block drbd0: uuid_compare()=-1 by rule 50
Sep 4 15:13:53 t2app1 kernel: [ 8.238532] block drbd0: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) disk( UpToDate -> Outdated ) pdsk( DUnknown -> UpToDate )
Sep 4 15:13:53 t2app1 kernel: [ 8.246935] block drbd0: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 25(1), total 25; compression: 100.0%
Sep 4 15:13:53 t2app1 kernel: [ 8.246982] block drbd0: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 32(1), total 32; compression: 100.0%
Sep 4 15:13:53 t2app1 kernel: [ 8.246984] block drbd0: conn( WFBitMapT -> WFSyncUUID )
Sep 4 15:13:53 t2app1 kernel: [ 8.251561] block drbd0: updated sync uuid 0757A08452088D5A:0000000000000000:724560592EBD27FB:724460592EBD27FB
Sep 4 15:13:53 t2app1 kernel: [ 8.251920] block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0
Sep 4 15:13:53 t2app1 kernel: [ 8.252922] block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0)
Sep 4 15:13:53 t2app1 kernel: [ 8.252926] block drbd0: conn( WFSyncUUID -> SyncTarget ) disk( Outdated -> Inconsistent )
Sep 4 15:13:53 t2app1 kernel: [ 8.252935] block drbd0: Began resync as SyncTarget (will sync 8192 KB [2048 bits set]).
Sep 4 15:13:53 t2app1 kernel: [ 8.253390] block drbd0: role( Secondary -> Primary )
Sep 4 15:13:53 t2app1 drbd[686]: WARN: stdin/stdout is not a TTY; using /dev/console.
Sep 4 15:13:58 t2app1 kernel: [ 12.624923] block drbd0: Resync done (total 4 sec; paused 0 sec; 2048 K/sec)
Sep 4 15:13:58 t2app1 kernel: [ 12.624926] block drbd0: updated UUIDs 2F26127377330A65:0000000000000000:0757A08452088D5B:0756A08452088D5B
Sep 4 15:13:58 t2app1 kernel: [ 12.624930] block drbd0: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate )
Sep 4 15:13:58 t2app1 kernel: [ 12.625225] block drbd0: helper command: /sbin/drbdadm after-resync-target minor-0
Sep 4 15:13:58 t2app1 kernel: [ 12.626743] block drbd0: helper command: /sbin/drbdadm after-resync-target minor-0 exit code 0 (0x0)
# mount | grep drbd
[EMPTY]
Workaround 1:
Add some delay to ocfs2.service start:
# diff -ruN ocfs2.service /lib/systemd/system/ocfs2.service
--- ocfs2.service 2019-09-04 14:43:55.613155935 +0200
+++ /lib/systemd/system/ocfs2.service 2019-09-04
15:21:34.502068371 +0200
@@ -7,6 +7,7 @@
[Service]
Type=oneshot
RemainAfterExit=yes
+ExecStartPre=/bin/sleep 3
ExecStart=/usr/lib/ocfs2-tools/ocfs2 start
ExecStop=/usr/lib/ocfs2-tools/ocfs2 stop
ExecReload=/usr/lib/ocfs2-tools/ocfs2 restart
Workaround 2:
create an rc.local systemd unit, and place a 'mount -a' command
at there:
# cat <<EOF >> /etc/systemd/system/rc-local.service
[Unit]
Description=/etc/rc.local
ConditionPathExists=/etc/rc.local
[Service]
Type=forking
ExecStart=/etc/rc.local start
TimeoutSec=0
StandardOutput=tty
RemainAfterExit=yes
SysVStartPriority=99
[Install]
WantedBy=multi-user.target
EOF
cat <<EOF >> /etc/rc.local
#!/bin/sh -e
#
# rc.local
#
# This script is executed at the end of each multiuser runlevel.
# Make sure that the script will "exit 0" on success or any other
# value on error.
#
# In order to enable or disable this script just change the
# execution
# bits.
#
# By default this script does nothing.
/bin/sleep 3 && mount -a
exit 0
EOF
# chmod +x /etc/rc.local
# systemctl enable rc-local
# reboot
Hope this helps,
a.
More information about the Debian-ha-maintainers
mailing list