[Debian-ha-maintainers] Bug#939395: ocfs2-tools - FS can't mount at boot on drbd device

Ervin Hegedüs airween at gmail.com
Wed Sep 4 14:28:51 BST 2019


Package: ocfs2-tools
Version: 1.8.5-7
Severity: normal

Hi,

I have two up-to-date Debian systems, with drbd and ocfs2. There
is only one drbd device, which configured as Primary/Primary.

Everything works as well, except one thing: the system can't
mount the drbd device at boot, I have to do it by hand.

Relevant configs:

# cat /etc/drbd.d/r0.res 
resource r0 {
  meta-disk internal;
  device /dev/drbd0;
  syncer {
    verify-alg sha1;
  }
  net {
    allow-two-primaries;
  }
  startup {
    become-primary-on both;
  }
  on t2app1 {
    disk /dev/vg-t2app1/lvdrbd0;
    address 192.168.72.21:7789;
  }
  on t2app2 {
    disk /dev/vg-t2app2/lvdrbd0;
    address 192.168.72.22:7789;
  }
}

# cat /etc/ocfs2/cluster.conf 
cluster:
    node_count = 2
    name = data

node:
    ip_port = 7777
    ip_address = 192.168.72.21
    number = 1
    name = t2app1
    cluster = data

node:
    ip_port = 7777
    ip_address = 192.168.72.22
    number = 2
    name = t2app2
    cluster = data

# grep drbd /etc/fstab 
/dev/drbd0	/drbd	ocfs2	_netdev,defaults	0	0


With these settings (total equals...) on Debian 9, there the
mount works as well, without any interruption.



Additional infos:


Status of services after reboot:

# systemctl status drbd.mount
● drbd.mount - /drbd
   Loaded: loaded (/etc/fstab; generated)
   Active: failed (Result: exit-code) since Wed 2019-09-04 15:13:49 CEST; 16s ago
    Where: /drbd
     What: /dev/drbd0
     Docs: man:fstab(5)
           man:systemd-fstab-generator(8)

Sep t 04 15:13:49 t2app1 systemd[1]: Mounting /drbd...
Sep t 04 15:13:49 t2app1 mount[720]: mount.ocfs2: I/O error on channel while opening device /dev/drbd0
Sep t 04 15:13:49 t2app1 systemd[1]: drbd.mount: Mount process exited, code=exited, status=1/FAILURE
Sep t 04 15:13:49 t2app1 systemd[1]: drbd.mount: Failed with result 'exit-code'.
Sep t 04 15:13:49 t2app1 systemd[1]: Failed to mount /drbd.

# systemctl status drbd.service
● drbd.service - LSB: Control DRBD resources.
   Loaded: loaded (/etc/init.d/drbd; generated)
   Active: active (exited) since Wed 2019-09-04 15:13:53 CEST; 41s ago
     Docs: man:systemd-sysv-generator(8)
  Process: 686 ExecStart=/etc/init.d/drbd start (code=exited, status=0/SUCCESS)

Sep t 04 15:13:49 t2app1 systemd[1]: Starting LSB: Control DRBD resources....
Sep t 04 15:13:49 t2app1 drbd[686]: Starting DRBD resources:[
Sep t 04 15:13:49 t2app1 drbd[686]:      create res: r0
Sep t 04 15:13:49 t2app1 drbd[686]:    prepare disk: r0
Sep t 04 15:13:49 t2app1 drbd[686]:     adjust disk: r0
Sep t 04 15:13:49 t2app1 drbd[686]:      adjust net: r0
Sep t 04 15:13:49 t2app1 drbd[686]: ]
Sep t 04 15:13:53 t2app1 drbd[686]: WARN: stdin/stdout is not a TTY; using /dev/console.
Sep t 04 15:13:53 t2app1 systemd[1]: Started LSB: Control DRBD resources..

# systemctl status ocfs2.service
● ocfs2.service - Mount ocfs2 Filesystems
   Loaded: loaded (/lib/systemd/system/ocfs2.service; enabled; vendor preset: enabled)
   Active: active (exited) since Wed 2019-09-04 15:13:49 CEST; 1min 40s ago
     Docs: man:ocfs2(7)
           man:mount.ocfs2(8)
  Process: 796 ExecStart=/usr/lib/ocfs2-tools/ocfs2 start (code=exited, status=0/SUCCESS)
 Main PID: 796 (code=exited, status=0/SUCCESS)

Sep t 04 15:13:49 t2app1 systemd[1]: Starting Mount ocfs2 Filesystems...
Sep t 04 15:13:49 t2app1 ocfs2[796]: Starting Oracle Cluster File System (OCFS2) mount.ocfs2: I/O error on channel while opening device /dev/drbd0
Sep t 04 15:13:49 t2app1 ocfs2[796]: Failed
Sep t 04 15:13:49 t2app1 systemd[1]: Started Mount ocfs2 Filesystems.

# grep drbd /var/log/syslog
Sep  4 15:13:29 t2app1 systemd[1]: dev-drbd0.device: Dependency Before=network-online.target ignored (.device units cannot be delayed)
Sep  4 15:13:29 t2app1 systemd[1]: dev-drbd0.device: Dependency Before=network.target ignored (.device units cannot be delayed)
Sep  4 15:13:31 t2app1 systemd[1]: Unmounting /drbd...
Sep  4 15:13:31 t2app1 systemd[1021]: drbd.mount: Succeeded.
Sep  4 15:13:49 t2app1 systemd-modules-load[373]: Inserted module 'drbd'
Sep  4 15:13:49 t2app1 kernel: [    3.422172] drbd: initialized.  Version: 8.4.10 (api:1/proto:86-101)
Sep  4 15:13:49 t2app1 kernel: [    3.422173] drbd: srcversion: 9B4D87C5E865DF526864868 
Sep  4 15:13:49 t2app1 kernel: [    3.422174] drbd: registered as block device major 147
Sep  4 15:13:49 t2app1 drbd[686]: Starting DRBD resources:[
Sep  4 15:13:49 t2app1 drbd[686]:      create res: r0
Sep  4 15:13:49 t2app1 drbd[686]:    prepare disk: r0
Sep  4 15:13:49 t2app1 systemd[1]: Found device /dev/drbd0.
Sep  4 15:13:49 t2app1 systemd[1]: Mounting /drbd...
Sep  4 15:13:49 t2app1 mount[720]: mount.ocfs2: I/O error on channel while opening device /dev/drbd0
Sep  4 15:13:49 t2app1 systemd[1]: drbd.mount: Mount process exited, code=exited, status=1/FAILURE
Sep  4 15:13:49 t2app1 systemd[1]: drbd.mount: Failed with result 'exit-code'.
Sep  4 15:13:49 t2app1 systemd[1]: Failed to mount /drbd.
Sep  4 15:13:49 t2app1 kernel: [    3.964487] drbd r0: Starting worker thread (from drbdsetup-84 [727])
Sep  4 15:13:49 t2app1 kernel: [    3.964926] block drbd0: disk( Diskless -> Attaching ) 
Sep  4 15:13:49 t2app1 kernel: [    3.964984] drbd r0: Method to ensure write ordering: flush
Sep  4 15:13:49 t2app1 kernel: [    3.964986] block drbd0: max BIO size = 1048576
Sep  4 15:13:49 t2app1 kernel: [    3.964988] block drbd0: drbd_bm_resize called with capacity == 20970808
Sep  4 15:13:49 t2app1 kernel: [    3.965029] block drbd0: resync bitmap: bits=2621351 words=40959 pages=80
Sep  4 15:13:49 t2app1 kernel: [    3.965030] block drbd0: size = 10 GB (10485404 KB)
Sep  4 15:13:49 t2app1 kernel: [    3.967075] block drbd0: recounting of set bits took additional 0 jiffies
Sep  4 15:13:49 t2app1 kernel: [    3.967077] block drbd0: 8192 KB (2048 bits) marked out-of-sync by on disk bit-map.
Sep  4 15:13:49 t2app1 kernel: [    3.967080] block drbd0: disk( Attaching -> UpToDate ) 
Sep  4 15:13:49 t2app1 kernel: [    3.967082] block drbd0: attached to UUIDs 0756A08452088D5B:0000000000000000:724560592EBD27FB:724460592EBD27FB
Sep  4 15:13:49 t2app1 drbd[686]:     adjust disk: r0
Sep  4 15:13:49 t2app1 kernel: [    3.969313] drbd r0: conn( StandAlone -> Unconnected ) 
Sep  4 15:13:49 t2app1 kernel: [    3.969324] drbd r0: Starting receiver thread (from drbd_w_r0 [730])
Sep  4 15:13:49 t2app1 kernel: [    3.969443] drbd r0: receiver (re)started
Sep  4 15:13:49 t2app1 kernel: [    3.969450] drbd r0: conn( Unconnected -> WFConnection ) 
Sep  4 15:13:49 t2app1 drbd[686]:      adjust net: r0
Sep  4 15:13:49 t2app1 drbd[686]: ]
Sep  4 15:13:49 t2app1 ocfs2[796]: Starting Oracle Cluster File System (OCFS2) mount.ocfs2: I/O error on channel while opening device /dev/drbd0
Sep  4 15:13:53 t2app1 kernel: [    8.202810] drbd r0: Handshake successful: Agreed network protocol version 101
Sep  4 15:13:53 t2app1 kernel: [    8.202812] drbd r0: Feature flags enabled on protocol level: 0x7 TRIM THIN_RESYNC WRITE_SAME.
Sep  4 15:13:53 t2app1 kernel: [    8.202852] drbd r0: conn( WFConnection -> WFReportParams ) 
Sep  4 15:13:53 t2app1 kernel: [    8.202853] drbd r0: Starting ack_recv thread (from drbd_r_r0 [741])
Sep  4 15:13:53 t2app1 kernel: [    8.238525] block drbd0: drbd_sync_handshake:
Sep  4 15:13:53 t2app1 kernel: [    8.238527] block drbd0: self 0756A08452088D5A:0000000000000000:724560592EBD27FB:724460592EBD27FB bits:2048 flags:0
Sep  4 15:13:53 t2app1 kernel: [    8.238529] block drbd0: peer 2F26127377330A65:0756A08452088D5B:724560592EBD27FB:724460592EBD27FB bits:1 flags:0
Sep  4 15:13:53 t2app1 kernel: [    8.238530] block drbd0: uuid_compare()=-1 by rule 50
Sep  4 15:13:53 t2app1 kernel: [    8.238532] block drbd0: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) disk( UpToDate -> Outdated ) pdsk( DUnknown -> UpToDate ) 
Sep  4 15:13:53 t2app1 kernel: [    8.246935] block drbd0: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 25(1), total 25; compression: 100.0%
Sep  4 15:13:53 t2app1 kernel: [    8.246982] block drbd0: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 32(1), total 32; compression: 100.0%
Sep  4 15:13:53 t2app1 kernel: [    8.246984] block drbd0: conn( WFBitMapT -> WFSyncUUID ) 
Sep  4 15:13:53 t2app1 kernel: [    8.251561] block drbd0: updated sync uuid 0757A08452088D5A:0000000000000000:724560592EBD27FB:724460592EBD27FB
Sep  4 15:13:53 t2app1 kernel: [    8.251920] block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0
Sep  4 15:13:53 t2app1 kernel: [    8.252922] block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0)
Sep  4 15:13:53 t2app1 kernel: [    8.252926] block drbd0: conn( WFSyncUUID -> SyncTarget ) disk( Outdated -> Inconsistent ) 
Sep  4 15:13:53 t2app1 kernel: [    8.252935] block drbd0: Began resync as SyncTarget (will sync 8192 KB [2048 bits set]).
Sep  4 15:13:53 t2app1 kernel: [    8.253390] block drbd0: role( Secondary -> Primary ) 
Sep  4 15:13:53 t2app1 drbd[686]: WARN: stdin/stdout is not a TTY; using /dev/console.
Sep  4 15:13:58 t2app1 kernel: [   12.624923] block drbd0: Resync done (total 4 sec; paused 0 sec; 2048 K/sec)
Sep  4 15:13:58 t2app1 kernel: [   12.624926] block drbd0: updated UUIDs 2F26127377330A65:0000000000000000:0757A08452088D5B:0756A08452088D5B
Sep  4 15:13:58 t2app1 kernel: [   12.624930] block drbd0: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate ) 
Sep  4 15:13:58 t2app1 kernel: [   12.625225] block drbd0: helper command: /sbin/drbdadm after-resync-target minor-0
Sep  4 15:13:58 t2app1 kernel: [   12.626743] block drbd0: helper command: /sbin/drbdadm after-resync-target minor-0 exit code 0 (0x0)

# mount | grep drbd
[EMPTY]




Workaround 1:

Add some delay to ocfs2.service start:

# diff -ruN ocfs2.service /lib/systemd/system/ocfs2.service
--- ocfs2.service	2019-09-04 14:43:55.613155935 +0200
+++ /lib/systemd/system/ocfs2.service	2019-09-04
15:21:34.502068371 +0200
@@ -7,6 +7,7 @@
 [Service]
 Type=oneshot
 RemainAfterExit=yes
+ExecStartPre=/bin/sleep 3
 ExecStart=/usr/lib/ocfs2-tools/ocfs2 start
 ExecStop=/usr/lib/ocfs2-tools/ocfs2 stop
 ExecReload=/usr/lib/ocfs2-tools/ocfs2 restart


Workaround 2:

create an rc.local systemd unit, and place a 'mount -a' command
at there:

# cat <<EOF >> /etc/systemd/system/rc-local.service
[Unit]
Description=/etc/rc.local
ConditionPathExists=/etc/rc.local
 
[Service]
Type=forking
ExecStart=/etc/rc.local start
TimeoutSec=0
StandardOutput=tty
RemainAfterExit=yes
SysVStartPriority=99
 
[Install]
WantedBy=multi-user.target
EOF


cat <<EOF >> /etc/rc.local
#!/bin/sh -e
#
# rc.local
#
# This script is executed at the end of each multiuser runlevel.
# Make sure that the script will "exit 0" on success or any other
# value on error.
#
# In order to enable or disable this script just change the
# execution
# bits.
#
# By default this script does nothing.

/bin/sleep 3 && mount -a
 
exit 0
EOF

# chmod +x /etc/rc.local
# systemctl enable rc-local
# reboot


Hope this helps,


a.



More information about the Debian-ha-maintainers mailing list