Bug#782793: systemd: ext4 filesystem on lvm on raid causes boot to enter emergency shell
Rick Thomas
rbthomas at pobox.com
Wed Apr 22 11:10:28 BST 2015
Hi Michael,
Thanks very much for helping me with this. (continued following quoted material)
On Apr 21, 2015, at 11:17 AM, Michael Biebl <biebl at debian.org> wrote:
> control: tags -1 moreinfo unreproducible
>
> Am 18.04.2015 um 02:02 schrieb Rick Thomas:
>>
>> On Apr 17, 2015, at 3:44 PM, Michael Biebl <biebl at debian.org> wrote:
>>>
>>> Thanks for the data.
>>> Looks like an lvm issue to me:
>>>
>>> root at cube:~# lvscan
>>> inactive '/dev/vg1/backup' [87.29 GiB] inherit
>>>
>>> and as a result, /dev/disk/by-label/BACKUP is missing.
>>
>> Yes, that’s true, of course. But the question is, what is keeping lvm from activating the volume?
>>
>> It works fine for a logical volume on a single physical disk. And /proc/mdstat shows that the RAID device, /dev/md127, _is_ active. Or, at least it is when we get to emergency mode… I don’t know if it’s active when the fsck times out, of course… If you know how to figure that out from the systemd journal I attached to the original bug report, or any other way that I can try, I’d appreciate any assistance you can give!
>
> fwiw, I tried to reproduce the problem in a VM with two additional disks
> attached and a setup like the following:
>
> ext4 on RAID1 (via mdadm)
> ext4 on LVM on RAID1 (mdadm)
> ext4 on LVM
> ext4 on dos partition.
>
> All partitions were correctly mounted during boot without any issues.
>
>
> Is this a fresh jessie installation or an upgraded system?
> Do you have any custom udev rules in /lib/udev/rules.d or /etc/udev/rules.d?
>
> If it's an upgraded system and you have the sysvinit package installed,
> you can try booting with sysvinit temporarily via
> init=/lib/sysvinit/init on the kernel command line.
>
> Does that work?
My physical setup is this: The hardware is a quad-core armv7 Cubox i4pro ( https://wiki.debian.org/ArmHardFloatPort/CuBox-i )
With some help from Karsten Merker, I got a plain-vanilla — un-modified — Jessie installed on it to use for experimenting. I wanted experience with the Cubox hardware and with using Jessie in a “real life” situation.
The boot (and system residency: root, swap, /home, /var — the works) is on a 32GB microSD card.
I’ve added to that an eSATA 1TB hard disk (currently configured as single filesystem using LVM) and a 7-port USB2.0 hub with 5 of the ports each holding a 32GB USB-Flash stick. Those 5 devices are configured as a software (md) RAID6 array (I wanted to get some experience with RAID6) providing about 90GB of useful space configured with LVM as a single logical volume.
It’s the RAID6 array (or rather the lv on it) that is having the problem.
I’ve managed to make it work using a cron script that runs at reboot time (crontab has:
@reboot bash -x bin/mount-backup
and the mount-backup script looks like this:
###################################
logger -s -t mount-backup 'Mounting the /backup filesystem'
(
let count=10 # don’t try uselessly forever if it fails
# If it doesn’t exist, take remedial action.
while [ ! -h /dev/disk/by-label/BACKUP ]
do
let count=$count-1
[ $count -lt 1 ] && exit 1
sleep 10 # give things some time to settle
cat /proc/mdstat # show some debugging information
# see if the raid has assembled and can be used
/sbin/vgchange -ay
done
# If the fsck isn’t perfect, quit and wait for human intervention
/sbin/fsck -nf /backup && /bin/mount /backup
) | logger -s -t mount-backup
###################################
This works. Interestingly, without the sleep loop the vgchange fails.
Now, you say that a VM with two virtual disks configured as RAID1 with a logical volume works fresh out of the box. This makes me wonder if it’s some kind of a timing problem… It takes a few seconds for the freshly rebooted system to find the USB-Flash sticks and assemble them. So some time-out is triggered in the systemd stuff on my setup, while your setup has no such physical constraints — everything is available immediately.
That’s just a guess… But fortunately, it’s a testable guess!
My setup is (at the moment) just for experimentation and learning — no actual useful work. So I can re-install it at will or make any changes I need to to track this down.
Your suggestion about trying sysvinit will be a good place to start. If that works with my workaround script disabled, the next experiment will be to try systemd with a rootdelay=10. I will also try the VM setup, just to see if I can replicate your result. After that, I’m not sure — any suggestions will be appreciated!
I’ll get back to you when I’ve made those tests. Real-life(TM) will probably prevent me from doing that before the week-end.
Enjoy! And Thanks for all the help!
Rick
More information about the Pkg-systemd-maintainers
mailing list