Bug#743512: grub-pc: grub-probe fails to locate md device (no such disk) when grub-installing
Adam Allred
adam at gtisc.gatech.edu
Thu Apr 3 15:06:16 UTC 2014
Package: grub-pc
Version: 1.99-27+deb7u2
Severity: normal
Dear Maintainer,
I have a Debian Wheezy amd64 server comprised of a Supermicro X8DTi-F motherboard (BIOS version 2.1), a Supermicro 846E16-R900B with a BPN-SAS2-846EL1 backplane, LSI 9211-8i HBA with P14 IT firmware, and two md RAID arrays. The first was a RAID10 array of 6 x 300GB intel 320 SDDs (md0) using msdos partition tables; the second is a RAID10 of 18 Seagate ST4000DM000 HDDs (md1) using GPT.
I have a need to increase the size of md0, so I began the process of replacing each of the 300GB 320s with Samsung 840 EVO 1TB drives. The following is my standard process:
root at server:/home/adam# cat /proc/mdstat
Personalities : [raid10]
md1 : active raid10 sdn2[19] sdj2[0] sdx2[17] sdv2[16] sdw2[15] sdu2[14] sdt2[13] sds2[12] sdm2[11] sdo2[9] sdr2[18] sdp2[7] sdq2[6] sdg2[5] sdl2[4] sdk2[3] sdh2[2] sdi2[1]
35161966080 blocks super 1.2 512K chunks 2 near-copies [18/18] [UUUUUUUUUUUUUUUUUU]
md0 : active raid10 sde1[10] sdd1[9] sdc1[8] sdb1[6] sda1[7] sdf1[5]
878710272 blocks super 1.2 512K chunks 2 near-copies [6/6] [UUUUUU]
unused devices: <none>
root at server:/home/adam# mdadm --fail /dev/md0 /dev/sdf1
mdadm: set /dev/sdf1 faulty in /dev/md0
root at server:/home/adam# mdadm --remove /dev/md0 /dev/sdf1
mdadm: hot removed /dev/sdf1 from /dev/md0
root at server:/home/adam# cat /proc/mdstat
Personalities : [raid10]
md1 : active raid10 sdn2[19] sdj2[0] sdx2[17] sdv2[16] sdw2[15] sdu2[14] sdt2[13] sds2[12] sdm2[11] sdo2[9] sdr2[18] sdp2[7] sdq2[6] sdg2[5] sdl2[4] sdk2[3] sdh2[2] sdi2[1]
35161966080 blocks super 1.2 512K chunks 2 near-copies [18/18] [UUUUUUUUUUUUUUUUUU]
md0 : active raid10 sde1[10] sdd1[9] sdc1[8] sdb1[6] sda1[7]
878710272 blocks super 1.2 512K chunks 2 near-copies [6/5] [UUUUU_]
unused devices: <none>
<physically replace the drive>
root at server:/home/adam# dd if=/dev/sde of=/dev/sdf bs=512 count=1
1+0 records in
1+0 records out
512 bytes (512 B) copied, 0.00303809 s, 169 kB/s
root at server:/home/adam# echo w | fdisk /dev/sdf
Command (m for help): The partition table has been altered!
Calling ioctl() to re-read partition table.
Syncing disks.
root at server:/home/adam# mdadm --add /dev/md0 /dev/sdf1
mdadm: added /dev/sdf1
root at server:/home/adam# cat /proc/mdstat
Personalities : [raid10]
md1 : active raid10 sdn2[19] sdj2[0] sdx2[17] sdv2[16] sdw2[15] sdu2[14] sdt2[13] sds2[12] sdm2[11] sdo2[9] sdr2[18] sdp2[7] sdq2[6] sdg2[5] sdl2[4] sdk2[3] sdh2[2] sdi2[1]
35161966080 blocks super 1.2 512K chunks 2 near-copies [18/18] [UUUUUUUUUUUUUUUUUU]
md0 : active raid10 sdf1[11] sde1[10] sdd1[9] sdc1[8] sdb1[6] sda1[7]
878710272 blocks super 1.2 512K chunks 2 near-copies [6/5] [UUUUU_]
[>....................] recovery = 0.0% (60160/292903424) finish=243.3min speed=20053K/sec
unused devices: <none>
root at server:/home/adam# grub-install --recheck /dev/sdf
At this point, something went wrong, only for the final 4 SSDs (sd[c-f]):
root at server:/home/adam# grub-install --recheck /dev/sdf
/usr/sbin/grub-probe: error: no such disk.
Auto-detection of a filesystem of /dev/md0 failed.
Try with --recheck.
If the problem persists please report this together with the output of "/usr/sbin/grub-probe --device-map="/boot/grub/device.map" --target=fs -v /boot/grub" to <bug-grub at gnu.org>
The output of grub-probe is as follows:
root at server:/home/adam# /usr/sbin/grub-probe --device-map="/boot/grub/device.map" --target=fs -v /boot/grub
/usr/sbin/grub-probe: info: Scanning for dmraid_nv RAID devices on disk hd0.
/usr/sbin/grub-probe: info: the size of hd0 is 1953525168.
/usr/sbin/grub-probe: info: the size of hd0 is 1953525168.
/usr/sbin/grub-probe: info: Scanning for dmraid_nv RAID devices on disk hd1.
/usr/sbin/grub-probe: info: the size of hd1 is 1953525168.
....
....
....
/usr/sbin/grub-probe: info: the size of hd23 is 7814037168.
/usr/sbin/grub-probe: info: no LVM signature found.
/usr/sbin/grub-probe: info: opening mduuid/48fdedd049d45f9a3963420a24b16940.
/usr/sbin/grub-probe: error: no such disk.
The mduuid is correct:
root at server:/home/adam# mdadm --examine --scan
ARRAY /dev/md/1 metadata=1.2 UUID=532ef032:cd182e40:575dfcd5:d6c2aa73 name=server:1
ARRAY /dev/md/0 metadata=1.2 UUID=48fdedd0:49d45f9a:3963420a:24b16940 name=server:0
What's really mind-boggling is that if I do this:
root at server:/home/adam# mdadm --examine /dev/sdf1
/dev/sdf1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x2
Array UUID : 48fdedd0:49d45f9a:3963420a:24b16940
Name : server:0 (local to host server)
Creation Time : Thu Nov 29 21:55:02 2012
Raid Level : raid10
Raid Devices : 6
Avail Dev Size : 585807872 (279.33 GiB 299.93 GB)
Array Size : 878710272 (838.00 GiB 899.80 GB)
Used Dev Size : 585806848 (279.33 GiB 299.93 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Recovery Offset : 329516928 sectors
State : active
Device UUID : ae156010:d7785167:59f3471c:8a17568d
Update Time : Thu Apr 3 14:57:29 2014
Checksum : 31857985 - correct
Events : 135343
Layout : near=2
Chunk Size : 512K
Device Role : Active device 5
Array State : AAAAAA ('A' == active, '.' == missing)
Then run grub-install again:
root at server:/home/adam# grub-install --recheck /dev/sdf
Installation finished. No error reported.
Then it completes with no problem. This consistently worked for all four of the drives which exhibited this behavior. I have, on rare occasions, encountered the same problem in the past; a reboot of the server also "fixed" it.
I checked for bug reports upstream to see if this was corrected in a newer version, but I did not see anything that pertained to my issue. Given I have a simple and effective workaround, it's not a big problem; it's just weird.
More information about the Pkg-grub-devel
mailing list