Bug#610184: Some debugging of superfluous RAID member
Anthony DeRobertis
aderobertis at metrics.net
Mon Jan 17 22:01:06 UTC 2011
First off, DEB_BUILD_OPTIONS nostrip,noopt does not work right.
Grub-probe is never build with -g -O0. Hacked up ./debian/rules a little
to force it (added CFLAGS='-g -O0' to confflags). Also, removed most of
the packages from debian/control so I wouldn't have to wait through N
flavors of grub building.
I'm comparing grub-probe --target=device / on two machines with very
similar RAID configs (both RAID1 /boot, 4 disks, no spares & RAID10
everything else (via lvm) with 4 disks, no spares).
The difference seems to be than on the machine with the errors,
grub_mdraid_detect on hd0 returns GRUB_ERR_NONE, whereas on the other
machine it returns GRUB_ERR_OUT_OF_RANGE. This is because it found the
version 1.0 superblock actually belonging to /dev/sda2 when looking for
the one for /dev/sda.
I suppose the alignment just doesn't work out on the other machine.
"Gond" is the machine that shows errors. "Zia" is the one that does not.
Zia:
root at Zia:/home/anthony# sfdisk -d /dev/sda
# partition table of /dev/sda
unit: sectors
/dev/sda1 : start= 63, size= 256977, Id=fd, bootable
/dev/sda2 : start= 257040, size= 257040, Id= c
/dev/sda3 : start= 514080, size=1953005985, Id=fd
/dev/sda4 : start= 0, size= 0, Id= 0
(gdb) p sector
$18 = 1953525152
(gdb) n
401 if (grub_disk_read (disk, sector, 0, sizeof (struct grub_raid_super_1x),
(gdb) n
405 if (sb_1x.magic == SB_MAGIC)
(gdb) p sb_1x
$19 = {magic = 0, major_version = 0, feature_map = 0, pad0 = 0,
set_uuid = '\000' <repeats 15 times>, set_name = '\000' <repeats 31 times>, ctime = 0,
level = 0, layout = 0, size = 0, chunksize = 0, raid_disks = 0, bitmap_offset = 0,
new_level = 0, reshape_position = 0, delta_disks = 0, new_layout = 0, new_chunk = 0,
pad1 = "\000\000\000", data_offset = 0, data_size = 0, super_offset = 0, recovery_offset = 0,
dev_number = 0, cnt_corrected_read = 0, device_uuid = '\000' <repeats 15 times>,
devflags = 0 '\000', pad2 = "\000\000\000\000\000\000", utime = 0, events = 0,
resync_offset = 0, sb_csum = 0, max_dev = 0, pad3 = '\000' <repeats 31 times>,
dev_roles = 0x7fffffffcc90}
(gdb) p *disk
$20 = {name = 0x671090 "hd0", dev = 0x6425c0, total_sectors = 1953525168, has_partitions = 1,
id = 0, partition = 0x0, read_hook = 0, data = 0x6763b0}
And Gond:
root at Gond:~# sfdisk -d /dev/sda
# partition table of /dev/sda
unit: sectors
/dev/sda1 : start= 2048, size= 262144, Id=fd, bootable
/dev/sda2 : start= 264192, size=976508976, Id=fd
/dev/sda3 : start= 0, size= 0, Id= 0
/dev/sda4 : start= 0, size= 0, Id= 0
(gdb) p sector
$10 = 976773152
(gdb) n
401 if (grub_disk_read (disk, sector, 0, sizeof (struct grub_raid_super_1x),
(gdb) n
405 if (sb_1x.magic == SB_MAGIC)
(gdb) p sb_1x
$11 = {magic = 2838187772, major_version = 1, feature_map = 1, pad0 = 0,
set_uuid = "\265\363\344\263yw\303H\342\"\203\373w\235n\217",
set_name = "Gond:1", '\000' <repeats 25 times>, ctime = 1279225379, level = 10, layout = 258,
size = 976507904, chunksize = 1024, raid_disks = 4, bitmap_offset = 4294967288, new_level = 0,
reshape_position = 0, delta_disks = 0, new_layout = 0, new_chunk = 0, pad1 = "\000\000\000",
data_offset = 0, data_size = 976508704, super_offset = 976508960, recovery_offset = 0,
dev_number = 0, cnt_corrected_read = 0, device_uuid = "\361\344Wp+\006\245\346(?\307\027[Lj.",
devflags = 0 '\000', pad2 = "\000\000\000\000\000\000", utime = 1295300868, events = 36368,
resync_offset = 18446744073709551615, sb_csum = 2056702893, max_dev = 384,
pad3 = '\000' <repeats 31 times>, dev_roles = 0x7fffffffd840}
(gdb) p *disk
$13 = {name = 0x671090 "hd0", dev = 0x6425c0, total_sectors = 976773168, has_partitions = 1,
id = 0, partition = 0x0, read_hook = 0, data = 0x6710b0}
So, this bug has probably been present for a while, but maybe didn't
matter until the recent error checks?
Somehow there is a way around this, as it doesn't confuse mdadm -E:
root at Gond:~# mdadm -E /dev/sda
mdadm: No md superblock detected on /dev/sda.
So, it can be fixed... One thing that quickly jumps out at me, is maybe
have grub_mdraid_detect check if sector == sb_1x.super_offset.
I can work up a patch to do that if you'd like.
More information about the Pkg-grub-devel
mailing list