Bug#478238: grub-probe: fails to find drive for /dev/sda10
Török Edwin
edwintorok at gmail.com
Sun May 11 11:18:58 UTC 2008
[sending to grub-devel@ as requested]
Robert Millan wrote:
> On Sun, May 04, 2008 at 05:01:32PM +0300, Török Edwin wrote:
>
>>>> Device Boot Start End Blocks Id System
>>>> /dev/sda1 * 1 1275 10241406 7 HPFS/NTFS
>>>> /dev/sda2 1276 2248 7815622+ a6 OpenBSD
>>>> /dev/sda3 2249 5289 24426832+ f W95 Ext'd (LBA)
>>>> /dev/sda4 6080 7296 9775552+ bf Solaris
>>>> /dev/sda5 2249 2371 987966 82 Linux swap / Solaris
>>>> /dev/sda6 2372 3587 9767488+ 83 Linux
>>>> /dev/sda7 3588 3600 104391 83 Linux
>>>> /dev/sda8 3601 4863 10145016 8e Linux LVM
>>>> /dev/sda9 4864 5228 2931831 a6 OpenBSD
>>>> /dev/sda10 5229 5289 489951 83 Linux
>>>>
>> [...]
>> grub> ls (hd0,10)
>> error: unknown device
>> grub> ls (hd0,11)
>> error: unknown device
>> grub>
>>
>
> I tried reproducing your setup, but I can't hit the same bug. This starts to
> look really nasty. Just spotted this:
>
> /build/buildd/grub2-1.96+20080426/partmap/pc.c:141: partition 0: flag 0x80, type 0x7, start 0x3f, len 0x1388afc
> [...]
> /build/buildd/grub2-1.96+20080426/partmap/pc.c:141: partition 0: flag 0x0, type 0x82, start 0x2270f07, len 0x1e267c
>
> for which I can't find any explanation other than memory corruption. Also,
> due to a missing fflush() call the output is somewhat scrambled, which makes
> it harder to track (I fixed this already in upstream).
>
> Could you:
>
> - Apply the attached patch & run grub-probe again (this time output
> will be a bit more readable)
>
There was no patch attached, however I did a 'cvs diff -u -D2008-04-30',
and applied that patch.
I found what the problem is, and it also explains why you couldn't
reproduce the problem.
/dev/sda9 is not a valid OpenBSD partition, and in partmap/pc.c:176 the
iteration fails with an error: invalid disk label magic 0x%x.
If I replace that return with a continue, it works.
The problem is that grub2 stops looking for more partitions as soon as
it encountered the invalid partition,
grub 0.97 was working perfectly and I never noticed the partition has
the wrong type!
Also if I change the partition type to 83 (as it should be) an unpatched
grub-probe can find that /boot is on /dev/sda10:
# grub-probe -t device /boot
/dev/sda10
I think grub2 should handle errors more gracefully, eventually mark the
partition as invalid, and keep going.
grub-probe was looking for /dev/sda10, and it shouldn't be affected by
/dev/sda9 being corrupted/invalid.
Think of it this way: if a partition gets corrupted, that shouldn't
prevent from booting, assuming the boot and root partitions are still ok.
Compare what grub-emu says when sda9 has wrong type:
grub> ls (hd0,10)
error: unknown device
And this is what it says when sda9 has the correct type:
grub> ls (hd0,10)
Partition hd0,10: Filesystem type ext2, Label debian_BOOT
> - Send it to grub-devel at gnu.org
>
Done
> ?
>
> Maybe someone there has an idea, but if it's memory corruption and we can't
> reproduce it, tracing the problem remotely isn't going to work very well.
>
It wasn't memory corruption, however I have run valgrind and it has
shown some leaks, plus call to stat() with NULL parameter.
The attached patch fixes some valgrind warnings. Some leaks still
remain, I attached the new valgrind logs.
P.S.: grub2 seems to work now, I am able to boot with it with the
text-mode menu. The default graphics mode doesn't work I will open a
separate bug about that.
Best regards,
--Edwin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: grub2.patch
Type: text/x-diff
Size: 1032 bytes
Desc: not available
Url : http://lists.alioth.debian.org/pipermail/pkg-grub-devel/attachments/20080511/7658484b/attachment.patch
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: vallog
Url: http://lists.alioth.debian.org/pipermail/pkg-grub-devel/attachments/20080511/7658484b/attachment.txt
More information about the Pkg-grub-devel
mailing list