Bug#239111: Grub is shockingly bad code

Steve McIntyre steve at einval.com
Sun Jan 11 17:16:03 UTC 2009


On Sun, Jan 11, 2009 at 12:26:48PM +0100, Robert Millan wrote:
>severity 239111 important
>clone 239111 -1
>retitle -1 should refuse to install on XFS unless embedding can be used
>reassign -1 grub2
>thanks
>
>On Sun, Jan 04, 2009 at 02:25:28AM +0000, Steve McIntyre wrote:
>> 
>> After several hours of working through the source, I give up. It's a
>> total mess and I'd much rather see grub simply removed from Debian
>> altogether than fix this bug and allow it to live on. Highlights:
>> 
>>  * gratuitous use of nested functions
>>  * horrific interfaces that don't pass enough information around
>>    internally, leading to:
>>  * functions passing internal state around via umarked global
>>    variables, relying on side effects
>>  * significantly obfuscated code
>>  * the core bug as described by Rob: accessing a block device
>>    underneath an active filesystem and assuming that the results will
>>    be safe and/or correct.
>> 
>> I *know* that grub is a bootloader, so it's always going to end up
>> having some nasty lowlevel code somewhere. But that's no excuse for
>> the code quality I've just seen. After this experience, I'm about to
>> remove grub from all my systems. Come back lilo, all is forgiven.
>
>You're right.  Code in GRUB Legacy is such a mess that we would have to
>rewrite it from scratch in order to get things right.
>
>Oh, wait, we already did.  It's called GRUB 2.  Though, we're still fond of
>the nested functions.  :-)

Yes, I've looked at the grub2 source and I'm much happier. It doesn't
look like a novice had written it, which is a major improvement. I'm
still curious WTH anybody would think the nested functions are a good
thing, though...

>As for accessing a block device underneath an active filesystem, we don't
>do this anymore, except when we have no other choice.

Yay.

>For XFS, this means that as long as you install GRUB to your MBR (and
>implicitly, post-MBR area), you'll never hit this problem.  That is,
>as long as your core.img fits in 32 kiB, which we are very careful to
>ensure it will always do.
>
>This reminds me, when it doesn't, in the particular case of XFS we need to
>abort install and tell the user why.  I'm cloning this bug for GRUB 2 with
>that purpose.
>
>Robert (et al):  Sorry that you spent so much effort debugging GRUB Legacy;
>we (upstream) have abandoned this codebase long ago.
>
>As for the bug severity, this bug has been around for a while, and we never
>considered it to be Release Critical.  D-I avoids using GRUB Legacy when user
>selected XFS, and rightly so.  As I said, with GRUB 2 it's going to be
>different (if someone wants to know more about the technical details on how
>we changed our approach, feel free to contact me).

The thing I'm more bothered about right now is what will happen to
people with existing systems who upgrade. That's why Rob raised this
bug to grave: his remote system locked up as part of an upgrade. What
do you plan to do to stop this happening again?

-- 
Steve McIntyre, Cambridge, UK.                                steve at einval.com
"This dress doesn't reverse." -- Alden Spiess






More information about the Pkg-grub-devel mailing list