Bug#891434: grub-efi: System fails to boot after "No space left on device" on EFI variable storage

Colin Watson cjwatson at debian.org
Mon Oct 29 00:06:13 GMT 2018


On Sun, Feb 25, 2018 at 04:13:13PM +0100, Ralf Jung wrote:
> earlier today I did a system update, which completed successfully (as in, dpkg
> didn't stop due to an error).  I then rebooted my machine.  This left Linux
> unable to boot; only the Windows entry was left in the boot menu.  After some
> hours of debugging, the problem turned out to be that writing an EFI variable
> fails with "No space left on the device".  I did a firmware update (from
> Windows), to no avail.  In the end I booted into a live system, deleted some of
> the "dump-type0-*" variables, rebooted, and then ran "grub-install" from a
> chroot to fix the situation.
> 
> I'm not exactly sure what went wrong here, but clearly the system shouldn't be
> put into an unbootable state ever.  I see two bugs here:
> 
> * First, it looks like something is filling up the EFI variable space.  I've
>   added an `ls -lah` of the evivars folder below.  This is after I deleted
>   roughly 20-30 "dump-type0-*" variables.  Is this the kernel dumping
>   information (about crashes or so)?  If yes, it seems to do so without ever
>   cleaning up or taking free space into account, which I'd consider a serious
>   bug.  Should I report this against the kernel?  I don't even know what creates
>   those EFI variables.

Those are created by the efi_pstore_write function in the kernel.
Beyond that I'm not really familiar with what's going on - you should
ask Debian's kernel folks if you need to pursue this.

> * Second, does grub-install really have to delete and create EFI variables even
>   when nothing changed?  It seems to me that writing an EFI variable is only
>   necessary when initially installing GRUB.  Even if writing is necessary, a
>   check could be done *before* deleting the boot entry whether it will be
>   possible to write it again later.  Right now, it seems that grub will happily
>   delete the debian boot entry and then fail to create it again -- and this
>   doesn't even make the system update fail.

Fixing this does seem like it would be a good idea for general
robustness against dodgy firmware (this is not the first iteration of
problems along these lines).  It would take some development work, but
hopefully not too much.

Things that GRUB can't do, as far as I can tell:

 * I don't think there's a way for GRUB to check whether it will be
   possible to recreate a boot entry later; as I understand it, that
   depends on various low-level details, including firmware-specific
   quirks.
   
 * Even detecting that nothing changed would require cooperation from
   efibootmgr, since the encoding of the EFI variable is an
   implementation detail there (so we can't just read it out and
   compare), and efibootmgr doesn't expose a way for GRUB to say "set
   this configuration, but only if it's different from what's already
   there".

However, I think GRUB can at least manage to delete all but one entry
from the same distributor rather than all of them, and if it finds one
remaining entry then it can modify that rather than writing a brand new
variable.  As I understand it, that would probably be enough to fix this
bug?

-- 
Colin Watson                                       [cjwatson at debian.org]



More information about the Pkg-grub-devel mailing list