Bug#966575: Symbol `grub_calloc' not found: AWS instance

Wed Aug 12 14:55:22 BST 2020

I've been affected by this issue on an AWS EC2 instance. 

The particular issue with AWS is that the device names 
may depend on the particular instance types; on newer 
hardware disks appear as NVMe devices, and on older 
hardware as /dev/xvd? or /dev/sd?.  The Debian cloud 
instances have unattended updates enabled and I guess 
that the grub update was installed while the instance 
was running on hardware with NVMe disks, while it had 
originally been installed when it was running on older 
hardware.  My fstab refers to the disks using UUIDs; I 
believe that some distributions may install symlinks in 
/dev to avoid problems like this but Debian doesn't 
seem to.

Rescue is not too difficult once you know how: detach 
the borked instance's root volume, attach it to another 
(temporary) instance, repair, and move it back.  To 
make it appear as the root volume when moved back you 
need to give exactly the same device name as is shown 
as "Root Device Name" in the image's AMI details; it 
took me a long time to work out that I needed to enter 
"xvda" rather than "/dev/xvda" here (YMMV).

To actually repair it I followed the advice in this bug 
to bind-mount /dev,proc,sys and chroot.  I then tried 
Colin's advice in message 184 but without success:

# dpkg-reconfigure grub-pc
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-4.19.0-10-amd64
Found initrd image: /boot/initrd.img-4.19.0-10-amd64
Found linux image: /boot/vmlinuz-4.19.0-9-amd64
Found initrd image: /boot/initrd.img-4.19.0-9-amd64
Found linux image: /boot/vmlinuz-4.9.0-5-amd64
Found initrd image: /boot/initrd.img-4.9.0-5-amd64
  WARNING: Device /dev/nvme0n1 not initialized in udev database even after waiting 10000000 microseconds.
  WARNING: Device /dev/nvme0n1p1 not initialized in udev database even after waiting 10000000 microseconds.
  WARNING: Device /dev/nvme0n1p14 not initialized in udev database even after waiting 10000000 microseconds.
  WARNING: Device /dev/nvme0n1p15 not initialized in udev database even after waiting 10000000 microseconds.
  WARNING: Device /dev/nvme1n1 not initialized in udev database even after waiting 10000000 microseconds.
  WARNING: Device /dev/nvme1n1p1 not initialized in udev database even after waiting 10000000 microseconds.
  WARNING: Device /dev/nvme0n1 not initialized in udev database even after waiting 10000000 microseconds.
  WARNING: Device /dev/nvme0n1p1 not initialized in udev database even after waiting 10000000 microseconds.
  WARNING: Device /dev/nvme0n1p14 not initialized in udev database even after waiting 10000000 microseconds.
  WARNING: Device /dev/nvme0n1p15 not initialized in udev database even after waiting 10000000 microseconds.
  WARNING: Device /dev/nvme1n1 not initialized in udev database even after waiting 10000000 microseconds.
  WARNING: Device /dev/nvme1n1p1 not initialized in udev database even after waiting 10000000 microseconds.
Found Debian GNU/Linux 10 (buster) on /dev/nvme0n1p1
done

There were a couple of curses dialogs during that asking about 
kernel command lines, for which I accepted the defaults.  Note 
that /dev/nvme0n1p1 is the rescue system's root device, not the 
one that needs repairing.  This didn't work.

So I tried again with grub-install:

# grub-install /dev/nvme1n1
Installing for i386-pc platform.
Installation finished. No error reported.

(Note nvme1n1, not nvme1 or nvme1n1p1.)  This has worked, in as 
much as the system now works again.  I take it that I should now 
dpkg-reconfigure from within the restarted system (though that 
will not prevent future breakage if I move to hardware with 
different device names, right?).

I hope a fix is planned for this; cloud images can have quite long 
uptimes so there may still be a lot of undiscovered affected systems.

Regards, Phil.