[Pkg-libvirt-maintainers] Bug#1111337: Bug#1111337: libvirt: After upgrading to trixie, some kvm guests never successfully boot
Andrea Bolognani
eof at kiyuko.org
Sun Aug 17 14:22:46 BST 2025
On Sun, Aug 17, 2025 at 01:04:13AM +0000, Kenneth J. Pronovici wrote:
> Source: libvirt
> Version: 11.3.0-3
> Severity: important
>
> I have a hypervisor (sol) with 4 guests (mercury, venus, mars, and
> jupiter). The hypervisor and first three guests run Debian. The 4th
> guest runs Home Assistant. The Debian guests have been running in
> roughly their current configuration since 2016 (wheezy) and have been
> upgraded since then for each Debian release.
>
> I have been working to upgrade all of the Debian guests to trixie. I
> followed the same process as usual: I upgraded each guest one at a time,
> rebooted, and tested. Once all guests were working, I upgraded the
> hypervisor.
This is a very sensible way to go about upgrading an hypervisor host,
well done :)
> After rebooting the hypervisor, none of the guests were
> running, and neither was the virtual network. I started the network
> manually, but I haven't managed to get any of the guests working.
>
> The initial problem with the guests was that their machine type
> pc-i440fx-2.1 was no longer supported. I adjusted this to
> pc-i440fx-2.4, and virsh accepted that change. Now, I can start the
> guests, and they report as running, but they never do anything useful.
The 2.1 machine types have been dropped from QEMU, so switching to
newer ones is the right thing to do. That said, I would probably pick
the most recent machine type (10.0) instead of the oldest one (2.4),
as that would give you the most runway. The 2.4 machine types are
almost certainly going to be absent from the QEMU version that will
end up in forky, and adopting them today pretty much guarantees that
you'll be forced to go through this process again in two years' time.
> As far as I can tell, nothing is ever even written to the console. The
> guests definitely never come up far enough to respond to a ping or
> accept an SSH connection. If I try to cleanly shut down a guest, it
> never stops. I have to use destroy instead of shutdown. When I attempt
> to start a guest, I get a new kvm process that immediately uses 100% of
> a CPU and stays like that for at least 30 minutes (which is as long as
> I've been willing to wait).
>
> Oddly, the jupiter guest (Home Assistant) boots without problems and
> seems to be working fine. So, I assume there must be something
> wrong/different about the Debian guests that conflicts with the new
> libvirt version in trixie, but can't figure out what that might be.
>
> I compared the current state of each guest against my pre-upgrade backup
> of /etc/libvirt, and none of the changes are surprising. Besides
> the change to the machine name, there are some minor changes to some XML
> stanzas which libvirt seems to make on its own. (I removed them and they
> came back.) All of the other changes within /etc/livirt came with as
> part of the upgrade to trixie.
>
> The main difference I see is that jupiter uses <os firmware='efi'>, and
> its only disk is a .qcow2 file. The other guests do not use EFI and
> their disks are raw block devices instead.
They're not using EFI "indirectly" through a legacy configuration
with a <loader> element either, right? And I assume you've confirmed
that the raw block devices are still present at the expected paths.
> I hesitate to attach configuration here, because I'm not sure what will
> be useful to you. Let me me know what questions you have, and I can
> attach configuration or run other diagnostics.
If libvirt is able to start the guests at all, that rules out most of
the common failure modes (e.g. failing to access the disk files or
rejecting an invalid configuration). It seems more likely than the
issue would be on the firmware/guest OS side, though of course it
could also be that libvirt is not setting things up in a way that
allows the firmware/guest OS to do anything useful.
You could start by collecting some debug logs[1] and checking that
there is nothing suspicious in the journal, such as AppArmor denials.
Another useful attempt would be to eschew the existing configuration
for one of the guest and import it from scratch using virt-install.
Assuming the needs are not too specific, that should result in a
working configuration, which we could then compare with the previous
one to try and identify the issue.
[1] https://libvirt.org/kbase/debuglogs.html
--
Andrea Bolognani <eof at kiyuko.org>
Resistance is futile, you will be garbage collected.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://alioth-lists.debian.net/pipermail/pkg-libvirt-maintainers/attachments/20250817/8a87c44c/attachment.sig>
More information about the Pkg-libvirt-maintainers
mailing list