[Pkg-xen-devel] Bug#988333: linux-image-5.10.0-6-amd64: VGA Intel IGD Passthrough to Debian Xen HVM DomUs not working, but Windows Xen HVMs do work

Chuck Zmudzinski brchuckz at netscape.net
Tue Oct 26 15:46:00 BST 2021


On 10/26/2021 10:06 AM, Chuck Zmudzinski wrote:
> On 10/25/2021 4:45 PM, Chuck Zmudzinski wrote:
>> On 10/23/2021 11:11 AM, Hans van Kranenburg wrote:
>>> Hi!
>>>
>>>> On 5/10/2021 1:33 PM, Chuck Zmudzinski wrote:
>>>>> [...] with buster and bullseye running as the Dom0, I can only get 
>>>>> the VGA/Passthrough feature to work with Windows Xen HVMs. I would 
>>>>> expect both Windows and Linux HVMs to work comparably well.
>>>
>>> A possible time-saver that I can recommend is to send a post to the
>>> upstream xen-users list [0] about this already. Like "Hi all, I'm
>>> starting a HVM Linux domU with Linux 5.10.70 on a Xen 4.14.3 system 
>>> with
>>> also 5.10.70 dom0 kernel, with this and this domU config file. It fails
>>> to start, this is the xl -vvv create output, and this error (the irq
>>> stuff) appears in the dom0 kernel log.". Try to keep it simple and not
>>> too long initially, without the surrounding stories, to increase chance
>>> of it being fully read.
>>
>> I can do this soon - I have some more interesting tests to share
>> here and with the Xen developers upstream.
>
> I will need to think a little about how to present this bug to
> the Xen upstream developers in a short and simple enough way
> for them to be likely to read it initially. For now, I will report here
> some results from the journal log entries of both Bullseye dom0
> and Bullseye domU for two different configurations. These logs
> are not generated with the -vvv option, but they do provide
> quite a bit of interesting information and are already
> somewhat overwhelming, even without the -vvv option. So
> I will hold off for now before making the logs even more verbose
> with -vvv.
>
> The intention of this message is to provide detailed logs for a
> detailed analysis of the problem, not to describe the problem
> in simple terms.
>
> A few days ago I ran two tests, and I have four different log
> files attached from those tests. In both tests, the Bullseye
> HVM was configured for PCI/IGD passthrough using the
> domain config file and preparation for passthrough in dom0
> described in the earlier message #31:
>
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=988333#31
>
> The two tests were:
>
> 1. Bullseye dom0, Debian 11.1 / Bullseye HVM domU, Debian 11.1
>
> This first test essentially confirmed that the updated versions
> of the packages for both Bullseye dom0 and Bullseye domU
> since the original report five months ago do not fix the
> problem. In this test case, I am using all the official packages
> of Debian 11.1 (Bullseye).
>
> It is important to note that the version of the device
> model used in this test is the official upstream version
> of qemu for Bullseye. On Debian, Xen uses by default the
> qemu-system-i386 binary from the qemu-system-x86
> package, and Bullseye currently uses qemu version
> 5.2+dfsg-11+deb11u1 as the default device model.
>
> I attached two log files from this test:
> qemu-upstream-hvm.txt and qemu-upstream-dom0.txt.
> They are the logged journal entries for the Bullseye HVM
> and Bullseye dom0 domains, respectively. They are fairly
> complete logs, showing the kernel version running in both
> the dom0 and the HVM, the kernel command line for both
> the dom0 and the domU, the command that was used to
> create the HVM domain, etc.
>
> One might recall that in the original report I said it was
> difficult to capture logs from the domU, but this time I was able
> to capture the log by waiting a few minutes before shutting
> it down. I also discovered, in contrast to what I said in the
> earlier report, that it is possible to gracefully shut down the
> domU using xl shutdown <dom id> by waiting long enough
> before trying to shut it down, and also it takes a few minutes
> instead of the normal few seconds to shut it down because
> of the problems caused by this configuration. By waiting
> for the graceful shutdown instead of using xl destroy <dom id>,
> I was able to view the log of the attempted boot in the domU
> on a subsequent normal boot (without PCI passthrough) using
> journalctl, and capture some useful Call Traces.
>
> For this first test, although there is a successful shut down,
> the domain is never built to the point where one can login,
> neither at the terminal nor remotely via ssh. But the boot
> messages were displayed on the passed through video
> device, but only very slowly, it took almost two minutes
> before the boot messages started to appear and it also
> took a couple of minutes after issuing the xl shutdown
> command in dom0 before it indicated on the passed
> through video device that the HVM domain shut down
> and powered off.
>
> The second test:
>
> 2. Same as first test, except use the qemu traditional device
> model instead of the qemu upstream model which on Debian
> comes from the qemu-system-x86 package.
>
> I also attached two log files from this test:
> qemu-traditional-hvm.txt and qemu-traditional-dom0.txt,
> and these also are fairly complete logs showing the kernel
> version in use, etc.
>
> Since Debian does not provide the traditional device model,
> I had to build it from xenbits.xen.org:
>
> https://xenbits.xen.org/gitweb/?p=qemu-xen-traditional.git;a=shortlog;h=refs/heads/stable-4.14 
>
>
> I also had to build a modified hvmloader with rombios support
> as required by the traditional qemu device model, and that can
> be done fairly easily with a slight modification to the build of the
> xen-utils-4.14 binary package for amd64.
>
> This device model and rombios is invoked by uncommenting the
> device_model_version = 'qemu-xen-traditional' line in the domain
> configuration file after installing the updated hvmloader file with
> rombios support and the qemu-dm binary under
> /usr/lib/xen-4.14/boot/hvmloader and
> /usr/lib/xen-4.14/bin/qemu-dm respectively.
>
> I accomplished this by creating a binary Debian package
> called xen-qemu-tradtional-4.14 which installs these two files
> and diverts the official hvmloader binary in xen-utils-4.14 to
> hvmloader.norombios.
>
> I verified my build of qemu-xen-traditional is correct
> enough to successfully pass through the PCI devices,
> including the Intel IGD, to a Windows 10 HVM using the
> traditional qemu device model and a Bullseye Xen dom0.
>
> In this configuration, the Bullseye HVM booted quickly and
> I was able to login remotely to it via ssh. This result shows
> the crash is not nearly as catastrophic as for the
> case when the upstream qemu device model is used.
> But there is no output on the display and there is still
> a crash and call trace with this test, but *only* in the domU.
> In this test, it was the i915 kernel module that crashed
> in the domU, and there is some useful information in
> the attached qemu-traditional-hvm.txt log file that should
> help diagnose the problem. This is in contrast to the first
> test with the upstream device model where a call trace of a
> crash appears in the journal of *both* the domU and the dom0.
> Another key difference between the two tests is that in the
> first test with the upstream qemu device model, the crash
> indicates a failure by the Xen hypervisor and/or Linux kernel
> to handle an IRQ instead of a failure in the i915 kernel
> module that occurs in the second test with the traditional
> qemu device model.
>
> It is not surprising that the behavior of the HVM domU depends
> not only on the hypervisor version but also on the qemu
> device model version because the virtual firmware seen by
> the domU depends on both the hypervisor and the device
> model running in dom0 to support the HVM domU, and also
> on the different bios versions used: rombios for the traditional
> device model and seabios for the upstream device model. So
> many different components makes it take a while to narrow
> down the problem.
>
> The logs contain some explanatory comments and are redacted
> to try to remove private data such as mac addresses and
> IP addresses.
>
> All the best,
>
> Chuck


Two more clarifications that might be needed to know how to
repeat the two tests described in the previous message:

1) To use the traditional device model, it is also necessary to
comment out the device_model_version = 'qemu-xen' line in
the domain configuration file in addition to uncommenting the
device_model_version = 'qemu-xen-traditional' line.

2). In both tests, the command line options for the hypervisor,
Debian version 4.14.3-1~deb11u1, was:

Command line: placeholder dom0_mem=3G,max:3G smt=false pv-l1tf=false iommu=1

My system uses UEFI booting of grub without secure boot, and
it boots the xen-4.14-amd64.gz hypervisor file from the xen
hypervisor package.

All the best,

Chuck



More information about the Pkg-xen-devel mailing list