Bug#1017944: grub-xen-host: 2.06-3 crashes PV guests in early boot
Thomas Keppler
winfr34k at gmail.com
Mon Aug 22 18:11:08 BST 2022
Package: grub-xen-host
Version: 2.06-3
When used as `kernel` for a PV guest, the VM fails to boot immediately:
# xl dmesg
root at hypervisor:~# xl dmesg -c
(XEN) printk: 3 messages suppressed.
(XEN) d163v0 Unhandled page fault fault/trap [#14, ec=0003]
(XEN) Pagetable walk from 000000000062702c:
(XEN) L4[0x000] = 000000081a023067 0000000000000623
(XEN) L3[0x000] = 000000081a024067 0000000000000624
(XEN) L2[0x003] = 000000081a028067 0000000000000628
(XEN) L1[0x027] = 001000081a027025 0000000000000627
(XEN) domain_crash_sync called from entry.S: fault at ffff82d04030330e x86_64/entry.S#create_bounce_frame+0x135/0x157
(XEN) Domain 163 (vcpu#0) crashed on cpu#2:
(XEN) ----[ Xen-4.14.5 x86_64 debug=n Not tainted ]----
(XEN) CPU: 2
(XEN) RIP: e033:[<000000000041f42f>]
(XEN) RFLAGS: 0000000000000282 EM: 1 CONTEXT: pv guest (d163v0)
(XEN) rax: 0000000000000080 rbx: 0000000000000000 rcx: 0000000000000000
(XEN) rdx: 0000000000000000 rsi: 000000000061f000 rdi: 0000000000000000
(XEN) rbp: 0000000000000000 rsp: 0000000000629fc0 r8: 0000000000000000
(XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000
(XEN) r12: 0000000000000000 r13: 0000000000000000 r14: 0000000000000000
(XEN) r15: 0000000000000000 cr0: 0000000080050033 cr4: 0000000000162660
(XEN) cr3: 000000081a022000 cr2: 000000000062702c
(XEN) fsb: 0000000000000000 gsb: 0000000000000000 gss: 0000000000000000
(XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e02b cs: e033
(XEN) Guest stack trace from rsp=0000000000629fc0:
(XEN) 0000000000000000 0000000000000000 0000000000000003 000000000041f42f
(XEN) 000000010000e030 0000000000010082 000000000062a000 000000000000e02b
(XEN) 0000000000000080 0000000000000000 0000000000000000 0000000000000000
(XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) 0000000000000080 0000000000000000 0000000000000000 0000000000000000
(XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Probably, the important bit here is:
(XEN) d163v0 Unhandled page fault fault/trap [#14, ec=0003]
in conjunction with:
(XEN) RIP: e033:[<000000000041f42f>]
which I couldn't get to yield something using `addr2line`, but maybe I'm just doing something wrong here.
Booting "regularly" with linux/ramdisk/extra arguments and grub-xen-host 2.04-20 works as expected.
Interestingly, grub-xen-bin 2.06-3 _inside_ the DomU doesn't cause any issues (even in conjunction with 2.04-20 on the host's side); It's only when used as a stub from the Dom0 side.
This behavior is reproducible on Debian 11.4, Kernel 5.10.0-17-amd64 and Xen 4.14.5+24-g87d90d511c-1 on an Intel Xeon E3-1225 v3 based box. My DomUs are also running Debian with the same general package versions, however, this doesn't seem to be relevant fot this bug as an "empty" guest leads to the same result.
More information about the Pkg-grub-devel
mailing list