Bug#799480: grub-xen-host: XEN domU crash when PV grub chainloads 32-bit domU grub

Andreas Sundstrom sunkan+debian.bugs at zappa.cx
Sat Sep 19 16:49:33 UTC 2015


Package: grub-xen-host
Version: 2.02~beta2-22
Severity: important

Dear Maintainer,

Using 64-bit dom0 and 32-bit domU PV (para-virtualized) grub sometimes
fail when chainloading the domU's grub. 64-bit domU seem to work 100%
of the time.

My understanding of the process:
* dom0 launches domU with grub that is loaded from dom0's disk.
* Grub reads config file from memdisk, and then looks for grub binary
  in domU filesystem.
* If grub is found in domU it then chainloads (multiboot) that grub
  binary and the domU grub reads grub.cfg and continue booting.
* If grub is _not_ found in domU it reads grub.cfg and continues with
  boot.

It fails at step 3 in my list of the boot process, but sometimes it
does work so it may be something like a race condition that causes the
problem?

A workaround is to not install or rename /boot/xen in domU so that the
first grub that is loaded from dom0's disk will not find the grub
binary in the domU filesystem and hence continues to read grub.cfg and
boot. The drawback of this is of course that the two versions can't
differ too much as there are different setups creating grub.cfg and
then reading/parsing it at boot time.

I am not sure at this point whether this is a problem in XEN or a
problem in grub but I compiled the legacy pvgrub that uses some minios
from XEN (don't really know much more about it) and when that legacy
pvgrub chainloads the domU grub it seems to work 100% of the time. Now
the legace pvgrub is not a real alternative as it's not packaged for
Debian though.

When it fails "xl create vm -c" outputs this:
Parsing config from /etc/xen/vm
libxl: error: libxl_dom.c:35:libxl__domain_type: unable to get domain type for domid=16
Unable to attach console
libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: console child [0] exited with error status 1

And "xl dmesg" shows errors like this:
(XEN) traps.c:2514:d15 Domain attempted WRMSR 00000000c0010201 from 0x0000000000000000 to 0x000000000000ffff.
(XEN) d16:v0: unhandled page fault (ec=0010)
(XEN) Pagetable walk from 0000000000000000:
(XEN)  L4[0x000] = 0000000200256027 000000000000049c
(XEN)  L3[0x000] = 0000000200255027 000000000000049d
(XEN)  L2[0x000] = 0000000200251023 00000000000004a1
(XEN)  L1[0x000] = 0000000000000000 ffffffffffffffff
(XEN) domain_crash_sync called from entry.S: fault at ffff82d08021feb0 compat_create_bounce_frame+0xc6/0xde
(XEN) Domain 16 (vcpu#0) crashed on cpu#0:
(XEN) ----[ Xen-4.4.1  x86_64  debug=n  Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    e019:[<0000000000000000>]
(XEN) RFLAGS: 0000000000000246   EM: 1   CONTEXT: pv guest
(XEN) rax: 0000000000000000   rbx: 0000000000000000   rcx: 0000000000000000
(XEN) rdx: 0000000000000000   rsi: 0000000000499000   rdi: 0000000000800000
(XEN) rbp: 000000000000000a   rsp: 00000000005a5ff0   r8:  0000000000000000
(XEN) r9:  0000000000000000   r10: ffff83023e9b9000   r11: ffff83023e9b9000
(XEN) r12: 0000033f3d335bfb   r13: ffff82d080300800   r14: ffff82d0802ea940
(XEN) r15: ffff83005e819000   cr0: 000000008005003b   cr4: 00000000000506f0
(XEN) cr3: 0000000200b7a000   cr2: 0000000000000000
(XEN) ds: e021   es: e021   fs: e021   gs: e021   ss: e021   cs: e019
(XEN) Guest stack trace from esp=005a5ff0:
(XEN)   00000010 00000000 0001e019 00010046 0016b38b 0016b38a 0016b389 0016b388
(XEN)   0016b387 0016b386 0016b385 0016b384 0016b383 0016b382 0016b381 0016b380
(XEN)   0016b37f 0016b37e 0016b37d 0016b37c 0016b37b 0016b37a 0016b379 0016b378
(XEN)   0016b377 0016b376 0016b375 0016b374 0016b373 0016b372 0016b371 0016b370
(XEN)   0016b36f 0016b36e 0016b36d 0016b36c 0016b36b 0016b36a 0016b369 0016b368
(XEN)   0016b367 0016b366 0016b365 0016b364 0016b363 0016b362 0016b361 0016b360
(XEN)   0016b35f 0016b35e 0016b35d 0016b35c 0016b35b 0016b35a 0016b359 0016b358
(XEN)   0016b357 0016b356 0016b355 0016b354 0016b353 0016b352 0016b351 0016b350
(XEN)   0016b34f 0016b34e 0016b34d 0016b34c 0016b34b 0016b34a 0016b349 0016b348
(XEN)   0016b347 0016b346 0016b345 0016b344 0016b343 0016b342 0016b341 0016b340
(XEN)   0016b33f 0016b33e 0016b33d 0016b33c 0016b33b 0016b33a 0016b339 0016b338
(XEN)   0016b337 0016b336 0016b335 0016b334 0016b333 0016b332 0016b331 0016b330
(XEN)   0016b32f 0016b32e 0016b32d 0016b32c 0016b32b 0016b32a 0016b329 0016b328
(XEN)   0016b327 0016b326 0016b325 0016b324 0016b323 0016b322 0016b321 0016b320
(XEN)   0016b31f 0016b31e 0016b31d 0016b31c 0016b31b 0016b31a 0016b319 0016b318
(XEN)   0016b317 0016b316 0016b315 0016b314 0016b313 0016b312 0016b311 0016b310
(XEN)   0016b30f 0016b30e 0016b30d 0016b30c 0016b30b 0016b30a 0016b309 0016b308
(XEN)   0016b307 0016b306 0016b305 0016b304 0016b303 0016b302 0016b301 0016b300
(XEN)   0016b2ff 0016b2fe 0016b2fd 0016b2fc 0016b2fb 0016b2fa 0016b2f9 0016b2f8
(XEN)   0016b2f7 0016b2f6 0016b2f5 0016b2f4 0016b2f3 0016b2f2 0016b2f1 0016b2f0

An easy way to find out which grub you are in if the machine boots is
to hit 'c' and type 'ls', only the grub from dom0 will know about
(memdisk). So when trying to replicate the issue (and the domU
actually starts) you can hit 'c', type 'ls' (check for memdisk) and
then type 'halt' and relaunch the domU. Usually I can't launch more
than 4-5 times in a row before it fails, often it fails on my first
try.

For information I have reproduced on two different AMD desktop
processor machines, not sure if Intel would be any different.  I'm
pretty sure I did tests with grub from unstable with same result at
some point, but can test again if that is likely to work.

The package that is in installed on the domU side is "grub-xen".

I am unable to understand how to debug grub further on my own, I have
printed out text from grub so that I understood that it is the
chainload that fails. I see no output from the domU grub (except when
it works as it should of course). I can help with further testing if
needed.

-- System Information:
Debian Release: 8.2
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'proposed-updates'), (500, 'stable')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 3.16.0-4-amd64 (SMP w/4 CPU cores)
Locale: LANG=sv_SE.UTF-8, LC_CTYPE=sv_SE.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages grub-xen-host depends on:
ii  grub-xen-bin  2.02~beta2-22

grub-xen-host recommends no packages.

grub-xen-host suggests no packages.

-- no debconf information

/Andreas



More information about the Pkg-grub-devel mailing list