[Pkg-xen-devel] Bug#944247: xen domU crashes under high i/o load if you use qcow2 images

Hans van Kranenburg hans at knorrie.org
Thu Nov 26 16:39:47 GMT 2020


tags 944247 + moreinfo
severity 944247 normal
thanks

Hi Mario,

On 11/6/19 4:46 PM, mario wrote:
> Source: xen
> Severity: important
> 
> Dear Maintainer,
> 
> we have updated our server from debian oldstable (which unfortunately wasn't running stable after the last update, bug reported) to debian buster.
> 
> unfortunately xen doesn't work reliably there either:
> 
> the virtual server crashes every 1-2 week with i/o problems and sometimes also takes other domU instances with it.
> we use qcow2 images.
> 
> the harddisk of the domU is simply no longer accessible for the linux kernel, no logfiles are available. in the xl console the following last lines can be read, login not possible:
> 
> [ 1450.976415] INFO: task nginx:376 blocked for more than 120 seconds.
> [ 1450.976423] Not tainted 4.9.0-9-amd64 #1 Debian 4.9.168-1+deb9u5
> [ 1450.976428] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 1450.976469] INFO: task nginx:377 blocked for more than 120 seconds.
> [ 1450.976474] Not tainted 4.9.0-9-amd64 #1 Debian 4.9.168-1+deb9u5
> [ 1450.976479] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 1450.976624] INFO: task nginx:378 blocked for more than 120 seconds.
> 
> the process varies:
> [1523692.508073] INFO: task jbd2/xvda2-8:159 blocked for more than 120 seconds
> [1523692.508084] Not tainted [...]
> 
> all hard disk accesses fail as if the i/o system is completely dead.
> only "xl destroy <domid>" and recreate will help

This report is now a year old. Unfortunately it did not get any reply.
This might have several reasons, and one of them is probably that
there's not someone else around reading it that uses the same storage
configuration and as well runs into the same problem.

> you can easily reproduce this with the tool stress "stress -c 8 -i 8 -d 8".
> it takes a maximum of 10 minutes until the vm crashes.
> 
> in our experience, as a workaround you can convert all images to raw. after our tests, the error will no longer occur. 
> but since we need the snapshot functions of qcow2 images, this is not a permanent solution.
> 
> does anyone else have problems with qcow2 images and xen under buster?
> maybe this also concerns qemu?
> 
> [...]
To be honest, I do not know.

Have you been able to find out more about the problem yet, in the last
year? Have you taken steps to try narrow down the problem by
investigating other combinations of used software with/without xen? I
mean, for example, reboot into just Linux and mount the qcow2 image
somewhere and do the same load test to see if it's also happening when
eliminating Xen from the equation?

The bug report right now is not really actionable for anyone else than
yourself. As Debian Xen team we unfortunately do not have the bandwidth
to go set up a test server with the same configuration as you have and
try hammer on it and cause the same problem to happen.

Thanks,
Hans



More information about the Pkg-xen-devel mailing list