[Pkg-xen-devel] Bug#880554: Bug#880554: xen domu freezes with kernel linux-image-4.9.0-4-amd64

Hans van Kranenburg hans at knorrie.org
Mon Feb 26 23:40:41 UTC 2018


On 02/26/2018 07:35 PM, Hans van Kranenburg wrote:
> On 02/26/2018 03:52 PM, Ian Jackson wrote:
>> Christian Schwamborn writes ("Re: Bug#880554: xen domu freezes with
kernel linux-image-4.9.0-4-amd64"):
>>> I can try, but the only system I can really test this is a productive
>>> system, as this 'reliable' shows this issue (and I don't want to crash
>>> it on purpose on a regular basis). Since I set gnttab_max_frame to a
>>> higher value it runs smooth. If you're confident this will work I can
>>> try this in the eventing, when all users logged off.
>>
>> Thanks.  I understand your reluctance.  I don't want to mislead you.
>> I think the odds of it working are probably ~75%.
>>
>> Unless you want to tolerate that risk, it might be better for us to
>> try to come up with a better way to test it.
>
> I can try this.
>
> I can run a dom0 with Xen 4.8 and 4.9 domU, I already have the xen-diag
> for it (so confirmed the patch in this bug report builds ok, we should
> include it for stretch, it's really useful).
>
> I think it's mainly trying to get a domU running with various
> combinations of domU kernel, number of disks and vcpus, and then look at
> the output of xen-diag.
Ok, I spent some time trying things.

Xen: 4.8.3+comet2+shim4.10.0+comet3-1+deb9u4.1
dom0 kernel 4.9.65-3+deb9u2
domU (PV) kernel 4.9.82-1+deb9u2

Observation so far: nr_frames increases as soon as a combination of
disk+vcpu has actually been doing disk activity, and then never decreases.

I ended up with a 64-vcpu domU with additional 10 1GiB disks (xvdc,
xvdd, etc).

I created ext4 fs on the disks and mounted them.

I used fio to throw some IO at the disk, trying to hit as many
combinations of vcpu and disk.

[things]
rw=randwrite
rwmixread=75
size=8M
directory=/mnt/xvdBLAH
ioengine=libaio
direct=1
iodepth=16
numjobs=64

with BLAH replaced by c, d, e, f etc...

-# rm */things*; for i in c d e f g h i j k l; do fio fio-xvd$i; done

-# while true; do /usr/lib/xen-4.8/bin/xen-diag gnttab_query_size 2;
sleep 10; done
domid=2: nr_frames=6, max_nr_frames=128
domid=2: nr_frames=7, max_nr_frames=128
domid=2: nr_frames=7, max_nr_frames=128
domid=2: nr_frames=10, max_nr_frames=128
domid=2: nr_frames=10, max_nr_frames=128
domid=2: nr_frames=11, max_nr_frames=128
domid=2: nr_frames=13, max_nr_frames=128
domid=2: nr_frames=14, max_nr_frames=128
domid=2: nr_frames=15, max_nr_frames=128
domid=2: nr_frames=16, max_nr_frames=128
domid=2: nr_frames=18, max_nr_frames=128
domid=2: nr_frames=18, max_nr_frames=128
domid=2: nr_frames=19, max_nr_frames=128
domid=2: nr_frames=21, max_nr_frames=128
domid=2: nr_frames=21, max_nr_frames=128
domid=2: nr_frames=23, max_nr_frames=128
domid=2: nr_frames=24, max_nr_frames=128
domid=2: nr_frames=24, max_nr_frames=128
domid=2: nr_frames=24, max_nr_frames=128
domid=2: nr_frames=24, max_nr_frames=128

So I can push it up to about 24 when doing this.

-# grep . /sys/module/xen_blkback/parameters/*
/sys/module/xen_blkback/parameters/log_stats:0
/sys/module/xen_blkback/parameters/max_buffer_pages:1024
/sys/module/xen_blkback/parameters/max_persistent_grants:1056
/sys/module/xen_blkback/parameters/max_queues:4
/sys/module/xen_blkback/parameters/max_ring_page_order:4

Now, I rebooted my test domo and put the modprobe file in place.
(Note: the filename has to end in .conf)!!

-# grep . /sys/module/xen_blkback/parameters/*
/sys/module/xen_blkback/parameters/log_stats:0
/sys/module/xen_blkback/parameters/max_buffer_pages:1024
/sys/module/xen_blkback/parameters/max_persistent_grants:1056
/sys/module/xen_blkback/parameters/max_queues:1
/sys/module/xen_blkback/parameters/max_ring_page_order:0

After doing the same tests, the result ends up being exactly 24 again.
So, the modprobe settings don't seem to do anything.

-# tree /sys/block/xvda/mq
/sys/block/xvda/mq
└── 0
    ├── active
    ├── cpu0
    │   ├── completed
    │   ├── dispatched
    │   ├── merged
    │   └── rq_list
    ├── cpu1
    │   ├── completed
    │   ├── dispatched
    │   ├── merged
    │   └── rq_list
   [...]
    ├── cpu63
    │   ├── completed
    │   ├── dispatched
    │   ├── merged
    │   └── rq_list
   [...]
    ├── cpu_list
    ├── dispatched
    ├── io_poll
    ├── pending
    ├── queued
    ├── run
    └── tags

65 directories, 264 files

Mwooop mwooop mwoop mwooooo (failure trombone).

It obviously didn't involve network traffic yet. And, all is stretch
kernels etc, which are reported to already be problematic.

But, the main thing I wanted to test is if the change would result in a
much lower total amount of grants, which is not the case.

So, anyone a better idea, or should we just add some clear documentation
for the max frames setting in the grub config example?

Hans



More information about the Pkg-xen-devel mailing list