[Pkg-xen-devel] Test report xen_4.11.1~pre.20180911.5acdd26fdc+dfsg-2

Hans van Kranenburg hans at knorrie.org
Tue Oct 9 23:52:19 BST 2018


I'm just dumping all I got in here, after initial feedback we can see
how to organize todo's around it.

tl;dr:
* Does not upgrade cleanly from 4.8 packages, so we have to prevent this
from entering testing until we fix that.
* Live migration is broken, explodes with memory allocation errors.

---- >8 ----

1. Build packages

* I have built salsa/master using pbuilder targeting sid. Great success...
* I have built packages for stretch-backports by adding a changelog
entry and building with pbuider targeting stretch. Great success.

---- >8 ----

2. Put the packages in a repository

I use reprepro for our own package repos at work. I have a small repo
named 'xen' on http://packages.knorrie.org/ that I use for testing xen.

When adding the result with reprepro include, this happens:

No section specified for
'xen_4.11.1~pre.20180911.5acdd26fdc+dfsg-2~bpo9+1.dsc' in
'/home/knorrie/pbuilder/result/4.11-stretch-backports/xen_4.11.1~pre.20180911.5acdd26fdc+dfsg-2~bpo9+1_amd64.changes'!

commit e996c09e2f "debian/: Completely rework the packaging" drops the
Section line for the source package. Is this intentional? I'd like to be
able to put packages in reprepro.

I used reprepro -S misc as workaround to override the sections.

---- >8 ----

3. i386 and amd64 packages?

After adding the new packages, I see that my reprepro has content left
for i386. E.g.:

-$ reprepro ls xen-utils-common
xen-utils-common | 4.11.1~pre.20180911.5acdd26fdc+dfsg-1~exp1~bpo9+1 |
stretch-backports | i386
xen-utils-common |      4.11.1~pre.20180911.5acdd26fdc+dfsg-2~bpo9+1 |
stretch-backports | amd64
xen-utils-common |                        4.10.1~pre+4.0f92968bcf-1~ |
       unstable | i386
xen-utils-common |             4.11.1~pre.20180911.5acdd26fdc+dfsg-2 |
       unstable | amd64

Why is this? Were the i386 things built before and not any more? I never
really noticed these. Is this a problem? How does the Debian archive
deal with this?

---- >8 ----

4. Install the packages.

At first I did an upgrade from previous 4.11 package to the new ones,
and ran in a problem. So later I did downgrade to 4.8 from stretch and
then redid the upgrade test. There it also occurs:

-# apt-get dist-upgrade
[...]
Unpacking xenstore-utils (4.11.1~pre.20180911.5acdd26fdc+dfsg-2~bpo9+1)
over (4.8.4+xsa273+shim4.10.1+xsa273-1+deb9u10) ...
dpkg: error processing archive
/tmp/apt-dpkg-install-WhZg6K/11-xenstore-utils_4.11.1~pre.20180911.5acdd26fdc+dfsg-2~bpo9+1_amd64.deb
(--unpack):
 trying to overwrite '/usr/share/man/man1/xenstore-chmod.1.gz', which is
also in package xen-utils-common 4.8.4+xsa273+shim4.10.1+xsa273-1+deb9u10
[...]
Errors were encountered while processing:
 /tmp/apt-dpkg-install-WhZg6K/11-xenstore-utils_4.11.1~pre.20180911.5acdd26fdc+dfsg-2~bpo9+1_amd64.deb
E: Sub-process /usr/bin/dpkg returned an error code (1)

If I simply run it again:

-# apt-get dist-upgrade
Preparing to unpack
.../xenstore-utils_4.11.1~pre.20180911.5acdd26fdc+dfsg-2~bpo9+1_amd64.deb
...
Unpacking xenstore-utils (4.11.1~pre.20180911.5acdd26fdc+dfsg-2~bpo9+1)
over (4.8.4+xsa273+shim4.10.1+xsa273-1+deb9u10) ...
Setting up xenstore-utils (4.11.1~pre.20180911.5acdd26fdc+dfsg-2~bpo9+1) ...

So it seems a file has moved to another package, and the order in which
they are upgraded matters.

In the end I still have xen-hypervisor-4.8-amd64 and libxen-4.8, all
other packages are 4.11-blah.

---- >8 ----

5. Try to still use 4.8

-# xen create -c blaat.bofh.dpl.mendix.net
Parsing config from blaat.bofh.dpl.mendix.net
libxl: info: libxl_create.c:105:libxl__domain_build_info_setdefault:
qemu-xen is unavailable, using qemu-xen-traditional instead: No such
file or directory
xenconsole: Could not read tty from store: Success

There's no xenconsoled process any more now.
/usr/lib/xen-4.8/bin/xenstored is still running however.

-# /etc/init.d/xen restart
[ ok ] Restarting xen (via systemctl): xen.service.

Now I got a xenconsoled process back.

After this, I can still start/stop a domU with this, so that's good.

The only 4.8 (real package version) things I still have are libxen-4.8,
xen-hypervisor-4.8-amd64 and xen-utils-4.8, so looks good.

Also, I have seen xenconsoled randomly disappear with all the previous
4.11 packages already. From syslog it seems it has something to do with
systemd, which is shutting it down during some nightly action.

---- >8 ----

5. Reboot into 4.11

Ah, 4.8 again. Grub config was not updated.

-# update-grub

Reboot again...

---- >8 ----

6. Now really reboot into 4.11

Yay.

---- >8 ----

7. Live migrate a domU to it.

At least it keeps running, but this is quite weird:

dmesg:

[ 3666.838699] Freezing user space processes ... (elapsed 0.001 seconds)
done.
[ 3666.840734] OOM killer disabled.
[ 3666.840738] Freezing remaining freezable tasks ... (elapsed 0.001
seconds) done.
[ 3666.842265] suspending xenstore...
[ 3666.856559] xen:grant_table: Grant tables using version 1 layout
[18443294892.646187] OOM killer enabled.
[18443294892.646200] Restarting tasks ... done.
[18443294892.684093] Setting capacity to 41943040

or with -T:

[Wed Oct 10 00:34:54 2018] Freezing user space processes ... (elapsed
0.001 seconds) done.
[Wed Oct 10 00:34:54 2018] OOM killer disabled.
[Wed Oct 10 00:34:54 2018] Freezing remaining freezable tasks ...
(elapsed 0.001 seconds) done.
[Wed Oct 10 00:34:54 2018] suspending xenstore...
[Wed Oct 10 00:34:54 2018] xen:grant_table: Grant tables using version 1
layout
[Tue Mar 22 00:02:00 2603] OOM killer enabled.
[Tue Mar 22 00:02:00 2603] Restarting tasks ... done.
[Tue Mar 22 00:02:00 2603] Setting capacity to 41943040

2603?

Ok, I can confirm that this also happens with the previous 4.11
packages. Also, I lose the tcp connection to the domU while live
migrating. Any process is still active, but my ssh session hangs totally.

Sigh, not more live migrate problems please.

---- >8 ----

8. Live migrate it away again

(manual reproduction with debug options):

-# xl -vvv migrate -C /etc/xen/guests/blaat.bofh.dpl.mendix.net -s ""
blaat.bofh.dpl.mendix.net "socat - TCP:10.140.221.7:8002"
Saving to migration stream new xl format (info 0x3/0x0/1254)
libxl: debug: libxl_domain.c:492:libxl_domain_suspend: Domain 1:ao
0x56303d91b050: create: how=(nil) callback=(nil) poller=0x56303d91ab50
libxl: debug: libxl.c:719:libxl__fd_flags_modify_save: fnctl F_GETFL
flags for fd 13 are 0x1
libxl: debug: libxl.c:727:libxl__fd_flags_modify_save: fnctl F_SETFL of
fd 13 to 0x1
libxl: debug: libxl_domain.c:520:libxl_domain_suspend: Domain 1:ao
0x56303d91b050: inprogress: poller=0x56303d91ab50, flags=i
libxl-save-helper: debug: starting save: Success
xc: detail: fd 13, dom 1, flags 1, hvm 0
xc: info: Saving domain 1, type x86 PV
xc: detail: 64 bits, 4 levels
xc: detail: max_mfn 0x1b1ffff
xc: detail: p2m list from 0xffffc90000000000 to 0xffffc90000ffffff, root
at 0xd9408f
xc: detail: max_pfn 0x1fffff, p2m_frames 4096
xencall: error: alloc_pages: mmap failed: Invalid argument
xc: error: Unable to allocate memory for dirty bitmaps, batch pfns and
deferred pages: Internal error
xc: error: Save failed (12 = Cannot allocate memory): Internal error
libxl-save-helper: debug: complete r=-1: Cannot allocate memory
libxl: error: libxl_stream_write.c:350:libxl__xc_domain_save_done:
Domain 1:saving domain: domain did not respond to suspend request:
Cannot allocate memory
libxl: debug: libxl.c:746:libxl__fd_flags_restore: fnctl F_SETFL of fd
13 to 0x1
libxl: debug: libxl_event.c:1869:libxl__ao_complete: ao 0x56303d91b050:
complete, rc=-8
libxl: debug: libxl_event.c:1838:libxl__ao__destroy: ao 0x56303d91b050:
destroy
migration sender: libxl_domain_suspend failed (rc=-8)
Migration failed, failed to suspend at sender.
xencall:buffer: debug: total allocations:20 total releases:20
xencall:buffer: debug: current allocations:0 maximum allocations:2
xencall:buffer: debug: cache current size:2
xencall:buffer: debug: cache hits:14 misses:2 toobig:4
xencall:buffer: debug: total allocations:0 total releases:0
xencall:buffer: debug: current allocations:0 maximum allocations:0
xencall:buffer: debug: cache current size:0
xencall:buffer: debug: cache hits:0 misses:0 toobig:0

That's not good, and a show stopper for me to do anything with it beyond
this first test machine.

---- >8 ----

9. by/domain info

-# xl info
[...]
cc_compiler            : gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
cc_compile_by          :
cc_compile_domain      :
cc_compile_date        : Sat Oct  6 00:24:06 UTC 2018
[...]

Previously the email address of the most recent debian/changelog entry
appeared here. Apparently this is gone.

-# xl dmesg
[...]
(XEN) Xen version 4.11.1-pre (Debian ) (@) (gcc (Debian 6.3.0-18+deb9u1)
6.3.0 20170516) debug=n  Sat Oct  6 00:24:06 UTC 2018
[...]

Maybe it makes sense to 'hard'code the team list email address in here
instead.

---- >8 ----

10. xl/xen tab completion

-# xl .
./             .bash_aliases  .bashrc        .profile       .vim/
../            .bash_history  .lesshst       .ssh/          .vimrc

-# xen .
./             .bash_aliases  .bashrc        .profile       .vim/
../            .bash_history  .lesshst       .ssh/          .vimrc

xl and xen now tab-complete filenames in the local directory.

---- >8 ----

So far my initial test report.

Hans



More information about the Pkg-xen-devel mailing list