[Pkg-xen-devel] Test report xen_4.11.1~pre.20180911.5acdd26fdc+dfsg-2

Ian Jackson ijackson at chiark.greenend.org.uk
Wed Oct 10 15:42:26 BST 2018


Hans van Kranenburg writes ("Test report xen_4.11.1~pre.20180911.5acdd26fdc+dfsg-2"):
> tl;dr:
> * Does not upgrade cleanly from 4.8 packages, so we have to prevent this
> from entering testing until we fix that.

I suggest we take the approach of fixing the bugs in git and then
uploading a new version as soon as what we have uploaded passes NEW.

> * Live migration is broken, explodes with memory allocation errors.

WFM, I'm afraid.

> ---- >8 ----
> 
> 2. Put the packages in a repository
> 
> I use reprepro for our own package repos at work. I have a small repo
> named 'xen' on http://packages.knorrie.org/ that I use for testing xen.
> 
> When adding the result with reprepro include, this happens:
> 
> No section specified for
> 'xen_4.11.1~pre.20180911.5acdd26fdc+dfsg-2~bpo9+1.dsc' in
> '/home/knorrie/pbuilder/result/4.11-stretch-backports/xen_4.11.1~pre.20180911.5acdd26fdc+dfsg-2~bpo9+1_amd64.changes'!
> 
> commit e996c09e2f "debian/: Completely rework the packaging" drops the
> Section line for the source package. Is this intentional? I'd like to be
> able to put packages in reprepro.
> 
> I used reprepro -S misc as workaround to override the sections.

Hrm.  Mostly I deleted the Section from the .dsc because I wanted to
spot if I didn't explicitly set the Section in one of the .debs.  I
trusted lintian (which does not complain about this) too much - I see
that Section is Recommended by policy 5.2 for the source stanza.

I have added `Section: admin' in my working tree.

> 3. i386 and amd64 packages?
> 
> After adding the new packages, I see that my reprepro has content left
> for i386. E.g.:
> 
> -$ reprepro ls xen-utils-common
> xen-utils-common | 4.11.1~pre.20180911.5acdd26fdc+dfsg-1~exp1~bpo9+1 |
> stretch-backports | i386
> xen-utils-common |      4.11.1~pre.20180911.5acdd26fdc+dfsg-2~bpo9+1 |
> stretch-backports | amd64
> xen-utils-common |                        4.10.1~pre+4.0f92968bcf-1~ |
>        unstable | i386
> xen-utils-common |             4.11.1~pre.20180911.5acdd26fdc+dfsg-2 |
>        unstable | amd64
> 
> Why is this? Were the i386 things built before and not any more? I never
> really noticed these. Is this a problem? How does the Debian archive
> deal with this?

The package should build fine for i386 as well as amd64.  I assume you
must have done an i386 build in the past.

> ---- >8 ----
> 
> 4. Install the packages.
> 
> At first I did an upgrade from previous 4.11 package to the new ones,
> and ran in a problem. So later I did downgrade to 4.8 from stretch and
> then redid the upgrade test. There it also occurs:
> 
> -# apt-get dist-upgrade
> [...]
> Unpacking xenstore-utils (4.11.1~pre.20180911.5acdd26fdc+dfsg-2~bpo9+1)
> over (4.8.4+xsa273+shim4.10.1+xsa273-1+deb9u10) ...
> dpkg: error processing archive
> /tmp/apt-dpkg-install-WhZg6K/11-xenstore-utils_4.11.1~pre.20180911.5acdd26fdc+dfsg-2~bpo9+1_amd64.deb
> (--unpack):
>  trying to overwrite '/usr/share/man/man1/xenstore-chmod.1.gz', which is
> also in package xen-utils-common 4.8.4+xsa273+shim4.10.1+xsa273-1+deb9u10
> [...]
> Errors were encountered while processing:
>  /tmp/apt-dpkg-install-WhZg6K/11-xenstore-utils_4.11.1~pre.20180911.5acdd26fdc+dfsg-2~bpo9+1_amd64.deb
> E: Sub-process /usr/bin/dpkg returned an error code (1)
> 
> If I simply run it again:
> 
> -# apt-get dist-upgrade
> Preparing to unpack
> .../xenstore-utils_4.11.1~pre.20180911.5acdd26fdc+dfsg-2~bpo9+1_amd64.deb
> ...
> Unpacking xenstore-utils (4.11.1~pre.20180911.5acdd26fdc+dfsg-2~bpo9+1)
> over (4.8.4+xsa273+shim4.10.1+xsa273-1+deb9u10) ...
> Setting up xenstore-utils (4.11.1~pre.20180911.5acdd26fdc+dfsg-2~bpo9+1) ...
> 
> So it seems a file has moved to another package, and the order in which
> they are upgraded matters.

This is a missing Replaces.  I have fixed that in my working tree too.

> In the end I still have xen-hypervisor-4.8-amd64 and libxen-4.8, all
> other packages are 4.11-blah.

Right.

> ---- >8 ----
> 
> 5. Try to still use 4.8
> 
> -# xen create -c blaat.bofh.dpl.mendix.net
> Parsing config from blaat.bofh.dpl.mendix.net
> libxl: info: libxl_create.c:105:libxl__domain_build_info_setdefault:
> qemu-xen is unavailable, using qemu-xen-traditional instead: No such
> file or directory
> xenconsole: Could not read tty from store: Success
> 
> There's no xenconsoled process any more now.

Can you investigate why this happens ?  It sounds like upgrading the
packages somehow stopped the old xenconsoled but didn't start a new
one.

> Also, I have seen xenconsoled randomly disappear with all the previous
> 4.11 packages already. From syslog it seems it has something to do with
> systemd, which is shutting it down during some nightly action.

Oh.  systemd.  I have been testing with sysvinit.

> ---- >8 ----
> 
> 5. Reboot into 4.11
> 
> Ah, 4.8 again. Grub config was not updated.

I encountered that too.  I thought I had fixed that.
xen-hypversor-F-V.postinst.vsn-in turns into ...
... oh wait it is missing the .vsn-in in the filename.

Fixed in my working tree.

> 7. Live migrate a domU to it.
> 
> At least it keeps running, but this is quite weird:
>
> dmesg:
> 
> [ 3666.838699] Freezing user space processes ... (elapsed 0.001 seconds)
> done.
> [ 3666.840734] OOM killer disabled.
> [ 3666.840738] Freezing remaining freezable tasks ... (elapsed 0.001
> seconds) done.
> [ 3666.842265] suspending xenstore...
> [ 3666.856559] xen:grant_table: Grant tables using version 1 layout
> [18443294892.646187] OOM killer enabled.
> [18443294892.646200] Restarting tasks ... done.
> [18443294892.684093] Setting capacity to 41943040

I think during early resume the timestamps may be wrong ?

> Ok, I can confirm that this also happens with the previous 4.11
> packages. Also, I lose the tcp connection to the domU while live
> migrating. Any process is still active, but my ssh session hangs totally.
> 
> Sigh, not more live migrate problems please.
> 
> ---- >8 ----
> 
> 8. Live migrate it away again

Is that from 4.11 to 4.8 ?  That's not necessarily expected to work.

On my test machine (stretch) I can localhost migrate both PV and HVM
guests.  The VM stays up.  My ssh session to it (tested with HVM only,
but no doubt PV works too) survives.

> (manual reproduction with debug options):
> 
> -# xl -vvv migrate -C /etc/xen/guests/blaat.bofh.dpl.mendix.net -s ""
> blaat.bofh.dpl.mendix.net "socat - TCP:10.140.221.7:8002"
> Saving to migration stream new xl format (info 0x3/0x0/1254)
> libxl: debug: libxl_domain.c:492:libxl_domain_suspend: Domain 1:ao
> 0x56303d91b050: create: how=(nil) callback=(nil) poller=0x56303d91ab50
> libxl: debug: libxl.c:719:libxl__fd_flags_modify_save: fnctl F_GETFL
> flags for fd 13 are 0x1
> libxl: debug: libxl.c:727:libxl__fd_flags_modify_save: fnctl F_SETFL of
> fd 13 to 0x1
> libxl: debug: libxl_domain.c:520:libxl_domain_suspend: Domain 1:ao
> 0x56303d91b050: inprogress: poller=0x56303d91ab50, flags=i
> libxl-save-helper: debug: starting save: Success
...
> xencall: error: alloc_pages: mmap failed: Invalid argument
> xc: error: Unable to allocate memory for dirty bitmaps, batch pfns and
> deferred pages: Internal error

I'm afraid IDK what this means.

> So far my initial test report.

Thanks.

Ian.

-- 
Ian Jackson <ijackson at chiark.greenend.org.uk>   These opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.



More information about the Pkg-xen-devel mailing list