[Pkg-xen-devel] Test report xen_4.11.1~pre.20180911.5acdd26fdc+dfsg-2

Hans van Kranenburg hans at knorrie.org
Wed Oct 10 15:56:13 BST 2018


On 10/10/2018 04:42 PM, Ian Jackson wrote:
> Hans van Kranenburg writes ("Test report xen_4.11.1~pre.20180911.5acdd26fdc+dfsg-2"):
>> tl;dr:
>> * Does not upgrade cleanly from 4.8 packages, so we have to prevent this
>> from entering testing until we fix that.
> 
> I suggest we take the approach of fixing the bugs in git and then
> uploading a new version as soon as what we have uploaded passes NEW.
> 
>> * Live migration is broken, explodes with memory allocation errors.
> 
> WFM, I'm afraid.
> 
>> ---- >8 ----
>>
>> 2. Put the packages in a repository
>>
>> I use reprepro for our own package repos at work. I have a small repo
>> named 'xen' on http://packages.knorrie.org/ that I use for testing xen.
>>
>> When adding the result with reprepro include, this happens:
>>
>> No section specified for
>> 'xen_4.11.1~pre.20180911.5acdd26fdc+dfsg-2~bpo9+1.dsc' in
>> '/home/knorrie/pbuilder/result/4.11-stretch-backports/xen_4.11.1~pre.20180911.5acdd26fdc+dfsg-2~bpo9+1_amd64.changes'!
>>
>> commit e996c09e2f "debian/: Completely rework the packaging" drops the
>> Section line for the source package. Is this intentional? I'd like to be
>> able to put packages in reprepro.
>>
>> I used reprepro -S misc as workaround to override the sections.
> 
> Hrm.  Mostly I deleted the Section from the .dsc because I wanted to
> spot if I didn't explicitly set the Section in one of the .debs.  I
> trusted lintian (which does not complain about this) too much - I see
> that Section is Recommended by policy 5.2 for the source stanza.
> 
> I have added `Section: admin' in my working tree.
> 
>> 3. i386 and amd64 packages?
>>
>> After adding the new packages, I see that my reprepro has content left
>> for i386. E.g.:
>>
>> -$ reprepro ls xen-utils-common
>> xen-utils-common | 4.11.1~pre.20180911.5acdd26fdc+dfsg-1~exp1~bpo9+1 |
>> stretch-backports | i386
>> xen-utils-common |      4.11.1~pre.20180911.5acdd26fdc+dfsg-2~bpo9+1 |
>> stretch-backports | amd64
>> xen-utils-common |                        4.10.1~pre+4.0f92968bcf-1~ |
>>        unstable | i386
>> xen-utils-common |             4.11.1~pre.20180911.5acdd26fdc+dfsg-2 |
>>        unstable | amd64
>>
>> Why is this? Were the i386 things built before and not any more? I never
>> really noticed these. Is this a problem? How does the Debian archive
>> deal with this?
> 
> The package should build fine for i386 as well as amd64.  I assume you
> must have done an i386 build in the past.

Nope, these are remainders of the output of the previous packaging. I've
never explicitely done something about i386. But, it's not important,
let's not spend time on th

>> ---- >8 ----
>>
>> 4. Install the packages.
>>
>> At first I did an upgrade from previous 4.11 package to the new ones,
>> and ran in a problem. So later I did downgrade to 4.8 from stretch and
>> then redid the upgrade test. There it also occurs:
>>
>> -# apt-get dist-upgrade
>> [...]
>> Unpacking xenstore-utils (4.11.1~pre.20180911.5acdd26fdc+dfsg-2~bpo9+1)
>> over (4.8.4+xsa273+shim4.10.1+xsa273-1+deb9u10) ...
>> dpkg: error processing archive
>> /tmp/apt-dpkg-install-WhZg6K/11-xenstore-utils_4.11.1~pre.20180911.5acdd26fdc+dfsg-2~bpo9+1_amd64.deb
>> (--unpack):
>>  trying to overwrite '/usr/share/man/man1/xenstore-chmod.1.gz', which is
>> also in package xen-utils-common 4.8.4+xsa273+shim4.10.1+xsa273-1+deb9u10
>> [...]
>> Errors were encountered while processing:
>>  /tmp/apt-dpkg-install-WhZg6K/11-xenstore-utils_4.11.1~pre.20180911.5acdd26fdc+dfsg-2~bpo9+1_amd64.deb
>> E: Sub-process /usr/bin/dpkg returned an error code (1)
>>
>> If I simply run it again:
>>
>> -# apt-get dist-upgrade
>> Preparing to unpack
>> .../xenstore-utils_4.11.1~pre.20180911.5acdd26fdc+dfsg-2~bpo9+1_amd64.deb
>> ...
>> Unpacking xenstore-utils (4.11.1~pre.20180911.5acdd26fdc+dfsg-2~bpo9+1)
>> over (4.8.4+xsa273+shim4.10.1+xsa273-1+deb9u10) ...
>> Setting up xenstore-utils (4.11.1~pre.20180911.5acdd26fdc+dfsg-2~bpo9+1) ...
>>
>> So it seems a file has moved to another package, and the order in which
>> they are upgraded matters.
> 
> This is a missing Replaces.  I have fixed that in my working tree too.
> 
>> In the end I still have xen-hypervisor-4.8-amd64 and libxen-4.8, all
>> other packages are 4.11-blah.
> 
> Right.
> 
>> ---- >8 ----
>>
>> 5. Try to still use 4.8
>>
>> -# xen create -c blaat.bofh.dpl.mendix.net
>> Parsing config from blaat.bofh.dpl.mendix.net
>> libxl: info: libxl_create.c:105:libxl__domain_build_info_setdefault:
>> qemu-xen is unavailable, using qemu-xen-traditional instead: No such
>> file or directory
>> xenconsole: Could not read tty from store: Success
>>
>> There's no xenconsoled process any more now.
> 
> Can you investigate why this happens ?  It sounds like upgrading the
> packages somehow stopped the old xenconsoled but didn't start a new
> one.

Yes, I have to investigate. I suspect it's not a problem that has been
introduced now.

>> Also, I have seen xenconsoled randomly disappear with all the previous
>> 4.11 packages already. From syslog it seems it has something to do with
>> systemd, which is shutting it down during some nightly action.
> 
> Oh.  systemd.  I have been testing with sysvinit.
> 
>> ---- >8 ----
>>
>> 5. Reboot into 4.11
>>
>> Ah, 4.8 again. Grub config was not updated.
> 
> I encountered that too.  I thought I had fixed that.
> xen-hypversor-F-V.postinst.vsn-in turns into ...
> ... oh wait it is missing the .vsn-in in the filename.
> 
> Fixed in my working tree.
> 
>> 7. Live migrate a domU to it.
>>
>> At least it keeps running, but this is quite weird:
>>
>> dmesg:
>>
>> [ 3666.838699] Freezing user space processes ... (elapsed 0.001 seconds)
>> done.
>> [ 3666.840734] OOM killer disabled.
>> [ 3666.840738] Freezing remaining freezable tasks ... (elapsed 0.001
>> seconds) done.
>> [ 3666.842265] suspending xenstore...
>> [ 3666.856559] xen:grant_table: Grant tables using version 1 layout
>> [18443294892.646187] OOM killer enabled.
>> [18443294892.646200] Restarting tasks ... done.
>> [18443294892.684093] Setting capacity to 41943040
> 
> I think during early resume the timestamps may be wrong ?

Just caused a new logline to happen:

[18446422056.096266] OOM killer enabled.
[18446422056.096276] Restarting tasks ... done.
[18446422056.169628] Setting capacity to 41943040
[18446479746.168280] EXT4-fs (xvdb): mounted filesystem with ordered
data mode. Opts: (null)

-$ date
Wed Oct 10 16:51:41 CEST 2018

>> Ok, I can confirm that this also happens with the previous 4.11
>> packages. Also, I lose the tcp connection to the domU while live
>> migrating. Any process is still active, but my ssh session hangs totally.
>>
>> Sigh, not more live migrate problems please.
>>
>> ---- >8 ----
>>
>> 8. Live migrate it away again
> 
> Is that from 4.11 to 4.8 ?  That's not necessarily expected to work.

No, 4.11 to 4.11.

Exactly same failure reproduced in 100% of the cases where I tried to
live migrate away the domU.

Attempt to live migrate to the same machine also fails.

dom0:

Linux omega 4.18.0-0.bpo.1-amd64 #1 SMP Debian 4.18.6-1~bpo9+1
(2018-09-13) x86_64 GNU/Linux

domU:

Linux blaat 4.18.0-0.bpo.1-amd64 #1 SMP Debian 4.18.6-1~bpo9+1
(2018-09-13) x86_64 GNU/Linux

> On my test machine (stretch) I can localhost migrate both PV and HVM
> guests.  The VM stays up.  My ssh session to it (tested with HVM only,
> but no doubt PV works too) survives.
> 
>> (manual reproduction with debug options):
>>
>> -# xl -vvv migrate -C /etc/xen/guests/blaat.bofh.dpl.mendix.net -s ""
>> blaat.bofh.dpl.mendix.net "socat - TCP:10.140.221.7:8002"
>> Saving to migration stream new xl format (info 0x3/0x0/1254)
>> libxl: debug: libxl_domain.c:492:libxl_domain_suspend: Domain 1:ao
>> 0x56303d91b050: create: how=(nil) callback=(nil) poller=0x56303d91ab50
>> libxl: debug: libxl.c:719:libxl__fd_flags_modify_save: fnctl F_GETFL
>> flags for fd 13 are 0x1
>> libxl: debug: libxl.c:727:libxl__fd_flags_modify_save: fnctl F_SETFL of
>> fd 13 to 0x1
>> libxl: debug: libxl_domain.c:520:libxl_domain_suspend: Domain 1:ao
>> 0x56303d91b050: inprogress: poller=0x56303d91ab50, flags=i
>> libxl-save-helper: debug: starting save: Success
> ...
>> xencall: error: alloc_pages: mmap failed: Invalid argument
>> xc: error: Unable to allocate memory for dirty bitmaps, batch pfns and
>> deferred pages: Internal error
> 
> I'm afraid IDK what this means.

D:

> 
>> So far my initial test report.
> 
> Thanks.
> 
> Ian.
> 




More information about the Pkg-xen-devel mailing list