[Pkg-libvirt-maintainers] Bug#719675: Live migration of KVM guests fails if it takes more than 30 seconds (large memory guests)

Christian Balzer chibi at gol.com
Wed Aug 14 07:49:42 UTC 2013


Package: libvirt0
Version: 0.9.12-11+deb7u1
Severity: important

Hello,

when doing a live migration using Pacemaker (the OCF VirtualDomain RA) on
a cluster with DRBD (active/active) backing storage everything works fine
with recently started (small memory footprint of about 200MB at most) KVM
guests. 

After inflating one guest to 2GB memory usage (memtester comes in handy
for that) the migration failed after 30 seconds, having managed to migrate
about 400MB in that time over the direct, dedicated GbE link between my
test cluster host nodes. 

libvirtd.log on the migration target node, migration start time is
07:24:51 :
---
2013-08-13 07:24:51.807+0000: 31953: warning : qemuDomainObjEnterMonitorInternal
:994 : This thread seems to be the async job owner; entering monitor without ask
ing for a nested job is dangerous
2013-08-13 07:24:51.886+0000: 31953: warning : qemuDomainObjEnterMonitorInternal
:994 : This thread seems to be the async job owner; entering monitor without ask
ing for a nested job is dangerous
2013-08-13 07:24:51.888+0000: 31953: warning : qemuDomainObjEnterMonitorInternal
:994 : This thread seems to be the async job owner; entering monitor without ask
ing for a nested job is dangerous
2013-08-13 07:24:51.948+0000: 31953: warning : qemuDomainObjEnterMonitorInternal
:994 : This thread seems to be the async job owner; entering monitor without ask
ing for a nested job is dangerous
2013-08-13 07:24:51.948+0000: 31953: warning : qemuDomainObjEnterMonitorInternal
:994 : This thread seems to be the async job owner; entering monitor without ask
ing for a nested job is dangerous
2013-08-13 07:25:21.217+0000: 31950: warning : virKeepAliveTimer:182 : No response from client 0x1948280 after 5 keepalive messages in 30 seconds
2013-08-13 07:25:31.224+0000: 31950: warning : qemuProcessKill:3813 : Timed out waiting after SIGTERM to process 15926, sending SIGKILL
---

Below is the only thing I could find which is somewhat related to this,
unfortunately it was cured by the miracle that is the next version upgrade
without the root cause being found:
https://bugzilla.redhat.com/show_bug.cgi?format=multiple&id=816451

I will install Sid on another test cluster tomorrow and am betting that it
will work just fine there. 
Since Testing is still at the same level as Wheezy I'm also betting that
we won't see anything in wheezy-backports anytime soon.
I'd really rather not create a production cluster based on Jessie or do
those rather complex backports myself...


Regards,

Christian
-- 
Christian Balzer        Network/Systems Engineer                
chibi at gol.com   	Global OnLine Japan/Fusion Communications
http://www.gol.com/



More information about the Pkg-libvirt-maintainers mailing list