[Pkg-libvirt-maintainers] Bug#719675: Bug#719675: Live migration of KVM guests fails if it takes more than 30 seconds (large memory guests)

Guido Günther agx at sigxcpu.org
Wed Aug 14 19:50:22 UTC 2013


On Wed, Aug 14, 2013 at 04:49:42PM +0900, Christian Balzer wrote:
> 
> Package: libvirt0
> Version: 0.9.12-11+deb7u1
> Severity: important
> 
> Hello,
> 
> when doing a live migration using Pacemaker (the OCF VirtualDomain RA) on
> a cluster with DRBD (active/active) backing storage everything works fine
> with recently started (small memory footprint of about 200MB at most) KVM
> guests. 
> 
> After inflating one guest to 2GB memory usage (memtester comes in handy
> for that) the migration failed after 30 seconds, having managed to migrate
> about 400MB in that time over the direct, dedicated GbE link between my
> test cluster host nodes. 
> 
> libvirtd.log on the migration target node, migration start time is
> 07:24:51 :
> ---
> 2013-08-13 07:24:51.807+0000: 31953: warning : qemuDomainObjEnterMonitorInternal
> :994 : This thread seems to be the async job owner; entering monitor without ask
> ing for a nested job is dangerous
> 2013-08-13 07:24:51.886+0000: 31953: warning : qemuDomainObjEnterMonitorInternal
> :994 : This thread seems to be the async job owner; entering monitor without ask
> ing for a nested job is dangerous
> 2013-08-13 07:24:51.888+0000: 31953: warning : qemuDomainObjEnterMonitorInternal
> :994 : This thread seems to be the async job owner; entering monitor without ask
> ing for a nested job is dangerous
> 2013-08-13 07:24:51.948+0000: 31953: warning : qemuDomainObjEnterMonitorInternal
> :994 : This thread seems to be the async job owner; entering monitor without ask
> ing for a nested job is dangerous
> 2013-08-13 07:24:51.948+0000: 31953: warning : qemuDomainObjEnterMonitorInternal
> :994 : This thread seems to be the async job owner; entering monitor without ask
> ing for a nested job is dangerous
> 2013-08-13 07:25:21.217+0000: 31950: warning : virKeepAliveTimer:182 : No response from client 0x1948280 after 5 keepalive messages in 30 seconds
> 2013-08-13 07:25:31.224+0000: 31950: warning : qemuProcessKill:3813 : Timed out waiting after SIGTERM to process 15926, sending SIGKILL

This looks more like you're not replying via the keepalive protocol.
What are you using to migrate VMs?
 -- Guido

> ---
> 
> Below is the only thing I could find which is somewhat related to this,
> unfortunately it was cured by the miracle that is the next version upgrade
> without the root cause being found:
> https://bugzilla.redhat.com/show_bug.cgi?format=multiple&id=816451
> 
> I will install Sid on another test cluster tomorrow and am betting that it
> will work just fine there. 
> Since Testing is still at the same level as Wheezy I'm also betting that
> we won't see anything in wheezy-backports anytime soon.
> I'd really rather not create a production cluster based on Jessie or do
> those rather complex backports myself...
> 
> 
> Regards,
> 
> Christian
> -- 
> Christian Balzer        Network/Systems Engineer                
> chibi at gol.com   	Global OnLine Japan/Fusion Communications
> http://www.gol.com/
> 
> _______________________________________________
> Pkg-libvirt-maintainers mailing list
> Pkg-libvirt-maintainers at lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-libvirt-maintainers
> 



More information about the Pkg-libvirt-maintainers mailing list