[Pkg-libvirt-maintainers] Bug#719675: Bug#719675: Live migration of KVM guests fails if it takes more than 30 seconds (large memory guests)
Guido Günther
agx at sigxcpu.org
Wed Aug 14 19:50:22 UTC 2013
On Wed, Aug 14, 2013 at 04:49:42PM +0900, Christian Balzer wrote:
>
> Package: libvirt0
> Version: 0.9.12-11+deb7u1
> Severity: important
>
> Hello,
>
> when doing a live migration using Pacemaker (the OCF VirtualDomain RA) on
> a cluster with DRBD (active/active) backing storage everything works fine
> with recently started (small memory footprint of about 200MB at most) KVM
> guests.
>
> After inflating one guest to 2GB memory usage (memtester comes in handy
> for that) the migration failed after 30 seconds, having managed to migrate
> about 400MB in that time over the direct, dedicated GbE link between my
> test cluster host nodes.
>
> libvirtd.log on the migration target node, migration start time is
> 07:24:51 :
> ---
> 2013-08-13 07:24:51.807+0000: 31953: warning : qemuDomainObjEnterMonitorInternal
> :994 : This thread seems to be the async job owner; entering monitor without ask
> ing for a nested job is dangerous
> 2013-08-13 07:24:51.886+0000: 31953: warning : qemuDomainObjEnterMonitorInternal
> :994 : This thread seems to be the async job owner; entering monitor without ask
> ing for a nested job is dangerous
> 2013-08-13 07:24:51.888+0000: 31953: warning : qemuDomainObjEnterMonitorInternal
> :994 : This thread seems to be the async job owner; entering monitor without ask
> ing for a nested job is dangerous
> 2013-08-13 07:24:51.948+0000: 31953: warning : qemuDomainObjEnterMonitorInternal
> :994 : This thread seems to be the async job owner; entering monitor without ask
> ing for a nested job is dangerous
> 2013-08-13 07:24:51.948+0000: 31953: warning : qemuDomainObjEnterMonitorInternal
> :994 : This thread seems to be the async job owner; entering monitor without ask
> ing for a nested job is dangerous
> 2013-08-13 07:25:21.217+0000: 31950: warning : virKeepAliveTimer:182 : No response from client 0x1948280 after 5 keepalive messages in 30 seconds
> 2013-08-13 07:25:31.224+0000: 31950: warning : qemuProcessKill:3813 : Timed out waiting after SIGTERM to process 15926, sending SIGKILL
This looks more like you're not replying via the keepalive protocol.
What are you using to migrate VMs?
-- Guido
> ---
>
> Below is the only thing I could find which is somewhat related to this,
> unfortunately it was cured by the miracle that is the next version upgrade
> without the root cause being found:
> https://bugzilla.redhat.com/show_bug.cgi?format=multiple&id=816451
>
> I will install Sid on another test cluster tomorrow and am betting that it
> will work just fine there.
> Since Testing is still at the same level as Wheezy I'm also betting that
> we won't see anything in wheezy-backports anytime soon.
> I'd really rather not create a production cluster based on Jessie or do
> those rather complex backports myself...
>
>
> Regards,
>
> Christian
> --
> Christian Balzer Network/Systems Engineer
> chibi at gol.com Global OnLine Japan/Fusion Communications
> http://www.gol.com/
>
> _______________________________________________
> Pkg-libvirt-maintainers mailing list
> Pkg-libvirt-maintainers at lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-libvirt-maintainers
>
More information about the Pkg-libvirt-maintainers
mailing list