[Pkg-libvirt-maintainers] Bug#998090: libvirt-daemon-system: Please defer starting libvirtd.socket until libvirtd.service dependencies can be met

Sat Oct 30 08:09:45 BST 2021

Package: libvirt-daemon-system
Version: 7.0.0-3
Severity: important

Hi,

Systemd has a class of boot-time races which can result in deadlock,
which I learned more than I ever wanted to know about when Buster to
Bullseye upgrades started leaving me with machines that were off the
network when they were rebooted ...  The reason for that is a bit of a
tangle of otherwise unrelated packages, and there are many ways this
*could* happen, but the root of it in my particular case was the libvirt
package switching to use socket activation instead of letting the daemon
create its own socket when it is ready to respond to requests on it.

The race occurs because the .socket unit creates the libvirt control
socket very early in the boot, before even the network-pre target is
reached, and so long before the libvirtd.service dependencies are
satisfied and the daemon itself can be started to handle requests.

The deadlock in my case occurs when a udev rule for a device already
attached at boot tries to assign that device to a VM.

Prior to Bullseye, what would occur is:

 The udev rule calls a small script on device hot/cold plug which
 checks a config file, and if the device is allocated to a VM, then
 calls virsh to attach it to that VM.

 This 'immediately' either succeeds, fails because the desired VM
 is not actually running (yet), or fails because libvirtd is not
 running and virsh did not find its socket present.

 If either of the failure cases occur, the calling script fails
 gracefully, and a QEMU hook will later handle attaching the device
 if/when libvirtd and the desired VM is actually started.

But in Bullseye there's a three-way race, and if the zombie socket is
created before the udev rule runs, then virsh connects to it, but hangs
indefinitely waiting for libvirtd.service to be able to start and
respond to the request.

The deadlock in this specific case then happens when ifupdown-pre (but
it could be any of many other things) calls udevadm settle to give the
initial network devices a chance to be fully set up and available before
the networking.service brings them up.

Which in turn then hangs waiting for the (otherwise unrelated) udev rule
above to complete, which won't happen until libvirtd is started, which
won't happen until the udev rule returns (or udevadm settle times out)
and network.target (among others) can be reached.

Everything stops for two minutes until the systemd "bug solver" of
arbitrary timeouts starts killing things, and the machine finishes
booting without any network devices.

The latter can be avoided (in most cases at least) with a tweak to
the networking.service dependencies (the bug I've reported here
https://bugs.debian.org/998088 has more of the gory details of this
problem from the perspective of ifupdown's entanglement in it).

But we can avoid this specific incarnation of it completely if the
libvirtd.socket unit declared the same ordering dependencies as the
libvirtd.service does, so that anything calling virsh, at any time,
can reasonably expect an answer in finite time instead of blocking
indefinitely to wait for a service (that systemd already knows
does not even have the basic preconditions to make it eligible to
start yet but ignores that to create the socket anyway).

Unless systemd gets smarter about this, there may always be a race
with the possibility of circular deadlocks if creation of the
socket and responding to requests for it are not atomic with the
creation of the service using it - so it may actually be better to
just go back to letting the daemon create and manage the socket
itself (as its "activation" signal to users of that socket) - but
we can at least narrow the window for losing it significantly if
we defer creation of the socket until at least the point where
systemd thinks it can attempt to start the daemon (though with
no guarantee of success at that still ...)

I hope I haven't missed anything that makes this make sense in the
context of libvirt ...  trying to look at and describe this from four
entirely independent points of view, each that doesn't directly care
about any of the others, is a bit of a hall of mirrors with small
parts of the problem stuck to each of them!

  Cheers,
  Ron

-- System Information:
Debian Release: 11.1
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable-security'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 5.10.0-9-amd64 (SMP w/12 CPU threads)
Locale: LANG=en_AU.utf8, LC_CTYPE=en_AU.utf8 (charmap=UTF-8), LANGUAGE=en_AU:en
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages libvirt-daemon-system depends on:
ii  adduser                         3.118
ii  debconf [debconf-2.0]           1.5.77
ii  gettext-base                    0.21-4
ii  iptables                        1.8.7-1
ii  libvirt-clients                 7.0.0-3
ii  libvirt-daemon                  7.0.0-3
ii  libvirt-daemon-config-network   7.0.0-3
ii  libvirt-daemon-config-nwfilter  7.0.0-3
ii  libvirt-daemon-system-systemd   7.0.0-3
ii  logrotate                       3.18.0-2
ii  policykit-1                     0.105-31