[Pkg-libvirt-maintainers] Bug#1120389: libvirt-daemon: libvirt disables detect_zeroes, resulting in denial of service during blockcommit/snapshot-delete

zrm debbug at trustiosity.com
Sat Nov 8 17:21:09 GMT 2025


Package: libvirt-daemon
Version: 11.3.0-3
Severity: important
Tags: security
X-Debbugs-Cc: debbug at trustiosity.com, Debian Security Team <team at security.debian.org>

Context: quick review of what happens if you 'virsh snapshot-delete' an external disk snapshot when the virtual disk has the properties discard='unmap' and detect_zeroes='off':

# # We're going to make this disk image small so we don't have a Bad Day
# qemu-img create -f qcow2 /var/lib/libvirt/images/trogdor.qcow2 5G
# # [put a base install of Debian or similar into this disk image]
# grep -A3 "device='disk'" trogdor.xml
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' discard='unmap' detect_zeroes='off'/>
      <source file='/var/lib/libvirt/images/trogdor.qcow2'/>
      <target dev='vda' bus='virtio'/>
# virsh define trogdor.xml
Domain 'trogdor' defined from trogdor.xml
# virsh start trogdor
Domain 'trogdor' started
# # Make a few snapshots and have the guest TRIM the filesystem in the second one
# N=1; virsh snapshot-create-as --domain trogdor snap$N --diskspec vda,file=/var/lib/libvirt/images/trogdor-snap${N}.qcow2 --disk-only
Domain snapshot snap1 created
# N=2; virsh snapshot-create-as --domain trogdor snap$N --diskspec vda,file=/var/lib/libvirt/images/trogdor-snap${N}.qcow2 --disk-only
Domain snapshot snap2 created
# virsh domfstrim trogdor
# N=3; virsh snapshot-create-as --domain trogdor snap$N --diskspec vda,file=/var/lib/libvirt/images/trogdor-snap${N}.qcow2 --disk-only
Domain snapshot snap3 created
# # Now delete the second one
# du -h /var/lib/libvirt/images/trogdor*.qcow2
1.4G    /var/lib/libvirt/images/trogdor.qcow2
324K    /var/lib/libvirt/images/trogdor-snap1.qcow2
2.1M    /var/lib/libvirt/images/trogdor-snap2.qcow2
456K    /var/lib/libvirt/images/trogdor-snap3.qcow2
# virsh snapshot-delete trogdor snap2
Domain snapshot snap2 deleted
# du -h /var/lib/libvirt/images/trogdor*.qcow2
1.4G    /var/lib/libvirt/images/trogdor.qcow2
3.0G    /var/lib/libvirt/images/trogdor-snap1.qcow2
456K    /var/lib/libvirt/images/trogdor-snap3.qcow2

Oops. The blockcommit implicit in deleting the snapshot turned all the TRIM'd zeroes into actual zeros because detect_zeroes is off. Which is probably a bug in itself; it would make sense to have discard='unmap' set detect_zeroes='unmap' during blockcommit/blockpull even if it doesn't the rest of the time. This is obviously a much bigger problem when VMs have 3TB of free space instead of 3GB. We must never do that. Whereas this is what happens (or should happen) if you do the same thing with detect_zeroes='unmap':

# grep -A3 "device='disk'" trogdor.xml
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' discard='unmap' detect_zeroes='unmap'/>
      <source file='/var/lib/libvirt/images/trogdor.qcow2'/>
      <target dev='vda' bus='virtio'/>
# virsh define trogdor.xml
Domain 'trogdor' defined from trogdor.xml
# virsh start trogdor
Domain 'trogdor' started
# # Make a few snapshots and have the guest TRIM the filesystem in the second one
# N=1; virsh snapshot-create-as --domain trogdor snap$N --diskspec vda,file=/var/lib/libvirt/images/trogdor-snap${N}.qcow2 --disk-only
Domain snapshot snap1 created
# # IT said try turning it off and on again
# virsh shutdown trogdor
Domain 'trogdor' is being shutdown
# # wait for it to shut down
# virsh start trogdor
Domain 'trogdor' started
# N=2; virsh snapshot-create-as --domain trogdor snap$N --diskspec vda,file=/var/lib/libvirt/images/trogdor-snap${N}.qcow2 --disk-only
Domain snapshot snap2 created
# virsh domfstrim trogdor
# N=3; virsh snapshot-create-as --domain trogdor snap$N --diskspec vda,file=/var/lib/libvirt/images/trogdor-snap${N}.qcow2 --disk-only
Domain snapshot snap3 created
# du -h /var/lib/libvirt/images/trogdor*.qcow2
1.4G    /var/lib/libvirt/images/trogdor.qcow2
5.4M    /var/lib/libvirt/images/trogdor-snap1.qcow2
3.9M    /var/lib/libvirt/images/trogdor-snap2.qcow2
200K    /var/lib/libvirt/images/trogdor-snap3.qcow2
# virsh snapshot-delete trogdor snap2
Domain snapshot snap2 deleted
# du -h /var/lib/libvirt/images/trogdor*.qcow2
1.4G    /var/lib/libvirt/images/trogdor.qcow2
5.8M    /var/lib/libvirt/images/trogdor-snap1.qcow2
1.6M    /var/lib/libvirt/images/trogdor-snap3.qcow2

Now the merged snapshot is a much more appropriate size. Notice the VM shutdown and start between creating the first snapshot and the second. That's to work around the bug, because the bug is that detect_zeroes isn't enabled when it ought to be. It's related to these lines in qemuDomainPrepareDiskSourceData() in src/qemu/qemu_domain.c:

    /* transfer properties valid only for the top level image */
    if (src == disk->src || src == disk->src->dataFileStore)
        src->detect_zeroes = disk->detect_zeroes;

The conditional here might have been crafted from a place of love, but it ain't right. The function is meant to copy properties from 'disk' to 'src'. What the detect_zeroes property does is check if data being written is all zeroes and then if it is, uses unmap instead of actually writing the zeroes. Typically writes only go to the top-level image so if none of the others ever get written to then they don't need detect_zeroes to be enabled, right? That's the theory?

There are three problems with this. The first is that it's apparently fruitless. Enabling detect_zeroes is somewhat computationally expensive so the theory might be to turn it off for read-only images where it isn't needed, but if they're actually read-only then they shouldn't be getting any writes anyway and then there shouldn't be any computation to be saved by turning it off.

The second is that the backing files aren't always read-only, and in particular live blockcommit writes to them and very much wants detect_zeroes='unmap'. This is where you get the denial of service, because it happens during live snapshot deletion. A user could create arbitrarily many snapshots of trivial size and TRIM them without exceeding a storage quota and the disk usage would only increase by a factor of a thousand when deleting, which renders adversarial the operation that should normally be expected to free up space. Your storage gets filled with zeroes even when you set detect_zeroes='unmap' and your VMs crash because your host filesystem is completely full.

The third is that the check has a TOCTOU fail which is triggered during live snapshot creation. It seems to do what it claims to during guest boot but during snapshot creation that function is called against a new snapshot which is about to be the top level image but 'disk->src' isn't set to 'src' until qemuSnapshotDiskUpdateSource() which is after the call to qemuDomainPrepareDiskSourceData(). With the result that detect_zeroes for what is about to be the top-level image is left as its default value, which is 'off'. So detect_zeroes gets erroneously disabled for the live top-level image whenever you create a live snapshot. It also stays enabled for what was the top-level image when the guest booted. This is why snapshot-delete "worked" after shutting down the guest so it could boot with the top-level snapshot as the one we would ultimately merge the deleted one into -- detect_zeroes at that point was still 'unmap' for snap1 instead of the then-top level snap3.

All of this seems to be sorted by removing the conditional in qemuDomainPrepareDiskSourceData() so that 'src->detect_zeroes = disk->detect_zeroes;' happens unconditionally. That and updating a few tests in tests/qemublocktestdata/xml2json to expect detect_zeroes to be present in the relevant JSON causes detect_zeroes='unmap' to work as expected with blockcommit/snapshot-delete.

If for some reason that change isn't desirable, the most irksome of the trouble could be eliminated by having qemuBlockCommit(), and therefore snapshot-delete, temporarily set detect_zeroes='unmap' on the commit target destination image whenever discard='unmap' is set. This would be good to do even if the conditional in qemuDomainPrepareDiskSourceData() *is* removed, because blockcommit causes disk images to explode with zeroes whenever discard='unmap' and detect_zeroes isn't, and that isn't desirable even if you don't want to constantly pay the cost of enabling detect_zeroes on the live top image. Or to put it another way, you shouldn't have to enable it on the live top image if you only want it during blockcommit/snapshot-delete, but you will want it during blockcommit whenever discard is enabled. 

-- System Information:
Debian Release: 13.1
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable-security'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 6.12.48+deb13-amd64 (SMP w/4 CPU threads; PREEMPT)
Kernel taint flags: TAINT_FIRMWARE_WORKAROUND
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages libvirt-daemon depends on:
ii  libc6                  2.41-12
ii  libgcc-s1              14.2.0-19
ii  libglib2.0-0t64        2.84.4-3~deb13u1
ii  libtirpc3t64           1.3.6+ds-1
ii  libvirt-common         11.3.0-3
ii  libvirt-daemon-common  11.3.0-3
ii  libvirt-daemon-log     11.3.0-3
ii  libvirt0               11.3.0-3
ii  libxml2                2.12.7+dfsg+really2.9.14-2.1+deb13u1
ii  logrotate              3.22.0-1

Versions of packages libvirt-daemon recommends:
ii  libvirt-daemon-driver-interface        11.3.0-3
ii  libvirt-daemon-driver-network          11.3.0-3
ii  libvirt-daemon-driver-nodedev          11.3.0-3
ii  libvirt-daemon-driver-nwfilter         11.3.0-3
ii  libvirt-daemon-driver-qemu             11.3.0-3
ii  libvirt-daemon-driver-secret           11.3.0-3
ii  libvirt-daemon-driver-storage          11.3.0-3
ii  libvirt-daemon-driver-storage-disk     11.3.0-3
ii  libvirt-daemon-driver-storage-iscsi    11.3.0-3
ii  libvirt-daemon-driver-storage-logical  11.3.0-3
ii  libvirt-daemon-driver-storage-mpath    11.3.0-3
ii  libvirt-daemon-driver-storage-scsi     11.3.0-3
ii  libvirt-daemon-lock                    11.3.0-3
ii  libvirt-daemon-plugin-lockd            11.3.0-3

Versions of packages libvirt-daemon suggests:
ii  libvirt-daemon-driver-lxc                   11.3.0-3
pn  libvirt-daemon-driver-storage-gluster       <none>
pn  libvirt-daemon-driver-storage-iscsi-direct  <none>
pn  libvirt-daemon-driver-storage-rbd           <none>
pn  libvirt-daemon-driver-storage-zfs           <none>
ii  libvirt-daemon-driver-vbox                  11.3.0-3
ii  libvirt-daemon-driver-xen                   11.3.0-3
ii  libvirt-daemon-system                       11.3.0-3

-- no debconf information



More information about the Pkg-libvirt-maintainers mailing list