[Pkg-zfsonlinux-devel] Bug#1086617: bookworm-pu: package zfs-linux/2.1.11-1+deb12u1

Shengqi Chen harry-chen at outlook.com
Sat Nov 2 08:04:31 GMT 2024


Package: release.debian.org
Severity: normal
Tags: bookworm
User: release.debian.org at packages.debian.org
Usertags: pu
X-Debbugs-Cc: zfs-linux at packages.debian.org, aron at debian.org
Control: affects -1 + src:zfs-linux
Control: block 1063497 by -1
Control: block 1069125 by -1

[ Reason ]

zfs in bookworm (2.1.11) suffers from CVE-2023-49298 (data corruption)
and CVE-2013-20001 (nfs sharing security problem).

[ Impact ]

They are still affected by these CVEs, although risks are not grave.

[ Tests ]

These patches are in unstable / bookworm-bpo for a long time.
No problems are reported against them.

[ Risks ]

Risks are minimal. I only cherry-pick fixes, no functionality changes.

[ Checklist ]
  [x] *all* changes are documented in the d/changelog
  [x] I reviewed all changes and I approve them
  [x] attach debdiff against the package in (old)stable
  [x] the issue is verified as fixed in unstable

[ Changes ]

  * dch: typo fix
  * New symbols for libzfs4linux and libzpool5linux
    (missing in last upload)
  * d/patches: cherry-pick upstream fixes for stability issues
    + fix dnode dirty test (Closes: #1056752, #1063497, CVE-2023-49298)
    + fix sharenfx IPv6 address parsing (Closes: CVE-2013-20001)
    + and some fixes related to NULL pointer, memory allocation, etc.

[ Other info ]

This request is similar to #1042730, but removing many non-essential
patches. The remaining patches containing ~100 LOC of pure changes,
and most of other changes are commit messages.

-- 
Thanks,
Shengqi Chen
-------------- next part --------------
diff -Nru zfs-linux-2.1.11/debian/changelog zfs-linux-2.1.11/debian/changelog
--- zfs-linux-2.1.11/debian/changelog	2023-04-23 17:29:38.000000000 +0800
+++ zfs-linux-2.1.11/debian/changelog	2024-11-02 15:34:23.000000000 +0800
@@ -1,3 +1,14 @@
+zfs-linux (2.1.11-1+deb12u1) UNRELEASED; urgency=medium
+
+  * dch: typo fix
+  * New symbols for libzfs4linux and libzpool5linux
+  * d/patches: cherry-pick upstream fixes for stability issues
+    + fix dnode dirty test (Closes: #1056752, #1063497, CVE-2023-49298)
+    + fix sharenfx IPv6 address parsing (Closes: CVE-2013-20001)
+    + and some fixes related to NULL pointer, memory allocation, etc.
+
+ -- Shengqi Chen <harry-chen at outlook.com>  Sat, 02 Nov 2024 15:34:23 +0800
+
 zfs-linux (2.1.11-1) unstable; urgency=medium
 
   [ Mo Zhou ]
@@ -5,7 +16,7 @@
 
   [ Aron Xu ]
   * New upstream stable point release version 2.1.11
-  * Drop patches that are alreay in upstream stable release
+  * Drop patches that are already in upstream stable release
 
  -- Aron Xu <aron at debian.org>  Sun, 23 Apr 2023 17:29:38 +0800
 
diff -Nru zfs-linux-2.1.11/debian/libzfs4linux.symbols zfs-linux-2.1.11/debian/libzfs4linux.symbols
--- zfs-linux-2.1.11/debian/libzfs4linux.symbols	2023-04-17 12:44:44.000000000 +0800
+++ zfs-linux-2.1.11/debian/libzfs4linux.symbols	2024-11-02 15:27:19.000000000 +0800
@@ -102,6 +102,7 @@
  snapshot_namecheck at Base 2.0
  spa_feature_table at Base 0.8.2
  unshare_one at Base 2.0
+ use_color at Base 2.1.11
  zcmd_alloc_dst_nvlist at Base 0.8.2
  zcmd_expand_dst_nvlist at Base 0.8.2
  zcmd_free_nvlists at Base 0.8.2
@@ -386,6 +387,7 @@
  zpool_vdev_path_to_guid at Base 0.8.2
  zpool_vdev_remove at Base 0.8.2
  zpool_vdev_remove_cancel at Base 0.8.2
+ zpool_vdev_remove_wanted at Base 2.1.11
  zpool_vdev_split at Base 0.8.2
  zpool_wait at Base 2.0
  zpool_wait_status at Base 2.0
@@ -678,6 +680,8 @@
  zfs_niceraw at Base 2.0
  zfs_nicetime at Base 2.0
  zfs_resolve_shortname at Base 2.0
+ zfs_setproctitle at Base 2.1.11
+ zfs_setproctitle_init at Base 2.1.11
  zfs_strcmp_pathname at Base 2.0
  zfs_strip_partition at Base 2.0
  zfs_strip_path at Base 2.0
diff -Nru zfs-linux-2.1.11/debian/libzpool5linux.symbols zfs-linux-2.1.11/debian/libzpool5linux.symbols
--- zfs-linux-2.1.11/debian/libzpool5linux.symbols	2023-04-17 15:26:55.000000000 +0800
+++ zfs-linux-2.1.11/debian/libzpool5linux.symbols	2024-11-02 15:27:19.000000000 +0800
@@ -685,6 +685,7 @@
  dnode_special_close at Base 0.8.2
  dnode_special_open at Base 0.8.2
  dnode_stats at Base 0.8.2
+ dnode_sums at Base 2.1.11
  dnode_sync at Base 0.8.2
  dnode_try_claim at Base 0.8.2
  dnode_verify at Base 2.0
@@ -2095,6 +2096,7 @@
  vdev_checkpoint_sm_object at Base 0.8.2
  vdev_children_are_offline at Base 0.8.2
  vdev_clear at Base 0.8.2
+ vdev_clear_kobj_evt at Base 2.1.11
  vdev_clear_resilver_deferred at Base 0.8.3
  vdev_clear_stats at Base 0.8.2
  vdev_close at Base 0.8.2
@@ -2227,6 +2229,7 @@
  vdev_open at Base 0.8.2
  vdev_open_children at Base 0.8.2
  vdev_open_children_subset at Base 2.1
+ vdev_post_kobj_evt at Base 2.1.11
  vdev_probe at Base 0.8.2
  vdev_propagate_state at Base 0.8.2
  vdev_psize_to_asize at Base 0.8.2
@@ -2277,6 +2280,7 @@
  vdev_removal_max_span at Base 0.8.2
  vdev_remove_child at Base 0.8.2
  vdev_remove_parent at Base 0.8.2
+ vdev_remove_wanted at Base 2.1.11
  vdev_reopen at Base 0.8.2
  vdev_replace_in_progress at Base 2.0
  vdev_replacing_ops at Base 0.8.2
diff -Nru zfs-linux-2.1.11/debian/patches/0013-Fix-Detach-spare-vdev-in-case-if-resilvering-does-no.patch zfs-linux-2.1.11/debian/patches/0013-Fix-Detach-spare-vdev-in-case-if-resilvering-does-no.patch
--- zfs-linux-2.1.11/debian/patches/0013-Fix-Detach-spare-vdev-in-case-if-resilvering-does-no.patch	1970-01-01 08:00:00.000000000 +0800
+++ zfs-linux-2.1.11/debian/patches/0013-Fix-Detach-spare-vdev-in-case-if-resilvering-does-no.patch	2024-11-02 15:27:19.000000000 +0800
@@ -0,0 +1,91 @@
+From a68dfdb88c88fe970343e49b48bfd3bb4cef99d2 Mon Sep 17 00:00:00 2001
+From: Ameer Hamza <106930537+ixhamza at users.noreply.github.com>
+Date: Wed, 19 Apr 2023 21:04:32 +0500
+Subject: [PATCH] Fix "Detach spare vdev in case if resilvering does not
+ happen"
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Spare vdev should detach from the pool when a disk is reinserted.
+However, spare detachment depends on the completion of resilvering,
+and if resilver does not schedule, the spare vdev?keeps attached to
+the pool until the next resilvering. When a zfs pool contains
+several disks (25+ mirror), resilvering does not always happen when
+a disk is?reinserted. In this patch, spare vdev is manually detached
+from the pool when resilvering does not occur and it has been tested
+on both Linux and FreeBSD.
+
+Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
+Reviewed-by: Alexander Motin <mav at FreeBSD.org>
+Signed-off-by: Ameer Hamza <ahamza at ixsystems.com>
+Closes #14722
+---
+ include/sys/spa.h |  1 +
+ module/zfs/spa.c  |  5 +++--
+ module/zfs/vdev.c | 12 +++++++++++-
+ 3 files changed, 15 insertions(+), 3 deletions(-)
+
+diff --git a/include/sys/spa.h b/include/sys/spa.h
+index fedadab45..07e09d1ec 100644
+--- a/include/sys/spa.h
++++ b/include/sys/spa.h
+@@ -785,6 +785,7 @@ extern int bpobj_enqueue_free_cb(void *arg, const blkptr_t *bp, dmu_tx_t *tx);
+ #define	SPA_ASYNC_L2CACHE_REBUILD		0x800
+ #define	SPA_ASYNC_L2CACHE_TRIM			0x1000
+ #define	SPA_ASYNC_REBUILD_DONE			0x2000
++#define	SPA_ASYNC_DETACH_SPARE			0x4000
+ 
+ /* device manipulation */
+ extern int spa_vdev_add(spa_t *spa, nvlist_t *nvroot);
+diff --git a/module/zfs/spa.c b/module/zfs/spa.c
+index 1ed79eed3..8bc51f777 100644
+--- a/module/zfs/spa.c
++++ b/module/zfs/spa.c
+@@ -6987,7 +6987,7 @@ spa_vdev_attach(spa_t *spa, uint64_t guid, nvlist_t *nvroot, int replacing,
+  * Detach a device from a mirror or replacing vdev.
+  *
+  * If 'replace_done' is specified, only detach if the parent
+- * is a replacing vdev.
++ * is a replacing or a spare vdev.
+  */
+ int
+ spa_vdev_detach(spa_t *spa, uint64_t guid, uint64_t pguid, int replace_done)
+@@ -8210,7 +8210,8 @@ spa_async_thread(void *arg)
+ 	 * If any devices are done replacing, detach them.
+ 	 */
+ 	if (tasks & SPA_ASYNC_RESILVER_DONE ||
+-	    tasks & SPA_ASYNC_REBUILD_DONE) {
++	    tasks & SPA_ASYNC_REBUILD_DONE ||
++	    tasks & SPA_ASYNC_DETACH_SPARE) {
+ 		spa_vdev_resilver_done(spa);
+ 	}
+ 
+diff --git a/module/zfs/vdev.c b/module/zfs/vdev.c
+index 4b9d7e7c0..ee0c1d862 100644
+--- a/module/zfs/vdev.c
++++ b/module/zfs/vdev.c
+@@ -4085,9 +4085,19 @@ vdev_online(spa_t *spa, uint64_t guid, uint64_t flags, vdev_state_t *newstate)
+ 
+ 	if (wasoffline ||
+ 	    (oldstate < VDEV_STATE_DEGRADED &&
+-	    vd->vdev_state >= VDEV_STATE_DEGRADED))
++	    vd->vdev_state >= VDEV_STATE_DEGRADED)) {
+ 		spa_event_notify(spa, vd, NULL, ESC_ZFS_VDEV_ONLINE);
+ 
++		/*
++		 * Asynchronously detach spare vdev if resilver or
++		 * rebuild is not required
++		 */
++		if (vd->vdev_unspare &&
++		    !dsl_scan_resilvering(spa->spa_dsl_pool) &&
++		    !dsl_scan_resilver_scheduled(spa->spa_dsl_pool) &&
++		    !vdev_rebuild_active(tvd))
++			spa_async_request(spa, SPA_ASYNC_DETACH_SPARE);
++	}
+ 	return (spa_vdev_state_exit(spa, vd, 0));
+ }
+ 
+-- 
+2.39.2
+
diff -Nru zfs-linux-2.1.11/debian/patches/0020-Fix-NULL-pointer-dereference-when-doing-concurrent-s.patch zfs-linux-2.1.11/debian/patches/0020-Fix-NULL-pointer-dereference-when-doing-concurrent-s.patch
--- zfs-linux-2.1.11/debian/patches/0020-Fix-NULL-pointer-dereference-when-doing-concurrent-s.patch	1970-01-01 08:00:00.000000000 +0800
+++ zfs-linux-2.1.11/debian/patches/0020-Fix-NULL-pointer-dereference-when-doing-concurrent-s.patch	2024-11-02 15:27:19.000000000 +0800
@@ -0,0 +1,65 @@
+From 671b1af1bc4b20ddd939c2ede22748bd027d30be Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Lu=C3=ADs=20Henriques?=
+ <73643340+lumigch at users.noreply.github.com>
+Date: Tue, 30 May 2023 23:15:24 +0100
+Subject: [PATCH] Fix NULL pointer dereference when doing concurrent 'send'
+ operations
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+A NULL pointer will occur when doing a 'zfs send -S' on a dataset that
+is still being received.  The problem is that the new 'send' will
+rightfully fail to own the datasets (i.e. dsl_dataset_own_force() will
+fail), but then dmu_send() will still do the dsl_dataset_disown().
+
+Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
+Signed-off-by: Lu?s Henriques <henrix at camandro.org>
+Closes #14903
+Closes #14890
+---
+ module/zfs/dmu_send.c | 8 ++++++--
+ 1 file changed, 6 insertions(+), 2 deletions(-)
+
+diff --git a/module/zfs/dmu_send.c b/module/zfs/dmu_send.c
+index cd9ecc07f..0dd1ec210 100644
+--- a/module/zfs/dmu_send.c
++++ b/module/zfs/dmu_send.c
+@@ -2797,6 +2797,7 @@ dmu_send(const char *tosnap, const char *fromsnap, boolean_t embedok,
+ 			}
+ 
+ 			if (err == 0) {
++				owned = B_TRUE;
+ 				err = zap_lookup(dspp.dp->dp_meta_objset,
+ 				    dspp.to_ds->ds_object,
+ 				    DS_FIELD_RESUME_TOGUID, 8, 1,
+@@ -2810,21 +2811,24 @@ dmu_send(const char *tosnap, const char *fromsnap, boolean_t embedok,
+ 				    sizeof (dspp.saved_toname),
+ 				    dspp.saved_toname);
+ 			}
+-			if (err != 0)
++			/* Only disown if there was an error in the lookups */
++			if (owned && (err != 0))
+ 				dsl_dataset_disown(dspp.to_ds, dsflags, FTAG);
+ 
+ 			kmem_strfree(name);
+ 		} else {
+ 			err = dsl_dataset_own(dspp.dp, tosnap, dsflags,
+ 			    FTAG, &dspp.to_ds);
++			if (err == 0)
++				owned = B_TRUE;
+ 		}
+-		owned = B_TRUE;
+ 	} else {
+ 		err = dsl_dataset_hold_flags(dspp.dp, tosnap, dsflags, FTAG,
+ 		    &dspp.to_ds);
+ 	}
+ 
+ 	if (err != 0) {
++		/* Note: dsl dataset is not owned at this point */
+ 		dsl_pool_rele(dspp.dp, FTAG);
+ 		return (err);
+ 	}
+-- 
+2.39.2
+
diff -Nru zfs-linux-2.1.11/debian/patches/0021-Revert-initramfs-use-mount.zfs-instead-of-mount.patch zfs-linux-2.1.11/debian/patches/0021-Revert-initramfs-use-mount.zfs-instead-of-mount.patch
--- zfs-linux-2.1.11/debian/patches/0021-Revert-initramfs-use-mount.zfs-instead-of-mount.patch	1970-01-01 08:00:00.000000000 +0800
+++ zfs-linux-2.1.11/debian/patches/0021-Revert-initramfs-use-mount.zfs-instead-of-mount.patch	2024-11-02 15:34:23.000000000 +0800
@@ -0,0 +1,45 @@
+From 93a99c6daae6e8c126ead2bcf331e5772c966cc7 Mon Sep 17 00:00:00 2001
+From: Rich Ercolani <214141+rincebrain at users.noreply.github.com>
+Date: Wed, 31 May 2023 19:58:41 -0400
+Subject: [PATCH] Revert "initramfs: use `mount.zfs` instead of `mount`"
+
+This broke mounting of snapshots on / for users.
+
+See https://github.com/openzfs/zfs/issues/9461#issuecomment-1376162949 for more context.
+
+Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
+Signed-off-by: Rich Ercolani <rincebrain at gmail.com>
+Closes #14908
+---
+ contrib/initramfs/scripts/zfs | 6 +++---
+ 1 file changed, 3 insertions(+), 3 deletions(-)
+
+--- a/contrib/initramfs/scripts/zfs
++++ b/contrib/initramfs/scripts/zfs
+@@ -342,7 +342,7 @@
+ 
+ 	# Need the _original_ datasets mountpoint!
+ 	mountpoint=$(get_fs_value "$fs" mountpoint)
+-	ZFS_CMD="mount.zfs -o zfsutil"
++	ZFS_CMD="mount -o zfsutil -t zfs"
+ 	if [ "$mountpoint" = "legacy" ] || [ "$mountpoint" = "none" ]; then
+ 		# Can't use the mountpoint property. Might be one of our
+ 		# clones. Check the 'org.zol:mountpoint' property set in
+@@ -359,7 +359,7 @@
+ 			fi
+ 			# Don't use mount.zfs -o zfsutils for legacy mountpoint
+ 			if [ "$mountpoint" = "legacy" ]; then
+-				ZFS_CMD="mount.zfs"
++				ZFS_CMD="mount -t zfs"
+ 			fi
+ 			# Last hail-mary: Hope 'rootmnt' is set!
+ 			mountpoint=""
+@@ -930,7 +930,7 @@
+ 		echo "       not specified on the kernel command line."
+ 		echo ""
+ 		echo "Manually mount the root filesystem on $rootmnt and then exit."
+-		echo "Hint: Try:  mount.zfs -o zfsutil ${ZFS_RPOOL-rpool}/ROOT/system $rootmnt"
++		echo "Hint: Try:  mount -o zfsutil -t zfs ${ZFS_RPOOL-rpool}/ROOT/system $rootmnt"
+ 		shell
+ 	fi
+ 
diff -Nru zfs-linux-2.1.11/debian/patches/0022-zil-Don-t-expect-zio_shrink-to-succeed.patch zfs-linux-2.1.11/debian/patches/0022-zil-Don-t-expect-zio_shrink-to-succeed.patch
--- zfs-linux-2.1.11/debian/patches/0022-zil-Don-t-expect-zio_shrink-to-succeed.patch	1970-01-01 08:00:00.000000000 +0800
+++ zfs-linux-2.1.11/debian/patches/0022-zil-Don-t-expect-zio_shrink-to-succeed.patch	2024-11-02 15:27:19.000000000 +0800
@@ -0,0 +1,31 @@
+From b01a8cc2c0fe6ee4af05bb0b1911afcbd39da64b Mon Sep 17 00:00:00 2001
+From: Alexander Motin <mav at FreeBSD.org>
+Date: Thu, 11 May 2023 17:27:12 -0400
+Subject: [PATCH] zil: Don't expect zio_shrink() to succeed.
+
+At least for RAIDZ zio_shrink() does not reduce zio size, but reduced
+wsz in that case likely results in writing uninitialized memory.
+
+Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
+Signed-off-by:  Alexander Motin <mav at FreeBSD.org>
+Sponsored by:   iXsystems, Inc.
+Closes #14853
+---
+ module/zfs/zil.c | 1 +
+ 1 file changed, 1 insertion(+)
+
+diff --git a/module/zfs/zil.c b/module/zfs/zil.c
+index 0456d3801..cca061040 100644
+--- a/module/zfs/zil.c
++++ b/module/zfs/zil.c
+@@ -1593,6 +1593,7 @@ zil_lwb_write_issue(zilog_t *zilog, lwb_t *lwb)
+ 		wsz = P2ROUNDUP_TYPED(lwb->lwb_nused, ZIL_MIN_BLKSZ, uint64_t);
+ 		ASSERT3U(wsz, <=, lwb->lwb_sz);
+ 		zio_shrink(lwb->lwb_write_zio, wsz);
++		wsz = lwb->lwb_write_zio->io_size;
+ 
+ 	} else {
+ 		wsz = lwb->lwb_sz;
+-- 
+2.39.2
+
diff -Nru zfs-linux-2.1.11/debian/patches/0027-Linux-Never-sleep-in-kmem_cache_alloc-.-KM_NOSLEEP-1.patch zfs-linux-2.1.11/debian/patches/0027-Linux-Never-sleep-in-kmem_cache_alloc-.-KM_NOSLEEP-1.patch
--- zfs-linux-2.1.11/debian/patches/0027-Linux-Never-sleep-in-kmem_cache_alloc-.-KM_NOSLEEP-1.patch	1970-01-01 08:00:00.000000000 +0800
+++ zfs-linux-2.1.11/debian/patches/0027-Linux-Never-sleep-in-kmem_cache_alloc-.-KM_NOSLEEP-1.patch	2024-11-02 15:34:23.000000000 +0800
@@ -0,0 +1,48 @@
+From 837e426c1f302e580a18a213fd216322f480caf8 Mon Sep 17 00:00:00 2001
+From: Brian Behlendorf <behlendorf1 at llnl.gov>
+Date: Wed, 7 Jun 2023 10:43:43 -0700
+Subject: [PATCH] Linux: Never sleep in kmem_cache_alloc(..., KM_NOSLEEP)
+ (#14926)
+
+When a kmem cache is exhausted and needs to be expanded a new
+slab is allocated.  KM_SLEEP callers can block and wait for the
+allocation, but KM_NOSLEEP callers were incorrectly allowed to
+block as well.
+
+Resolve this by attempting an emergency allocation as a best
+effort.  This may fail but that's fine since any KM_NOSLEEP
+consumer is required to handle an allocation failure.
+
+Signed-off-by: Brian Behlendorf <behlendorf1 at llnl.gov>
+Reviewed-by: Adam Moss <c at yotes.com>
+Reviewed-by: Brian Atkinson <batkinson at lanl.gov>
+Reviewed-by: Richard Yao <richard.yao at alumni.stonybrook.edu>
+Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
+---
+ module/os/linux/spl/spl-kmem-cache.c | 12 +++++++++++-
+ 1 file changed, 11 insertions(+), 1 deletion(-)
+
+--- a/module/os/linux/spl/spl-kmem-cache.c
++++ b/module/os/linux/spl/spl-kmem-cache.c
+@@ -1017,10 +1017,20 @@
+ 	ASSERT0(flags & ~KM_PUBLIC_MASK);
+ 	ASSERT(skc->skc_magic == SKC_MAGIC);
+ 	ASSERT((skc->skc_flags & KMC_SLAB) == 0);
+-	might_sleep();
++
+ 	*obj = NULL;
+ 
+ 	/*
++	 * Since we can't sleep attempt an emergency allocation to satisfy
++	 * the request.  The only alterative is to fail the allocation but
++	 * it's preferable try.  The use of KM_NOSLEEP is expected to be rare.
++	 */
++	if (flags & KM_NOSLEEP)
++		return (spl_emergency_alloc(skc, flags, obj));
++
++	might_sleep();
++
++	/*
+ 	 * Before allocating a new slab wait for any reaping to complete and
+ 	 * then return so the local magazine can be rechecked for new objects.
+ 	 */
diff -Nru zfs-linux-2.1.11/debian/patches/0028-dnode_is_dirty-check-dnode-and-its-data-for-dirtines.patch zfs-linux-2.1.11/debian/patches/0028-dnode_is_dirty-check-dnode-and-its-data-for-dirtines.patch
--- zfs-linux-2.1.11/debian/patches/0028-dnode_is_dirty-check-dnode-and-its-data-for-dirtines.patch	1970-01-01 08:00:00.000000000 +0800
+++ zfs-linux-2.1.11/debian/patches/0028-dnode_is_dirty-check-dnode-and-its-data-for-dirtines.patch	2024-11-02 15:27:19.000000000 +0800
@@ -0,0 +1,93 @@
+From 77b0c6f0403b2b7d145bf6c244b6acbc757ccdc9 Mon Sep 17 00:00:00 2001
+From: Rob N <robn at despairlabs.com>
+Date: Wed, 29 Nov 2023 04:16:49 +1100
+Subject: [PATCH] dnode_is_dirty: check dnode and its data for dirtiness
+
+Over its history this the dirty dnode test has been changed between
+checking for a dnodes being on `os_dirty_dnodes` (`dn_dirty_link`) and
+`dn_dirty_record`.
+
+  de198f2d9 Fix lseek(SEEK_DATA/SEEK_HOLE) mmap consistency
+  2531ce372 Revert "Report holes when there are only metadata changes"
+  ec4f9b8f3 Report holes when there are only metadata changes
+  454365bba Fix dirty check in dmu_offset_next()
+  66aca2473 SEEK_HOLE should not block on txg_wait_synced()
+
+Also illumos/illumos-gate at c543ec060d illumos/illumos-gate at 2bcf0248e9
+
+It turns out both are actually required.
+
+In the case of appending data to a newly created file, the dnode proper
+is dirtied (at least to change the blocksize) and dirty records are
+added.  Thus, a single logical operation is represented by separate
+dirty indicators, and must not be separated.
+
+The incorrect dirty check becomes a problem when the first block of a
+file is being appended to while another process is calling lseek to skip
+holes. There is a small window where the dnode part is undirtied while
+there are still dirty records. In this case, `lseek(fd, 0, SEEK_DATA)`
+would not know that the file is dirty, and would go to
+`dnode_next_offset()`. Since the object has no data blocks yet, it
+returns `ESRCH`, indicating no data found, which results in `ENXIO`
+being returned to `lseek()`'s caller.
+
+Since coreutils 9.2, `cp` performs sparse copies by default, that is, it
+uses `SEEK_DATA` and `SEEK_HOLE` against the source file and attempts to
+replicate the holes in the target. When it hits the bug, its initial
+search for data fails, and it goes on to call `fallocate()` to create a
+hole over the entire destination file.
+
+This has come up more recently as users upgrade their systems, getting
+OpenZFS 2.2 as well as a newer coreutils. However, this problem has been
+reproduced against 2.1, as well as on FreeBSD 13 and 14.
+
+This change simply updates the dirty check to check both types of dirty.
+If there's anything dirty at all, we immediately go to the "wait for
+sync" stage, It doesn't really matter after that; both changes are on
+disk, so the dirty fields should be correct.
+
+Sponsored-by: Klara, Inc.
+Sponsored-by: Wasabi Technology, Inc.
+Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
+Reviewed-by: Alexander Motin <mav at FreeBSD.org>
+Reviewed-by: Rich Ercolani <rincebrain at gmail.com>
+Signed-off-by: Rob Norris <rob.norris at klarasystems.com>
+Closes #15571
+Closes #15526
+---
+ module/zfs/dnode.c | 12 ++++++++++--
+ 1 file changed, 10 insertions(+), 2 deletions(-)
+
+diff --git a/module/zfs/dnode.c b/module/zfs/dnode.c
+index a9aaa4d21..efebc443a 100644
+--- a/module/zfs/dnode.c
++++ b/module/zfs/dnode.c
+@@ -1773,7 +1773,14 @@ dnode_try_claim(objset_t *os, uint64_t object, int slots)
+ }
+ 
+ /*
+- * Checks if the dnode contains any uncommitted dirty records.
++ * Checks if the dnode itself is dirty, or is carrying any uncommitted records.
++ * It is important to check both conditions, as some operations (eg appending
++ * to a file) can dirty both as a single logical unit, but they are not synced
++ * out atomically, so checking one and not the other can result in an object
++ * appearing to be clean mid-way through a commit.
++ *
++ * Do not change this lightly! If you get it wrong, dmu_offset_next() can
++ * detect a hole where there is really data, leading to silent corruption.
+  */
+ boolean_t
+ dnode_is_dirty(dnode_t *dn)
+@@ -1781,7 +1788,8 @@ dnode_is_dirty(dnode_t *dn)
+ 	mutex_enter(&dn->dn_mtx);
+ 
+ 	for (int i = 0; i < TXG_SIZE; i++) {
+-		if (multilist_link_active(&dn->dn_dirty_link[i])) {
++		if (multilist_link_active(&dn->dn_dirty_link[i]) ||
++		    !list_is_empty(&dn->dn_dirty_records[i])) {
+ 			mutex_exit(&dn->dn_mtx);
+ 			return (B_TRUE);
+ 		}
+-- 
+2.39.2
+
diff -Nru zfs-linux-2.1.11/debian/patches/0029-Zpool-can-start-allocating-from-metaslab-before-TRIM.patch zfs-linux-2.1.11/debian/patches/0029-Zpool-can-start-allocating-from-metaslab-before-TRIM.patch
--- zfs-linux-2.1.11/debian/patches/0029-Zpool-can-start-allocating-from-metaslab-before-TRIM.patch	1970-01-01 08:00:00.000000000 +0800
+++ zfs-linux-2.1.11/debian/patches/0029-Zpool-can-start-allocating-from-metaslab-before-TRIM.patch	2024-11-02 15:27:19.000000000 +0800
@@ -0,0 +1,104 @@
+From 1ca531971f176ae7b8ca440e836985ae1d7fa0ec Mon Sep 17 00:00:00 2001
+From: Jason King <jasonbking at users.noreply.github.com>
+Date: Thu, 12 Oct 2023 13:01:54 -0500
+Subject: [PATCH] Zpool can start allocating from metaslab before TRIMs have
+ completed
+
+When doing a manual TRIM on a zpool, the metaslab being TRIMmed is
+potentially re-enabled before all queued TRIM zios for that metaslab
+have completed. Since TRIM zios have the lowest priority, it is
+possible to get into a situation where allocations occur from the
+just re-enabled metaslab and cut ahead of queued TRIMs to the same
+metaslab.  If the ranges overlap, this will cause corruption.
+
+We were able to trigger this pretty consistently with a small single
+top-level vdev zpool (i.e. small number of metaslabs) with heavy
+parallel write activity while performing a manual TRIM against a
+somewhat 'slow' device (so TRIMs took a bit of time to complete).
+With the patch, we've not been able to recreate it since. It was on
+illumos, but inspection of the OpenZFS trim code looks like the
+relevant pieces are largely unchanged and so it appears it would be
+vulnerable to the same issue.
+
+Reviewed-by: Igor Kozhukhov <igor at dilos.org>
+Reviewed-by: Alexander Motin <mav at FreeBSD.org>
+Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
+Signed-off-by: Jason King <jking at racktopsystems.com>
+Illumos-issue: https://www.illumos.org/issues/15939
+Closes #15395
+---
+ module/zfs/vdev_trim.c | 28 +++++++++++++++++++---------
+ 1 file changed, 19 insertions(+), 9 deletions(-)
+
+diff --git a/module/zfs/vdev_trim.c b/module/zfs/vdev_trim.c
+index 92daed48f..c0ce2ac28 100644
+--- a/module/zfs/vdev_trim.c
++++ b/module/zfs/vdev_trim.c
+@@ -23,6 +23,7 @@
+  * Copyright (c) 2016 by Delphix. All rights reserved.
+  * Copyright (c) 2019 by Lawrence Livermore National Security, LLC.
+  * Copyright (c) 2021 Hewlett Packard Enterprise Development LP
++ * Copyright 2023 RackTop Systems, Inc.
+  */
+ 
+ #include <sys/spa.h>
+@@ -572,6 +573,7 @@ vdev_trim_ranges(trim_args_t *ta)
+ 	uint64_t extent_bytes_max = ta->trim_extent_bytes_max;
+ 	uint64_t extent_bytes_min = ta->trim_extent_bytes_min;
+ 	spa_t *spa = vd->vdev_spa;
++	int error = 0;
+ 
+ 	ta->trim_start_time = gethrtime();
+ 	ta->trim_bytes_done = 0;
+@@ -591,19 +593,32 @@ vdev_trim_ranges(trim_args_t *ta)
+ 		uint64_t writes_required = ((size - 1) / extent_bytes_max) + 1;
+ 
+ 		for (uint64_t w = 0; w < writes_required; w++) {
+-			int error;
+-
+ 			error = vdev_trim_range(ta, VDEV_LABEL_START_SIZE +
+ 			    rs_get_start(rs, ta->trim_tree) +
+ 			    (w *extent_bytes_max), MIN(size -
+ 			    (w * extent_bytes_max), extent_bytes_max));
+ 			if (error != 0) {
+-				return (error);
++				goto done;
+ 			}
+ 		}
+ 	}
+ 
+-	return (0);
++done:
++	/*
++	 * Make sure all TRIMs for this metaslab have completed before
++	 * returning. TRIM zios have lower priority over regular or syncing
++	 * zios, so all TRIM zios for this metaslab must complete before the
++	 * metaslab is re-enabled. Otherwise it's possible write zios to
++	 * this metaslab could cut ahead of still queued TRIM zios for this
++	 * metaslab causing corruption if the ranges overlap.
++	 */
++	mutex_enter(&vd->vdev_trim_io_lock);
++	while (vd->vdev_trim_inflight[0] > 0) {
++		cv_wait(&vd->vdev_trim_io_cv, &vd->vdev_trim_io_lock);
++	}
++	mutex_exit(&vd->vdev_trim_io_lock);
++
++	return (error);
+ }
+ 
+ static void
+@@ -922,11 +937,6 @@ vdev_trim_thread(void *arg)
+ 	}
+ 
+ 	spa_config_exit(spa, SCL_CONFIG, FTAG);
+-	mutex_enter(&vd->vdev_trim_io_lock);
+-	while (vd->vdev_trim_inflight[0] > 0) {
+-		cv_wait(&vd->vdev_trim_io_cv, &vd->vdev_trim_io_lock);
+-	}
+-	mutex_exit(&vd->vdev_trim_io_lock);
+ 
+ 	range_tree_destroy(ta.trim_tree);
+ 
+-- 
+2.39.2
+
diff -Nru zfs-linux-2.1.11/debian/patches/0030-libshare-nfs-pass-through-ipv6-addresses-in-bracket.patch zfs-linux-2.1.11/debian/patches/0030-libshare-nfs-pass-through-ipv6-addresses-in-bracket.patch
--- zfs-linux-2.1.11/debian/patches/0030-libshare-nfs-pass-through-ipv6-addresses-in-bracket.patch	1970-01-01 08:00:00.000000000 +0800
+++ zfs-linux-2.1.11/debian/patches/0030-libshare-nfs-pass-through-ipv6-addresses-in-bracket.patch	2024-11-02 15:34:23.000000000 +0800
@@ -0,0 +1,94 @@
+From 6cb5e1e7591da20af3a15793e022345a73e40fb7 Mon Sep 17 00:00:00 2001
+From: felixdoerre <felixdoerre at users.noreply.github.com>
+Date: Wed, 20 Oct 2021 19:40:00 +0200
+Subject: [PATCH] libshare: nfs: pass through ipv6 addresses in bracket
+ notation
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Recognize when the host part of a sharenfs attribute is an ipv6
+Literal and pass that through without modification.
+
+Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
+Signed-off-by: Felix D?rre <felix at dogcraft.de>
+Closes: #11171
+Closes #11939
+Closes: #1894
+---
+--- a/lib/libshare/os/linux/nfs.c
++++ b/lib/libshare/os/linux/nfs.c
+@@ -180,8 +180,9 @@
+ {
+ 	int error;
+ 	const char *access;
+-	char *host_dup, *host, *next;
++	char *host_dup, *host, *next, *v6Literal;
+ 	nfs_host_cookie_t *udata = (nfs_host_cookie_t *)pcookie;
++	int cidr_len;
+ 
+ #ifdef DEBUG
+ 	fprintf(stderr, "foreach_nfs_host_cb: key=%s, value=%s\n", opt, value);
+@@ -204,10 +205,46 @@
+ 		host = host_dup;
+ 
+ 		do {
+-			next = strchr(host, ':');
+-			if (next != NULL) {
+-				*next = '\0';
+-				next++;
++			if (*host == '[') {
++				host++;
++				v6Literal = strchr(host, ']');
++				if (v6Literal == NULL) {
++					free(host_dup);
++					return (SA_SYNTAX_ERR);
++				}
++				if (v6Literal[1] == '\0') {
++					*v6Literal = '\0';
++					next = NULL;
++				} else if (v6Literal[1] == '/') {
++					next = strchr(v6Literal + 2, ':');
++					if (next == NULL) {
++						cidr_len =
++						    strlen(v6Literal + 1);
++						memmove(v6Literal,
++						    v6Literal + 1,
++						    cidr_len);
++						v6Literal[cidr_len] = '\0';
++					} else {
++						cidr_len = next - v6Literal - 1;
++						memmove(v6Literal,
++						    v6Literal + 1,
++						    cidr_len);
++						v6Literal[cidr_len] = '\0';
++						next++;
++					}
++				} else if (v6Literal[1] == ':') {
++					*v6Literal = '\0';
++					next = v6Literal + 2;
++				} else {
++					free(host_dup);
++					return (SA_SYNTAX_ERR);
++				}
++			} else {
++				next = strchr(host, ':');
++				if (next != NULL) {
++					*next = '\0';
++					next++;
++				}
+ 			}
+ 
+ 			error = udata->callback(udata->filename,
+--- a/man/man8/zfs.8
++++ b/man/man8/zfs.8
+@@ -545,7 +545,7 @@
+ on the
+ .Ar tank/home
+ file system:
+-.Dl # Nm zfs Cm set Sy sharenfs Ns = Ns ' Ns Ar rw Ns =@123.123.0.0/16,root= Ns Ar neo Ns ' tank/home
++.Dl # Nm zfs Cm set Sy sharenfs Ns = Ns ' Ns Ar rw Ns =@123.123.0.0/16:[::1],root= Ns Ar neo Ns ' tank/home
+ .Pp
+ If you are using DNS for host name resolution,
+ specify the fully-qualified hostname.
+
diff -Nru zfs-linux-2.1.11/debian/patches/series zfs-linux-2.1.11/debian/patches/series
--- zfs-linux-2.1.11/debian/patches/series	2023-04-19 13:37:42.000000000 +0800
+++ zfs-linux-2.1.11/debian/patches/series	2024-11-02 15:34:23.000000000 +0800
@@ -27,3 +27,12 @@
 0006-rootdelay-on-zfs-should-be-adaptive.patch
 0009-zdb-zero-pad-checksum-output.patch
 0010-zdb-zero-pad-checksum-output-follow-up.patch
+# 2.1.11+deb12u1
+0013-Fix-Detach-spare-vdev-in-case-if-resilvering-does-no.patch
+0020-Fix-NULL-pointer-dereference-when-doing-concurrent-s.patch
+0021-Revert-initramfs-use-mount.zfs-instead-of-mount.patch
+0022-zil-Don-t-expect-zio_shrink-to-succeed.patch
+0027-Linux-Never-sleep-in-kmem_cache_alloc-.-KM_NOSLEEP-1.patch
+0028-dnode_is_dirty-check-dnode-and-its-data-for-dirtines.patch
+0029-Zpool-can-start-allocating-from-metaslab-before-TRIM.patch
+0030-libshare-nfs-pass-through-ipv6-addresses-in-bracket.patch


More information about the Pkg-zfsonlinux-devel mailing list