[Pkg-zfsonlinux-devel] Bug#946610: zfs-dkms: livelock between ZFS evict and writeback threads

Heitor Alves de Siqueira halves at canonical.com
Wed Dec 11 20:05:29 GMT 2019

Source: zfs-linux
Severity: normal

Dear Maintainer,

For certain ZFS workloads, we start seeing hung task timeouts in the kernel logs due to zil_commit() stalling. This is due to zfs_zget() not detecting whether a znode has been marked for deletion before attempting to access it, causing a constant "retry loop" in zfs_get_data() if that znode has been unlinked already. An example of the stack traces follows:

[72742.051703] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[72742.070429] mysqld          D    0  5713   2881 0x00000320
[72742.073220] Call Trace:
[72742.075305]  __schedule+0x24e/0x880
[72742.090436]  schedule+0x2c/0x80
[72742.090438]  schedule_preempt_disabled+0xe/0x10
[72742.090441]  __mutex_lock.isra.5+0x276/0x4e0
[72742.090547]  ? dmu_tx_destroy+0x105/0x130 [zfs]
[72742.090555]  __mutex_lock_slowpath+0x13/0x20
[72742.115374]  ? __mutex_lock_slowpath+0x13/0x20
[72742.132266]  mutex_lock+0x2f/0x40
[72742.134207]  zil_commit_impl+0x1b0/0x1b30 [zfs]
[72742.150428]  ? spl_kmem_alloc+0x115/0x180 [spl]
[72742.152622]  ? mutex_lock+0x12/0x40
[72742.154819]  ? zfs_refcount_add_many+0x9a/0x100 [zfs]
[72742.171450]  zil_commit+0xde/0x150 [zfs]
[72742.173687]  zfs_fsync+0x77/0xe0 [zfs]
[72742.175044]  zpl_fsync+0x80/0x110 [zfs]
[72742.191690]  vfs_fsync_range+0x51/0xb0
[72742.193876]  do_fsync+0x3d/0x70
[72742.195126]  SyS_fsync+0x10/0x20
[72742.211059]  do_syscall_64+0x73/0x130
[72742.214078]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2

It's possible to hit this issue due to a race between the ZFS evict and writeback threads. If the z_iput task is trying to evict a znode that's currently sitting in the writeback thread, both will "livelock" each other and stall the ZIO pipeline, causing other ZFS operations (such as zil_commit) to hang indefinitely.

This has been documented and fixed upstream in PR#9583 [0]. We need to pull two fixes from upstream: the first one fixes the zfs_zget() issue in the writeback thread, while the second fixes a regression on O_TMPFILE descriptors caused by the first one.

Upstream patches:
 - Break out of zfs_zget early if unlinked znode (41e1aa2a06f8) [1]
 - Check for unlinked znodes after igrab() (0c46813805f4) [2]

[0] https://github.com/zfsonlinux/zfs/pull/9583
[1] https://github.com/zfsonlinux/zfs/commit/41e1aa2a06f8
[2] https://github.com/zfsonlinux/zfs/commit/0c46813805f4

More information about the Pkg-zfsonlinux-devel mailing list