[Pkg-zfsonlinux-devel] Bug#874541: zfs-dkms: Importing pool fails and hangs forever (INFO: task l2arc_feed:796 blocked for more than...)
root
sdelafond at gmail.com
Thu Sep 7 07:05:34 UTC 2017
Package: zfs-dkms
Version: 0.6.5.11-1
Severity: important
Control: tag -1 + upstream
Control: forward -1 https://github.com/zfsonlinux/zfs/issues/6609
My system crashed brutally this afternoon, didn't get a stack trace at
all. Upon reboot, "zfs import storage" hangs forever, with the load
gradually reaching 3. In dmesg I see a bunch of the following:
,----
| [ 1571.829354] INFO: task l2arc_feed:796 blocked for more than 120 seconds.
| [ 1571.836183] Tainted: P IO 4.12.0-1-amd64 #1 Debian 4.12.6-1
| [ 1571.843434] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
| [ 1571.851395] l2arc_feed D 0 796 2 0x00000000
| [ 1571.851399] Call Trace:
| [ 1571.851406] ? __schedule+0x3c5/0x850
| [ 1571.851409] ? try_to_del_timer_sync+0x4d/0x80
| [ 1571.851414] ? schedule+0x32/0x80
| [ 1571.851416] ? schedule_preempt_disabled+0xa/0x10
| [ 1571.851418] ? __mutex_lock.isra.4+0x2c3/0x4f0
| [ 1571.851467] ? l2arc_feed_thread+0x1bf/0xd80 [zfs]
| [ 1571.851501] ? l2arc_feed_thread+0x1bf/0xd80 [zfs]
| [ 1571.851505] ? dequeue_entity+0xe4/0x460
| [ 1571.851508] ? pick_next_task_fair+0x133/0x540
| [ 1571.851512] ? __switch_to+0x234/0x430
| [ 1571.851521] ? thread_generic_wrapper+0x62/0x80 [spl]
| [ 1571.851556] ? l2arc_evict+0x370/0x370 [zfs]
| [ 1571.851563] ? thread_generic_wrapper+0x6d/0x80 [spl]
| [ 1571.851568] ? kthread+0xfc/0x130
| [ 1571.851574] ? __thread_exit+0x20/0x20 [spl]
| [ 1571.851577] ? kthread_create_on_node+0x70/0x70
| [ 1571.851580] ? ret_from_fork+0x25/0x30
| [ 1571.851601] INFO: task zpool:12432 blocked for more than 120 seconds.
| [ 1571.858210] Tainted: P IO 4.12.0-1-amd64 #1 Debian 4.12.6-1
| [ 1571.865511] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
| [ 1571.873470] zpool D 0 12432 4470 0x00000000
| [ 1571.873472] Call Trace:
| [ 1571.873476] ? __schedule+0x3c5/0x850
| [ 1571.873533] ? zil_claim+0x250/0x250 [zfs]
| [ 1571.873538] ? schedule+0x32/0x80
| [ 1571.873549] ? taskq_wait+0x7e/0xd0 [spl]
| [ 1571.873556] ? remove_wait_queue+0x60/0x60
| [ 1571.873601] ? dmu_objset_find_dp+0x113/0x1b0 [zfs]
| [ 1571.873656] ? spa_load+0x18b1/0x1c80 [zfs]
| [ 1571.873662] ? zpool_get_rewind_policy+0x74/0x1a0 [zcommon]
| [ 1571.873716] ? spa_load_best+0x5a/0x280 [zfs]
| [ 1571.873770] ? spa_import+0x1cb/0x710 [zfs]
| [ 1571.873825] ? get_nvlist+0xc6/0xf0 [zfs]
| [ 1571.873880] ? zfs_ioc_pool_import+0xf5/0x120 [zfs]
| [ 1571.873936] ? zfsdev_ioctl+0x462/0x4f0 [zfs]
| [ 1571.873939] ? do_vfs_ioctl+0x9f/0x600
| [ 1571.873943] ? SyS_ioctl+0x74/0x80
| [ 1571.873946] ? system_call_fast_compare_end+0xc/0x97
| [ 1571.873989] INFO: task dmu_objset_find:13497 blocked for more than 120 seconds.
| [ 1571.881430] Tainted: P IO 4.12.0-1-amd64 #1 Debian 4.12.6-1
| [ 1571.888634] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
| [ 1571.896594] dmu_objset_find D 0 13497 2 0x00000000
| [ 1571.896596] Call Trace:
| [ 1571.896600] ? __schedule+0x3c5/0x850
| [ 1571.896602] ? schedule+0x32/0x80
| [ 1571.896611] ? cv_wait_common+0x115/0x130 [spl]
| [ 1571.896617] ? remove_wait_queue+0x60/0x60
| [ 1571.896662] ? dmu_buf_hold_array_by_dnode+0x1bc/0x4a0 [zfs]
| [ 1571.896705] ? dmu_read+0x9f/0x1a0 [zfs]
| [ 1571.896762] ? space_map_load+0x1c5/0x520 [zfs]
| [ 1571.896814] ? metaslab_load+0x2e/0xc0 [zfs]
| [ 1571.896865] ? metaslab_activate+0x82/0xb0 [zfs]
| [ 1571.896916] ? metaslab_claim+0x192/0x480 [zfs]
| [ 1571.896971] ? zio_dva_claim+0x1a/0x30 [zfs]
| [ 1571.897024] ? zio_wait+0x72/0x140 [zfs]
| [ 1571.897078] ? zil_parse+0x1ec/0x910 [zfs]
| [ 1571.897132] ? zil_replaying+0x60/0x60 [zfs]
| [ 1571.897194] ? zil_claim_log_block+0xa0/0xa0 [zfs]
| [ 1571.897254] ? zil_check_log_chain+0xe0/0x190 [zfs]
| [ 1571.897299] ? dmu_objset_find_dp_impl+0x18a/0x3c0 [zfs]
| [ 1571.897343] ? dmu_objset_find_dp_cb+0x25/0x40 [zfs]
| [ 1571.897354] ? taskq_thread+0x27e/0x4b0 [spl]
| [ 1571.897361] ? wake_up_q+0x70/0x70
| [ 1571.897367] ? kthread+0xfc/0x130
| [ 1571.897374] ? taskq_thread_spawn+0x50/0x50 [spl]
| [ 1571.897376] ? kthread_create_on_node+0x70/0x70
| [ 1571.897379] ? ret_from_fork+0x25/0x30
`----
If I reboot and don't try to import, "zfs import" reports that my pool is as follows:
,----
| pool: storage
| id: 12616089183921587906
| state: ONLINE
| action: The pool can be imported using its name or numeric identifier.
| config:
|
| storage ONLINE
| mirror-0 ONLINE
| sdb ONLINE
| sdd ONLINE
| cache
| sdc2
| sda2
| logs
| mirror-1 ONLINE
| sda3 ONLINE
| sdc1 ONLINE
`----
I wasn't sure if cache partitions from a not-yet-imported volume were
supposed to show as ONLINE as well, so I tried simply removing
/dev/sd[ac]2 but that didn't change the symptoms upon importing.
zbl -l does see LABELs on each of the partition making up my pool, in
case it's worth mentioning.
Maybe more importantly, I can import the pool fine if I pass -o
readonly=on. All the filesystems then get mounted, and it appears I can
read my data just fine, without any spurious logs in dmesg.
When the pool is imported read-only, zpool status storage shows:
,----
| pool: storage
| state: ONLINE
| scan: scrub repaired 0 in 9h23m with 0 errors on Sun Aug 13 09:47:43 2017
| config:
|
| NAME STATE READ WRITE CKSUM
| storage ONLINE 0 0 0
| mirror-0 ONLINE 0 0 0
| sdb ONLINE 0 0 0
| sdd ONLINE 0 0 0
| logs
| mirror-1 ONLINE 0 0 0
| sda3 ONLINE 0 0 0
| sdc1 ONLINE 0 0 0
| cache
| sdc2 ONLINE 0 0 0
| sda2 ONLINE 0 0 0
|
| errors: No known data errors
`----
sdb and sdc are 2 identical 3To SATA drives, while sda and sdc are SSD,
with sda1 being /. SMART doesn't see any problems with them.
Short of changing all four hard-drives, what are my options here ? If
one HD is truly going bad, and causing the issue, how can I identify it
? More generally, what would cause ZFS to act like this ?
(Note: I'm tagging this upstream because I don't see how it could a pure
Debian problem, but feel free to revert that)
-- System Information:
Debian Release: stretch/sid
APT prefers unstable
APT policy: (500, 'unstable'), (500, 'stable'), (500, 'oldstable'), (1, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386, armhf
Kernel: Linux 4.12.0-1-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash
Init: systemd (via /run/systemd/system)
Versions of packages zfs-dkms depends on:
ii debconf 1.5.63
ii dkms 2.3-3
ii lsb-release 9.20160629
ii spl-dkms 0.6.5.11-1
Versions of packages zfs-dkms recommends:
ii zfs-zed 0.6.5.11-1
ii zfsutils-linux 0.6.5.11-1
zfs-dkms suggests no packages.
-- debconf information:
zfs-dkms/stop-build-for-32bit-kernel: true
* zfs-dkms/note-incompatible-licenses:
zfs-dkms/stop-build-for-unknown-kernel: true
More information about the Pkg-zfsonlinux-devel
mailing list