[Pkg-zfsonlinux-devel] Bug#874541: zfs-dkms: Importing pool fails and hangs forever (INFO: task l2arc_feed:796 blocked for more than...)

root sdelafond at gmail.com
Thu Sep 7 07:05:34 UTC 2017


Package: zfs-dkms
Version: 0.6.5.11-1
Severity: important
Control: tag -1 + upstream
Control: forward -1 https://github.com/zfsonlinux/zfs/issues/6609

My system crashed brutally this afternoon, didn't get a stack trace at
all. Upon reboot, "zfs import storage" hangs forever, with the load
gradually reaching 3. In dmesg I see a bunch of the following:

,----
| [ 1571.829354] INFO: task l2arc_feed:796 blocked for more than 120 seconds.
| [ 1571.836183]       Tainted: P          IO    4.12.0-1-amd64 #1 Debian 4.12.6-1
| [ 1571.843434] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
| [ 1571.851395] l2arc_feed      D    0   796      2 0x00000000
| [ 1571.851399] Call Trace:
| [ 1571.851406]  ? __schedule+0x3c5/0x850
| [ 1571.851409]  ? try_to_del_timer_sync+0x4d/0x80
| [ 1571.851414]  ? schedule+0x32/0x80
| [ 1571.851416]  ? schedule_preempt_disabled+0xa/0x10
| [ 1571.851418]  ? __mutex_lock.isra.4+0x2c3/0x4f0
| [ 1571.851467]  ? l2arc_feed_thread+0x1bf/0xd80 [zfs]
| [ 1571.851501]  ? l2arc_feed_thread+0x1bf/0xd80 [zfs]
| [ 1571.851505]  ? dequeue_entity+0xe4/0x460
| [ 1571.851508]  ? pick_next_task_fair+0x133/0x540
| [ 1571.851512]  ? __switch_to+0x234/0x430
| [ 1571.851521]  ? thread_generic_wrapper+0x62/0x80 [spl]
| [ 1571.851556]  ? l2arc_evict+0x370/0x370 [zfs]
| [ 1571.851563]  ? thread_generic_wrapper+0x6d/0x80 [spl]
| [ 1571.851568]  ? kthread+0xfc/0x130
| [ 1571.851574]  ? __thread_exit+0x20/0x20 [spl]
| [ 1571.851577]  ? kthread_create_on_node+0x70/0x70
| [ 1571.851580]  ? ret_from_fork+0x25/0x30
| [ 1571.851601] INFO: task zpool:12432 blocked for more than 120 seconds.
| [ 1571.858210]       Tainted: P          IO    4.12.0-1-amd64 #1 Debian 4.12.6-1
| [ 1571.865511] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
| [ 1571.873470] zpool           D    0 12432   4470 0x00000000
| [ 1571.873472] Call Trace:
| [ 1571.873476]  ? __schedule+0x3c5/0x850
| [ 1571.873533]  ? zil_claim+0x250/0x250 [zfs]
| [ 1571.873538]  ? schedule+0x32/0x80
| [ 1571.873549]  ? taskq_wait+0x7e/0xd0 [spl]
| [ 1571.873556]  ? remove_wait_queue+0x60/0x60
| [ 1571.873601]  ? dmu_objset_find_dp+0x113/0x1b0 [zfs]
| [ 1571.873656]  ? spa_load+0x18b1/0x1c80 [zfs]
| [ 1571.873662]  ? zpool_get_rewind_policy+0x74/0x1a0 [zcommon]
| [ 1571.873716]  ? spa_load_best+0x5a/0x280 [zfs]
| [ 1571.873770]  ? spa_import+0x1cb/0x710 [zfs]
| [ 1571.873825]  ? get_nvlist+0xc6/0xf0 [zfs]
| [ 1571.873880]  ? zfs_ioc_pool_import+0xf5/0x120 [zfs]
| [ 1571.873936]  ? zfsdev_ioctl+0x462/0x4f0 [zfs]
| [ 1571.873939]  ? do_vfs_ioctl+0x9f/0x600
| [ 1571.873943]  ? SyS_ioctl+0x74/0x80
| [ 1571.873946]  ? system_call_fast_compare_end+0xc/0x97
| [ 1571.873989] INFO: task dmu_objset_find:13497 blocked for more than 120 seconds.
| [ 1571.881430]       Tainted: P          IO    4.12.0-1-amd64 #1 Debian 4.12.6-1
| [ 1571.888634] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
| [ 1571.896594] dmu_objset_find D    0 13497      2 0x00000000
| [ 1571.896596] Call Trace:
| [ 1571.896600]  ? __schedule+0x3c5/0x850
| [ 1571.896602]  ? schedule+0x32/0x80
| [ 1571.896611]  ? cv_wait_common+0x115/0x130 [spl]
| [ 1571.896617]  ? remove_wait_queue+0x60/0x60
| [ 1571.896662]  ? dmu_buf_hold_array_by_dnode+0x1bc/0x4a0 [zfs]
| [ 1571.896705]  ? dmu_read+0x9f/0x1a0 [zfs]
| [ 1571.896762]  ? space_map_load+0x1c5/0x520 [zfs]
| [ 1571.896814]  ? metaslab_load+0x2e/0xc0 [zfs]
| [ 1571.896865]  ? metaslab_activate+0x82/0xb0 [zfs]
| [ 1571.896916]  ? metaslab_claim+0x192/0x480 [zfs]
| [ 1571.896971]  ? zio_dva_claim+0x1a/0x30 [zfs]
| [ 1571.897024]  ? zio_wait+0x72/0x140 [zfs]
| [ 1571.897078]  ? zil_parse+0x1ec/0x910 [zfs]
| [ 1571.897132]  ? zil_replaying+0x60/0x60 [zfs]
| [ 1571.897194]  ? zil_claim_log_block+0xa0/0xa0 [zfs]
| [ 1571.897254]  ? zil_check_log_chain+0xe0/0x190 [zfs]
| [ 1571.897299]  ? dmu_objset_find_dp_impl+0x18a/0x3c0 [zfs]
| [ 1571.897343]  ? dmu_objset_find_dp_cb+0x25/0x40 [zfs]
| [ 1571.897354]  ? taskq_thread+0x27e/0x4b0 [spl]
| [ 1571.897361]  ? wake_up_q+0x70/0x70
| [ 1571.897367]  ? kthread+0xfc/0x130
| [ 1571.897374]  ? taskq_thread_spawn+0x50/0x50 [spl]
| [ 1571.897376]  ? kthread_create_on_node+0x70/0x70
| [ 1571.897379]  ? ret_from_fork+0x25/0x30
`----

If I reboot and don't try to import, "zfs import" reports that my pool is as follows:

,----
|    pool: storage                                                              
|      id: 12616089183921587906                                                  
|   state: ONLINE                                                                      
|  action: The pool can be imported using its name or numeric identifier.        
|  config:                                                                       
|                                                                                 
|         storage     ONLINE                                                     
|           mirror-0  ONLINE                                                        
|             sdb     ONLINE                                                           
|             sdd     ONLINE                                                    
|         cache                                                                 
|           sdc2                                                                 
|           sda2                                                                
|         logs                                                                         
|           mirror-1  ONLINE                                                     
|             sda3    ONLINE                                                    
|             sdc1    ONLINE     
`----

I wasn't sure if cache partitions from a not-yet-imported volume were
supposed to show as ONLINE as well, so I tried simply removing
/dev/sd[ac]2 but that didn't change the symptoms upon importing.

zbl -l does see LABELs on each of the partition making up my pool, in
case it's worth mentioning.

Maybe more importantly, I can import the pool fine if I pass -o
readonly=on. All the filesystems then get mounted, and it appears I can
read my data just fine, without any spurious logs in dmesg.

When the pool is imported read-only, zpool status storage shows:

,----
|   pool: storage
|  state: ONLINE                                
|   scan: scrub repaired 0 in 9h23m with 0 errors on Sun Aug 13 09:47:43 2017
| config:                                       
|              
|         NAME        STATE     READ WRITE CKSUM
|         storage     ONLINE       0     0     0
|           mirror-0  ONLINE       0     0     0
|             sdb     ONLINE       0     0     0
|             sdd     ONLINE       0     0     0                                 
|         logs                                            
|           mirror-1  ONLINE       0     0     0                                 
|             sda3    ONLINE       0     0     0                                 
|             sdc1    ONLINE       0     0     0
|         cache
|           sdc2      ONLINE       0     0     0
|           sda2      ONLINE       0     0     0
| 
| errors: No known data errors
`----

sdb and sdc are 2 identical 3To SATA drives, while sda and sdc are SSD,
with sda1 being /. SMART doesn't see any problems with them.

Short of changing all four hard-drives, what are my options here ? If
one HD is truly going bad, and causing the issue, how can I identify it
? More generally, what would cause ZFS to act like this ?

(Note: I'm tagging this upstream because I don't see how it could a pure
Debian problem, but feel free to revert that)


-- System Information:
Debian Release: stretch/sid
  APT prefers unstable
  APT policy: (500, 'unstable'), (500, 'stable'), (500, 'oldstable'), (1, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386, armhf

Kernel: Linux 4.12.0-1-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash
Init: systemd (via /run/systemd/system)

Versions of packages zfs-dkms depends on:
ii  debconf      1.5.63
ii  dkms         2.3-3
ii  lsb-release  9.20160629
ii  spl-dkms     0.6.5.11-1

Versions of packages zfs-dkms recommends:
ii  zfs-zed         0.6.5.11-1
ii  zfsutils-linux  0.6.5.11-1

zfs-dkms suggests no packages.

-- debconf information:
  zfs-dkms/stop-build-for-32bit-kernel: true
* zfs-dkms/note-incompatible-licenses:
  zfs-dkms/stop-build-for-unknown-kernel: true



More information about the Pkg-zfsonlinux-devel mailing list