[Parted-maintainers] Bug#762236: parted: [kfreebsd] use of kern.geom.debugflags is unsafe

Steven Chamberlain steven at pyro.eu.org
Fri Sep 19 20:21:36 UTC 2014


Package: parted
Version: 3.2-5
Severity: grave

Hi,

I found during d-i testing, that ZFS pools sometimes suffer I/O errors
and go offline;  this causes partman-zfs to hang.

I suspect that in non-interactive situations, using parted to merely
view the partition table could crash active, mounted filesystems on that
disk.

This took *many* hours to figure out but simplifies to:

  * parted's interface is designed so that a disk is 'opened' once,
    various get/set operations can happen, changes may be committed
    and then the fd is closed;  therefore, disks are always opened
    with O_RDWR mode in case a set operation might need to make changes

  * opening O_RDWR and closing a disk device on FreeBSD, causes its
    partition tables to be rescanned (so, a 'commit' is implied -
    this also seems to be the cause of major slowness of partman in
    kfreebsd d-i, because it does this 'about' a hundred times in total)

  * the CAM layer handles this as a MEDIACHANGE event, it will DESTROY
    /dev devices for the disk partitions, and when scanning is complete,
    CREATE them again (perhaps similar to udev?)

  * an active ZFS pool may be doing things asynchronusly and try to
    read or write one of these devices when it disappears;  I/O fails
    and the disk will be marked as faulty;  if there is no redundancy,
    the entire pool will be offlined;  I/O on these devices will hang
    until the fault is cleared and pool manually onlined

  * parted shouldn't be *able* to open a disk with O_RDWR if there
    are mounted filesystems;  it uses kern.geom.debugflags "foot
    shooting" mode in order to do this unsafely

  * in the d-i partman use-case, "foot shooting" mode doesn't seem to
    be needed, because the disks are not active at the partitioning
    step so they can be opened O_RDWR anyway;  the partition table is
    finalised when partman-zfs configuration starts;  it is already
    assumed you cannot go back and make changes after this
    (partman-zfs is modelled on partman-lvm;  you cannot set up LVM or
    a RAID then expect to repartition the component disks)

  * in non-interactive situations, an administrator could modify
    kern.geom.debugflags before using parted on an active disk, if
    they still really want to carry out risky operations

So what I'm suggesting is that we just don't use kern.geom.debugflags
to set "foot shooting" mode in parted any more - unless someone can
give a really good reason why we do need to (and then help me find
another solution to all of the above ;).

I did test this in a jessie d-i install onto ZFS.  I found only two
problems:

  * when opening O_RDWR fails, it falls back to O_RDONLY but calls
    this an 'exception' with 'warn' severity that we would need to
    inhibit (or d-i will be interrupted by an Ignore/Cancel dialog
    when partman next looks at the partition table)

  * if some partman component does try to make changes after
    partman-zfs, that will now fail with an Ignore/Cancel dialog, but
    I'd consider it a bug if something does try to do this;
    update.d/swap was the only example I found, and that is in fact due
    to a bug which I'll file a separate report for

Thanks for reading!

-- System Information:
Debian Release: jessie/sid
  APT prefers unstable
  APT policy: (500, 'unstable'), (500, 'stable')
Architecture: kfreebsd-amd64 (x86_64)

Kernel: kFreeBSD 9.0-2-amd64-xenhvm-ipsec
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash



More information about the Parted-maintainers mailing list