[Parted-maintainers] Bug#762236: parted: [kfreebsd] use of kern.geom.debugflags is unsafe
Steven Chamberlain
steven at pyro.eu.org
Fri Sep 19 20:21:36 UTC 2014
Package: parted
Version: 3.2-5
Severity: grave
Hi,
I found during d-i testing, that ZFS pools sometimes suffer I/O errors
and go offline; this causes partman-zfs to hang.
I suspect that in non-interactive situations, using parted to merely
view the partition table could crash active, mounted filesystems on that
disk.
This took *many* hours to figure out but simplifies to:
* parted's interface is designed so that a disk is 'opened' once,
various get/set operations can happen, changes may be committed
and then the fd is closed; therefore, disks are always opened
with O_RDWR mode in case a set operation might need to make changes
* opening O_RDWR and closing a disk device on FreeBSD, causes its
partition tables to be rescanned (so, a 'commit' is implied -
this also seems to be the cause of major slowness of partman in
kfreebsd d-i, because it does this 'about' a hundred times in total)
* the CAM layer handles this as a MEDIACHANGE event, it will DESTROY
/dev devices for the disk partitions, and when scanning is complete,
CREATE them again (perhaps similar to udev?)
* an active ZFS pool may be doing things asynchronusly and try to
read or write one of these devices when it disappears; I/O fails
and the disk will be marked as faulty; if there is no redundancy,
the entire pool will be offlined; I/O on these devices will hang
until the fault is cleared and pool manually onlined
* parted shouldn't be *able* to open a disk with O_RDWR if there
are mounted filesystems; it uses kern.geom.debugflags "foot
shooting" mode in order to do this unsafely
* in the d-i partman use-case, "foot shooting" mode doesn't seem to
be needed, because the disks are not active at the partitioning
step so they can be opened O_RDWR anyway; the partition table is
finalised when partman-zfs configuration starts; it is already
assumed you cannot go back and make changes after this
(partman-zfs is modelled on partman-lvm; you cannot set up LVM or
a RAID then expect to repartition the component disks)
* in non-interactive situations, an administrator could modify
kern.geom.debugflags before using parted on an active disk, if
they still really want to carry out risky operations
So what I'm suggesting is that we just don't use kern.geom.debugflags
to set "foot shooting" mode in parted any more - unless someone can
give a really good reason why we do need to (and then help me find
another solution to all of the above ;).
I did test this in a jessie d-i install onto ZFS. I found only two
problems:
* when opening O_RDWR fails, it falls back to O_RDONLY but calls
this an 'exception' with 'warn' severity that we would need to
inhibit (or d-i will be interrupted by an Ignore/Cancel dialog
when partman next looks at the partition table)
* if some partman component does try to make changes after
partman-zfs, that will now fail with an Ignore/Cancel dialog, but
I'd consider it a bug if something does try to do this;
update.d/swap was the only example I found, and that is in fact due
to a bug which I'll file a separate report for
Thanks for reading!
-- System Information:
Debian Release: jessie/sid
APT prefers unstable
APT policy: (500, 'unstable'), (500, 'stable')
Architecture: kfreebsd-amd64 (x86_64)
Kernel: kFreeBSD 9.0-2-amd64-xenhvm-ipsec
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
More information about the Parted-maintainers
mailing list