[Pkg-systemd-maintainers] Bug#739593: Bug#739593: closed by Michael Stapelberg <stapelberg at debian.org> (Re: Bug#739593: systemd makes / shared by default)

Lennart Poettering lennart at poettering.net
Thu Feb 27 18:50:39 GMT 2014


On Thu, 27.02.14 10:36, Michael Stapelberg (stapelberg at debian.org) wrote:

> >> The bit I was missing here is that I can run "mount --make-rprivate /"
> >> *inside* the CLONE_NEWNS namespace, so that it doesn't modify the
> >> system's global state, but just what I am seeing. (Does anyone
> >> actually understand these semantics?!)
> >
> > I think I had to read sharedsubtree.txt about a dozen times before I
> > understood it, so you're not the only one left wanting better
> > documentation. :)
>
> Lennart, we are considering disabling the code in systemd which makes /
> shared by default so that we follow the kernel default.

Hmm? Why would you do that?

> I’d be interested in your comments on that, especially in the context of
> this bugreport (see http://bugs.debian.org/739593 for full history).

If you open your own mount namespace and don't want propagation, then
turn off propagation, by remounting the root dir inside the namespace
with MS_REC|MS_SLAVE or suchlike.

We turned the default from PRIVATE to SHARED on request of the container
and security guys, since they want that if you mount something from the
host into a subdir of the container, it should just appear there,
because that's what most people would most likely expect. Or, if you use
something like pam_namespace to give users a private /tmp, they should
otherwise see all the mounts popping up/removed as normal.

The kernel default for this is unlikely to change since they argue that
it breaks compatbility, which I kinda agree with. In systemd however, we
thought we'd better pick saner defaults.

I'd strongly recommend not to patch this in Debian. First of all, you'd
break a lot of stuff when using containers, where suddenly mounts on the
host wouldn't propagate anymore to containers, or where using
pam_namespace for /tmp could not work anymore, which would certainly be
confusing. But more importantly you don't actually "fix" anything. You
just switch defaults, and with the new default your specific case might
start working, but for everybody else who changed the default things
would still be broken. And since disassocitation is a one-way street if
you globally disassociate everything you can never reassociate things...

Or to explain this differently:

a) With the default of MS_SHARED for the root dir like systemd sets it up,
   you enable propagation to containers, and those who don't want the
   propagation can opt-out of it for their specific namespace.

   Advantage: you cover all usecases with the default setting. All
   programs will work with both of MS_SHARED and MS_PRIVATE set for /.

   Disadvantage: you might need to patch a package or two to properly
   disassociate their namespace from the host by remounting the root dir
   inside of the namespace with MS_REC|MS_SLAVE as described above.

b) If you patch systemd to go back to MS_PRIVATE for the root dir, you
   disable propagation to containers, and nobody can opt-in to it anymore
   for their specific namespace. 

   Advantage: you don't have to patch those few programs which
   currently assume the root dir is MS_PRIVATE and don't disassociate
   things.

   Disadvantage: the apps are still broken for those who switch to
   MS_SHARED for /. You hence only cover the usecases where people do
   not dissassocitate. You break the usecase where people want the
   propagation to tkae place.

TL;DR: fix the individual processes locally to disassociate their
namespaces. Don't tape over it by making all of them disassociate by
default, breaking those which do not want to disassociate. Because after
disassociation there is no way back.

Oh, and of course, in Fedora and RHEL we'll stick to the MS_SHARED
defaults. Sooner or later we'll patch through all software that assumes
that MS_PRIVATE was the default... Hence, sooner or later we'll fix all
these things for you anyway...

Hope this makes some sense...

Lennart

-- 
Lennart Poettering, Red Hat




More information about the Pkg-systemd-maintainers mailing list