[pkg-go] Bug#1078205: autopkgtest-virt-podman: document how to give systemd CAP_SYS_ADMIN

Sat Aug 10 17:58:20 BST 2024

Control: reassign -1 autopkgtest
Control: retitle -1 autopkgtest-virt-podman: document how to give systemd CAP_SYS_ADMIN
Control: severity -1 wishlist
Control: forwarded -1 https://salsa.debian.org/ci-team/autopkgtest/-/merge_requests/396
Control: tags -1 + help

On Sat, 10 Aug 2024 at 11:24:51 -0400, Reinhard Tartler wrote:
> I personally find that wording a bit too strong. How about something like
> this:
> 
...
> >     However, this also introduces an additional
> >     attack surface in the
> >     kernel if malicious code tried to escape the container sandbox.

Are you thinking here of malicious code in a systemd service inside the
container trying to escape from systemd's sandboxing to be privileged
within the podman container, or are you thinking about malicious code
inside the podman container (possibly as unconfined root) escaping
from the podman container to harm the host system?

In the autopkgtest use-case, I think in general we trust (or distrust!)
everything inside the podman container equally: they're all coming from
the same apt source(s), and can execute arbitrary code as container root
via their maintainer scripts. The only reason that systemd's sandboxing
of system services matters to us is that one of the things we ideally
want to test is that the maintainer of the package under test didn't
configure overly-strict systemd sandboxing that breaks their service's
intended functionality.

> > (It might also be appropriate to add a shorthand form for that, to avoid
> > needing to use the "pass arbitrary options to podman-run" mechanism,
> > but that would need some more design to choose a suitable name for
> > that option. --trust-root-in-testbed, perhaps, if my understanding of
> > the impact of CAP_SYS_ADMIN is correct.)
> 
> I'd love to see such a shortcut, but it is not obvious to my how to name it.
> Your suggestion seems too strong to me, because there are typically still
> other
> security features in play, such as seccomp, selinux or apparmor.

Yes, hence my question about how dangerous it is to allow CAP_SYS_ADMIN.

One way to phrase it is: are the other security mechanisms that podman
uses (seccomp and LSMs) meant to be sufficiently strong that, if container
root can escape to the host (even with CAP_SYS_ADMIN), that would justify
a CVE in either podman or the kernel? If yes, then denying CAP_SYS_ADMIN
is just hardening, rather than being security-critical in its own right.

> Re-reading through https://github.com/systemd/systemd/issues/29860 clarifies
> that systemd has a number of additional security hardening features, such as
> DynamicUsers, but also things like PrivateDevices=`, `ProtectHome=`,
> `ProtectSystem=`, `MountFlags=`, `PrivateTmp=`, `ReadWriteDirectories=`,
> `ReadOnlyDirectories=`, `InaccessibleDirectories=`, and `MountFlags=`.

Yes. Some of these are orthogonal to CAP_SYS_ADMIN; some of them need
CAP_SYS_ADMIN to be effective, but are automatically disabled (with
a warning) in its absence; and some need CAP_SYS_ADMIN, and service
startup fails in its absence. It's the inconsistency between those
last two categories that initially led me to think that this could be
a systemd bug.

> > If nothing is going to be done about this in systemd, and nothing can be
> > done about it in podman, then it'll probably have to end up as a
> > documentation improvement in autopkgtest-virt-podman(1).
> 
> I tend to agree.

Reassigning to autopkgtest(-virt-podman) for that.

> I personally would be comfortable running containers
> that have systemd inside with CAP_SYS_ADMIN because that is closer to
> how systemd runs on a real system. Also, podman provides other additional
> security features, such as seccomp and apparmor/selinux.

Thanks, that's a useful data point.

    smcv