[pkg-go] systemd: can't start polkitd in a podman container without CAP_SYS_ADMIN

Thu Aug 8 12:28:33 BST 2024

Control: reassign -1 podman 5.0.3+ds1-5

On Thu, 08 Aug 2024 at 11:33:53 +0100, Luca Boccassi wrote:
> On Thu, 8 Aug 2024 09:20:34 +0100 Simon McVittie <smcv at debian.org>
> wrote:
> > Forwarded: https://github.com/systemd/systemd/issues/29860
> 
> The linked issue already says it all, this is an issue in podman.

My reading of the linked issue was that the clearest it got was "it's
unclear whether this is supported or unsupported", but perhaps I missed
something there.

Context for podman maintainers and debian-ci@:

autopkgtest-{build,virt}-podman has a mode where it runs a podman container
with systemd (or sysvinit) as its init system, instead of having the
"payload" command just run from a shell or a minimal init. This is
necessary if we want to test that system services like polkitd work
correctly when run by the init system, and makes a-v-podman a viable
replacement for a-v-lxc.

When running a container that has systemd as its init system, systemd
strongly recommends, but does not require, retaining CAP_SYS_ADMIN
(as per <https://systemd.io/CONTAINER_INTERFACE/>).

Some systemd hardening features (for example ProtectHostname=yes)
cannot work if CAP_SYS_ADMIN is not in the capability bounding set, in
which case systemd automatically disables those features. This weakens
the security boundary between the container's init and the container's
system services, but they still work.

Other systemd hardening features (for example MemoryDenyWriteExecute=yes)
cannot work if CAP_SYS_ADMIN is not in the capability bounding set, and
systemd does not automatically disable them: instead, the service just
fails to start. polkitd is an example of a package that uses several of
these features.

It is possible to run systemd + polkitd inside a podman container by
running it as "podman run ... --cap-add=CAP_SYS_ADMIN", but I am unsure
whether this undermines or defeats podman's security model. A question
for the podman maintainers: what is the security impact of adding that
option? I'm guessing that the answer is one of these:

- maybe it's a sandbox escape that allows the container payload to execute
  arbitrary code as the user running podman, and this is by design, so
  reporting it as a security vulnerability would be wontfix'd?

- or maybe it weakens podman's security hardening, but is not meant to
  result in a sandbox escape because other factors (e.g. use of userns
  and/or seccomp) are meant to protect the host system, so if there was
  a sandbox escape then it would be treated as a vulnerability?

- or is there a better way to run a container that has a full systemd init
  system under podman?

My goal here is to have a way that we can run a container with systemd
init on an ordinary Debian system, certainly without introducing a
root security hole on the host system if the uid 0 of the container is
malicious or compromised, and ideally also without allowing access to the
user account that is running podman (similar to the security properties
that we'd expect from running the potentially malicious/compromised
payload inside qemu/kvm).

If the answer is something like "you can use --cap-add=CAP_SYS_ADMIN,
it isn't a security hole" then we can reassign this to autopkgtest,
as a request to make `autopkgtest-virt-podman --init` automatically add
that option.

Thanks,
    smcv