Bug#1034392: tomcat9: jstack/jcmd broken for non-root users with tomcat9+jdk11 or greater

Per Lundberg per.lundberg at hibox.tv
Fri Apr 14 07:31:19 BST 2023


Package: tomcat9
Version: 9.0.43-2~deb11u6
Severity: normal
X-Debbugs-Cc: sebastian.lovdahl at hibox.tv

Hi,

We noticed while rolling out JDK 17 support for our in-house application
that the following command is "broken" (moral-martin is an LXD container
in my examples below, PID 4108 is the tomcat9 java process):

    root at moral-martin:~# lsb_release -a
    No LSB modules are available.
    Distributor ID:	Debian
    Description:	Debian GNU/Linux 11 (bullseye)
    Release:	11
    Codename:	bullseye

    root at moral-martin:~# sudo -u tomcat jstack 4108
    4108: Unable to open socket file /proc/4108/root/tmp/.java_pid4108: target process 4108 doesn't respond within 10500ms or HotSpot VM not loaded

...when all following conditions are met:

* tomcat9 is running from systemd, _and_
* the JDK is of version 11 or greater, _and_
* the systemd unit (/lib/systemd/system/tomcat9.service) sets
  AmbientCapabilities=CAP_NET_BIND_SERVICE (which is done by the Debian
  package)

We have spent a significant amount of time debugging this and I'll try
to do my best to summarize our findings here:

The problem is that the way jstack and similar tools work have changed
from JDK8 to JDK11. In JDK8, it simply uses /tmp to try and communicate
with the target process:
https://github.com/AdoptOpenJDK/openjdk-jdk8u/blob/master/jdk/src/solaris/classes/sun/tools/attach/LinuxVirtualMachine.java#L40-L45
and https://github.com/AdoptOpenJDK/openjdk-jdk8u/blob/master/jdk/src/solaris/classes/sun/tools/attach/LinuxVirtualMachine.java#L293

In newer JDK versions (JDK 17 as an example), the code has been made
"smarter" to support mount namespaces:
https://github.com/openjdk/jdk17u/blob/master/src/jdk.attach/linux/classes/sun/tools/attach/VirtualMachineImpl.java#L299-L302

_However_... bear with me, this is where it gets interesting: this
presumes that the calling process can access /proc/<pid>/root/tmp. When
AmbientCapabilities=CAP_NET_BIND_SERVICE is set in the systemd unit,
this is not the case:

    root at moral-martin:~# sudo -u tomcat ls -l /proc/4108/root
    ls: cannot read symbolic link '/proc/4108/root': Permission denied
    lrwxrwxrwx 1 tomcat tomcat 0 Apr 13 12:55 /proc/4108/root

We have tested this and concluded that:

1. This happens whever _any_ capability is set in the systemd unit; it's
   not limited to CAP_NET_BIND_SERVICE. (Note: I haven't tested adding
   all possible capabilities yet; I believed I had but when writing this
   bug report I realize that my attempt at setting all of them didn't
   actually list all of them in `getpcaps pid`; will test this a bit
   more and see if it makes any difference)

2. When you remove AmbientCapabilities or set it to AmbientCapabilities=
   (empty string), it also works correctly.

I honestly don't know if tomcat9 is the correct package to report this
to; it can also be seen as a bug in the JDK. (We will work with the JDK
maintainers to get this reported to them as well.) Feel free to reassign
the bug report to another package.

With JDK 8, this works correctly. Some of our tooling/monitoring is
dependent on being able to connect to Tomcat (running on JDK 8 or 17) at
runtime. That's why this is imporant for us.

Workaround

What we have seen that on JDK 17, running `jstack` as root works; this
will connect to the target process correctly. However, this does _not_
work on JDK 8 and doesn't seem to work properly on JDK 11 either (I
think this has been fixed upstream in JDK for more recent JDK versions,
which is why it behaves differently on JDK 17). 

Our application supports both JDK 8 and 17 for now, and running `jstack`
as root *does not* work on JDK 8. Hence, having to run it as root with
our JDK 17-based installations (only) makes things unnecessarily
complex.

Conclusions

It puzzles me why setting the ambient capabilities for a process breaks
this. It's uncertain whether this is a "feature" by the kernel or
elsewhere. We have tried to find more details about this by studying the
systemd and dbus code to a certain extent, but have yet been unable to
find anything. If anyone reading this knows the prctl and cap_set_proc
semantics by heart, your help would be greatly appreciated.

Best regards,
Per

-- System Information:
Debian Release: bookworm/sid
  APT prefers testing-security
  APT policy: (500, 'testing-security'), (500, 'testing')
Architecture: amd64 (x86_64)

Kernel: Linux 6.1.0-6-amd64 (SMP w/20 CPU threads; PREEMPT)
Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE=en_US:en
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages tomcat9 depends on:
ii  lsb-base                    11.6
ii  systemd [systemd-tmpfiles]  252.6-1
ii  sysvinit-utils [lsb-base]   3.06-2
pn  tomcat9-common              <none>
ii  ucf                         3.0043+nmu1

Versions of packages tomcat9 recommends:
ii  libtcnative-1  1.2.35-1

Versions of packages tomcat9 suggests:
pn  tomcat9-admin     <none>
pn  tomcat9-docs      <none>
pn  tomcat9-examples  <none>
pn  tomcat9-user      <none>



More information about the pkg-java-maintainers mailing list