Bug#896152: systemd/dbus hanging on stretch updates

Christoph Berg christoph.berg at credativ.de
Fri Apr 20 12:05:41 BST 2018


Re: Simon McVittie 2018-04-20 <20180420100827.GA27046 at espresso.pseudorandom.co.uk>
> Is it possible to take a snapshot (maybe as a VM) of a
> similarly-configured machine, and reproduce this repeatedly by upgrading
> from the snapshot?

There's no machines left to try it on, unfortunately.

> > There's nothing in the logs which looks like the cause, but maybe I
> > haven't seen it among the plethora of messages. (I can provide logs if
> > it helps.)
> 
> Searching the syslog or systemd Journal for messages involving dbus
> (or systemd, or other things that use D-Bus) might be informative?

Mar 18 09:02:09 pgdgbuild systemd[1]: session-c56894.scope: Failed to add PIDs to scope's control group: No such process
Mar 18 09:02:09 pgdgbuild systemd[1]: Failed to start Session c56894 of user root.
Mar 18 09:02:09 pgdgbuild systemd[1]: session-c56894.scope: Unit entered failed state.
... and this message block has been repeating since then.

Apr 20 09:37:03 pgdgbuild systemd[1]: Failed to start Session c130815 of user root.
Apr 20 09:37:03 pgdgbuild systemd[1]: session-c130815.scope: Unit entered failed state.
Apr 20 09:37:09 pgdgbuild systemd[1]: approx.socket: Failed to queue service startup job (Maybe the service file is missing or not a template unit?): Argument list too long
Apr 20 09:37:09 pgdgbuild systemd[1]: approx.socket: Unit entered failed state.
Apr 20 09:38:22 pgdgbuild systemd[1]: Failed to set up mount unit: Argument list too long
Apr 20 09:38:26 pgdgbuild systemd[1]: Failed to set up mount unit: Argument list too long
Apr 20 09:38:26 pgdgbuild systemd[1]: Failed to set up mount unit: Argument list too long
Apr 20 09:38:26 pgdgbuild systemd[1]: Failed to set up mount unit: Argument list too long
Apr 20 09:38:45 pgdgbuild systemd[1]: Failed to set up mount unit: Argument list too long
Apr 20 09:39:23 pgdgbuild systemd[1]: Reloading.
Apr 20 10:25:20 pgdgbuild systemd[1]: Failed to reload: Argument list too long
Apr 20 10:25:22 pgdgbuild systemd[1]: Failed to send queued message: Transport endpoint is not connected
Apr 20 10:25:22 pgdgbuild systemd[1]: Starting Daily apt download activities...
Apr 20 10:25:23 pgdgbuild systemd[1]: Started Daily apt download activities.
Apr 20 10:25:23 pgdgbuild systemd[1]: apt-daily.timer: Adding 11h 25min 42.050196s random time.
Apr 20 10:25:23 pgdgbuild systemd[1]: apt-daily.timer: Adding 3h 1min 156.224ms random time.
Apr 20 10:43:41 pgdgbuild systemd[1]: Reexecuting.
Apr 20 10:43:55 pgdgbuild systemd[1]: systemd 232 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN)
Apr 20 10:43:55 pgdgbuild systemd[1]: Detected virtualization vmware.
Apr 20 10:43:55 pgdgbuild systemd[1]: Detected architecture x86-64.
Apr 20 11:24:09 pgdgbuild systemd-modules-load[256]: Inserted module 'loop'
Apr 20 11:24:09 pgdgbuild systemd[1]: Starting Flush Journal to Persistent Storage...
Apr 20 11:24:09 pgdgbuild systemd[1]: Started Flush Journal to Persistent Storage.

> Is the dbus-daemon responsive (before starting the upgrade) in the
> sense that you can connect to it? One easy way to try this is to run

I believe it (systemctl) was, but I can't say for sure.

> "dbus-monitor --system". If you see "dbus-monitor: unable to enable
> new-style monitoring: org.freedesktop.DBus.Error.AccessDenied"
> as non-root, or if you see it logging a NameAcquired message, then
> everything is working as it should; use Ctrl+C to exit.

> You might be able to get some useful information from commands like these
> (as root):
> 
> ls -l /proc/$(pgrep -f "dbus-daemon --system")/fd
> cat /proc/$(pgrep -f "dbus-daemon --system")/status
> dbus-send --system --dest=org.freedesktop.DBus --print-reply /org/freedesktop/DBus org.freedesktop.DBus.Debug.Stats.GetStats
> dbus-send --system --dest=org.freedesktop.DBus --print-reply /org/freedesktop/DBus org.freedesktop.DBus.Debug.Stats.GetConnectionStats string:org.freedesktop.systemd1

I'll try that next time, thanks for the commands.

> > xymon      360  0.0  0.0      0     0 ?        Z    10:42   0:00 [sh] <defunct>
> > sshd       470  0.0  0.0      0     0 ?        Z    10:44   0:00 [sshd] <defunct>
> 
> I wonder whether these zombies are relevant?

On the other system I was just seeing this problem, systemd didn't
reap the dbus zombie after I had killed it, so this is probably this
same systemd (?) issue.

Sorry for the diffuse report - I know too little about dbus and the
ecosystem around it.

Christoph
-- 
Senior Berater, Tel.: +49 2166 9901 187
credativ GmbH, HRB Mönchengladbach 12080, USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
pgp fingerprint: 5C48 FE61 57F4 9179 5970  87C6 4C5A 6BAB 12D2 A7AE




More information about the Pkg-systemd-maintainers mailing list