[Pkg-sssd-devel] Bug#932080: sssd fails to start: #886483 raises from the dead

Sam Morris sam at robots.org.uk
Mon Jul 15 14:47:19 BST 2019


Control: tag -1 + patch upstream
Control: forward -1 

On Sun, 14 Jul 2019 20:19:40 +0100 =?utf-8?b?SnVoYSBKw6R5a2vDpA==?= <juhaj at iki.fi> wrote:
> Second failure mode is triggered by trying the obvious: commenting out
> the whole "service" line from sssd.conf. However, now sssd fails both
> from command line and from systemd because
> 
> "sssd: SSSD couldn't load the configuration database [22]: Invalid argument."
> 
> There does not seem to be any way of disabling all non-socket services
> but if at least one non-socket service is active, systemd will time out
> trying to load the corresponding socket.

This is because 'services' is mandatory if sssd is built without
HAVE_SYSTEMD. A typo in src/external/systemd.m4 prevents HAVE_SYSTEMD
being set, which removes the code allowing 'services' to be optional.
It also removes the code that notifies systemd that sssd is ready,
which is why systemd kills it after 90 seconds and marks the unit as
failed.

If you run 'journalctl -u sssd-nss.service', you should see the
following error:

    Invalid option --socket-activated: unknown option

This is the reason that the socket-activated services that _aren't_
explicitly listed in 'services' fail to start.

I have fixed this in
<
https://salsa.debian.org/sssd-team/sssd/merge_requests/5>.

I also removed 'services' from sssd.conf to allow nss and pam to be
socket-activated by default (their corresponding socket units check
sssd.conf and fail to start if they are mentioned in 'services').

While debugging this I made a few observations:

Duplicate sssd_pac process: I've noticed that sssd always starts and
babysits an sssd_pac process even though it's not configured to do so.
systemd starts its own instance under sssd-pac.service; I presume
there's a race here over which instance is invoked when I finally log
in. Both processes hold open the database and the log file. I don't
know if there's potential for corruption here, but I don't like it!
According to the add_implicit_services function, this happens because I
have a domain using the IPA provider; I guess this code needs to be
updated to take into account the possibility of a socket-activated pac
responder process. I'll raise this upstream.

sssd resiliance: without sssd running, it's impossible to log into the
system or use sudo to fix it. Shouldn't sssd use Restart=always to
ensure that it comes back if it crashes?

dbus crash: while upgrading sssd to version 2, and later on while I was
installing my patched package with dpkg, dbus-daemon logged the message
"unable to reload configuration: (null)" and then segfaulted ("segfault
at 0 ip 000055a507dfd700 sp 00007ffeed3d3638 error 6 in dbus-
daemon[55a507de7000+22000]" according to the kernel). Sadly systemd-
coredump didn't record a core file. Did dbus-daemon perhaps tried to
reload the config while dpkg was still writing it out?

sssd_pac crash: sssd_pac dies after about five minutes with the "The
last reference on a connection was dropped without closing the
connection. This is a bug in an application. [...] Most likely, the
application called unref() too many times and removed a reference
belonging to libdbus, since this is a shared connection." If this
persists after a reboot I'll file a separate bug for it.

-- 
Sam Morris <sam at robots.org.uk>



More information about the Pkg-sssd-devel mailing list