[Pkg-sssd-devel] Bug#946847: sssd_be: Busy loops on flaky LDAP, SIGTERM from watchdog not processed
Dominik George
natureshadow at debian.org
Mon Dec 16 14:12:20 GMT 2019
Package: sssd
Version: 2.2.2-1+b1
Severity: important
Tags: upstream
In a setup with sssd using a remote slapd for NSS, and a somewhat flaky
network in between, sssd_be tends to get into a busy loop sometimes, using
100% CPU time on one core.
Debugging showed that sssd has a watchdog to clean up in such cases, but
sssd_be installs a signal handler that prevents the SIGTERM on the
processgroup to be processed correctly, and does not exit.
src/util/util_watchdog.c:
64 /* the watchdog is purposefully *not* handled by the tevent
65 * signal handler as it is meant to check if the daemon is
66 * still processing the event queue itself. A stuck process
67 * may not handle the event queue at all and thus not handle
68 * signals either */
69 static void watchdog_handler(int sig)
70 {
71
72 watchdog_detect_timeshift();
73
74 /* if a pre-defined number of ticks passed by kills itself */
75 if (__sync_add_and_fetch(&watchdog_ctx.ticks, 1) > WATCHDOG_MAX_TICKS) {
76 if (getpid() == getpgrp()) {
77 kill(-getpgrp(), SIGTERM);
78 } else {
79 _exit(1);
80 }
81 }
82 }
(NB. Seems what is described in the comment was not all too successful ;)
The signal handler is installed in src/providers/data_provider_be.c:
448 static void be_process_finalize(struct tevent_context *ev,
449 struct tevent_signal *se,
450 int signum,
451 int count,
452 void *siginfo,
453 void *private_data)
454 {
455 struct be_ctx *be_ctx;
456
457 be_ctx = talloc_get_type(private_data, struct be_ctx);
458 talloc_free(be_ctx);
459 orderly_shutdown(0);
460 }
461
462 static errno_t be_process_install_sigterm_handler(struct be_ctx *be_ctx)
463 {
464 struct tevent_signal *sige;
465
466 BlockSignals(false, SIGTERM);
467
468 sige = tevent_add_signal(be_ctx->ev, be_ctx, SIGTERM, SA_SIGINFO,
469 be_process_finalize, be_ctx);
470 if (sige == NULL) {
471 DEBUG(SSSDBG_CRIT_FAILURE, "tevent_add_signal failed.\n");
472 return ENOMEM;
473 }
474
475 return EOK;
476 }
Setting a breakpoint on be_process_finalize showed that this function is
never reached, probably because libtevent never gets around to calling it.
Two proposals to circumvent this are:
a) Reset the handler before calling kill on the process group in line 77
(e.g. signal(SIGTERM, SIG_DFL);)
b) Move the exit call in line 79 out of the branch so it gets called unconditionally
in case kill() fails to kill the process itself
We tested solution a) in gdb and it caused sssd_be to exit cleanly and
restart, as it should.
Cheers,
Nik
Analysis was sponsored by Teckids e.V. and tarent solutions GmbH.
-- System Information:
Debian Release: bullseye/sid
APT prefers testing-debug
APT policy: (500, 'testing-debug'), (500, 'testing')
Architecture: amd64 (x86_64)
Kernel: Linux 5.3.0-2-amd64 (SMP w/4 CPU cores)
Locale: LANG=de_DE.UTF-8, LC_CTYPE=de_DE.UTF-8 (charmap=UTF-8), LANGUAGE=de_DE.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled
Versions of packages sssd depends on:
ii python3-sss 2.2.2-1+b1
ii sssd-ad 2.2.2-1+b1
ii sssd-common 2.2.2-1+b1
ii sssd-ipa 2.2.2-1+b1
ii sssd-krb5 2.2.2-1+b1
ii sssd-ldap 2.2.2-1+b1
ii sssd-proxy 2.2.2-1+b1
sssd recommends no packages.
sssd suggests no packages.
-- no debconf information
More information about the Pkg-sssd-devel
mailing list