Bug#840056: shibboleth-sp2-utils: upgrade attempt of shibboleth-sp2-utils gets hung at restart of shibd service

Tue Oct 11 10:22:42 UTC 2016

"S. Banerian" <banerian at u.washington.edu> writes:

> On 10/09/2016 05:25 PM, Ferenc Wágner wrote:
>
>> "S. Banerian" <banerian at u.washington.edu> writes:
>> 
>>> On 10/07/2016 02:04 PM, Ferenc Wágner wrote:
>>>
>>>> Could you please make sure shibd isn't running
>>>> then show me the output of
>>>>
>>>> # sudo -u _shibd strace shibd -f -F
> [...]
> after some 12 hours of trying to start, failing, it finally started,
> created shibd.sock, and under a test, worked.

Was this the doing of a single invocation of the above, or do you refer
to systemd continuously trying to restart it and succeeding eventually?

>> Can you provide a full GDB backtrace (after installing
>> shibboleth-sp2-utils-dbgsym; please yell if you need precise
>> instructions).
>
> does not appear to be in stretch. so i need the instructions.

It is in a separate archive, see
https://wiki.debian.org/AutomaticDebugPackages.  But let's exclude the
simple timeout problem beforehand.

>>> Note: prior to the upgrade, shibboleth was working.
>> 
>> Which version of shibboleth was working for you?
>
> the version just prior to this one 2.6.0+dfsg1-3+b1 on stretch.

Do you mean 2.5.6+dfsg1-2?  Your dpkg or apt logs should reveal the
upgraded version.

>> Can you share your shibboleth2.xml?
>
> I'm a bit reluctant to provide some of the information in the
> RequestMapper sections.

If configuring a longer timeout (below) does not help, please check if
you can reproduce the issue without the sensitive parts.

> When I force a restart, systemctl restart shibd.service I get the issue
> as before, where
>
> \_ /bin/systemd-tty-ask-password-agent --watch
>
> stays there for a looong time, and is not returning, systemctl says it
> is started, but journalctl -xe gives:
>
> Oct 10 14:00:35 epics systemd[1]: shibd.service: Killing process 30980
> (shibd) with signal SIGKILL.
> Oct 10 14:00:35 epics systemd[1]: shibd.service: Main process exited,
> code=killed, status=9/KILL
> Oct 10 14:00:35 epics systemd[1]: Failed to start Shibboleth Service
> Provider Daemon.
> -- Subject: Unit shibd.service has failed

This really does not make much sense together...  And I can't see any
systemd-tty-ask-password-agent processes at all for some reason.

> there is a shibd -f -F process running, but no shibd.sock file

Are you sure that process isn't from some manual start attempt?  Also,
if you start an instance manually while systemd's still trying to
occasionally restart shibd in the background, the socket may get lost.

So, first of all, tell systemd to stop shibd and wait for it:

# systemctl stop shibd

Then you should see something like:

# systemctl status shibd
[...]
   Active: inactive (dead) [...]
[...]
 Main PID: 360 (code=exited, status=0/SUCCESS)
[...]
Oct 11 11:34:39 elm systemd[1]: Stopped Shibboleth Service Provider Daemon.

Then start it manually:

# date; sudo -u _shibd /usr/sbin/shibd -f -F

Meanwhile check /var/log/shibboleth/shibd.log for progress; the
timestamps should tell you where time was spent.

> I'm not convinced that systemd is behaving well.

Maybe it is, just the default start timeut (90s) is too short for your
metadata setup.  Try setting it longer like:

# mkdir /etc/systemd/system/shibd.service.d
# printf '[Service]\nTimeoutStartSec=5min\n' >/etc/systemd/system/shibd.service.d/timeout.conf
# systemctl daemon-reload
# systemctl cat shibd
[you should see the result at then end of output]

Make sure to Ctrl-C your manually started shibd process if it's still
running before starting the systemd shibd service.

> with the attempt to perform
> systemctl restart shibd.service
> I'm now seeing the CPU at 100% and memory (but not yet swap) near 100% also.
> and no shibd.sock.

Yes, the startup phase of shibd can consume lots of resources (Dynamic
MetadataProvider can help with this).  And the default timeout changed
from 5min to 1.5min in this upgrade, which might cause your problems.
-- 
Feri