[Pkg-swan-devel] Bug#781209: postinst execution order bug confuses systemd
Faidon Liambotis
paravoid at debian.org
Thu Mar 26 02:29:02 UTC 2015
Package: strongswan-starter
Version: 5.2.1-5
Severity: grave
strongswan-starter currently ships:
- /etc/init.d/ipsec
- /lib/systemd/system/strongswan.service
With the latter containing Alias=ipsec.service and also calling the
ipsec binary with --nofork as an (implicit) Type=simple unit. This is
all a bit confusing at start but pretty sane in general and the
strongswan rename is a nice move (and also consistent with Ubuntu).
The package's postinst, however, is buggy: it does not use
dh_installinit but calls invoke-rc.d ipsec manually. That would have been
fine, but invoke-rc.d ipsec is called *before* the
dh_systemd_enable/deb-systemd-helper bits.
This means that "invoke-rc.d ipsec start" runs before the systemd unit
is properly installed, which in turn confuses the hell out of systemd
(as, among others, it expects a Type=simple unit), as evidenced by the
following commands run in sequence:
# apt-get install strongswan
[...]
# systemctl status strongswan
● strongswan.service - strongSwan IPsec IKEv1/IKEv2 daemon using ipsec.conf
Loaded: loaded (/lib/systemd/system/strongswan.service; enabled)
Active: active (running) since Thu 2015-03-26 00:50:42 UTC; 6min ago
CGroup: /system.slice/ipsec.service
├─5150 /usr/lib/ipsec/starter --daemon charon
└─5151 /usr/lib/ipsec/charon --use-syslog
[note how starter has been called without --nofork and there is a CGroup called
"ipsec.service", despite the unit called "strongswan.service"]
# systemctl restart strongswan
# systemctl status strongswan
● strongswan.service - strongSwan IPsec IKEv1/IKEv2 daemon using ipsec.conf
Loaded: loaded (/lib/systemd/system/strongswan.service; enabled)
Active: inactive (dead) since Thu 2015-03-26 01:00:59 UTC; 2s ago
Process: 5783 ExecStart=/usr/sbin/ipsec start --nofork (code=exited, status=0/SUCCESS)
Main PID: 5783 (code=exited, status=0/SUCCESS)
Mar 26 01:00:59 curium systemd[1]: Started strongSwan IPsec IKEv1/IKEv2 daemon using ipsec.conf.
Mar 26 01:00:59 curium ipsec_starter[5783]: Starting strongSwan 5.2.1 IPsec [starter]...
Mar 26 01:00:59 curium ipsec_starter[5783]: charon is already running (/var/run/charon.pid exists) -- skipping daemon start
Mar 26 01:00:59 curium ipsec[5783]: Starting strongSwan 5.2.1 IPsec [starter]...
Mar 26 01:00:59 curium ipsec[5783]: charon is already running (/var/run/charon.pid exists) -- skipping daemon start
Mar 26 01:00:59 curium ipsec[5783]: starter is already running (/var/run/starter.charon.pid exists) -- no fork done
[note the inactive/dead after a restart!]
# ps aux |grep ipsec
root 5150 0.0 0.0 17144 968 ? Ss 00:50 0:00 /usr/lib/ipsec/starter --daemon charon
root 5151 0.0 0.0 1275680 5416 ? Ssl 00:50 0:00 /usr/lib/ipsec/charon --use-syslog
Those are lingering/orphan processes, unmanaged by systemd. This won't
happen every time -- it's a race but reproducible, I've managed to
recreate it 5 times here already on two different servers. 19 times out
of 20, no process will stay behind; ipsec won't be running at all, which
is also a bug.
The remaining 1 time, though, the service stays out of systemd's control
and remains unmanageable; systemd thinks it's dead but it really is
running. This is a) confusing to the sysadmin b) means that reloads will
fail, c) means that a package removal won't actually stop the daemons,
d) that tools such as puppet will try to restart it again and again but
failing to do so.
More importantly, though, it triggers a secondary bug in systemd itself.
Continuing right from the execution path above:
# ipsec stop
Stopping strongSwan IPsec...
# grep systemd /var/log/syslog | tail -3
Mar 26 01:02:15 curium systemd[1]: Assertion 'path' failed at ../src/shared/cgroup-util.c:913, function cg_is_empty_recursive(). Aborting.
Mar 26 01:02:15 curium systemd[1]: Caught <ABRT>, dumped core as pid 6916.
Mar 26 01:02:15 curium systemd[1]: Freezing execution.
# systemctl status
^C
At that point, the system barely works; systemctl etc. are not
responding.
I'll be filing the latter separately against systemd. However, the
strongswan's postinst is buggy nevertheless and creates a situation
uncommon enough to trigger this cascaded failure.
Regards,
Faidon
More information about the Pkg-swan-devel
mailing list