Bug#766943: systemd: server no longer gets networking after switching to systemd

Christoph Anton Mitterer calestyo at scientia.net
Mon Oct 27 05:56:19 GMT 2014


affects 766943 ifupdown
stop

Hey.

Some more information:


I finally came in again and could place some cron script which manually
brings the network up and restarts ssh every 15mins when it can't ping
google.com.... so testing should get a lot easier then.


I also have some additional information, namely on all my systems I use
a custom /etc/systemd/system/netfilter-persistent.service (attached)
since the original one is IMHO not guaranteed to run early enough.
However, this file is unlikely to be connected to this issue, since
removing it (and using the original one again) doesn't change anything.


I did however find what is likely to be the issue:
Changing "allow-auto eth0" to "allow-hotplug eth0" in /e/n/interfaces...
and the system did come up again every time (and I did like 5-8 reboots
with all my tests, also the one with netfilter-persistent above - just
to make sure that it's no coincidence).

I switched back to "allow-auto eth0" for verification,... (with several
tests)... and again it usually didn't get networking.



Using allow-hotplug however leads e.g. to these: 
# systemctl -l status apache2.service 
● apache2.service - LSB: Start/stop apache2 web server
   Loaded: loaded (/etc/init.d/apache2)
   Active: failed (Result: exit-code) since Mon 2014-10-27 06:52:28 CET; 2min 15s ago
  Process: 2837 ExecStart=/etc/init.d/apache2 start (code=exited, status=1/FAILURE)

Oct 27 06:52:27 kronecker systemd[2837]: Executing: /etc/init.d/apache2 start
Oct 27 06:52:28 kronecker apache2[2837]: Starting web server: apache2(99)Cannot assign requested address: make_sock: could not bind to address [2a01:4f8:a0:4024::c:0]:80
Oct 27 06:52:28 kronecker apache2[2837]: no listening sockets available, shutting down
Oct 27 06:52:28 kronecker apache2[2837]: Unable to open logs
Oct 27 06:52:28 kronecker apache2[2837]: Action 'start' failed.
Oct 27 06:52:28 kronecker apache2[2837]: The Apache error log may have more information.
Oct 27 06:52:28 kronecker apache2[2837]: failed!
Oct 27 06:52:28 kronecker systemd[1]: apache2.service: control process exited, code=exited status=1
Oct 27 06:52:28 kronecker systemd[1]: Failed to start LSB: Start/stop apache2 web server.
Oct 27 06:52:28 kronecker systemd[1]: Unit apache2.service entered failed state.



# systemctl -l status bind9.service 
● bind9.service - BIND Domain Name Server
   Loaded: loaded (/lib/systemd/system/bind9.service; enabled)
  Drop-In: /run/systemd/generator/bind9.service.d
           └─50-insserv.conf-$named.conf
   Active: failed (Result: exit-code) since Mon 2014-10-27 06:52:23 CET; 3min 13s ago
     Docs: man:named(8)
  Process: 2161 ExecStop=/usr/sbin/rndc stop (code=exited, status=1/FAILURE)
  Process: 1937 ExecStart=/usr/sbin/named -f -u bind (code=exited, status=1/FAILURE)
 Main PID: 1937 (code=exited, status=1/FAILURE)

Oct 27 06:52:23 kronecker systemd[1]: Forked /usr/sbin/rndc as 2161
Oct 27 06:52:23 kronecker systemd[1]: bind9.service changed running -> stop
Oct 27 06:52:23 kronecker systemd[2161]: Executing: /usr/sbin/rndc stop
Oct 27 06:52:23 kronecker rndc[2161]: rndc: couldn't get address for 'localhost.': not found
Oct 27 06:52:23 kronecker systemd[1]: Child 2161 belongs to bind9.service
Oct 27 06:52:23 kronecker systemd[1]: bind9.service: control process exited, code=exited status=1
Oct 27 06:52:23 kronecker systemd[1]: bind9.service got final SIGCHLD for state stop
Oct 27 06:52:23 kronecker systemd[1]: bind9.service changed stop -> failed
Oct 27 06:52:23 kronecker systemd[1]: Unit bind9.service entered failed state.
Oct 27 06:52:23 kronecker systemd[1]: bind9.service: cgroup is empty



# systemctl -l status sks.service 
● sks.service - (null)
   Loaded: loaded (/etc/init.d/sks)
   Active: active (running) since Mon 2014-10-27 06:52:22 CET; 4min 21s ago
  Process: 1991 ExecStart=/etc/init.d/sks start (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/sks.service
           ├─2043 /usr/sbin/sks db
           └─2046 /usr/sbin/sks recon

Oct 27 06:52:22 kronecker systemd[1]: sks.service changed dead -> start
Oct 27 06:52:22 kronecker systemd[1991]: Executing: /etc/init.d/sks start
Oct 27 06:52:22 kronecker sks[1991]: Starting sks daemons: sksdb.. sksrecon.. done.
Oct 27 06:52:22 kronecker systemd[1]: Child 1991 belongs to sks.service
Oct 27 06:52:22 kronecker systemd[1]: sks.service: control process exited, code=exited status=0
Oct 27 06:52:22 kronecker systemd[1]: sks.service got final SIGCHLD for state start
Oct 27 06:52:22 kronecker systemd[1]: sks.service changed start -> running
Oct 27 06:52:22 kronecker systemd[1]: Job sks.service/start finished, result=done
Oct 27 06:52:22 kronecker systemd[1]: Started (null).
Oct 27 06:52:23 kronecker sks[1991]: 2014-10-27 06:52:23 Failed to listen on 2a01:4f8:a0:4024::2:2:11370: Failure("Failure while binding socket.  Probably another socket bound to this address")





So it seems all this is strongly connected to bugs #727073 and #766291,
which I've reported against ifupdown some time ago.

The former (#727073) is basically the problem, that when allow-hotplug
is used, several services (apache, bind, sks for example) can't bind to
at least IPv4 addresses (back then in my tests with sysvinit being
init).
Andrew basically said, that things are as they are... but it seems that
this issue, together with systemd is really a showstopper.

The second (#766291) is with dhcp in a VM, using systemd as init,... and
there no interface gets configured when allow-auto is used.


So it would be great if the systemd and ifupdown experts could have
closer look on what to do.
Right now, one is in the unfortunate situation now, that either no
networking comes up at all... or it does come up, but most services
don't start as they fail to bind.

Which, as I've said,... are both showstoppers for production servers
For that reason, I'd also suggest to increase the severity of these
two/three bugs >= grave, so that people with apt-listbugs could get
notified... after all, the combination of these issues basically cause
full system breakage, since people might not even be able to connect to
their systems anymore without larger efforts.
I just won't change it myself, since I'm too tired of these
severity-up-and-down-wars.


Cheers,
Chris.
-------------- next part --------------
[Unit]
Description=netfilter persistent configuration
DefaultDependencies=no
Before=network.target network-pre.target sysinit.target
Requires=systemd-modules-load.service
After=systemd-modules-load.service

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/sbin/netfilter-persistent start
ExecStop=/usr/sbin/netfilter-persistent stop

[Install]
WantedBy=sysinit.target
RequiredBy=network-pre.target
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5313 bytes
Desc: not available
URL: <http://lists.alioth.debian.org/pipermail/pkg-systemd-maintainers/attachments/20141027/40fb903f/attachment-0001.bin>


More information about the Pkg-systemd-maintainers mailing list