[Pkg-systemd-maintainers] Bug#719945: Bug#719945: systemd: Hangs during shutdown (likely NFS-related)

Sam Morris sam at robots.org.uk
Tue Jan 28 13:28:23 GMT 2014


On Sun, Jan 26, 2014 at 10:35:29AM +0100, Michael Stapelberg wrote:
> control: tag -1 + pending
> 
> Hi Sam,
> 
> Sam Morris <sam at robots.org.uk> writes:
> > I rebuilt with the attached patch and it does the trick. I think it's
> > also the fix applied to fix
> > <https://bugzilla.redhat.com/show_bug.cgi?id=999061>.
> Thanks, I merged it:
> http://anonscm.debian.org/gitweb/?p=pkg-systemd/systemd.git;a=commitdiff;h=cf19e2b
> 
> -- 
> Best regards,
> Michael

Hm. It seems that the problem isn't fixed after all. I was fooled
because I was able to reboot a few times without the problem happening,
but I've now reproduced it with the patch applied.

I've attached a debug log of the shutdown process. You can see
ifup at eth0.service being stopped on line 8596, but home.mount isn't
unmounted until line 9612.

On line 9588 you can see that nfs-common.service is stopped before the
NFS unmount operation completes (line 9654). I'm not an NFS expert but I
think this service should only be stopped after all NFS filesystems are
unmounted, so that the NFS server is informed that any locks being
released on the filesystem (and probably other things on different NFS
versions). This is the ordering in /etc/rc6.d as well.

On line 9617, you can see that the NFS mount is being unmounted with a
simple '/bin/umount /home' which fails since there are still user
processes running with files open. In order to avoid potential data loss
I get the feeling that something should be killing these processes off
politely before the filesystem rug is yanked away from underneath them,
but I think that's a bug for another time. When booting with sysvinit,
/etc/init.d/umountnfs.sh uses the -f and -l options when running umount,
which at least ensures that the filesystem will be unmounted even if the
network is down. From my log it appears that systemd doesn't even start
this service during the shutdown process. If it's intended that systemd
takes over its job, then the correct options should be used (-f and -l,
on any kernel version supported by systemd), and the service should be
masked. If not, then umountnfs.service should be started during the
shutdown process. Unless you have another suggestion, I'll give this a
go and see how it works out.

FYI, here's a summary of how NFS mounting during boot, and unmounting
during shutdown, is handled in Debian.

By default, d-i configures network interfaces as follows:

	allow-hotplug eth0
	iface eth0 inet dhcp

This causes NFS mounts to be activated by ifup, via
/etc/network/if-up.d/mountnfs, during hotplug time, but only if all
other 'auto' interfaces have previously been brought up.

The user can also configure their network interface with 'auto' instead
of 'allow-hotplug'. In this case, NFS mounts are still mounted when ifup
for the final 'auto' interface is run, but this will instead happen
during the start of networking.service.

There's also the existence of an /etc/default.rcS variable,
ASYNCMOUNTNFS. By default this is unset, corresponding to 'yes'. If set
to 'no', then NFS mounts are not activated as above; instead they are
activated by mountnfs.service. This service is masked in the Debian
systemd package, so I think we can say that ASYNCMOUNTNFS=no is not
currently supported with our systemd setup.

Under sysvinit, unmounting at shutdown is handled by
/etc/init.d/umountnfs.sh, which runs before nfs-common, and then
rpcbind, are stopped. As noted above, umountnfs.service is not started
during shutdown under systemd.

Interfaces can also be configured with NetworkManager, which adds
another axis to the configuration space. Simple configuration of a wired
network interface should still work, but I think some work has to be
done (currently by the admin) to enable
NetworkManager-wait-online.service in order to get systemd to delay
activating the NFS mounts until NM determines that a network connection
is available.

Incidentally, NetworkManager-wait-online.service looks wrong to me; I
think it should declare Wants= and Before= on network-online.target,
since that is the name of the target documented in systemd.special(7);
however I think that it's not actually broken with its current
settings--they will just result in network.target itself being delayed
until NetworkManager-wait-online.service starts up, and since the .mount
units generated by systemd-fstab-generator are After= both network.target
and network-online.target, the mounts will still be activated at the
right time. If NetworkManager-wait-online.service were changed to use
network-wait-online.target instead, then could we enable
NetworkManager-wait-online.service by default without delaying the
startup of any services that don't run After= that target, i.e., none in
the default install?

As for shutting down, NetworkManager should only be stopped after remote
filesystems are unmounted. I'm not sure if this is the case already.
I've no idea how to deal with horrible cases such as when the user
reboots the system while they have mounted an NFS share via a VPN
connection that will be killed when they log out.

Anyway, there's also the question of whether the infrastructure for
dealing with NFS should be distribution-specific at all, or whether it
should all be moved in to systemd. I'd love for this to be the case in
the long term, but today it's just wishful thinking. For now I'd just
like to replicate enough of the classic Debian shutdown logic to
eliminate the hang during shutdown.

Regards,

-- 

Sam Morris <https://robots.org.uk/>
3412 EA18 1277 354B 991B  C869 B219 7FDB 5EA0 1078




More information about the Pkg-systemd-maintainers mailing list