[Pkg-systemd-maintainers] Bug#719945: Bug#719945: systemd: Hangs during shutdown (likely NFS-related)

Michael Biebl biebl at debian.org
Tue Jan 28 16:56:24 GMT 2014


Am 28.01.2014 14:28, schrieb Sam Morris:
> On Sun, Jan 26, 2014 at 10:35:29AM +0100, Michael Stapelberg wrote:
>> control: tag -1 + pending
>>
>> Hi Sam,
>>
>> Sam Morris <sam at robots.org.uk> writes:
>>> I rebuilt with the attached patch and it does the trick. I think it's
>>> also the fix applied to fix
>>> <https://bugzilla.redhat.com/show_bug.cgi?id=999061>.
>> Thanks, I merged it:
>> http://anonscm.debian.org/gitweb/?p=pkg-systemd/systemd.git;a=commitdiff;h=cf19e2b
>>
>> -- 
>> Best regards,
>> Michael
> 
> Hm. It seems that the problem isn't fixed after all. I was fooled
> because I was able to reboot a few times without the problem happening,
> but I've now reproduced it with the patch applied.
> 
> I've attached a debug log of the shutdown process. You can see
> ifup at eth0.service being stopped on line 8596, but home.mount isn't
> unmounted until line 9612.
> 
> On line 9588 you can see that nfs-common.service is stopped before the
> NFS unmount operation completes (line 9654). I'm not an NFS expert but I
> think this service should only be stopped after all NFS filesystems are
> unmounted, so that the NFS server is informed that any locks being
> released on the filesystem (and probably other things on different NFS
> versions). This is the ordering in /etc/rc6.d as well.

Can you attach the output of
systemctl show nfs-common.service ifup at eth0.service your-nfs-mount.mount

> On line 9617, you can see that the NFS mount is being unmounted with a
> simple '/bin/umount /home' which fails since there are still user
> processes running with files open. In order to avoid potential data loss
> I get the feeling that something should be killing these processes off
> politely before the filesystem rug is yanked away from underneath them,
> but I think that's a bug for another time. When booting with sysvinit,
> /etc/init.d/umountnfs.sh uses the -f and -l options when running umount,
> which at least ensures that the filesystem will be unmounted even if the
> network is down. From my log it appears that systemd doesn't even start
> this service during the shutdown process. If it's intended that systemd
> takes over its job, then the correct options should be used (-f and -l,
> on any kernel version supported by systemd), and the service should be
> masked. If not, then umountnfs.service should be started during the
> shutdown process. Unless you have another suggestion, I'll give this a
> go and see how it works out.
> 
> FYI, here's a summary of how NFS mounting during boot, and unmounting
> during shutdown, is handled in Debian.
> 
> By default, d-i configures network interfaces as follows:
> 
> 	allow-hotplug eth0
> 	iface eth0 inet dhcp
> 
> This causes NFS mounts to be activated by ifup, via
> /etc/network/if-up.d/mountnfs, during hotplug time, but only if all
> other 'auto' interfaces have previously been brought up.
> 
> The user can also configure their network interface with 'auto' instead
> of 'allow-hotplug'. In this case, NFS mounts are still mounted when ifup
> for the final 'auto' interface is run, but this will instead happen
> during the start of networking.service.
> 
> There's also the existence of an /etc/default.rcS variable,
> ASYNCMOUNTNFS. By default this is unset, corresponding to 'yes'. If set
> to 'no', then NFS mounts are not activated as above; instead they are
> activated by mountnfs.service. This service is masked in the Debian
> systemd package, so I think we can say that ASYNCMOUNTNFS=no is not
> currently supported with our systemd setup.
> 
> Under sysvinit, unmounting at shutdown is handled by
> /etc/init.d/umountnfs.sh, which runs before nfs-common, and then
> rpcbind, are stopped. As noted above, umountnfs.service is not started
> during shutdown under systemd.

This is all a great mess under sysvinit.
umountnfs.service is blacklisted as mounts (also remote ones) are
directly handled by systemd.

> Interfaces can also be configured with NetworkManager, which adds
> another axis to the configuration space. Simple configuration of a wired
> network interface should still work, but I think some work has to be
> done (currently by the admin) to enable
> NetworkManager-wait-online.service in order to get systemd to delay
> activating the NFS mounts until NM determines that a network connection
> is available.
> 
> Incidentally, NetworkManager-wait-online.service looks wrong to me; I
> think it should declare Wants= and Before= on network-online.target,
> since that is the name of the target documented in systemd.special(7);
> however I think that it's not actually broken with its current
> settings--they will just result in network.target itself being delayed
> until NetworkManager-wait-online.service starts up, and since the .mount
> units generated by systemd-fstab-generator are After= both network.target
> and network-online.target, the mounts will still be activated at the
> right time. If NetworkManager-wait-online.service were changed to use
> network-wait-online.target instead, then could we enable
> NetworkManager-wait-online.service by default without delaying the
> startup of any services that don't run After= that target, i.e., none in
> the default install?

NM-wait-online is only really relevant for boot. It's a service which
blocks (by default up until 30 secs) and waits until a network
connection is established. And yeah, I think NM in unstable is currently
broken in that regard. The introduction of network-online.target is
something more recent. IIRC this should be fixed in the experimental
version of NM.

> As for shutting down, NetworkManager should only be stopped after remote
> filesystems are unmounted. I'm not sure if this is the case already.
> I've no idea how to deal with horrible cases such as when the user
> reboots the system while they have mounted an NFS share via a VPN
> connection that will be killed when they log out.

Since /usr could be on NFS, this is going to be tricky. That said, I
don't think NM has a problem here since it not longer shuts down the
interfaces when NM is stopped (at least ethernet devices).

As for ifup at .service: it might be a problem that we use
DefaultDependencies=yes (the default).
We probably need to use DefaultDependencies=no and tweak the dependencies.
We will probably also need native .service files for nfs-common and
rpcbind so we can ensure the correct ordering.

Michael

-- 
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 884 bytes
Desc: OpenPGP digital signature
URL: <http://alioth-lists.debian.net/pipermail/pkg-systemd-maintainers/attachments/20140128/56233593/attachment-0002.sig>


More information about the Pkg-systemd-maintainers mailing list