Bug#835810: systemd nfs mount timeout leaves mount unit in active state, preventing later automounts
Tero Marttila
tero.marttila at paivola.fi
Sun Aug 28 14:08:51 BST 2016
Package: systemd
Version: 215-17+deb8u4
In a scenario where the NFS server is not yet up when mounting an NFS
share at boot, the systemd *.mount unit times out and gets stuck in a
somewhat odd "active (mounted)" state, even though it is not actually
mounted. This appears to prevent any systemd *.automount unit from being
able to later mount it when the NFS server does come back online.
This means that in our scenario where the NFS server machine is slower
to boot up than the NFS client machines, we must perform manual
intervention on each of the client machines to mount the missing NFS
shares after e.g. a site-wide power failure. I had hoped that using
systemd.automount would handle this scenario better, having processes
attempting to use the NFS mount wait or return IO errors until the NFS
server eventually comes online and gets mounted.
> ● srv-mail.mount - /srv/mail
> Loaded: loaded (/etc/fstab)
> Active: active (mounted) (Result: timeout) since Sun 2016-08-28 15:03:44 EEST; 1min 40s ago
> Where: /srv/mail
> What: nfs.test.pvl:mail
> Docs: man:fstab(5)
> man:systemd-fstab-generator(8)
> Process: 426 ExecMount=/bin/mount -n nfs.test.pvl:mail /srv/mail -t nfs4 -o defaults,auto,nodev,nosuid,noatime,x-systemd.automount (code=killed, signal=TERM)
>
> Aug 28 15:02:14 mail-mta-test systemd[426]: Executing: /bin/mount -n nfs.test.pvl:mail /srv/mail -t nfs4 -o defaults,auto,nodev,nosuid,noatime,x-systemd.automount
> Aug 28 15:03:44 mail-mta-test systemd[1]: srv-mail.mount mounting timed out. Stopping.
> Aug 28 15:03:44 mail-mta-test systemd[1]: srv-mail.mount changed mounting -> mounting-sigterm
> Aug 28 15:03:44 mail-mta-test systemd[1]: Child 426 belongs to srv-mail.mount
> Aug 28 15:03:44 mail-mta-test systemd[1]: srv-mail.mount mount process exited, code=killed status=15
> Aug 28 15:03:44 mail-mta-test systemd[1]: srv-mail.mount changed mounting-sigterm -> mounted
> Aug 28 15:03:44 mail-mta-test systemd[1]: Job srv-mail.mount/start finished, result=done
> Aug 28 15:03:44 mail-mta-test systemd[1]: Mounted /srv/mail.
I am reporting this as a bug since it appears that manually *stopping*
the *.mount unit to place it into an inactive state is enough to allow
the automounter to successfully mount it.
Annotated events below, full journalctl output at:
https://gist.github.com/SpComb/560ce0c4e7d625dfea354769f48af758#file-journalctl-systemd-mount
-- Tero Marttila
Initial mount during bootup:
> Aug 28 15:02:14 mail-mta-test systemd[1]: Mounting /srv/mail...
> Aug 28 15:02:14 mail-mta-test systemd[1]: About to execute: /bin/mount -n nfs.test.pvl:mail /srv/mail -t nfs4 -o defaults,auto,nodev,nosuid,noatime,x-systemd.automount
> Aug 28 15:02:14 mail-mta-test systemd[1]: Forked /bin/mount as 426
> Aug 28 15:02:14 mail-mta-test systemd[1]: srv-mail.mount changed dead -> mounting
NFS server is down, initial mount times out:
> Aug 28 15:03:44 mail-mta-test systemd[1]: srv-mail.mount mounting timed out. Stopping.
> Aug 28 15:03:44 mail-mta-test systemd[1]: srv-mail.mount changed mounting -> mounting-sigterm
> Aug 28 15:03:44 mail-mta-test systemd[1]: Received SIGCHLD from PID 426 (mount).
> Aug 28 15:03:44 mail-mta-test systemd[1]: Child 426 (mount) died (code=killed, status=15/TERM)
> Aug 28 15:03:44 mail-mta-test systemd[1]: Child 426 belongs to srv-mail.mount
> Aug 28 15:03:44 mail-mta-test systemd[1]: srv-mail.mount mount process exited, code=killed status=15
> Aug 28 15:03:44 mail-mta-test systemd[1]: srv-mail.mount changed mounting-sigterm -> mounted
> Aug 28 15:03:44 mail-mta-test systemd[1]: Job srv-mail.mount/start finished, result=done
> Aug 28 15:03:44 mail-mta-test systemd[1]: Mounted /srv/mail.
> Aug 28 15:03:44 mail-mta-test systemd[1]: Child 428 (mount.nfs4) died (code=killed, status=15/TERM)
Attempt to SSH in while NFS server is still down. Triggers first
automount attempt, but does not actually attempt to mount anything:
> Aug 28 15:06:23 mail-mta-test systemd[1]: Got automount request for /srv/mail, triggered by 753 (sshd)
> Aug 28 15:06:23 mail-mta-test systemd[1]: Trying to enqueue job srv-mail.mount/start/replace
> Aug 28 15:06:23 mail-mta-test systemd[1]: Installed new job srv-mail.mount/start as 671
> Aug 28 15:06:23 mail-mta-test systemd[1]: Enqueued job srv-mail.mount/start as 671
> Aug 28 15:06:23 mail-mta-test systemd[1]: srv-mail.automount changed waiting -> running
> Aug 28 15:06:23 mail-mta-test systemd[1]: Job srv-mail.mount/start finished, result=done
> Aug 28 15:06:23 mail-mta-test systemd[1]: Mounted /srv/mail.
Restore the NFS server.
Attempt to SSH in again with sshd pid=801, no automount triggered yet?
Manually stop the .mount unit, which immediately triggers a successful
automount from the previous sshd pid=801 login attempt:
> Aug 28 15:07:27 mail-mta-test systemd[1]: Got message type=method_call sender=n/a destination=org.freedesktop.systemd1 object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=StopUnit cookie=1 reply_cookie=0 error=n/a
> Aug 28 15:07:27 mail-mta-test systemd[1]: Trying to enqueue job srv-mail.mount/stop/replace
> Aug 28 15:07:27 mail-mta-test systemd[1]: Installed new job srv-mail.mount/stop as 684
> Aug 28 15:07:27 mail-mta-test systemd[1]: Enqueued job srv-mail.mount/stop as 684
> Aug 28 15:07:27 mail-mta-test systemd[1]: Sent message type=method_return sender=n/a destination=n/a object=n/a interface=n/a member=n/a cookie=1 reply_cookie=1 error=n/a
> Aug 28 15:07:27 mail-mta-test systemd[1]: Sent message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=JobNew cookie=2 reply_cookie=0 error=n/a
> Aug 28 15:07:27 mail-mta-test systemd[1]: Unmounting /srv/mail...
> Aug 28 15:07:27 mail-mta-test systemd[1]: About to execute: /bin/umount -n /srv/mail
> Aug 28 15:07:27 mail-mta-test systemd[1]: Forked /bin/umount as 818
> Aug 28 15:07:27 mail-mta-test systemd[1]: Sending failure: No such device
> Aug 28 15:07:27 mail-mta-test systemd[1]: srv-mail.mount changed mounted -> unmounting
> Aug 28 15:07:27 mail-mta-test systemd[1]: Sent message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1/unit/srv_2dmail_2emount interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=3 reply_cookie=0 error=n/a
> Aug 28 15:07:27 mail-mta-test systemd[1]: Sent message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1/unit/srv_2dmail_2emount interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=4 reply_cookie=0 error=n/a
> Aug 28 15:07:27 mail-mta-test systemd[1]: Sent message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1/job/684 interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=5 reply_cookie=0 error=n/a
> Aug 28 15:07:27 mail-mta-test systemd[1]: Got automount request for /srv/mail, triggered by 801 (sshd)
> Aug 28 15:07:27 mail-mta-test systemd[1]: Trying to enqueue job srv-mail.mount/start/replace
> Aug 28 15:07:27 mail-mta-test systemd[1]: Job srv-mail.mount/stop finished, result=canceled
> Aug 28 15:07:27 mail-mta-test systemd[1]: Sent message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=JobRemoved cookie=6 reply_cookie=0 error=n/a
> Aug 28 15:07:27 mail-mta-test systemd[1]: Installed new job srv-mail.mount/start as 685
> Aug 28 15:07:27 mail-mta-test systemd[1]: Enqueued job srv-mail.mount/start as 685
> Aug 28 15:07:27 mail-mta-test systemd[1]: Sent message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1/unit/srv_2dmail_2eautomount interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=7 reply_cookie=0 error=n/a
> Aug 28 15:07:27 mail-mta-test systemd[1]: Sent message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1/unit/srv_2dmail_2eautomount interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=8 reply_cookie=0 error=n/a
> Aug 28 15:07:27 mail-mta-test systemd[1]: Sent message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=JobNew cookie=9 reply_cookie=0 error=n/a
> Aug 28 15:07:27 mail-mta-test systemd[1]: Mounting /srv/mail...
> Aug 28 15:07:27 mail-mta-test systemd[1]: Sent message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1/unit/srv_2dmail_2emount interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=10 reply_cookie=0 error=n/a
> Aug 28 15:07:27 mail-mta-test systemd[1]: Sent message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1/unit/srv_2dmail_2emount interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=11 reply_cookie=0 error=n/a
> Aug 28 15:07:27 mail-mta-test systemd[1]: Sent message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1/job/685 interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=12 reply_cookie=0 error=n/a
> Aug 28 15:07:27 mail-mta-test systemd[1]: Got message type=method_call sender=n/a destination=org.freedesktop.systemd1 object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=GetUnit cookie=2 reply_cookie=0 error=n/a
> Aug 28 15:07:27 mail-mta-test systemd[1]: Sent message type=method_return sender=n/a destination=n/a object=n/a interface=n/a member=n/a cookie=13 reply_cookie=2 error=n/a
> Aug 28 15:07:27 mail-mta-test systemd[1]: Got message type=method_call sender=n/a destination=org.freedesktop.systemd1 object=/org/freedesktop/systemd1/unit/srv_2dmail_2emount interface=org.freedesktop.DBus.Properties member=Get cookie=3 reply_cookie=0 error=n/a
> Aug 28 15:07:27 mail-mta-test systemd[1]: Sent message type=method_return sender=n/a destination=n/a object=n/a interface=n/a member=n/a cookie=14 reply_cookie=3 error=n/a
> Aug 28 15:07:27 mail-mta-test systemd[818]: Executing: /bin/umount -n /srv/mail
> Aug 28 15:07:27 mail-mta-test systemd[1]: Got disconnect on private connection.
> Aug 28 15:07:27 mail-mta-test systemd[1]: Received SIGCHLD from PID 818 (umount).
> Aug 28 15:07:27 mail-mta-test systemd[1]: Child 818 (umount) died (code=exited, status=32/n/a)
> Aug 28 15:07:27 mail-mta-test systemd[1]: Child 818 belongs to srv-mail.mount
> Aug 28 15:07:27 mail-mta-test systemd[1]: srv-mail.mount mount process exited, code=exited status=32
> Aug 28 15:07:27 mail-mta-test systemd[1]: Sending failure: No such device
> Aug 28 15:07:27 mail-mta-test systemd[1]: srv-mail.mount changed unmounting -> failed
> Aug 28 15:07:27 mail-mta-test systemd[1]: Unit srv-mail.mount entered failed state.
> Aug 28 15:07:27 mail-mta-test systemd[1]: Mounting /srv/mail...
> Aug 28 15:07:27 mail-mta-test systemd[1]: About to execute: /bin/mount -n nfs.test.pvl:mail /srv/mail -t nfs4 -o defaults,auto,nodev,nosuid,noatime,x-systemd.automount
> Aug 28 15:07:27 mail-mta-test systemd[1]: Forked /bin/mount as 821
> Aug 28 15:07:27 mail-mta-test systemd[1]: srv-mail.mount changed failed -> mounting
> Aug 28 15:07:27 mail-mta-test systemd[1]: Received SIGCHLD from PID 802 (sshd).
> Aug 28 15:07:27 mail-mta-test systemd[1]: Child 802 (sshd) died (code=exited, status=255/n/a)
> Aug 28 15:07:27 mail-mta-test systemd[1]: Child 754 (sshd) died (code=exited, status=255/n/a)
> Aug 28 15:07:27 mail-mta-test systemd[1]: Accepted new private connection.
> Aug 28 15:07:27 mail-mta-test systemd[1]: Got message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1/agent interface=org.freedesktop.systemd1.Agent member=Released cookie=1 reply_cookie=0 error=n/a
> Aug 28 15:07:27 mail-mta-test systemd[1]: Got disconnect on private connection.
> Aug 28 15:07:27 mail-mta-test systemd[821]: Executing: /bin/mount -n nfs.test.pvl:mail /srv/mail -t nfs4 -o defaults,auto,nodev,nosuid,noatime,x-systemd.autom
> Aug 28 15:07:27 mail-mta-test sshd[801]: Connection closed by 2001:2060:41:a64::1d9 [preauth]
> Aug 28 15:07:27 mail-mta-test sshd[753]: Connection closed by 2001:2060:41:a64::1d9 [preauth]
> Aug 28 15:07:27 mail-mta-test umount[818]: umount: /srv/mail: target is busy
> Aug 28 15:07:27 mail-mta-test umount[818]: (In some cases useful info about processes that
> Aug 28 15:07:27 mail-mta-test umount[818]: use the device is found by lsof(8) or fuser(1).)
> Aug 28 15:07:27 mail-mta-test systemd[1]: srv-mail.mount changed mounting -> mounting-done
> Aug 28 15:07:27 mail-mta-test systemd[1]: Job srv-mail.mount/start finished, result=done
> Aug 28 15:07:27 mail-mta-test systemd[1]: Mounted /srv/mail.
> Aug 28 15:07:27 mail-mta-test systemd[1]: Received SIGCHLD from PID 821 (mount).
> Aug 28 15:07:27 mail-mta-test systemd[1]: Child 821 (mount) died (code=exited, status=0/SUCCESS)
> Aug 28 15:07:27 mail-mta-test systemd[1]: Child 821 belongs to srv-mail.mount
> Aug 28 15:07:27 mail-mta-test systemd[1]: srv-mail.mount mount process exited, code=exited status=0
> Aug 28 15:07:27 mail-mta-test systemd[1]: srv-mail.mount changed mounting-done -> mounted
The NFS mount is now actually automounted.
More information about the Pkg-systemd-maintainers
mailing list