jessie: help debugging NFS shares not mounted at boot, double mounts with mount -a, and @reboot cronjobs

Sandro Tosi morph at debian.org
Wed Feb 24 12:29:05 GMT 2016


Hey Felipe, thanks a lot for your help!

On Tue, Feb 23, 2016 at 10:16 PM, Felipe Sateler <fsateler at debian.org>
wrote:
>
> On 23 February 2016 at 12:12, Sandro Tosi <morph at debian.org> wrote:
> > On Tue, Feb 23, 2016 at 9:19 AM, Sandro Tosi <morph at debian.org> wrote:
> >> quick update: we had a couple of (real) nfs issues and
> >> misconfiguration (meeh) that made the script fail even if it shouldnt
> >> have, so no news yet; the reboot loop just restarted and will
> >> periodically check it and report back if something comes up.
> >
> > so well, we just needed to wait a bit :)
> >
> > here it is:
> >
> > # journalctl -alb | grep -nE "cron|NFSSERVER"
> > 1012:Feb 23 06:50:42 SERVER systemd[1]: Installed new job
> > mnt-NFSSERVER_VOL.mount/start as 99
> > 1014:Feb 23 06:50:42 SERVER systemd[1]: Installed new job
cron.service/start
> > as 101
> > 1714:Feb 23 06:50:43 SERVER systemd[1]: Mounting /mnt/NFSSERVER_VOL...
> > 1716:Feb 23 06:50:43 SERVER systemd[1]: About to execute: /bin/mount -n
> > XXX.YYY.32.75:/vol/VOL /mnt/NFSSERVER_VOL -t nfs -o
> > ro,intr,nolock,tcp,rdirplus,noatime,_netdev
> > 1718:Feb 23 06:50:43 SERVER systemd[1]: mnt-NFSSERVER_VOL.mount changed
dead
> > -> mounting
> > 1720:Feb 23 06:50:43 SERVER systemd[574]: Executing: /bin/mount -n
> > XXX.YYY.32.75:/vol/VOL /mnt/NFSSERVER_VOL -t nfs -o
> > ro,intr,nolock,tcp,rdirplus,noatime,_netdev
> > 1905:Feb 23 06:52:13 SERVER systemd[1]: mnt-NFSSERVER_VOL.mount mounting
> > timed out. Stopping.
> > 1906:Feb 23 06:52:13 SERVER systemd[1]: mnt-NFSSERVER_VOL.mount changed
> > mounting -> mounting-sigterm
> > 1915:Feb 23 06:52:13 SERVER systemd[1]: Child 574 belongs to
> > mnt-NFSSERVER_VOL.mount
> > 1916:Feb 23 06:52:13 SERVER systemd[1]: mnt-NFSSERVER_VOL.mount mount
> > process exited, code=killed status=15
> > 1917:Feb 23 06:52:13 SERVER systemd[1]: mnt-NFSSERVER_VOL.mount changed
> > mounting-sigterm -> mounted
> > 1918:Feb 23 06:52:13 SERVER systemd[1]: Job
mnt-NFSSERVER_VOL.mount/start
> > finished, result=done
> > 1919:Feb 23 06:52:13 SERVER systemd[1]: Mounted /mnt/NFSSERVER_VOL.
> > 2025:Feb 23 06:52:13 SERVER systemd[1]: About to execute:
/usr/sbin/cron -f
> > $EXTRA_OPTS
> > 2026:Feb 23 06:52:13 SERVER systemd[1]: Forked /usr/sbin/cron as 786
> > 2027:Feb 23 06:52:13 SERVER systemd[1]: cron.service changed dead ->
running
> > 2028:Feb 23 06:52:13 SERVER systemd[1]: Job cron.service/start finished,
> > result=done
> > 2029:Feb 23 06:52:13 SERVER systemd[786]: Executing: /usr/sbin/cron -f
> > 2038:Feb 23 06:52:13 SERVER cron[786]: (CRON) INFO (pidfile fd = 3)
> > 2128:Feb 23 06:52:13 SERVER cron[786]: (CRON) INFO (Running @reboot
jobs)
> > 2300:Feb 23 06:52:13 SERVER systemd[1]: mnt-NFSSERVER_VOL.mount changed
> > mounted -> failed
> > 2301:Feb 23 06:52:13 SERVER systemd[1]: Failed to destroy cgroup
> > /system.slice/mnt-NFSSERVER_VOL.mount: Device or resource busy
> > 2302:Feb 23 06:52:13 SERVER systemd[1]: Unit mnt-NFSSERVER_VOL.mount
entered
> > failed state.
> > 2303:Feb 23 06:52:13 SERVER systemd[1]: Sent message type=signal
sender=n/a
> > destination=n/a
> > object=/org/freedesktop/systemd1/unit/mnt_2dNFSSERVER_5fVOL_2emount
> > interface=org.freedesktop.DBus.Properties member=PropertiesChanged
cookie=30
> > reply_cookie=0 error=n/a
> > 2304:Feb 23 06:52:13 SERVER systemd[1]: Sent message type=signal
sender=n/a
> > destination=n/a
> > object=/org/freedesktop/systemd1/unit/mnt_2dNFSSERVER_5fVOL_2emount
> > interface=org.freedesktop.DBus.Properties member=PropertiesChanged
cookie=31
> > reply_cookie=0 error=n/a
> >
> > so 1m30s passed and the mount didnt come up, which is (one of) the
original
> > issue (usually, running mount -t nfs -a will bring it up, even just
right
> > after the failure at boot, so it seems like it's not retried?). i check
in
> > the journalctl output around those lines, there was additional msg
relevant
> > to this.
>
> Is the network up at this point?

i think so (and the other 11 NFS mounts were correctly mounted at this
point anyway): here's some logs of the network and the failing mount:

# journalctl -alb | grep -inE "network|NFSSERVER"
653:Feb 23 06:50:42 SERVER kernel: igb: Intel(R) Gigabit Ethernet Network
Driver - version 5.2.15-k
656:Feb 23 06:50:42 SERVER kernel: igb 0000:01:00.0: Intel(R) Gigabit
Ethernet Network Connection
661:Feb 23 06:50:42 SERVER kernel: igb 0000:01:00.1: Intel(R) Gigabit
Ethernet Network Connection
678:Feb 23 06:50:42 SERVER kernel: drop_monitor: Initializing network drop
monitor service
962:Feb 23 06:50:42 SERVER systemd[1]: Installed new job
networking.service/start as 46
963:Feb 23 06:50:42 SERVER systemd[1]: Installed new job
network.target/start as 47
969:Feb 23 06:50:42 SERVER systemd[1]: Installed new job
network-online.target/start as 53
1012:Feb 23 06:50:42 SERVER systemd[1]: Installed new job
mnt-NFSSERVER_VOL.mount/start as 99
1548:Feb 23 06:50:42 SERVER systemd[1]: Starting LSB: Raise network
interfaces....
1550:Feb 23 06:50:42 SERVER systemd[1]: About to execute:
/etc/init.d/networking start
1551:Feb 23 06:50:42 SERVER systemd[1]: Forked /etc/init.d/networking as 407
1552:Feb 23 06:50:42 SERVER systemd[1]: networking.service changed dead ->
start
1553:Feb 23 06:50:42 SERVER systemd[407]: Executing: /etc/init.d/networking
start
1591:Feb 23 06:50:43 SERVER networking[407]: Configuring network
interfaces...Cannot get EEE settings: Operation not supported
1648:Feb 23 06:50:43 SERVER networking[407]: done.
1649:Feb 23 06:50:43 SERVER systemd[1]: Received SIGCHLD from PID 407
(networking).
1650:Feb 23 06:50:43 SERVER systemd[1]: Child 407 (networking) died
(code=exited, status=0/SUCCESS)
1651:Feb 23 06:50:43 SERVER systemd[1]: Child 407 belongs to
networking.service
1652:Feb 23 06:50:43 SERVER systemd[1]: networking.service: control process
exited, code=exited status=0
1653:Feb 23 06:50:43 SERVER systemd[1]: networking.service got final
SIGCHLD for state start
1654:Feb 23 06:50:43 SERVER systemd[1]: networking.service changed start ->
running
1655:Feb 23 06:50:43 SERVER systemd[1]: Job networking.service/start
finished, result=done
1656:Feb 23 06:50:43 SERVER systemd[1]: Started LSB: Raise network
interfaces..
1657:Feb 23 06:50:43 SERVER systemd[1]: Starting Network.
1658:Feb 23 06:50:43 SERVER systemd[1]: network.target changed dead ->
active
1659:Feb 23 06:50:43 SERVER systemd[1]: Job network.target/start finished,
result=done
1660:Feb 23 06:50:43 SERVER systemd[1]: Reached target Network.
1661:Feb 23 06:50:43 SERVER systemd[1]: Starting Network is Online.
1662:Feb 23 06:50:43 SERVER systemd[1]: network-online.target changed dead
-> active
1663:Feb 23 06:50:43 SERVER systemd[1]: Job network-online.target/start
finished, result=done
1664:Feb 23 06:50:43 SERVER systemd[1]: Reached target Network is Online.
1714:Feb 23 06:50:43 SERVER systemd[1]: Mounting /mnt/NFSSERVER_VOL...
1716:Feb 23 06:50:43 SERVER systemd[1]: About to execute: /bin/mount -n
XXX.YYY.32.75:/vol/VOL /mnt/NFSSERVER_VOL -t nfs -o
ro,intr,nolock,tcp,rdirplus,noatime,_netdev
1718:Feb 23 06:50:43 SERVER systemd[1]: mnt-NFSSERVER_VOL.mount changed
dead -> mounting
1720:Feb 23 06:50:43 SERVER systemd[574]: Executing: /bin/mount -n
XXX.YYY.32.75:/vol/VOL /mnt/NFSSERVER_VOL -t nfs -o
ro,intr,nolock,tcp,rdirplus,noatime,_netdev
1877:Feb 23 06:50:58 SERVER systemd[1]: Child 489 belongs to
networking.service
1879:Feb 23 06:50:58 SERVER systemd[1]: Child 501 belongs to
networking.service
1891:Feb 23 06:51:19 SERVER systemd[1]: Child 539 belongs to
networking.service
1892:Feb 23 06:51:19 SERVER systemd[1]: networking.service: cgroup is empty
1893:Feb 23 06:51:19 SERVER systemd[1]: networking.service changed running
-> exited
1905:Feb 23 06:52:13 SERVER systemd[1]: mnt-NFSSERVER_VOL.mount mounting
timed out. Stopping.
1906:Feb 23 06:52:13 SERVER systemd[1]: mnt-NFSSERVER_VOL.mount changed
mounting -> mounting-sigterm
1915:Feb 23 06:52:13 SERVER systemd[1]: Child 574 belongs to
mnt-NFSSERVER_VOL.mount
1916:Feb 23 06:52:13 SERVER systemd[1]: mnt-NFSSERVER_VOL.mount mount
process exited, code=killed status=15
1917:Feb 23 06:52:13 SERVER systemd[1]: mnt-NFSSERVER_VOL.mount changed
mounting-sigterm -> mounted
1918:Feb 23 06:52:13 SERVER systemd[1]: Job mnt-NFSSERVER_VOL.mount/start
finished, result=done
1919:Feb 23 06:52:13 SERVER systemd[1]: Mounted /mnt/NFSSERVER_VOL.
2300:Feb 23 06:52:13 SERVER systemd[1]: mnt-NFSSERVER_VOL.mount changed
mounted -> failed
2301:Feb 23 06:52:13 SERVER systemd[1]: Failed to destroy cgroup
/system.slice/mnt-NFSSERVER_VOL.mount: Device or resource busy
2302:Feb 23 06:52:13 SERVER systemd[1]: Unit mnt-NFSSERVER_VOL.mount
entered failed state.
2303:Feb 23 06:52:13 SERVER systemd[1]: Sent message type=signal sender=n/a
destination=n/a
object=/org/freedesktop/systemd1/unit/mnt_2dNFSSERVER_5fVOL_2emount
interface=org.freedesktop.DBus.Properties member=PropertiesChanged
cookie=30 reply_cookie=0 error=n/a
2304:Feb 23 06:52:13 SERVER systemd[1]: Sent message type=signal sender=n/a
destination=n/a
object=/org/freedesktop/systemd1/unit/mnt_2dNFSSERVER_5fVOL_2emount
interface=org.freedesktop.DBus.Properties member=PropertiesChanged
cookie=31 reply_cookie=0 error=n/a
2449:Feb 23 06:52:13 SERVER systemd[1]: Starting Host and Network Name
Lookups.
2452:Feb 23 06:52:13 SERVER systemd[1]: Reached target Host and Network
Name Lookups.

in particular

1664:Feb 23 06:50:43 SERVER systemd[1]: Reached target Network is Online.
1714:Feb 23 06:50:43 SERVER systemd[1]: Mounting /mnt/NFSSERVER_VOL...

so the net online target is reached (if we believe systemd :) from man
systemd.special for network-online.target, "What precisely this requires is
left to the implementation of the network managing service.", how can we
know how this is implemented, and so what is meant here as
'online'?) before attempting the mount.

I checked at what point the /mnt/NFSSERVER_VOL is executed, and it was the
very last one NFS mount attempted (dunno if it's useful, i was expecting to
be the first, so with a possibly network still unstable), here below what I
called before /mnt/NFSSERVER_VOL is /mnt/NFSSERVER1_VOL (as i included also
the other NFS mounts):

1665:Feb 23 06:50:43 SERVER systemd[1]: Mounting /mnt/NFSSERVER12_VOL...
1669:Feb 23 06:50:43 SERVER systemd[1]: Mounting /mnt/NFSSERVER7_VOL...
1674:Feb 23 06:50:43 SERVER systemd[1]: Mounting /mnt/NFSSERVER10...
1679:Feb 23 06:50:43 SERVER systemd[1]: Mounting /mnt/NFSSERVER7_VOL...
1684:Feb 23 06:50:43 SERVER systemd[1]: Mounting /mnt/NFSSERVER8...
1689:Feb 23 06:50:43 SERVER systemd[1]: Mounting /mnt/NFSSERVER7_VOL...
1694:Feb 23 06:50:43 SERVER systemd[1]: Mounting /mnt/NFSSERVER5...
1699:Feb 23 06:50:43 SERVER systemd[1]: Mounting /mnt/NFSSERVER4...
1704:Feb 23 06:50:43 SERVER systemd[1]: Mounting /mnt/NFSSERVER3...
1709:Feb 23 06:50:43 SERVER systemd[1]: Mounting /mnt/NFSSERVER2...
1714:Feb 23 06:50:43 SERVER systemd[1]: Mounting /mnt/NFSSERVER1_VOL...
1757:Feb 23 06:50:46 SERVER systemd[1]: mnt-NFSSERVER12_VOL.mount changed
mounting -> mounting-done
1759:Feb 23 06:50:46 SERVER systemd[1]: Mounted /mnt/NFSSERVER12_VOL.
1777:Feb 23 06:50:47 SERVER systemd[1]: mnt-NFSSERVER8.mount changed
mounting -> mounting-done
1779:Feb 23 06:50:47 SERVER systemd[1]: Mounted /mnt/NFSSERVER8.
1780:Feb 23 06:50:47 SERVER systemd[1]: mnt-NFSSERVER7_VOL.mount changed
mounting -> mounting-done
1782:Feb 23 06:50:47 SERVER systemd[1]: Mounted /mnt/NFSSERVER7_VOL.
1783:Feb 23 06:50:47 SERVER systemd[1]: mnt-NFSSERVER3.mount changed
mounting -> mounting-done
1785:Feb 23 06:50:47 SERVER systemd[1]: Mounted /mnt/NFSSERVER3.
1786:Feb 23 06:50:47 SERVER systemd[1]: mnt-NFSSERVER5.mount changed
mounting -> mounting-done
1788:Feb 23 06:50:47 SERVER systemd[1]: Mounted /mnt/NFSSERVER5.
1789:Feb 23 06:50:47 SERVER systemd[1]: mnt-NFSSERVER7_VOL.mount changed
mounting -> mounting-done
1791:Feb 23 06:50:47 SERVER systemd[1]: Mounted /mnt/NFSSERVER7_VOL.
1792:Feb 23 06:50:47 SERVER systemd[1]: mnt-NFSSERVER7_VOL.mount changed
mounting -> mounting-done
1794:Feb 23 06:50:47 SERVER systemd[1]: Mounted /mnt/NFSSERVER7_VOL.
1795:Feb 23 06:50:47 SERVER systemd[1]: mnt-NFSSERVER10.mount changed
mounting -> mounting-done
1797:Feb 23 06:50:47 SERVER systemd[1]: Mounted /mnt/NFSSERVER10.
1830:Feb 23 06:50:47 SERVER systemd[1]: mnt-NFSSERVER4.mount changed
mounting -> mounted
1832:Feb 23 06:50:47 SERVER systemd[1]: Mounted /mnt/NFSSERVER4.
1836:Feb 23 06:50:47 SERVER systemd[1]: mnt-NFSSERVER2.mount changed
mounting -> mounted
1838:Feb 23 06:50:47 SERVER systemd[1]: Mounted /mnt/NFSSERVER2.
1906:Feb 23 06:52:13 SERVER systemd[1]: mnt-NFSSERVER1_VOL.mount changed
mounting -> mounting-sigterm
1919:Feb 23 06:52:13 SERVER systemd[1]: Mounted /mnt/NFSSERVER1_VOL.

but it is the first one defined in /etc/fstab:

XXX.YYY.32.75:/vol/vol3    /mnt/NFSSERVER1_VOL      nfs
ro,intr,nolock,tcp,rdirplus,noatime,_netdev
XXX.YYY.26.132:/ifs    /mnt/NFSSERVER2    nfs
rw,intr,tcp,rdirplus,noatime,_netdev
XXX.YYY.20.176:/ifs    /mnt/NFSSERVER3      nfs
ro,tcp,rdirplus,noatime,_netdev
XXX.YYY.16.226:/ifs    /mnt/NFSSERVER4     nfs
rw,intr,tcp,rdirplus,noatime,_netdev
XXX.YYY.21.22:/ifs/scratcho    /mnt/NFSSERVER5     nfs
rw,intr,tcp,rdirplus,noatime,_netdev
XXX.YYY.27.74:/vol/vol5    /mnt/NFSSERVER7_VOL     nfs
rw,intr,tcp,rdirplus,noatime,_netdev
XXX.YYY.32.111:/buildcache     /mnt/NFSSERVER8   nfs
ro,soft,intr,tcp,rdirplus,noatime,_netdev
XXX.YYY.27.74:/vol/vol1    /mnt/NFSSERVER7_VOL     nfs
ro,soft,intr,tcp,rdirplus,noatime,_netdev
XXX.YYY.20.176:/ifs    /mnt/NFSSERVER10     nfs
ro,nocto,actimeo=600,tcp,rdirplus,noatime,_netdev
XXX.YYY.27.74:/vol/vol6    /mnt/NFSSERVER7_VOL     nfs
ro,soft,intr,tcp,rdirplus,noatime,_netdev
XXX.YYY.20.14:/vol/vol3 /mnt/NFSSERVER12_VOL nfs
ro,intr,nolock,tcp,rdirplus,noatime,_netdev

something else that sounds fishy is:

Feb 23 06:50:43 SERVER systemd[1]: Mounting /mnt/NFSSERVER1_VOL...
Feb 23 06:50:43 SERVER systemd[1]: About to execute: /bin/mount -n
XXX.YYY.32.75:/vol/vol3 /mnt/NFSSERVER1_VOL -t nfs -o
ro,intr,nolock,tcp,rdirplus,noatime,_netdev
Feb 23 06:50:43 SERVER systemd[1]: mnt-NFSSERVER1_VOL.mount changed dead ->
mounting
Feb 23 06:50:43 SERVER systemd[574]: Executing: /bin/mount -n
XXX.YYY.32.75:/vol/vol3 /mnt/NFSSERVER1_VOL -t nfs -o
ro,intr,nolock,tcp,rdirplus,noatime,_netdev
Feb 23 06:52:13 SERVER systemd[1]: mnt-NFSSERVER1_VOL.mount mounting timed
out. Stopping.
Feb 23 06:52:13 SERVER systemd[1]: mnt-NFSSERVER1_VOL.mount changed
mounting -> mounting-sigterm
Feb 23 06:52:13 SERVER systemd[1]: Child 574 belongs to
mnt-NFSSERVER1_VOL.mount
Feb 23 06:52:13 SERVER systemd[1]: mnt-NFSSERVER1_VOL.mount mount process
exited, code=killed status=15
Feb 23 06:52:13 SERVER systemd[1]: mnt-NFSSERVER1_VOL.mount changed
mounting-sigterm -> mounted
Feb 23 06:52:13 SERVER systemd[1]: Job mnt-NFSSERVER1_VOL.mount/start
finished, result=done
Feb 23 06:52:13 SERVER systemd[1]: Mounted /mnt/NFSSERVER1_VOL.
Feb 23 06:52:13 SERVER systemd[1]: Starting Remote File Systems.
Feb 23 06:52:13 SERVER systemd[1]: remote-fs.target changed dead -> active
Feb 23 06:52:13 SERVER systemd[1]: Job remote-fs.target/start finished,
result=done
Feb 23 06:52:13 SERVER systemd[1]: Reached target Remote File Systems.
Feb 23 06:52:13 SERVER systemd[1]: mnt-NFSSERVER1_VOL.mount changed mounted
-> failed

so it is actually marked as done/mounted (?!?): "mnt-NFSSERVER1_VOL.mount
changed mounting-sigterm -> mounted", which triggers "Job
remote-fs.target/start finished, result=done" - but that seems wrong, it
was sigterm'ed, why systemd thinks this is actually completed correctly? at
the end it doesnt, as it's marked as failed, but that "mounting-sigterm ->
mounted" transition appears to be a mistake


>
> >
> > also note like cron.service is started, even if we configured:
> >
> > # grep remote-fs /etc/systemd/system/cron.service
> > Requires=remote-fs.target
> > After=remote-fs.target
>
> What does `systemctl show remote-fs.target` say?

Id=remote-fs.target
Names=remote-fs.target
Requires=mnt-NFSSERVER12_VOL.mount mnt-NFSSERVER7_vol6.mount
mnt-NFSSERVER32.mount mnt-NFSSERVER9_vol1.mount mnt-NFSSERVER8.mount
mnt-NFSSERVER11_vol5.mount mnt-NFSSERVER5.mount mnt-NFSSERVER4.mount
mnt-gonzo.mount mnt-NFSSERVER2.mount mnt-NFSSERVER1_vol3.mount
Wants=remote-fs-pre.target
RequiredBy=cron.service
WantedBy=multi-user.target
Conflicts=shutdown.target
Before=console-setup.service kbd.service systemd-journal-flush.service
systemd-user-sessions.service mcelog.service ntp.service
ipmidetectd.service kexec-load.service nagios-nrpe-server.service
nutcracker.service lsyncd.service irqbalance.service ipwatchd.service
ganglia-monitor.service unbound.service bmc-watchdog.service
setup_hugetlbfs.service sysstat.service postfix.service cron.service
After=remote-fs-pre.target local-fs.target mnt-NFSSERVER1_VOL.mount
mnt-NFSSERVER2.mount mnt-NFSSERVER3.mount mnt-NFSSERVER4.mount
mnt-NFSSERVER5.mount mnt-NFSSERVER7_vol5.mount mnt-NFSSERVER8.mount
mnt-NFSSERVER9_vol1.mount mnt-NFSSERVER10.mount mnt-NFSSERVER11_vol6.mount
mnt-NFSSERVER12_vol3.mount
Documentation=man:systemd.special(7)
Description=Remote File Systems
LoadState=loaded
ActiveState=active
SubState=active
FragmentPath=/lib/systemd/system/remote-fs.target
DropInPaths=/run/systemd/generator/remote-fs.target.d/50-insserv.conf.conf
UnitFileState=enabled
InactiveExitTimestamp=Tue 2016-02-23 06:52:13 EST
InactiveExitTimestampMonotonic=104052746
ActiveEnterTimestamp=Tue 2016-02-23 06:52:13 EST
ActiveEnterTimestampMonotonic=104052746
ActiveExitTimestampMonotonic=0
InactiveEnterTimestampMonotonic=0
CanStart=yes
CanStop=yes
CanReload=no
CanIsolate=no
StopWhenUnneeded=no
RefuseManualStart=no
RefuseManualStop=no
AllowIsolate=no
DefaultDependencies=no
OnFailureJobMode=replace
IgnoreOnIsolate=no
IgnoreOnSnapshot=no
NeedDaemonReload=no
JobTimeoutUSec=0
ConditionResult=yes
ConditionTimestamp=Tue 2016-02-23 06:52:13 EST
ConditionTimestampMonotonic=104052734
Transient=no

>
> >
> > checking the status of that target:
> >
> > # systemctl status remote-fs.target
> > ● remote-fs.target - Remote File Systems
> >    Loaded: loaded (/lib/systemd/system/remote-fs.target; enabled)
> >   Drop-In: /run/systemd/generator/remote-fs.target.d
> >            └─50-insserv.conf.conf
> >    Active: active since Tue 2016-02-23 06:52:13 EST; 3h 15min ago
> >      Docs: man:systemd.special(7)
> >
> > Feb 23 06:52:13 SERVER systemd[1]: Starting Remote File Systems.
> > Feb 23 06:52:13 SERVER systemd[1]: Job remote-fs.target/start finished,
> > result=done
> > Feb 23 06:52:13 SERVER systemd[1]: Reached target Remote File Systems.
> >
> > so at the same time when mnt-NFSSERVER_VOL.mount is marked as failed,
the
> > remote-fs.target is marked as loaded successfully (which seems the wrong
> > status to me), and in fact the only failed unit is:
> >
> > # systemctl --failed
> >   UNIT                    LOAD   ACTIVE SUB    DESCRIPTION
> > ● mnt-NFSSERVER_VOL.mount loaded failed failed /mnt/NFSSERVER_VOL
> >
> > 1 loaded units listed. Pass --all to see loaded but inactive units, too.
>
> Looks like remote-fs.target does not Require its mounts, only Wants.

from the 'show' command above it seems it actually Requires the mount,
where did you find the info about only 'Want'ing the mounts?

>
> >
> >
> > can I provide more logs/info? do you see anything wrong on this
> > configuration that we might want to change?
>
> It seems strange that the mount fails.

yeah I agree it's strange, but there might be situations when the
particular nfs server is overloaded, and/or the network path is a bit
congested, and the mount might take a while to come up; we consider a slow
mount acceptable (we dont reboot machines that often, and servers are
already slow beasts to start, a few secs more is not an issue), we just
feel like if something goes a bit out of the usual, the mount is never
retried and it's just left it there, dying

> Any logs which might shed some
> light on why the mount command fails (especially network status)?

sadly i cant find anything relevant: dmesg has nothing, journalctl contains
nothing helpful/related, ad so do syslog or messages - any other place you
want me to look?

> Does
> the mount complete successfully if started via systemd post-boot ?
> that is, `systemctl start mnt-NFSSERVER_VOL.mount`.

it usually does, up to the point that we setup a "mount -t nfs -a" as an
@reboot cronjob just to get the mounts up

on this specific machine. I'm running that command now, so 24hrs after when
that happened, and it completes just fine (but i have no doubts it would
have succeeded even just after the boot failure):

# systemctl --failed --no-legend
mnt-NFSSERVER_VOL.mount loaded failed failed /mnt/NFSSERVER_VOL
# systemctl start mnt-NFSSERVER_VOL.mount

Feb 24 07:24:21 SERVER systemd[1]: Trying to enqueue job
mnt-NFSSERVER1_VOL.mount/start/replace
Feb 24 07:24:21 SERVER systemd[1]: Installed new job
mnt-NFSSERVER1_VOL.mount/start as 565
Feb 24 07:24:21 SERVER systemd[1]: Enqueued job
mnt-NFSSERVER1_VOL.mount/start as 565
Feb 24 07:24:21 SERVER systemd[1]: Mounting /mnt/NFSSERVER1_VOL...
Feb 24 07:24:21 SERVER systemd[1]: mnt-NFSSERVER1_VOL.mount: Directory
/mnt/majestix_vol3 to mount over is not empty, mounting anyway.
Feb 24 07:24:21 SERVER systemd[1]: About to execute: /bin/mount -n
XXX.YYY.32.75:/vol/vol3 /mnt/NFSSERVER1_VOL -t nfs -o
ro,intr,nolock,tcp,rdirplus,noatime,_netdev
Feb 24 07:24:21 SERVER systemd[1]: mnt-NFSSERVER1_VOL.mount changed failed
-> mounting
Feb 24 07:24:21 SERVER systemd[1]: Sent message type=signal sender=n/a
destination=n/a
object=/org/freedesktop/systemd1/unit/mnt_2dNFSSERVER1_5fvol3_2emount
interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=3
reply_cookie=0 error=n/a
Feb 24 07:24:21 SERVER systemd[1]: Sent message type=signal sender=n/a
destination=n/a
object=/org/freedesktop/systemd1/unit/mnt_2dNFSSERVER1_5fvol3_2emount
interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=4
reply_cookie=0 error=n/a
Feb 24 07:24:21 SERVER systemd[1]: Got message type=method_call sender=n/a
destination=org.freedesktop.systemd1
object=/org/freedesktop/systemd1/unit/mnt_2dNFSSERVER1_5fvol3_2emount
interface=org.freedesktop.DBus.Properties member=Get cookie=3
reply_cookie=0 error=n/a
Feb 24 07:24:21 SERVER systemd[27575]: Executing: /bin/mount -n
XXX.YYY.32.75:/vol/vol3 /mnt/NFSSERVER1_VOL -t nfs -o
ro,intr,nolock,tcp,rdirplus,noatime,_netdev
Feb 24 07:24:21 SERVER systemd[1]: Child 27575 belongs to
mnt-NFSSERVER1_VOL.mount
Feb 24 07:24:21 SERVER systemd[1]: mnt-NFSSERVER1_VOL.mount mount process
exited, code=exited status=0
Feb 24 07:24:21 SERVER systemd[1]: mnt-NFSSERVER1_VOL.mount changed
mounting -> mounted
Feb 24 07:24:21 SERVER systemd[1]: Job mnt-NFSSERVER1_VOL.mount/start
finished, result=done
Feb 24 07:24:21 SERVER systemd[1]: Mounted /mnt/NFSSERVER1_VOL.

# systemctl status | head
● SERVER
    State: running
     Jobs: 0 queued
   Failed: 0 units


thanks a ton!!

-- 
Sandro "morph" Tosi
My website: http://sandrotosi.me/
Me at Debian: http://wiki.debian.org/SandroTosi
G+: https://plus.google.com/u/0/+SandroTosi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/pkg-systemd-maintainers/attachments/20160224/500da87a/attachment-0002.html>


More information about the Pkg-systemd-maintainers mailing list