jessie: help debugging NFS shares not mounted at boot, double mounts with mount -a, and @reboot cronjobs

Fri Feb 12 17:32:50 GMT 2016

sorry the tests are going slowly, but another interesting thing i found is
this: i have an @reboot cronjob that check if the number of mounts in the
system is not what it's supposed to be, and it grabs some system
information. despite:

# grep remote /etc/systemd/system/cron.service
Requires=remote-fs.target
After=remote-fs.target

the cronjob was started when 'mount' still didnt have all the nfs mount
available (it became available at a later stage, but not when the cronjob
ran); from the boot log:

1005:Feb 10 16:44:38 SERVER systemd[1]: Installed new job
mnt-NFSSERVER.mount/start as 97
1707:Feb 10 16:44:40 SERVER systemd[1]: Mounting /mnt/NFSSERVER...
1709:Feb 10 16:44:40 SERVER systemd[1]: About to execute: /bin/mount -n
XXX.YYY.16.226:/VOL /mnt/NFSSERVER -t nfs -o
rw,intr,tcp,bg,rdirplus,noatime,_netdev
1711:Feb 10 16:44:40 SERVER systemd[1]: mnt-NFSSERVER.mount changed dead ->
mounting
1713:Feb 10 16:44:40 SERVER systemd[583]: Executing: /bin/mount -n
XXX.YYY.16.226:/VOL /mnt/NFSSERVER -t nfs -o
rw,intr,tcp,bg,rdirplus,noatime,_netdev
1815:Feb 10 16:44:43 SERVER systemd[1]: Child 583 belongs to
mnt-NFSSERVER.mount
1816:Feb 10 16:44:43 SERVER systemd[1]: mnt-NFSSERVER.mount mount process
exited, code=exited status=0
1817:Feb 10 16:44:43 SERVER systemd[1]: mnt-NFSSERVER.mount changed
mounting -> mounted
1818:Feb 10 16:44:43 SERVER systemd[1]: Job mnt-NFSSERVER.mount/start
finished, result=done
1819:Feb 10 16:44:43 SERVER systemd[1]: Mounted /mnt/NFSSERVER.
2106:Feb 10 16:44:43 SERVER systemd[1]: mnt-NFSSERVER.mount changed mounted
-> dead
2107:Feb 10 16:44:43 SERVER systemd[1]: Failed to destroy cgroup
/system.slice/mnt-NFSSERVER.mount: Device or resource busy
2632:Feb 10 16:44:54 SERVER systemd[1]: mnt-NFSSERVER.mount changed dead ->
mounted

2312:Feb 10 16:44:43 SERVER CRON[876]: (root) CMD
(/root/jessie_mount_test.sh)

so the cronjob is started before "mnt-NFSSERVER.mount changed dead ->
mounted" which happened 11 secs later (!) - did i define the dependency
between cron and remote-fs wrong? is the remote-fs target reached too early?

this is a slightly different case (a mountpoint is available later than the
others, not completely missing), so let me know if this is not interesting
and so I can discard these results (and reboot the machine, to try to
replicate the original issue) or if there's some value in debugging further
(as said, the machine is still up from when this happened, so I can inspect
further if needed).

Thanks!

On Wed, Feb 10, 2016 at 5:48 PM, Michael Biebl <biebl at debian.org> wrote:

> Am 10.02.2016 um 18:37 schrieb Sandro Tosi:
> > Disabling ifplugd didnt change the situation, and there are still missing
> > mount points
> >
> > On Tue, Feb 9, 2016 at 9:21 PM, Michael Biebl <biebl at debian.org> wrote:
> >> Am 09.02.2016 um 22:11 schrieb Sandro Tosi:
> >>>> Another idea: maybe it's related to name resolution. How is that
> >>>> configured? Does it help if you use IP adresses in /etc/fstab?
> >>>
> >>> # cat /etc/resolv.conf
> >>> search OUR-DOMAIN.com
> >>> nameserver 127.0.0.1
> >>> nameserver XXX.YYY.32.33
> >>> nameserver XXX.YYY.32.22
> >>> options no_tld_query
> >>>
> >>> on localhost we have unbound as dns cache with this config
> >>>
> >>> # cat /etc/unbound/unbound.conf
> >>> server:
> >>>        val-permissive-mode: yes
> >>>        local-zone: "10.in-addr.arpa" nodefault
> >>> forward-zone:
> >>>        name: .
> >>>        forward-addr: XXX.YYY.32.33
> >>>        forward-addr: XXX.YYY.32.22
> >>> remote-control:
> >>>        control-enable: yes
> >>>
> >>> the NFS storage appliance we are using is configured to have a
> >>> multiple ip addresses to resolve to the same domain name, and it
> >>> automatically balances connections between clients providing different
> >>> ip addresses, so we cannot change that.
> >>
> >> For testing purposes, it should be possible to configure one client to
> >> use a fixed IP address in /etc/fstab.
> >
> > oh yes, totally. I just tried that (with ifplugd still disabled) and...
> >
> >> If the mount then doesn't fail,
> >> you have narrowed down the problem then at least.
> >
> > ... sadly now all the nfs shares fail to mount at first:
> >
> > Feb 10 12:08:27 SERVER kernel: RPC: Registered tcp NFSv4.1 backchannel
> > transport module.
> > Feb 10 12:08:27 SERVER kernel: FS-Cache: Netfs 'nfs' registered for
> caching
> > Feb 10 12:08:27 SERVER kernel: NFS: Registering the id_resolver key type
> > Feb 10 12:08:27 SERVER kernel: Installing knfsd (copyright (C) 1996
> > okir at monad.swb.de).
> > Feb 10 12:08:30 SERVER kernel: igb 0000:01:00.0 eth0: igb: eth0 NIC Link
> is
> > Up 1000 Mbps Full Duplex, Flow Control: RX
> > Feb 10 12:08:30 SERVER mount[576]: mount to NFS server 'XXX.YYY.21.22'
> > failed: No route to host, retrying
> > Feb 10 12:08:30 SERVER mount[567]: mount to NFS server 'XXX.YYY.27.74'
> > failed: No route to host, retrying
> > Feb 10 12:08:30 SERVER mount[578]: mount to NFS server 'XXX.YYY.16.226'
> > failed: No route to host, retrying
> > Feb 10 12:08:30 SERVER mount[582]: mount to NFS server 'XXX.YYY.26.132'
> > failed: No route to host, retrying
> > Feb 10 12:08:30 SERVER mount[574]: mount to NFS server 'XXX.YYY.36.210'
> > failed: No route to host, retrying
> > Feb 10 12:08:30 SERVER mount[572]: mount to NFS server 'XXX.YYY.27.74'
> > failed: No route to host, retrying
> > Feb 10 12:08:30 SERVER mount[583]: mount to NFS server 'XXX.YYY.32.75'
> > failed: No route to host, retrying
> > Feb 10 12:08:30 SERVER mount[569]: mount to NFS server 'XXX.YYY.32.111'
> > failed: No route to host, retrying
> > Feb 10 12:08:30 SERVER mount[564]: mount to NFS server 'XXX.YYY.20.176'
> > failed: No route to host, retrying
> > Feb 10 12:08:30 SERVER mount[580]: mount to NFS server 'XXX.YYY.20.176'
> > failed: No route to host, retrying
> > Feb 10 12:08:30 SERVER mount[561]: mount.nfs: backgrounding
> > "XXX.YYY.20.176:/VOL"
> > Feb 10 12:08:30 SERVER mount[562]: mount.nfs: backgrounding
> > "XXX.YYY.27.74:/VOL"
> > Feb 10 12:08:30 SERVER mount[563]: mount.nfs: backgrounding
> > "XXX.YYY.32.111:/VOL"
> > Feb 10 12:08:30 SERVER mount[565]: mount.nfs: backgrounding
> > "XXX.YYY.27.74:/VOL"
> > Feb 10 12:08:30 SERVER mount[568]: mount.nfs: backgrounding
> > "XXX.YYY.36.210:/VOL"
> > Feb 10 12:08:30 SERVER mount[573]: mount.nfs: backgrounding
> > "XXX.YYY.21.22:/VOL"
> > Feb 10 12:08:30 SERVER mount[575]: mount.nfs: backgrounding
> > "XXX.YYY.16.226:/VOL"
> > Feb 10 12:08:30 SERVER mount[579]: mount.nfs: backgrounding
> > "XXX.YYY.26.132:/VOL"
> > Feb 10 12:08:30 SERVER mount[581]: mount.nfs: backgrounding
> > "XXX.YYY.32.75:/VOL"
> > Feb 10 12:08:30 SERVER mount[577]: mount.nfs: backgrounding
> > "XXX.YYY.20.176:/VOL"
> > Feb 10 12:08:30 SERVER nfs-common[612]: Starting NFS common utilities:
> > statd idmapd.
> >
> > but just above all these failures, the eth0 is marked as UP.
> >
> > in the critical-chain now I no longer see the remote-fs target (so I'm
> not
> > sure when it is started in relation with the networking target), is it
> > normal?
>
> Attach the output of systemctl status <failing-mount>.mount,
> systemd-analyze dump and journalctl -alb (with debugging enabled)
>
>
>
> --
> Why is it that all of the instruments seeking intelligent life in the
> universe are pointed away from Earth?
>
>

-- 
Sandro "morph" Tosi
My website: http://sandrotosi.me/
Me at Debian: http://wiki.debian.org/SandroTosi
G+: https://plus.google.com/u/0/+SandroTosi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/pkg-systemd-maintainers/attachments/20160212/587d4426/attachment-0002.html>