jessie: help debugging NFS shares not mounted at boot, double mounts with mount -a, and @reboot cronjobs

Thu Feb 18 17:05:17 GMT 2016

On Thu, Feb 18, 2016 at 4:49 PM, Felipe Sateler <fsateler at debian.org> wrote:
> On 18 February 2016 at 13:41, Sandro Tosi <morph at debian.org> wrote:
>> On Thu, Feb 18, 2016 at 4:11 PM, Felipe Sateler <fsateler at debian.org> wrote:
>>> On 10 February 2016 at 14:37, Sandro Tosi <morph at debian.org> wrote:
>>>> Disabling ifplugd didnt change the situation, and there are still missing
>>>> mount points
>>>>
>>>>
>>>> On Tue, Feb 9, 2016 at 9:21 PM, Michael Biebl <biebl at debian.org> wrote:
>>>>> Am 09.02.2016 um 22:11 schrieb Sandro Tosi:
>>>>>>> Another idea: maybe it's related to name resolution. How is that
>>>>>>> configured? Does it help if you use IP adresses in /etc/fstab?
>>>>>>
>>>>>> # cat /etc/resolv.conf
>>>>>> search OUR-DOMAIN.com
>>>>>> nameserver 127.0.0.1
>>>>>> nameserver XXX.YYY.32.33
>>>>>> nameserver XXX.YYY.32.22
>>>>>> options no_tld_query
>>>>>>
>>>>>> on localhost we have unbound as dns cache with this config
>>>>>>
>>>>>> # cat /etc/unbound/unbound.conf
>>>>>> server:
>>>>>>        val-permissive-mode: yes
>>>>>>        local-zone: "10.in-addr.arpa" nodefault
>>>>>> forward-zone:
>>>>>>        name: .
>>>>>>        forward-addr: XXX.YYY.32.33
>>>>>>        forward-addr: XXX.YYY.32.22
>>>>>> remote-control:
>>>>>>        control-enable: yes
>>>>>>
>>>>>> the NFS storage appliance we are using is configured to have a
>>>>>> multiple ip addresses to resolve to the same domain name, and it
>>>>>> automatically balances connections between clients providing different
>>>>>> ip addresses, so we cannot change that.
>>>>>
>>>>> For testing purposes, it should be possible to configure one client to
>>>>> use a fixed IP address in /etc/fstab.
>>>>
>>>> oh yes, totally. I just tried that (with ifplugd still disabled) and...
>>>>
>>>>> If the mount then doesn't fail,
>>>>> you have narrowed down the problem then at least.
>>>>
>>>> ... sadly now all the nfs shares fail to mount at first:
>>>>
>>>> Feb 10 12:08:27 SERVER kernel: RPC: Registered tcp NFSv4.1 backchannel
>>>> transport module.
>>>> Feb 10 12:08:27 SERVER kernel: FS-Cache: Netfs 'nfs' registered for caching
>>>> Feb 10 12:08:27 SERVER kernel: NFS: Registering the id_resolver key type
>>>> Feb 10 12:08:27 SERVER kernel: Installing knfsd (copyright (C) 1996
>>>> okir at monad.swb.de).
>>>> Feb 10 12:08:30 SERVER kernel: igb 0000:01:00.0 eth0: igb: eth0 NIC Link is
>>>> Up 1000 Mbps Full Duplex, Flow Control: RX
>>>> Feb 10 12:08:30 SERVER mount[576]: mount to NFS server 'XXX.YYY.21.22'
>>>> failed: No route to host, retrying
>>>> Feb 10 12:08:30 SERVER mount[567]: mount to NFS server 'XXX.YYY.27.74'
>>>> failed: No route to host, retrying
>>>> Feb 10 12:08:30 SERVER mount[578]: mount to NFS server 'XXX.YYY.16.226'
>>>> failed: No route to host, retrying
>>>> Feb 10 12:08:30 SERVER mount[582]: mount to NFS server 'XXX.YYY.26.132'
>>>> failed: No route to host, retrying
>>>> Feb 10 12:08:30 SERVER mount[574]: mount to NFS server 'XXX.YYY.36.210'
>>>> failed: No route to host, retrying
>>>> Feb 10 12:08:30 SERVER mount[572]: mount to NFS server 'XXX.YYY.27.74'
>>>> failed: No route to host, retrying
>>>> Feb 10 12:08:30 SERVER mount[583]: mount to NFS server 'XXX.YYY.32.75'
>>>> failed: No route to host, retrying
>>>> Feb 10 12:08:30 SERVER mount[569]: mount to NFS server 'XXX.YYY.32.111'
>>>> failed: No route to host, retrying
>>>> Feb 10 12:08:30 SERVER mount[564]: mount to NFS server 'XXX.YYY.20.176'
>>>> failed: No route to host, retrying
>>>> Feb 10 12:08:30 SERVER mount[580]: mount to NFS server 'XXX.YYY.20.176'
>>>> failed: No route to host, retrying
>>>> Feb 10 12:08:30 SERVER mount[561]: mount.nfs: backgrounding
>>>> "XXX.YYY.20.176:/VOL"
>>>> Feb 10 12:08:30 SERVER mount[562]: mount.nfs: backgrounding
>>>> "XXX.YYY.27.74:/VOL"
>>>> Feb 10 12:08:30 SERVER mount[563]: mount.nfs: backgrounding
>>>> "XXX.YYY.32.111:/VOL"
>>>> Feb 10 12:08:30 SERVER mount[565]: mount.nfs: backgrounding
>>>> "XXX.YYY.27.74:/VOL"
>>>> Feb 10 12:08:30 SERVER mount[568]: mount.nfs: backgrounding
>>>> "XXX.YYY.36.210:/VOL"
>>>> Feb 10 12:08:30 SERVER mount[573]: mount.nfs: backgrounding
>>>> "XXX.YYY.21.22:/VOL"
>>>> Feb 10 12:08:30 SERVER mount[575]: mount.nfs: backgrounding
>>>> "XXX.YYY.16.226:/VOL"
>>>> Feb 10 12:08:30 SERVER mount[579]: mount.nfs: backgrounding
>>>> "XXX.YYY.26.132:/VOL"
>>>> Feb 10 12:08:30 SERVER mount[581]: mount.nfs: backgrounding
>>>> "XXX.YYY.32.75:/VOL"
>>>> Feb 10 12:08:30 SERVER mount[577]: mount.nfs: backgrounding
>>>> "XXX.YYY.20.176:/VOL"
>>>> Feb 10 12:08:30 SERVER nfs-common[612]: Starting NFS common utilities: statd
>>>> idmapd.
>>>>
>>>> but just above all these failures, the eth0 is marked as UP.
>>>
>>> Could the networking script be exiting too early?
>>
>> at which network script in particular are you referring to? we are
>> configuring our network in /etc/network/interfaces
>
> That would be networking.service (ie, /etc/init.d/networking).
>
> Are there more lines corresponding to the pids of the failed mounts
> (the number between [])?

I'm afraid i stupidly didnt save the logs for this machine status, and
in the attempt to replicate it, i ended up in the situation described
in the email form Feb 12th (with a mount coming up later than the
other but cron to be started anyway).

Let me know if you prefer to investigate this latest state (the
machine is still in that state and has not been touched since, and it
appears to be somehow relevant to the situation at hand) or do you
want me to start rebooting the node until we are able to replicate the
same situation as above.

>>
>>> Do you have more
>>> interfaces in these machines? Are all of them configured as auto or
>>> static?
>>
>> on this particular machine there is a single eth0 interface configured as auto
>
> So this is not the same setup as the previous one you posted? I'm
> getting a bit confused...

yes this has always been the same setup; my question about multiple
NICs is because we do have seen this behavior on machines with
multiple interfaces, and so we were wondering if that could make the
issue more likely to happen, but it was more of a curiosity.

the machine I am providing logs from is always the same with the same
exact configuration, unless specified (like disabling services as
requested, etc etc).

Thanks for the help!

-- 
Sandro "morph" Tosi
My website: http://sandrotosi.me/
Me at Debian: http://wiki.debian.org/SandroTosi
G+: https://plus.google.com/u/0/+SandroTosi