jessie: help debugging NFS shares not mounted at boot, double mounts with mount -a, and @reboot cronjobs

Fri Mar 4 21:13:05 GMT 2016

On Wed, Mar 2, 2016 at 3:38 PM, Felipe Sateler <fsateler at debian.org> wrote:
> On 2 March 2016 at 11:46, Sandro Tosi <morph at debian.org> wrote:
>> Thanks Felipe to assist me with this and sorry for the late reply!
>>
>> On Thu, Feb 25, 2016 at 7:19 PM, Felipe Sateler <fsateler at debian.org> wrote:
>
>>> Maybe the timeout is just too short. Maybe adding
>>> x-systemd.device-timeout=90s helps?
>>
>>
>> i think that's already 90s by default? at least we see the mount fail after
>> 90s, so maybe we should set to a lower or higher value? I'm unsure if that's
>> what could trigger a "retry" of the mount, because as soon as I see the
>> machine online, i can login and issue a mount -t nfs -a and all the missing
>> mountpoints (still pending for systemd) are promptly mounted, so it's like
>> if they are "frozen" and a retry would just make them successful.
>
> I think that mount itself will not retry without being given the bg
> option, and that causes systemd to think the mounts are ready too
> early.

yeah indeed

> So yeah, I think trying with a larger timeout might be useful (if the
> server/network is overloaded). Another option would be to specify
> ordering relations so that they do not happen simultaneously:
>
> edit /etc/systemd/mnt-NFSSERVER_VOL.mount.d/local.conf and write:
>
> [Unit]
> After=mnt-NFSSERVER_VOL2.mount
>
> It may be worthwhile to try adding a bunch of these (one for each
> mount), so that they are mounted in order, to see if that changes
> anything.

mh, yeah i can probably try that, even if I am not sure we want to
actually do it at scale, as it will be a nightmare to automate and
keep updated as mountpoints get added and removed, but for testing is
fine

>>> A completely different alternative is to setup the nfs mounts as
>>> automounts instead of real mounts (ie, set x-systemd.automount
>>> option). This would have the aditional benefit of removing the need to
>>> specify dependency against remote-fs.target.
>>
>>
>> i dont think we want that: we prefer (for various reason) to have all the
>> mountpoint, services and processes running at the machine boot, and be
>> already there when we want to use them
>
> OK.
>
>>> I have to confess I don't have much more ideas on where to look...
>>
>>
>> ok, at least we are not alone in not understanding what's going on :)
>>
>> but as you can imagine, this is a rather nasty issue and -while we are
>> moving forward with the adoption of jessie- is making a lot of people
>> uncomfortable and skeptical (i just want to express the feelings we have not
>> any complains on the quality of systemd or debian :) )
>>
>> would you think it might be viable to contact systemd upstream about this?
>> Jessie runs 215 while upstream released 229 so the risk of a "get the latest
>> version and report back" is high, and it's not something easily doable i
>> guess (?).
>
> Well, if you have a test machine showing the problem that can be
> upgraded to stretch, it would be great to test it.

even if I could upgrade to stretch, it wont solve our problem as we
cannot run stretch in production, and too many other components are
different that it will somewhat of a pointless test. i was more
thinking of backporting systemd 229-2 (which is about to reach
testing): how hard would it be? would you prefer to maintain a proper
backport or just a one off (even done by me, even just for testing
purposes so not uploaded in bpo or so)?

> Indeed, bug reports are only accepted for the last couple of versions.
> On the mailing list you may get better responses.

I will try there and cc this list, thanks!

>> would you prefer to start this discussion yourself, as you might
>> have a well established relationship with systemd upstream? i can trigger
>> the discussion myself as well, and copying the systemd debiam maint ml, as
>> you prefer
>
> I think its best if you ask there, as there will likely be questions
> asked. It would be great also to have an as-full-as-possible log of a
> problematic boot. It is hard to follow "anonymized" and filtered logs,
> especially when the anonymization rules change on different mails ;)

that was indeed poor from my side (even though the changes were kinda
minor), apologize for that.

> Could you attach one? Or mail me privately if you prefer, and I can
> take a look to see if there is anything else that may be fishy.

we are extremely protective of what we do on our system, so i am not
able to provide a pristine boot log. I am attaching an anonymized
version of the first minutes of boot, hope that's good enough.

thanks a ton for your help!

-- 
Sandro "morph" Tosi
My website: http://sandrotosi.me/
Me at Debian: http://wiki.debian.org/SandroTosi
G+: https://plus.google.com/u/0/+SandroTosi
-------------- next part --------------
A non-text attachment was scrubbed...
Name: boot.log.gz
Type: application/x-gzip
Size: 32205 bytes
Desc: not available
URL: <http://alioth-lists.debian.net/pipermail/pkg-systemd-maintainers/attachments/20160304/d093c229/attachment-0002.bin>