Bug#931267: times out and drops into useless emergency shell with fsck still ongoing

Steve McIntyre steve at einval.com
Sun Jun 30 23:29:10 BST 2019


Control: retitle -1 starts up emergency shell with fsck running on the same console

Hi Michael, and thanks for the very quick responses!

I've analyzed my own logs some more and realised I've made a mistake
here when trying to work out the problem.

There's a local filesystem service (a FUSE-based transcoding
filesystem) which looks like it failed and that caused the problem,
not a fsck timeout. I need to see what happened there, and this is
nothing to do with you.

*However*, I still have the problem of fsck and the emergency shell
fighting over the console. :-/ That makes the problem(s) difficult to
debug.

On Sun, Jun 30, 2019 at 07:12:33AM +0200, Michael Biebl wrote:
>Control: tags -1 + moreinfo
>
>Hi Steve!
>
>Am 30.06.19 um 01:46 schrieb Steve McIntyre:
>
>> [ Ignore the system info etc. below - I'm running reportbug on a
>>   different system. ]
>> 
>> I've just dist-upgraded my headless home firewall/server from Stretch
>> to Buster. I did the usual task of config file merging. and then
>> rebooted the machine. It didn't come up on the network again
>> afterwards.
>> 
>> After rummaging around to connect a serial cable, there was no
>> interaction on the console so I rebooted again. Now I see that the
>> machine is running a fsck on the multiple large filesystems (Debian
>> mirror, video/audio data etc.) Fine - the machine had not been
>> rebooted in a long time, so the fsck was overdue. Then I see that
>> systemd has decided to time out startup of services and drop me into
>> an Emergency shell. With fsck going on and writing to the console, I
>> cannot useful interact with the shell. The fsck completed
>> successfully, but I had a headless machine that still needed
>> interaction to make it work again.
>> 
>> Several points/questions:
>> 
>>  * Why on earth do things have a short timeout when fsck is still
>>    running? It's normal for fsck on a large fs to take a long time,
>>    and this should not break bootup. Especially not on a headless box.
>
>This is odd. /lib/systemd/system/systemd-fsckd at .service uses
>"TimeoutSec=0", so it should not timeout.
>
>I quickly gave this a test. See
>https://people.debian.org/~biebl/boot.mp4

ACK!

>I artifically modified systemd-fsckd at .service to sleep for 200s to
>simulate a long fsck, I chose this deliberate to be > 180s, which is the
>default systemd timeout for services.
>As you can see, after 30s the eye-of-cylon animation kicks in and after
>300s the boot proceeds and successfully completes.
>This is even a more elaborate setup with LVM and stuff.

Nod.

>So I guess we need more information to debug this. Do you remember which
>job timed out? Do you have any logs from this failed boot which could
>give a clue which service triggered the start of the emergency shell?

See attached emergency-boot.log.gz - that's journalctl output from the
system from boot up until the emergency shell finished. "Jun 29
16:21:25" is the timestamp to look for, in particular

Jun 29 16:21:25 jack systemd[1]: share-music-ogg.mount: Mount process finished, but there is no mount.

>The default output goes to /dev/console, i.e. the currently active. I
>think what you are seeing here is another manifestation of
>https://github.com/systemd/systemd/issues/1840

Ish, I guess.

>>  * I haven't tried to reproduce this, but the initial interaction on
>>    the console seemed to show a hung machine. I had no useful
>>    interaction there. Is the Emergency shell setup meant to prompt
>>    again on password failure if I just hit <enter> several times?
>
>It should, yes.

OK. That didn't seem to be happening, but as I didn't have the console
cable attached from the start I can't be sure.

-- 
Steve McIntyre, Cambridge, UK.                                steve at einval.com
"Every time you use Tcl, God kills a kitten." -- Malcolm Ray
-------------- next part --------------
A non-text attachment was scrubbed...
Name: emergency-boot.log.gz
Type: application/gzip
Size: 15876 bytes
Desc: not available
URL: <http://alioth-lists.debian.net/pipermail/pkg-systemd-maintainers/attachments/20190630/415b985e/attachment-0001.gz>


More information about the Pkg-systemd-maintainers mailing list