Bug#793116: systemd-journald exiting with SIGUSR1

Daniel Povey dpovey at gmail.com
Fri Oct 21 19:45:46 BST 2016


I just want to follow up on this that I believe I have found the reason for
this bug and I have a solution which I am testing out.

In the output of `systemctl status systemd-journald` it says:
   Active: failed (Result: start-limit)
and this is related to the systemd mechanism to stop cycles where services
keep restarting.  It's related to the configs
StartLimitIntervalSec=   [defaults to DefaultStartLimitIntervalSec=10 sec
by default]
StartLimitBurst=  [defaults to DefaultStartLimitBurst=5 by default]

so that if a service is restarted 5 times within 10 seconds, it's not
restarted any more to avoid wasting resources.
What seems to happen is that if systemd-journal-flush.service sends the
signal SIGUSR1 to systemd-journald and systemd-journald wasn't ready to
handle it for some reason (e.g. had just been started, or there was some
hangup), then systemd-journald is restarted, and
systemd-journal-flush.service immediately restarts also.  Again
systemd-journald won't be ready to handle the signal as it hasn't had time
to run the signal handler, and the cycle repeats 5 times until the
systemd-journald service hits the StartLimitBurst and dies.  I verified
using `lastcomm` (we have process accounting enabled) that this did in fact
happen.

The way I am trying to fix it is to introduce a restart delay of 10 seconds
into the systemd-journal-flush service.  See the config file below; the
only new line that differs from the defaults in
 /lib/systemd/system/systemd-journal-flush.service is the line
RestartSec=10.

Dan



======
cat /etc/systemd/system/systemd-journal-flush.service
#  This file is part of systemd.
#
#  systemd is free software; you can redistribute it and/or modify it
#  under the terms of the GNU Lesser General Public License as published by
#  the Free Software Foundation; either version 2.1 of the License, or
#  (at your option) any later version.

[Unit]
Description=Trigger Flushing of Journal to Persistent Storage
Documentation=man:systemd-journald.service(8) man:journald.conf(5)
DefaultDependencies=no
Requires=systemd-journald.service
After=systemd-journald.service local-fs.target remote-fs.target
Before=systemd-user-sessions.service
# Don't re-start this service too often, or we can get into a cycle where
# if this service was run just after systemd-journald was started, it
# fails to handle the signal, causing that service to be restarted
# and this servie to be immediately restarted also, until we hit
# the limit of StartLimitBurst=5 restarts and systemd-journald
# dies.
RestartSec=10

[Service]
ExecStart=/bin/systemctl kill --kill-who=main --signal=SIGUSR1
systemd-journald.service
Type=oneshot


On Sat, Oct 15, 2016 at 9:48 PM, Daniel Povey <dpovey at gmail.com> wrote:

> Another observation about this bug, which might be helpful.
>
> If the signal is sent to systemd-journald via
> /bin/systemctl kill --kill-who=main --signal=SIGUSR1
> systemd-journald.service
> then messages like the following show up in the kernel messages from
> `dmesg -T`, like:
>
> [Sat Oct 15 21:02:35 2016] systemd-journald[26517]: Received request to
> flush runtime journal from PID 1
>
> but they don't show up in the output of 'journalctl -r'.
> In  /etc/systemd/journald.conf, it says:
>
> #MaxLevelStore=debug
> #MaxLevelSyslog=debug
>
> so I would have thought the same messages would go to both places.
> I don't know if I'm misunderstanding something here..
>
> Dan
>
>
>
>
>
> On Sat, Oct 15, 2016 at 9:07 PM, Daniel Povey <dpovey at gmail.com> wrote:
>
>> BTW, I attach the output from `systemd-analyze dump`, as dump.txt.
>> It would be great if the debian people could help us look into this.
>> Lennart has a policy that his team will only look into bug reports in the
>> latest two versions of systemd, and obviously we are well behind that.
>>
>> Dan
>>
>>
>> On Sat, Oct 15, 2016 at 8:53 PM, Daniel Povey <dpovey at gmail.com> wrote:
>>
>>>
>>> I just want to report that we are suffering from this bug, and it is
>>> quite frequent.
>>> This is with version  215-17+deb8u5 .
>>>
>>> root at a12:~# systemctl status systemd-journald
>>>
>>> *** systemd-journald.service - Journal Service
>>>
>>>    Loaded: loaded (/lib/systemd/system/systemd-journald.service; static)
>>>
>>>    Active: *failed* (Result: start-limit) since Sat 2016-10-15 12:01:57
>>> EDT; 8h ago
>>>
>>>      Docs: man:systemd-journald.service(8)
>>>
>>>            man:journald.conf(5)
>>>
>>>   Process: 51561 ExecStart=/lib/systemd/systemd-journald *(code=killed,
>>> signal=USR1)*
>>>
>>>  Main PID: 51561 (code=killed, signal=USR1)
>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/pkg-systemd-maintainers/attachments/20161021/f457896f/attachment-0002.html>


More information about the Pkg-systemd-maintainers mailing list