Bug#793116: systemd-journald exiting with SIGUSR1
Daniel Povey
dpovey at gmail.com
Fri Oct 28 17:52:27 BST 2016
To follow up: this did not fix the problem.
On Fri, Oct 21, 2016 at 2:45 PM, Daniel Povey <dpovey at gmail.com> wrote:
> I just want to follow up on this that I believe I have found the reason
> for this bug and I have a solution which I am testing out.
>
> In the output of `systemctl status systemd-journald` it says:
> Active: failed (Result: start-limit)
> and this is related to the systemd mechanism to stop cycles where services
> keep restarting. It's related to the configs
> StartLimitIntervalSec= [defaults to DefaultStartLimitIntervalSec=10 sec
> by default]
> StartLimitBurst= [defaults to DefaultStartLimitBurst=5 by default]
>
> so that if a service is restarted 5 times within 10 seconds, it's not
> restarted any more to avoid wasting resources.
> What seems to happen is that if systemd-journal-flush.service sends the
> signal SIGUSR1 to systemd-journald and systemd-journald wasn't ready to
> handle it for some reason (e.g. had just been started, or there was some
> hangup), then systemd-journald is restarted, and
> systemd-journal-flush.service immediately restarts also. Again
> systemd-journald won't be ready to handle the signal as it hasn't had time
> to run the signal handler, and the cycle repeats 5 times until the
> systemd-journald service hits the StartLimitBurst and dies. I verified
> using `lastcomm` (we have process accounting enabled) that this did in fact
> happen.
>
> The way I am trying to fix it is to introduce a restart delay of 10
> seconds into the systemd-journal-flush service. See the config file below;
> the only new line that differs from the defaults in
> /lib/systemd/system/systemd-journal-flush.service is the line
> RestartSec=10.
>
> Dan
>
>
>
> ======
> cat /etc/systemd/system/systemd-journal-flush.service
> # This file is part of systemd.
> #
> # systemd is free software; you can redistribute it and/or modify it
> # under the terms of the GNU Lesser General Public License as published by
> # the Free Software Foundation; either version 2.1 of the License, or
> # (at your option) any later version.
>
> [Unit]
> Description=Trigger Flushing of Journal to Persistent Storage
> Documentation=man:systemd-journald.service(8) man:journald.conf(5)
> DefaultDependencies=no
> Requires=systemd-journald.service
> After=systemd-journald.service local-fs.target remote-fs.target
> Before=systemd-user-sessions.service
> # Don't re-start this service too often, or we can get into a cycle where
> # if this service was run just after systemd-journald was started, it
> # fails to handle the signal, causing that service to be restarted
> # and this servie to be immediately restarted also, until we hit
> # the limit of StartLimitBurst=5 restarts and systemd-journald
> # dies.
> RestartSec=10
>
> [Service]
> ExecStart=/bin/systemctl kill --kill-who=main --signal=SIGUSR1
> systemd-journald.service
> Type=oneshot
>
>
> On Sat, Oct 15, 2016 at 9:48 PM, Daniel Povey <dpovey at gmail.com> wrote:
>
>> Another observation about this bug, which might be helpful.
>>
>> If the signal is sent to systemd-journald via
>> /bin/systemctl kill --kill-who=main --signal=SIGUSR1
>> systemd-journald.service
>> then messages like the following show up in the kernel messages from
>> `dmesg -T`, like:
>>
>> [Sat Oct 15 21:02:35 2016] systemd-journald[26517]: Received request to
>> flush runtime journal from PID 1
>>
>> but they don't show up in the output of 'journalctl -r'.
>> In /etc/systemd/journald.conf, it says:
>>
>> #MaxLevelStore=debug
>> #MaxLevelSyslog=debug
>>
>> so I would have thought the same messages would go to both places.
>> I don't know if I'm misunderstanding something here..
>>
>> Dan
>>
>>
>>
>>
>>
>> On Sat, Oct 15, 2016 at 9:07 PM, Daniel Povey <dpovey at gmail.com> wrote:
>>
>>> BTW, I attach the output from `systemd-analyze dump`, as dump.txt.
>>> It would be great if the debian people could help us look into this.
>>> Lennart has a policy that his team will only look into bug reports in
>>> the latest two versions of systemd, and obviously we are well behind that.
>>>
>>> Dan
>>>
>>>
>>> On Sat, Oct 15, 2016 at 8:53 PM, Daniel Povey <dpovey at gmail.com> wrote:
>>>
>>>>
>>>> I just want to report that we are suffering from this bug, and it is
>>>> quite frequent.
>>>> This is with version 215-17+deb8u5 .
>>>>
>>>> root at a12:~# systemctl status systemd-journald
>>>>
>>>> *** systemd-journald.service - Journal Service
>>>>
>>>> Loaded: loaded (/lib/systemd/system/systemd-journald.service;
>>>> static)
>>>>
>>>> Active: *failed* (Result: start-limit) since Sat 2016-10-15
>>>> 12:01:57 EDT; 8h ago
>>>>
>>>> Docs: man:systemd-journald.service(8)
>>>>
>>>> man:journald.conf(5)
>>>>
>>>> Process: 51561 ExecStart=/lib/systemd/systemd-journald *(code=killed,
>>>> signal=USR1)*
>>>>
>>>> Main PID: 51561 (code=killed, signal=USR1)
>>>>
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/pkg-systemd-maintainers/attachments/20161028/edf5a73e/attachment-0002.html>
More information about the Pkg-systemd-maintainers
mailing list