Bug#762700: systemd: journald fails to forward most boot messages to syslog

Wed Feb 25 21:18:30 GMT 2015

Control: tags -1 - moreinfo

Am 2015-02-25 21:27, schrieb Michael Biebl:
> Am 25.02.2015 um 18:59 schrieb Christian Seiler:
>> Control: severity -1 serious
>>
>> Thoughts?
>
> Those missing syslog messages (only) happen when systemd flushes the
> initial boot kernel/messages to the syslog socket, right?

Not exactly, journald flushing is only for journald's own storage.
It always writes to the /run/systemd/journal/syslog socket and lets
the kernel buffer everything. There's never an explicit flush step.
At some point once syslog has started, it just starts consuming the
buffer. (Consisting of only 11 messages, as discussed.)

> Once the syslog daemon is running, do you still get messages about
> syslog messages being dropped?

The original reporter of the bug noticed this during normal
operations. Also, just trying some trivial things, I could
reproduce it with for example:

for i in $(seq 1 1000); do systemd-cat ls -l /usr/bin & done

Warning: this will produce millions of log messages and take a while.
(I overdid that, you can probably just do seq 1 10 or something.
journald will rotate this quite a few times during operation...)

Now this was really extreme, but there is software out there that
does log heavily (also depending on the log level set by the admin),
and just having a queue of 11 is really, really short in such
situations. Yes, at some point there has to be a cutoff, but I
don't think it should be 11.

> If the former, I don't think it's a serious bug. After all, with 
> classic
> syslog, you didn't even capture that data from early boot in the 
> first
> place.

The problem is that now syslog is started in parallel with all
other services, and at least here (both inside a VM and also on a
bare-metal machine) it consistently takes longer than most other
services to actually start. On a typical Wheezy system, syslog
is ordered before most services.

So for example, take the following log messages:

cron[679]: (CRON) INFO (pidfile fd = 3)
mcelog[685]: Starting Machine Check Exceptions decoder: mcelog.
cron[679]: (CRON) INFO (Running @reboot jobs)
dbus[694]: [system] Successfully activated service 
'org.freedesktop.systemd1'
smartd[681]: smartd 6.4 2014-10-07 r4002 [x86_64-linux-3.16.0-4-amd64] 
(local build)
smartd[681]: Copyright (C) 2002-14, Bruce Allen, Christian Franke, 
www.smartmontools.org
smartd[681]: Opened configuration file /etc/smartd.conf
smartd[681]: Drive: DEVICESCAN, implied '-a' Directive on line 21 of 
file /etc/smartd.conf
smartd[681]: Configuration file /etc/smartd.conf was parsed, found 
DEVICESCAN, scanning devices
smartd[681]: .... more messages about hard drives used ...
acpid: starting up with netlink and the input layer
acpid: 1 rule loaded
acpid: waiting for events: event logging is off
CRON[1027]: (logcheck) CMD (   if [ -x /usr/sbin/logcheck ]; then nice 
-n10 /usr/sbin/logcheck -R; fi)
exim4[686]: Starting MTA: exim4.

These services were all ordered after $syslog on Wheezy, and thus
their startup is logged. But syslog takes longer to start than
those services (at least on the machine I looked at, since it's a
race, it probably depends on a lot of factors), and thus these
messages are only present in /var/log/syslog on Jessie w/ systemd if
I apply the fix discussed here.

($syslog translates to the socket on systemd systems, not to the
syslog service itself.)

I did get ntpd's messages, probably because ntpdate is run before
ntpd is started, and ntpdate takes a while to sync the current time,
so that's the first message I typically see in syslog on a Jessie
system.

The paradoxical thing is that if you just increase the max qlen a
small bit, those messages will be the last ones in, because it's
FIFO. So in order to get all the messages that were previously
available at boot, the max number of datagrams queued has to be
relatively high... (# of messages at boot before syslog can
reasonably be expected to have started, which is probably at the
very least 100, if not more)

> It would certainly be good to fix this issue, but I don't think it
> qualifies as RC.

Even if you think my mistreatment of systemd-cat is just too
extreme, maybe, but other stuff that worked previously at boot is
in fact broken now.

Christian