Bug#762700: systemd: journald fails to forward most boot messages to syslog

Wed Feb 25 17:59:59 GMT 2015

Control: severity -1 serious

(justification for severity: breaks unrelated software, in this case
all syslog daemons on a default installation; could possibly be
considered grave or even criticial - data loss affecting nearly all
users)

This bug is actually much worse than it seems. In fact, on a default
installation, most boot messages won't get logged via syslog. On a
relatively minimal VM, I consistently get the journal message:
    Forwarding to syslog missed 15 messages
(The exact number varies a bit.) On a bare metal machine with a bit
more installed (but still no GUI, for example), I get ~60 messages
missed.

More messages might get missed during the time the system is running,
but that depends on how many messages are generated, so if it's only a
few at a time, this won't be a problem.

The reason is the following: for datagram UNIX sockets, there are two
limits that actually make a difference: SO_SNDBUF and the system-wide
sysctl net.unix.max_dgram_qlen. For SOCK_DGRAM sockets, SO_SNDBUF
relates to the max size of a single dgram, whereas the sysctl relates
to how many datagrams may be queued at all. SO_RCVBUF doesn't have any
effect on UNIX sockets.

Some observations:

  - setting ReceiveBuffer=8M is a no-op here, could easily be dropped
  - SendBuffer=8M will increase the max size of a single log message
    that may be sent via this socket (8M is probably at bit much)
  - net.unix.max_dgram_qlen defaults to 10 (!)
  - any time a socket is created within the kernel, the current value
    of net.unix.max_dgram_qlen is copied into the socket's data
    structure within the kernel, so net.unix.max_dgram_qlen has to be
    set to the chosen value BEFORE the socket is created
  - net.unix.max_dgram_qlen applies to all UNIX dgram sockets in the
    system
  - funnily enough, the kernel can't properly count an in reality 11
    (instead of 10) datagrams can be queued (> instead of >= in the
    code)
  - syslog daemons read kernel messages directly from /proc/kmsg (and
    journal doesn't forward them), so they are not affected by this
    issue

This has the following consequences:

  - at boot, exactly 11 messages will be queued during the time the
    system boots but the syslog daemon is not started yet

  - unless you have a *really* minimal system, the number of messages
    generated while booting up is typically larger than 11, at least
    20-25 on even a relatively minimal VM, on a typical bare-metal
    system with more stuff installed it's even higher

  - all messages after those 11 all go to the journal, but are dropped
    from the syslog forwarding, so syslog won't get any more messages
    until it's finished starting and has consumed the first 11 messages
    in the queue

  - if you have a high message load, since journald might try to
    forward the messages to syslog in a relatively tight loop, the
    limit of 11 is easily surpassed even during runtime

Steps to reproduce:

1. boot a typical Debian Jessie installation (systemd, any syslog
    daemon, journal forwarding to syslog enabled)
2. run the following command:
      journalctl --quiet MESSAGE_ID=0027229ca0644181a76c4e92458afa2e
    This will print a message such as:
      systemd-journal[132]: Forwarding to syslog missed 35 messages.
3. run journalctl | grep -v kernel: | less to see the boot messages
4. check /var/log/syslog for other messages, and quite a few are
    missing (note that by default, some do go to /var/log/auth.log
    etc., but there will be some that don't appear at all in any log
    that syslog writes)

Solution:

  - first of all, increasing SendBuffer in the socket unit won't
    really help, unless you have *reall* long log lines (default max
    line len in the kernel is $((2048-32)), which corresponds to a
    SendBuffer of 1024 (for some reason the value is doubled, see
    man:unix(7)). So perhaps the default SendBuffer is sufficient or
    maybe increase to 16K or so, but 8M is most likely more than
    excessive.

  - increase net.unix.max_dgram_qlen to some higher value.

      This is the tricky part: max_dgram_qlen has to be set before the
      syslog socket is created, so using the sysctl.d/ support is not
      an option. The only easy way I found was to create service for
      this that runs at the very beginning:

      setup-dgram-qlen.service:
      [Unit]
      DefaultDependencies=no
      Before=syslog.socket

      [Service]
      Type=oneshot
      ExecStart=/sbin/sysctl -w net.unix.max_dgram_qlen=512

      (should be activated statically, by a Wants= dependency from
      syslog.socket)
      (possibly add StandardOutput=null)
      (or hack it into systemd bootup sequence after mounting /proc,
      but using this unit was much simpler and can later also be
      customized / masked / ... by the admin)

      Note that this is now set for all sockets, and I don't know the
      precise ramifications for that. (OTOH, if somebody really wanted
      to DOS using this, they could easily just create 1e6 sockets in
      the fs in a couple of processes, listen on them and then queue
      stuff there, so I don't see how setting this limit higher will
      hurt much, but I don't really have an overview here.)

      Also: I don't know what a reasonable number here is. With 512 it
      works, but is that better than 128? Or better than 1024? I don't
      really know...

Thoughts?

Christian