Bug#777735: systemd: Disk corruption on reboot initiated through systemd
Peter Colberg
peter at colberg.org
Thu Feb 12 05:22:02 GMT 2015
On Thu, Feb 12, 2015 at 04:28:38AM +0100, Michael Biebl wrote:
> Just to be clear here: Is this only happening once after the
> dist-upgrade from wheezy (sysvinit) to jessie (systemd-sysv) on reboot
> or everytime you reboot your jessie system?
It happens everytime I reboot from within the VM.
> What makes you think this is a disk corruption? The log message you
> provided shows that the disks are clean?
Yes, this is not disk corruption. The fsck at bootup is clean.
This is more acurately described as data loss. I see it in the form of
log data produced shortly before the reboot (that is after reboot is
issued and before the machine is reset) not being written to disk.
Instead, the log files contain variably-sized holes of nil bytes.
> Which other file? What file system is this?
So far I noticed corruption in /var/log/syslog and /var/log/daemon.log
which are written by rsyslog, and in /var/log/prosody/prosody.log
which is written by prosody.
Each VM has attached a single virtio disk with a GPT partition label
that contains a swap partition, an ext4 filesystem for /, and further
ext4 filesystems for /home and /srv.
> Can you share such a (gzipped) qemu-kvm image or provide instructions,
> how we can reproduce the problem?
Sorry, I cannot share the machine images since they contain private
data. However, I have now disabled any non-essential services on one
machine. Please see the file 'ps_aux' that shows all processes running
when logged in via ssh. I can still reproduce the corruption with this
mimimal configuration.
> Since systemd-logind reacts on external ACPI power button presses with
> the same command (systemctl poweroff), I'm suprised that this is
> supposed to make a difference.
I have narrowed down the issue significantly: corruption only occurs if
qemu-kvm emulates a watchdog device. I am using libvirt, please see
the file 'himeji.xml' for the machine setup.
When I remove the watchdog device from the VM, the corruption
disappears and /var/log/syslog contains the expected last and
first messages before and after the reboot:
Feb 11 23:41:19 himeji rsyslogd: [origin software="rsyslogd" swVersion="8.4.2" x-pid="532" x-info="http://www.rsyslog.com"] exiting on signal 15.
Feb 11 23:41:57 himeji rsyslogd: [origin software="rsyslogd" swVersion="8.4.2" x-pid="518" x-info="http://www.rsyslog.com"] start
> Can you follow the steps at [1] and provide a verbose debug log from
> boot and poweroff by logging to the serial console.
>
>
> [1] http://freedesktop.org/wiki/Software/systemd/Debugging/
I captured a reboot of the VM with and without a watchdog device. Please note
that reboots in a VM with watchdog device were working fine with sysvinit.
Does systemd always use /dev/watchdog? /etc/systemd/system.conf was left
unmodified, in particular it does not define RuntimeWatchdogSec.
Regards,
Peter
-------------- next part --------------
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.6 0.1 28460 4548 ? Ss 23:45 0:01 /sbin/init
root 2 0.0 0.0 0 0 ? S 23:45 0:00 [kthreadd]
root 3 0.3 0.0 0 0 ? S 23:45 0:00 [ksoftirqd/0]
root 4 0.0 0.0 0 0 ? S 23:45 0:00 [kworker/0:0]
root 5 0.0 0.0 0 0 ? S< 23:45 0:00 [kworker/0:0H]
root 6 0.0 0.0 0 0 ? S 23:45 0:00 [kworker/u8:0]
root 7 0.3 0.0 0 0 ? S 23:45 0:00 [rcu_sched]
root 8 0.0 0.0 0 0 ? S 23:45 0:00 [rcu_bh]
root 9 0.0 0.0 0 0 ? S 23:45 0:00 [migration/0]
root 10 0.0 0.0 0 0 ? S 23:45 0:00 [watchdog/0]
root 11 0.0 0.0 0 0 ? S 23:45 0:00 [watchdog/1]
root 12 0.0 0.0 0 0 ? S 23:45 0:00 [migration/1]
root 13 0.4 0.0 0 0 ? S 23:45 0:00 [ksoftirqd/1]
root 14 0.0 0.0 0 0 ? S 23:45 0:00 [kworker/1:0]
root 15 0.0 0.0 0 0 ? S< 23:45 0:00 [kworker/1:0H]
root 16 0.0 0.0 0 0 ? S 23:45 0:00 [watchdog/2]
root 17 0.0 0.0 0 0 ? S 23:45 0:00 [migration/2]
root 18 0.3 0.0 0 0 ? S 23:45 0:00 [ksoftirqd/2]
root 19 0.0 0.0 0 0 ? S 23:45 0:00 [kworker/2:0]
root 20 0.0 0.0 0 0 ? S< 23:45 0:00 [kworker/2:0H]
root 21 0.0 0.0 0 0 ? S 23:45 0:00 [watchdog/3]
root 22 0.0 0.0 0 0 ? S 23:45 0:00 [migration/3]
root 23 0.3 0.0 0 0 ? S 23:45 0:00 [ksoftirqd/3]
root 24 0.0 0.0 0 0 ? S 23:45 0:00 [kworker/3:0]
root 25 0.0 0.0 0 0 ? S< 23:45 0:00 [kworker/3:0H]
root 26 0.0 0.0 0 0 ? S< 23:45 0:00 [khelper]
root 27 0.0 0.0 0 0 ? S 23:45 0:00 [kdevtmpfs]
root 28 0.0 0.0 0 0 ? S< 23:45 0:00 [netns]
root 29 0.0 0.0 0 0 ? S 23:45 0:00 [khungtaskd]
root 30 0.0 0.0 0 0 ? S< 23:45 0:00 [writeback]
root 31 0.0 0.0 0 0 ? SN 23:45 0:00 [ksmd]
root 32 0.0 0.0 0 0 ? SN 23:45 0:00 [khugepaged]
root 33 0.0 0.0 0 0 ? S< 23:45 0:00 [crypto]
root 34 0.0 0.0 0 0 ? S< 23:45 0:00 [kintegrityd]
root 35 0.0 0.0 0 0 ? S< 23:45 0:00 [bioset]
root 36 0.0 0.0 0 0 ? S< 23:45 0:00 [kblockd]
root 37 0.0 0.0 0 0 ? S 23:45 0:00 [kworker/0:1]
root 38 0.0 0.0 0 0 ? S 23:45 0:00 [kworker/1:1]
root 39 0.0 0.0 0 0 ? S 23:45 0:00 [kworker/2:1]
root 40 0.0 0.0 0 0 ? S 23:45 0:00 [kworker/3:1]
root 41 0.0 0.0 0 0 ? S 23:45 0:00 [kswapd0]
root 42 0.0 0.0 0 0 ? S 23:45 0:00 [fsnotify_mark]
root 48 0.0 0.0 0 0 ? S< 23:45 0:00 [kthrotld]
root 49 0.0 0.0 0 0 ? S< 23:45 0:00 [ipv6_addrconf]
root 50 0.0 0.0 0 0 ? S< 23:45 0:00 [deferwq]
root 51 0.0 0.0 0 0 ? S 23:45 0:00 [kworker/u8:1]
root 90 0.0 0.0 0 0 ? S< 23:45 0:00 [ata_sff]
root 92 0.0 0.0 0 0 ? S 23:45 0:00 [scsi_eh_0]
root 93 0.0 0.0 0 0 ? S< 23:45 0:00 [scsi_tmf_0]
root 94 0.0 0.0 0 0 ? S 23:45 0:00 [scsi_eh_1]
root 95 0.0 0.0 0 0 ? S< 23:45 0:00 [scsi_tmf_1]
root 96 0.0 0.0 0 0 ? S 23:45 0:00 [kworker/u8:2]
root 97 0.0 0.0 0 0 ? S 23:45 0:00 [kworker/u8:3]
root 109 0.0 0.0 0 0 ? S< 23:45 0:00 [kworker/3:1H]
root 110 0.0 0.0 0 0 ? S 23:45 0:00 [jbd2/vda3-8]
root 111 0.0 0.0 0 0 ? S< 23:45 0:00 [ext4-rsv-conver]
root 139 0.0 0.0 0 0 ? S 23:45 0:00 [kworker/0:2]
root 147 0.0 0.0 32956 2980 ? Ss 23:45 0:00 /lib/systemd/systemd-journald
root 148 0.0 0.0 0 0 ? S 23:45 0:00 [kauditd]
root 160 0.0 0.0 40984 3444 ? Ss 23:45 0:00 /lib/systemd/systemd-udevd
root 196 0.0 0.0 0 0 ? S< 23:45 0:00 [kpsmoused]
root 232 0.0 0.0 0 0 ? S 23:45 0:00 [kworker/2:2]
root 240 0.0 0.0 0 0 ? S< 23:45 0:00 [kworker/0:1H]
root 241 0.0 0.0 0 0 ? S< 23:45 0:00 [kworker/2:1H]
root 247 0.0 0.0 0 0 ? S 23:45 0:00 [jbd2/vda4-8]
root 248 0.0 0.0 0 0 ? S< 23:45 0:00 [ext4-rsv-conver]
root 250 0.0 0.0 0 0 ? S 23:45 0:00 [jbd2/vda5-8]
root 251 0.0 0.0 0 0 ? S< 23:45 0:00 [ext4-rsv-conver]
root 269 0.0 0.0 0 0 ? S 23:45 0:00 [kworker/3:2]
root 530 0.0 0.1 55160 5144 ? Ss 23:45 0:00 /usr/sbin/sshd -D
root 532 0.0 0.0 28260 2920 ? Ss 23:45 0:00 /lib/systemd/systemd-logind
message+ 533 0.0 0.0 42112 3348 ? Ss 23:45 0:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
root 535 0.0 0.0 262876 3600 ? Ssl 23:45 0:00 /usr/sbin/rsyslogd -n
root 538 0.0 0.0 12656 1824 ttyS0 Ss+ 23:45 0:00 /sbin/agetty --keep-baud 115200 38400 9600 ttyS0 vt102
root 545 0.0 0.1 91156 6156 ? Ss 23:47 0:00 sshd: root at pts/0
root 547 0.0 0.0 27220 3488 ? Ss 23:47 0:00 /lib/systemd/systemd --user
root 548 0.0 0.0 49740 1500 ? S 23:47 0:00 (sd-pam)
root 550 0.0 0.1 23716 5632 pts/0 Ss 23:47 0:00 -bash
root 566 0.0 0.0 17812 2320 pts/0 R+ 23:48 0:00 ps aux
-------------- next part --------------
A non-text attachment was scrubbed...
Name: himeji.xml
Type: application/xml
Size: 1985 bytes
Desc: not available
URL: <http://alioth-lists.debian.net/pipermail/pkg-systemd-maintainers/attachments/20150212/14712ea6/attachment-0002.xml>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: himeji_reboot_with_watchdog.gz
Type: application/gzip
Size: 20959 bytes
Desc: not available
URL: <http://alioth-lists.debian.net/pipermail/pkg-systemd-maintainers/attachments/20150212/14712ea6/attachment.gz>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: himeji_reboot_without_watchdog.gz
Type: application/gzip
Size: 20924 bytes
Desc: not available
URL: <http://alioth-lists.debian.net/pipermail/pkg-systemd-maintainers/attachments/20150212/14712ea6/attachment-0001.gz>
More information about the Pkg-systemd-maintainers
mailing list