[debian-mysql] Bug#1031656: MariaDB autopkgtest upstream test suite fails on disk / data corruption on ppc64el

Otto Kekäläinen otto at debian.org
Sun Feb 19 21:59:05 GMT 2023


Package: mariadb-10.6,mariadb
Version: mariadb-10.6/1:10.6.11-2,mariadb/1:10.11.1-1

## Summary

MariaDB 10.6 and 10.11 have had random failures in different
autopkgtest 'upstream' tests starting from Feb 7th onward. The only
commonality is time, so it might be that the Debian autopkgtest runner
hosts have had a hardware issue, kernel issue or simply overload since
Feb 7th.


## Description

The autopkgtest 'upstream' in package src:mariadb-10.6 (and also
src:mariadb) started failing on various tests with the server randomly
shutting down due to on-disk data corruption and server log error
message:

2023-02-08 10:41:15 0 [Note] /usr/sbin/mariadbd: ready for connections.
Version: '10.6.11-MariaDB-2-log'  socket:
'/tmp/tmp.rj1KIL2yar/tmp/1/mysqld.1.sock'  port: 16000  Debian n/a
2023-02-08 10:41:18 4 [ERROR] InnoDB: Database page corruption on disk
or a failed read of file './ibdata1' page [page id: space=0, page
number=334]. You m
2023-02-08 10:41:18 4 [Note] InnoDB: Page dump (16384 bytes):
2023-02-08 10:41:18 4 [Note] InnoDB:
000000000000014effffffffffffffff00000000002ed5f50002000000000000
2023-02-08 10:41:18 4 [Note] InnoDB:
00000000000000000680083affffffff0000ffffffff0000000205c600000000
2023-02-08 10:41:18 4 [Note] InnoDB:
0000014406f2000000010000014e002c0000014e002c00000000000000450000
2023-02-08 10:41:18 4 [Note] InnoDB:
0000000000460001011000000000000000000000030e00000000014e0330ffff
2023-02-08 10:41:18 4 [Note] InnoDB:
ffff000000000000000000000000000000000000000000000000000000000000
2023-02-08 10:41:18 4 [Note] InnoDB:
0000000000000000000000000000000000000000000000000000000000000000

I downloaded manually most of the logs in February from
https://ci.debian.net/packages/m/mariadb-10.6/testing/ppc64el/
https://ci.debian.net/packages/m/mariadb/testing/ppc64el/

By grepping for 'InnoDB: Database page corruption on disk' and the
date, time and host I was able to construct this timeline that shows
that MariaDB 10.6.11-2 (uploaded on Jan 14th) started erroring on this
from Feb 8th onward, proving this is definitely a host hardware or
kernel upgrade issue:

mariadb 1:10.6.11-2 d.gz:305 data inconsistency errors
mariadb 1:10.6.11-2 d.gz:autopkgtest [10:35:18]: starting date and
time: 2023-02-08 10:35:18+0000
mariadb 1:10.6.11-2 d.gz:host ci-worker-ppc64el-02;

mariadb 1:10.6.11-2 c.gz:202 data inconsistency errors
mariadb 1:10.6.11-2 c.gz:autopkgtest [23:30:16]: starting date and
time: 2023-02-15 23:30:16+0000
mariadb 1:10.6.11-2 c.gz:host ci-worker-ppc64el-01;

mariadb 1:10.6.11-2 b.gz:203 data inconsistency errors
mariadb 1:10.6.11-2 b.gz:autopkgtest [15:55:32]: starting date and
time: 2023-02-17 15:55:32+0000
mariadb 1:10.6.11-2 b.gz:host ci-worker-ppc64el-01;

mariadb 1:10.6.11-2 a.gz:408 data inconsistency errors
mariadb 1:10.6.11-2 a.gz:autopkgtest [04:56:40]: starting date and
time: 2023-02-18 04:56:40+0000
mariadb 1:10.6.11-2 a.gz:host ci-worker-ppc64el-02;




For MariaDB 10.11 series there was no issues with this symptoms in
January or early February:

mariadb 1:10.11.1-1 a.gz:0 data inconsistency errors
mariadb 1:10.11.1-1 a.gz:autopkgtest [09:54:29]: starting date and
time: 2023-01-16 09:54:29+0000
mariadb 1:10.11.1-1 a.gz:host ci-worker-ppc64el-04;

mariadb 1:10.11.1-1 b.gz:0 data inconsistency errors
mariadb 1:10.11.1-1 b.gz:autopkgtest [16:38:18]: starting date and
time: 2023-01-20 16:38:18+0000
mariadb 1:10.11.1-1 b.gz:host ci-worker-ppc64el-01;

mariadb 1:10.11.1-2 a.gz:0 data inconsistency errors
mariadb 1:10.11.1-2 a.gz:autopkgtest [23:16:23]: starting date and
time: 2023-02-03 23:16:23+0000
mariadb 1:10.11.1-2 a.gz:host ci-worker-ppc64el-04;


Errors started on Feb 6th, with the exception of random passing ones
on Feb 7th and Feb 11th:

mariadb 1:10.11.1-3 a.gz:345 data inconsistency errors
mariadb 1:10.11.1-3 a.gz:autopkgtest [10:00:31]: starting date and
time: 2023-02-06 10:00:31+0000
mariadb 1:10.11.1-3 a.gz:host ci-worker-ppc64el-02;

mariadb 1:10.11.1-3 b.gz:0 data inconsistency errors
mariadb 1:10.11.1-3 b.gz:autopkgtest [06:52:36]: starting date and
time: 2023-02-07 06:52:36+0000
mariadb 1:10.11.1-3 b.gz:host ci-worker-ppc64el-01;

mariadb 1:10.11.1-4 a.gz:658 data inconsistency errors
mariadb 1:10.11.1-4 a.gz:autopkgtest [12:32:51]: starting date and
time: 2023-02-10 12:32:51+0000
mariadb 1:10.11.1-4 a.gz:host ci-worker-ppc64el-03;

mariadb 1:10.11.1-4 b.gz:406 data inconsistency errors
mariadb 1:10.11.1-4 b.gz:autopkgtest [17:36:39]: starting date and
time: 2023-02-10 17:36:39+0000
mariadb 1:10.11.1-4 b.gz:host ci-worker-ppc64el-04;

mariadb 1:10.11.1-4 c.gz:248 data inconsistency errors
mariadb 1:10.11.1-4 c.gz:autopkgtest [19:19:57]: starting date and
time: 2023-02-10 19:19:57+0000
mariadb 1:10.11.1-4 c.gz:host ci-worker-ppc64el-04;

mariadb 1:10.11.1-4 d.gz:0 data inconsistency errors
mariadb 1:10.11.1-4 d.gz:autopkgtest [19:05:34]: starting date and
time: 2023-02-11 19:05:34+0000
mariadb 1:10.11.1-4 d.gz:host ci-worker-ppc64el-04;

mariadb 1:10.11.2-1 a.gz:305 data inconsistency errors
mariadb 1:10.11.2-1 a.gz:autopkgtest [09:50:34]: starting date and
time: 2023-02-17 09:50:34+0000
mariadb 1:10.11.2-1 a.gz:host ci-worker-ppc64el-01;

mariadb 1:10.11.2-1 b.gz:303 data inconsistency errors
mariadb 1:10.11.2-1 b.gz:autopkgtest [01:00:30]: starting date and
time: 2023-02-18 01:00:30+0000
mariadb 1:10.11.2-1 b.gz:host ci-worker-ppc64el-04;

mariadb 1:10.11.2-1 c.gz:101 data inconsistency errors
mariadb 1:10.11.2-1 c.gz:autopkgtest [09:33:55]: starting date and
time: 2023-02-18 09:33:55+0000
mariadb 1:10.11.2-1 c.gz:host ci-worker-ppc64el-04;

mariadb 1:10.11.2-1 d.gz:507 data inconsistency errors
mariadb 1:10.11.2-1 d.gz:autopkgtest [19:29:58]: starting date and
time: 2023-02-18 19:29:58+0000
mariadb 1:10.11.2-1 d.gz:host ci-worker-ppc64el-04;

mariadb 1:10.11.2-1 e.gz:520 data inconsistency errors
mariadb 1:10.11.2-1 e.gz:autopkgtest [00:14:11]: starting date and
time: 2023-02-19 00:14:11+0000
mariadb 1:10.11.2-1 e.gz:host ci-worker-ppc64el-04;

mariadb 1:10.11.2-1 f.gz:303 data inconsistency errors
mariadb 1:10.11.2-1 f.gz:autopkgtest [05:10:00]: starting date and
time: 2023-02-19 05:10:00+0000
mariadb 1:10.11.2-1 f.gz:host ci-worker-ppc64el-02;

mariadb 1:10.11.2-1 g.gz:202 data inconsistency errors
mariadb 1:10.11.2-1 g.gz:autopkgtest [05:09:32]: starting date and
time: 2023-02-19 05:09:32+0000
mariadb 1:10.11.2-1 g.gz:host ci-worker-ppc64el-01;

mariadb 1:10.11.2-1 h.gz:406 data inconsistency errors
mariadb 1:10.11.2-1 h.gz:autopkgtest [17:44:33]: starting date and
time: 2023-02-19 17:44:33+0000
mariadb 1:10.11.2-1 h.gz:host ci-worker-ppc64el-01;

Failures happen in various different tests although the
index_merge_innodb, innodb_ext_key and xa_prepared_binlog_off have
several occurrences:

$ zgrep -F '[ fail ]' *.gz | sort -k 3.6
mariadb 1:10.11.2-1 e.gz:main.bootstrap_innodb 'innodb'           w2 [ fail ]
mariadb 1:10.11.2-1 a.gz:main.ctype_utf8mb4_innodb 'innodb'       w1 [ fail ]
mariadb 1:10.11.2-1 g.gz:main.ctype_utf8mb4_innodb 'innodb'       w5 [ fail ]
mariadb 1:10.11.2-1 a.gz:main.index_merge_innodb 'innodb'         w2 [ fail ]
mariadb 1:10.11.2-1 e.gz:main.index_merge_innodb 'innodb'         w2 [ fail ]
mariadb 1:10.11.2-1 d.gz:main.index_merge_innodb 'innodb'         w3 [ fail ]
mariadb 1:10.11.2-1 f.gz:main.index_merge_innodb 'innodb'         w3 [ fail ]
mariadb 1:10.11.2-1 h.gz:main.index_merge_innodb 'innodb'         w5 [ fail ]
mariadb 1:10.11.1-4 a.gz:main.innodb_ext_key 'covering,innodb,off' w1 [ fail ]
mariadb 1:10.6.11-2 b.gz:main.innodb_ext_key 'covering,innodb,on' w3 [ fail ]
mariadb 1:10.6.11-2 a.gz:main.innodb_ext_key 'innodb,off,unoptimized'
w2 [ fail ]
mariadb 1:10.11.1-4 b.gz:main.innodb_ext_key 'innodb,off,unoptimized'
w4 [ fail ]
mariadb 1:10.11.2-1 e.gz:main.innodb_ext_key 'innodb,on,unoptimized' w1 [ fail ]
mariadb 1:10.11.2-1 d.gz:main.innodb_ext_key 'innodb,on,unoptimized' w5 [ fail ]
mariadb 1:10.11.2-1 e.gz:main.keyread 'innodb'                    w2 [ fail ]
mariadb 1:10.11.1-4 c.gz:main.order_by_innodb 'innodb'            w4 [ fail ]
mariadb 1:10.11.1-4 a.gz:main.order_by_innodb 'innodb'            w5 [ fail ]
mariadb 1:10.6.11-2 a.gz:main.partition_explicit_prune 'innodb'   w1 [ fail ]
mariadb 1:10.11.2-1 b.gz:main.range_mrr_icp 'innodb'              w5 [ fail ]
mariadb 1:10.6.11-2 c.gz:main.range_vs_index_merge_innodb 'innodb' w3 [ fail ]
mariadb 1:10.6.11-2 d.gz:main.rowid_filter_innodb 'innodb'        w1 [ fail ]
mariadb 1:10.11.2-1 h.gz:main.rowid_filter_innodb 'innodb'        w4 [ fail ]
mariadb 1:10.11.1-4 b.gz:main.rowid_filter_innodb 'innodb'        w5 [ fail ]
mariadb 1:10.11.1-3 a.gz:main.update_use_source 'innodb'          w3 [ fail ]
mariadb 1:10.6.11-2 d.gz:main.xa_prepared_binlog_off 'innodb'     w1 [ fail ]
mariadb 1:10.11.2-1 b.gz:main.xa_prepared_binlog_off 'innodb'     w2 [ fail ]
mariadb 1:10.11.1-3 a.gz:main.xa_prepared_binlog_off 'innodb'     w4 [ fail ]
mariadb 1:10.11.2-1 c.gz:main.xa_prepared_binlog_off 'innodb'     w5 [ fail ]

They do however randomly pass as well, even on a retry on the same run:

mariadb 1:10.11.1-1 b.gz:main.index_merge_innodb 'innodb'         w3 [
pass ]   6194
mariadb 1:10.11.1-2 a.gz:main.index_merge_innodb 'innodb'         w1 [
pass ]   8445
mariadb 1:10.11.1-3 a.gz:main.index_merge_innodb 'innodb'         w4 [
pass ]   8226
mariadb 1:10.11.1-3 b.gz:main.index_merge_innodb 'innodb'         w5 [
pass ]  10499
mariadb 1:10.11.1-4 a.gz:main.index_merge_innodb 'innodb'         w3 [
pass ]   6620
mariadb 1:10.11.1-4 b.gz:main.index_merge_innodb 'innodb'         w3 [
pass ]  10248
mariadb 1:10.11.1-4 c.gz:main.index_merge_innodb 'innodb'         w4 [
pass ]   9350
mariadb 1:10.11.1-4 d.gz:main.index_merge_innodb 'innodb'         w3 [
pass ]  10274
mariadb 1:10.11.2-1 a.gz:main.index_merge_innodb 'innodb'         w2 [ fail ]
mariadb 1:10.11.2-1 b.gz:main.index_merge_innodb 'innodb'         w2 [
pass ]   6310
mariadb 1:10.11.2-1 c.gz:main.index_merge_innodb 'innodb'         w5 [
pass ]   5689
mariadb 1:10.11.2-1 d.gz:main.index_merge_innodb 'innodb'         w3 [ fail ]
mariadb 1:10.11.2-1 e.gz:main.index_merge_innodb 'innodb'         w2 [ fail ]
mariadb 1:10.11.2-1 e.gz:main.index_merge_innodb 'innodb'         w2 [
retry-pass ]  13856
mariadb 1:10.11.2-1 e.gz:main.index_merge_innodb 'innodb'         w2 [
retry-pass ]  21997
mariadb 1:10.11.2-1 f.gz:main.index_merge_innodb 'innodb'         w3 [ fail ]
mariadb 1:10.11.2-1 f.gz:main.index_merge_innodb 'innodb'         w3 [
retry-fail ]
mariadb 1:10.11.2-1 f.gz:main.index_merge_innodb 'innodb'         w3 [
retry-pass ]   7697
mariadb 1:10.11.2-1 g.gz:main.index_merge_innodb 'innodb'         w5 [
pass ]   7287
mariadb 1:10.11.2-1 h.gz:main.index_merge_innodb 'innodb'         w5 [ fail ]
mariadb 1:10.11.2-1 h.gz:main.index_merge_innodb 'innodb'         w5 [
retry-pass ]   6086
mariadb 1:10.11.2-1 h.gz:main.index_merge_innodb 'innodb'         w5 [
retry-pass ]   6118


## Conclusion

Since this kind of errors have not been seen on any upstream
builbot.mariadb.org runs, or on Ubuntu Launchpad.net runs for MariaDB
10.11 (or other versions), it looks very much like an issue in
autopkgtests hosts in Debian ppc64el and not like a regression in
MariaDB itself.

It could potentially also simply be due to overload on the ppc64el
hosts now when there are so many builds and tests running.



More information about the pkg-mysql-maint mailing list