Bug#986527: Patches for flaky build and cython unavailability

Ahzo ahzo at tutanota.com
Sat Jul 31 20:47:42 BST 2021


Control: tags -1 patch

Hi,

the main problem making the sagemath testsuite flaky is that it randomly aborts due to 'Too many open files'.
Thus only a small part of the test suite gets actually run, when the build is heavily parallelized.
This can be seen by reporting not only the number of failed, but also that of run tests, which shows significant fluctuations.

The problem occurs, because every finished, but not yet logged worker, holds an open fd (a pipe used to read the output of the child actually doing the tests).
Thus when following a long running worker, i.e. logging its messages, while it is still running, so many finished tests can accumulate, that the open files limit (ulimit -n) is reached.

However, there should be no open pipe per finished worker, as the test suite calls 'os.close(self.rmessages)' before waiting for logging the messages.
So this seems to be caused by something that python does behind the scenes.
Removing the single line 'finished.append(w)' in src/sage/doctest/forker.py prevents the open fd increase, though at the cost of hardly logging any test output.

This problem can be avoided by simply logging every finished test, but no running one.

With only the 0001-Report-the-number-of-total-tests-run.patch, the result is something like:
Success: 5 of 71435 tests failed, up to 200 failures are tolerated

Adding the dt-Do-not-follow-a-running-worker.patch, the result becomes:
Success: 194 of 361139 tests failed, up to 200 failures are tolerated

These 194 failures are pretty close to the threshold of 200, so it is not particularly surprising, that this can fail in some environments.
Slightly passing this threshold triggered the build failure in this bug and also the one in bug #983931.

Increasing the threshold to 300 should make that rather unlikely, though.
And considering that there are more than 360 thousand tests, less then 300 failures means more than 99.9 % of the tests succeeded.

The "cython: not found" issue is trivial to fix and important, because otherwise 'sage --cython' does not work and there is no '--cython3' option (unlike e.g. the '--ipython3' option).

After adding the 0002-Tolerate-up-to-300-failing-tests.patch and the u2-Adapt-to-python2-removal.patch the test result is:
Success: 189 of 361139 tests failed, up to 300 failures are tolerated

It would also be a good idea to include a backport of commit 5cf493ca51 ("Avoid libgmp's new lazy allocation") in the next sagemath upload, as that fixes a severe memory leak (see bug #964848).

As to the crashes, I can't reproduce any crash when testing interfaces/singular.py:
sage -t --long --random-seed=0 src/sage/interfaces/singular.py
    [404 tests, 3.87 s]

This crash also does not always happen for the reproducible builds either, e.g. the following log shows it first crashing and then passing this test:
https://tests.reproducible-builds.org/debian/rbuild/bullseye/amd64/sagemath_9.2-2.rbuild.log.gz
[...]
sage -t --long --random-seed=0 src/sage/interfaces/singular.py
    Killed due to segmentation fault
[...]
sage -t --long --random-seed=0 src/sage/interfaces/singular.py
    [404 tests, 21.06 s]
[...]

However, a number of other crashes happen during every test run, but only one of them causes a test failure:
sage -t --long --random-seed=0 src/sage/interfaces/tests.py
**********************************************************************
File "src/sage/interfaces/tests.py", line 34, in sage.interfaces.tests
Failed example:
    subprocess.call("echo syntax error | ecl", **kwds) in (0, 255)
Expected:
    True
Got:
    False
**********************************************************************

Similar crashes sometimes also occur when testing interfaces/lisp.py, but without causing the test to fail.
This is a problem in ecl, which crashes when both stdout and stderr are full, see bug #710953.

Then there is a crash in nauty-gentourng triggered by src/sage/graphs/digraph_generators.py.
For details see bug #991750.

There are also two SIGABRT crashes in mwrank triggered by src/sage/interfaces/mwrank.py.
These seem to be intentional due to invalid input.

Finally, there are some python crashes (5 SIGQUIT, 1 SIGABRT, 1 SIGSEGV) that are all caused intentionally by the test suite.

So none of these crashes are problems in sagemath itself.

Regards,
Ahzo
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Report-the-number-of-total-tests-run.patch
Type: text/x-patch
Size: 2928 bytes
Desc: not available
URL: <http://alioth-lists.debian.net/pipermail/debian-science-maintainers/attachments/20210731/b943575d/attachment-0012.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-Tolerate-up-to-300-failing-tests.patch
Type: text/x-patch
Size: 758 bytes
Desc: not available
URL: <http://alioth-lists.debian.net/pipermail/debian-science-maintainers/attachments/20210731/b943575d/attachment-0013.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dt-Do-not-follow-a-running-worker.patch
Type: text/x-patch
Size: 3708 bytes
Desc: not available
URL: <http://alioth-lists.debian.net/pipermail/debian-science-maintainers/attachments/20210731/b943575d/attachment-0014.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: u2-Adapt-to-python2-removal.patch
Type: text/x-patch
Size: 1941 bytes
Desc: not available
URL: <http://alioth-lists.debian.net/pipermail/debian-science-maintainers/attachments/20210731/b943575d/attachment-0015.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Report-the-number-of-total-tests-run.patch
Type: text/x-patch
Size: 2928 bytes
Desc: not available
URL: <http://alioth-lists.debian.net/pipermail/debian-science-maintainers/attachments/20210731/b943575d/attachment-0016.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-Tolerate-up-to-300-failing-tests.patch
Type: text/x-patch
Size: 758 bytes
Desc: not available
URL: <http://alioth-lists.debian.net/pipermail/debian-science-maintainers/attachments/20210731/b943575d/attachment-0017.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dt-Do-not-follow-a-running-worker.patch
Type: text/x-patch
Size: 3708 bytes
Desc: not available
URL: <http://alioth-lists.debian.net/pipermail/debian-science-maintainers/attachments/20210731/b943575d/attachment-0018.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: u2-Adapt-to-python2-removal.patch
Type: text/x-patch
Size: 1941 bytes
Desc: not available
URL: <http://alioth-lists.debian.net/pipermail/debian-science-maintainers/attachments/20210731/b943575d/attachment-0019.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Report-the-number-of-total-tests-run.patch
Type: text/x-patch
Size: 2928 bytes
Desc: not available
URL: <http://alioth-lists.debian.net/pipermail/debian-science-maintainers/attachments/20210731/b943575d/attachment-0020.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-Tolerate-up-to-300-failing-tests.patch
Type: text/x-patch
Size: 758 bytes
Desc: not available
URL: <http://alioth-lists.debian.net/pipermail/debian-science-maintainers/attachments/20210731/b943575d/attachment-0021.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dt-Do-not-follow-a-running-worker.patch
Type: text/x-patch
Size: 3708 bytes
Desc: not available
URL: <http://alioth-lists.debian.net/pipermail/debian-science-maintainers/attachments/20210731/b943575d/attachment-0022.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: u2-Adapt-to-python2-removal.patch
Type: text/x-patch
Size: 1941 bytes
Desc: not available
URL: <http://alioth-lists.debian.net/pipermail/debian-science-maintainers/attachments/20210731/b943575d/attachment-0023.bin>


More information about the debian-science-maintainers mailing list