Bug#959139: numpy breaks scikit-learn arm64 autopkgtest: assert_uniform_grid(Y, try_name)

Paul Gevers elbrus at debian.org
Wed Apr 29 20:55:48 BST 2020


Source: numpy, scikit-learn
Control: found -1 numpy/1:1.18.3-1
Control: found -1 scikit-learn/0.22.2.post1+dfsg-5
Severity: serious
Tags: sid bullseye
X-Debbugs-CC: debian-ci at lists.debian.org
User: debian-ci at lists.debian.org
Usertags: breaks needs-update

Dear maintainer(s),

With a recent upload of numpy the autopkgtest of scikit-learn fails in
testing on arm64 when that autopkgtest is run with the binary packages
of numpy from unstable. It passes when run with only packages from
testing. In tabular form:

                       pass            fail
numpy                  from testing    1:1.18.3-1
scikit-learn           from testing    0.22.2.post1+dfsg-5
all others             from testing    from testing

I copied some of the output at the bottom of this report.

Currently this regression is blocking the migration of numpy to testing
[1]. Due to the nature of this issue, I filed this bug report against
both packages. Can you please investigate the situation and reassign the
bug to the right package?

More information about this bug and the reason for filing it can be found on
https://wiki.debian.org/ContinuousIntegration/RegressionEmailInformation

Paul

[1] https://qa.debian.org/excuses.php?package=numpy

https://ci.debian.net/data/autopkgtest/testing/arm64/s/scikit-learn/5194679/log.gz

=================================== FAILURES
===================================
________________________ test_uniform_grid[barnes_hut]
_________________________

method = 'barnes_hut'

    @pytest.mark.parametrize('method', ['barnes_hut', 'exact'])
    def test_uniform_grid(method):
        """Make sure that TSNE can approximately recover a uniform 2D grid

        Due to ties in distances between point in X_2d_grid, this test
is platform
        dependent for ``method='barnes_hut'`` due to numerical imprecision.

        Also, t-SNE is not assured to converge to the right solution
because bad
        initialization can lead to convergence to bad local minimum (the
        optimization problem is non-convex). To avoid breaking the test
too often,
        we re-run t-SNE from the final point when the convergence is not
good
        enough.
        """
        seeds = [0, 1, 2]
        n_iter = 500
        for seed in seeds:
            tsne = TSNE(n_components=2, init='random', random_state=seed,
                        perplexity=20, n_iter=n_iter, method=method)
            Y = tsne.fit_transform(X_2d_grid)

            try_name = "{}_{}".format(method, seed)
            try:
>               assert_uniform_grid(Y, try_name)

/usr/lib/python3/dist-packages/sklearn/manifold/tests/test_t_sne.py:784:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _ _ _

Y = array([[ 52.326397  , -15.92225   ],
       [ 46.679527  , -20.175953  ],
       [ 40.870537  , -24.181147  ],
       ...[-35.291374  ,  22.122814  ],
       [-42.2738    ,  18.793724  ],
       [-48.922283  ,  15.606232  ]], dtype=float32)
try_name = 'barnes_hut_1'

    def assert_uniform_grid(Y, try_name=None):
        # Ensure that the resulting embedding leads to approximately
        # uniformly spaced points: the distance to the closest neighbors
        # should be non-zero and approximately constant.
        nn = NearestNeighbors(n_neighbors=1).fit(Y)
        dist_to_nn = nn.kneighbors(return_distance=True)[0].ravel()
        assert dist_to_nn.min() > 0.1

        smallest_to_mean = dist_to_nn.min() / np.mean(dist_to_nn)
        largest_to_mean = dist_to_nn.max() / np.mean(dist_to_nn)

        assert smallest_to_mean > .5, try_name
>       assert largest_to_mean < 2, try_name
E       AssertionError: barnes_hut_1
E       assert 6.67359409617653 < 2

/usr/lib/python3/dist-packages/sklearn/manifold/tests/test_t_sne.py:807:
AssertionError

During handling of the above exception, another exception occurred:

method = 'barnes_hut'

    @pytest.mark.parametrize('method', ['barnes_hut', 'exact'])
    def test_uniform_grid(method):
        """Make sure that TSNE can approximately recover a uniform 2D grid

        Due to ties in distances between point in X_2d_grid, this test
is platform
        dependent for ``method='barnes_hut'`` due to numerical imprecision.

        Also, t-SNE is not assured to converge to the right solution
because bad
        initialization can lead to convergence to bad local minimum (the
        optimization problem is non-convex). To avoid breaking the test
too often,
        we re-run t-SNE from the final point when the convergence is not
good
        enough.
        """
        seeds = [0, 1, 2]
        n_iter = 500
        for seed in seeds:
            tsne = TSNE(n_components=2, init='random', random_state=seed,
                        perplexity=20, n_iter=n_iter, method=method)
            Y = tsne.fit_transform(X_2d_grid)

            try_name = "{}_{}".format(method, seed)
            try:
                assert_uniform_grid(Y, try_name)
            except AssertionError:
                # If the test fails a first time, re-run with init=Y to
see if
                # this was caused by a bad initialization. Note that
this will
                # also run an early_exaggeration step.
                try_name += ":rerun"
                tsne.init = Y
                Y = tsne.fit_transform(X_2d_grid)
>               assert_uniform_grid(Y, try_name)

/usr/lib/python3/dist-packages/sklearn/manifold/tests/test_t_sne.py:792:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _ _ _

Y = array([[-18.169476  ,   6.0802336 ],
       [-18.278513  ,   2.8822129 ],
       [-18.671782  ,  -0.4646889 ],
       ...[ 22.550077  ,  19.698557  ],
       [ 21.399723  ,  22.933178  ],
       [ 16.22136   ,  28.22955   ]], dtype=float32)
try_name = 'barnes_hut_1:rerun'

    def assert_uniform_grid(Y, try_name=None):
        # Ensure that the resulting embedding leads to approximately
        # uniformly spaced points: the distance to the closest neighbors
        # should be non-zero and approximately constant.
        nn = NearestNeighbors(n_neighbors=1).fit(Y)
        dist_to_nn = nn.kneighbors(return_distance=True)[0].ravel()
        assert dist_to_nn.min() > 0.1

        smallest_to_mean = dist_to_nn.min() / np.mean(dist_to_nn)
        largest_to_mean = dist_to_nn.max() / np.mean(dist_to_nn)

        assert smallest_to_mean > .5, try_name
>       assert largest_to_mean < 2, try_name
E       AssertionError: barnes_hut_1:rerun
E       assert 2.145051767903112 < 2

/usr/lib/python3/dist-packages/sklearn/manifold/tests/test_t_sne.py:807:
AssertionError

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <http://alioth-lists.debian.net/pipermail/debian-science-maintainers/attachments/20200429/c79476be/attachment-0001.sig>


More information about the debian-science-maintainers mailing list