[Debian-science-sagemath] Buster todo list

Ximin Luo infinity0 at debian.org
Wed Jan 23 16:29:00 GMT 2019


Tobias Hansen:
> On 1/23/19 5:06 PM, Ximin Luo wrote:
>> Tobias Hansen:
>>> On 1/23/19 8:38 AM, Ximin Luo wrote:
>>>> François Bissey:
>>>>> You may be interested in the discussion I had with other people at
>>>>> https://trac.sagemath.org/ticket/26596#comment:21
>>>>> and
>>>>> https://github.com/cschwan/sage-on-gentoo/commit/1a5fefec74c222ee5d0673bb439c6d5a3b0c6e1e#commitcomment-30999803
>>>>>
>>>>> My summary:
>>>>> There is a subtle difference of behavior in xerblas as shipped by openblas (as a part of blas) and netlib's lapack function of the same name (shipped as part of lapack). If you ship an integrated openblas with lapack included you get the base blas function from openblas (that's the one vanilla sage sees). If you have a separate lapack based on netlib you get to load netlib's xerblas first. You may find that if you preload openblas first the problem may go away.
>>>>>
>>>> Thanks for the hint. Our problem was slightly similar but different.
>>>>
>>>> It seems that in Debian, since we install both atlas and lapack, the "system" libblas.so and liblapack.so by default point to the atlas implementations. Once I run "sudo update-alternatives" to make the following changes:
>>>>
>>>> update-alternatives: using /usr/lib/x86_64-linux-gnu/blas/libblas.so.3 to provide /usr/lib/x86_64-linux-gnu/libblas.so.3 (libblas.so.3-x86_64-linux-gnu) in manual mode
>>>> update-alternatives: using /usr/lib/x86_64-linux-gnu/blas/libblas.so to provide /usr/lib/x86_64-linux-gnu/libblas.so (libblas.so-x86_64-linux-gnu) in manual mode
>>>> update-alternatives: using /usr/lib/x86_64-linux-gnu/lapack/liblapack.so to provide /usr/lib/x86_64-linux-gnu/liblapack.so (liblapack.so-x86_64-linux-gnu) in manual mode
>>>> update-alternatives: using /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3 to provide /usr/lib/x86_64-linux-gnu/liblapack.so.3 (liblapack.so.3-x86_64-linux-gnu) in manual mode
>>>>
>>>> then matrix_double_dense.pyx passes again. All of those source files come from the lapack package.
>>>>
>>>> (Actually, narrowing it down further, only libblas.so.3 was required to be not-atlas, the other files could come from atlas. But it's cleaner and probably safer to switch them all at once.)
>>>>
>>>> So it seems that atlas is the culprit here, there is something wrong with its zgees or xerbla implementation. Tobias do you know why we are using both atlas and lapack at the same time? Can we just avoid atlas completely?
>>> We used to use openblas on the architectures that it supports and atlas on the others. We changed to atlas in 8.3-1~exp1 to "work around crashes with openblas 0.3.2" (fom changelog).
>>>
>>> I don't know why we depend on lapack. I think that dependency has been there a long time. Feel free to change it. Maybe we should revert the change from 8.3-1~exp1 and remove lapack?
>>>
>> lapack is the solution here, and atlas is the problematic one.
>>
>> I used update-alternatives above to work-around the issue but this requires root so we can't do it in the Debian build. A different work around is:
> 
> That's why I proposed trying to go back from atlas to openblas. We could also try lapack. If we depend on just one of them, update-alternatives should create the links automatically.
> 

Although even if we only use lapack, if a user installs atlas on their end machine then the same problem will occur. And I'm not sure we want to say sagemath Conflicts: libatlas3-base. Also the problem might change in a future version and perhaps OpenBLAS would be the problematic one - we don't want to have to keep switching the library just to deal with this.

Ideally, we should understand what the problem is on a deeper level. A few questions to drive this search:

1. Does atlas have a bug (i.e. is its behaviour against the BLAS specification)? If so we could fix it.

2. Is matrix_double_dense/is_unitary *supposed* to only directly rely on scipy? In this case lapack is *supposed* to be used (via scipy) and we somehow have to make this specific usage ignore atlas.

3. Is the BLAS specification under-specified for xerbla/zgees or whatever is the problem here? So different implementations can do random things, which Sage cannot predict?

It would be good if a sage dev could describe what they think might be happening. The problematic code section seems to be 

src/sage/matrix/matrix_double_dense.pyx:
    def is_unitary(self, tol=1e-12, algorithm='orthonormal'):
        if algorithm == 'orthonormal':
            # this branch

At some point it calls into libblas.so. All of the xerbla.f implementations I've seen contain a STOP command, but it is only being hit when we link against the ATLAS libblas.so.

X

-- 
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git



More information about the Debian-science-sagemath mailing list