<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sat, May 23, 2020 at 1:49 AM Drew Parsons <<a href="mailto:dparsons@debian.org">dparsons@debian.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 2020-05-23 14:18, Jed Brown wrote:<br>

> Drew Parsons <<a href="mailto:dparsons@debian.org" target="_blank">dparsons@debian.org</a>> writes:<br>

> <br>

>> Hi, the Debian project is discussing whether we should start providing <br>

>> a<br>

>> 64 bit build of PETSc (which means we'd have to upgrade our entire<br>

>> computational library stack, starting from BLAS and going through MPI,<br>

>> MUMPS, etc).<br>

> <br>

> You don't need to change BLAS or MPI.<br>

<br>

I see, the PETSc API allows for PetscBLASInt and PetscMPIInt distinct <br>

from PetscInt. That gives us more flexibility. (In any case, the Debian <br>

BLAS maintainer is already providing blas64 packages. We've started <br>

discussions about MPI).<br>

<br>

But what about MUMPS? Would MUMPS need to be built with 64 bit support <br>

to work with 64-bit PETSc?<br>

(the MUMPS docs indicate that its 64 bit support needs 64-bit versions <br>

of BLAS, SCOTCH, METIS and MPI).<br></blockquote>In MUMPS's manual, it is called full 64-bit. Out of the same memory bandwidth concern, MUMPS also supports selective 64-bit, in a sense it only uses int64_t for selected variables. One can still use it with 32-bit BLAS, MPI etc.  We support selective 64-bit MUMPS starting from petsc-3.13.0<div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

<br>

>> A default PETSc build uses 32 bit addressing to index vectors and<br>

>> matrices.  64 bit addressing can be switched on by configuring with<br>

>> --with-64-bit-indices=1, allowing much larger systems to be handled.<br>

>> <br>

>> My question for petsc-maint is, is there a reason why 64 bit indexing <br>

>> is<br>

>> not already activated by default on 64-bit systems?  Certainly C<br>

>> pointers and type int would already be 64 bit on these systems.<br>

> <br>

> Umm, x86-64 Linux is LP64, so int is 32-bit.  ILP64 is relatively <br>

> exotic<br>

> these days.<br>

<br>

<br>

oh ok. I had assumed int was 64 bit on x86-64. Thanks for the <br>

correction.<br>

<br>

<br>

>> Is it a question of performance?  Is 32 bit indexing executed faster <br>

>> (in<br>

>> the sense of 2 operations per clock cycle), such that 64-bit <br>

>> addressing<br>

>> is accompanied with a drop in performance?<br>

> <br>

> Sparse iterative solvers are entirely limited by memory bandwidth;<br>

> sizeof(double) + sizeof(int64_t) = 16 incurs a performance hit relative<br>

> to 12 for int32_t.  It has nothing to do with clock cycles for<br>

> instructions, just memory bandwidth (and usage, but that is less often<br>

> an issue).<br>

> <br>

>> In that case we'd only want to use 64-bit PETSc if the system being<br>

>> modelled is large enough to actually need it. Or is there a different<br>

>> reason that 64 bit indexing is not switched on by default?<br>

> <br>

> It's just about performance, as above.<br>

<br>

<br>

Thanks Jed.  That's good justification for us to keep our current 32-bit <br>

built then, and provide a separate 64-bit build alongside it.<br>

<br>

<br>

>  There are two situations in<br>

> which 64-bit is needed.  Historically (supercomputing with thinner<br>

> nodes), it has been that you're solving problems with more than 2B <br>

> dofs.<br>

> In today's age of fat nodes, it also happens that a matrix on a single<br>

> MPI rank has more than 2B nonzeros.  This is especially common when<br>

> using direct solvers.  We'd like to address the latter case by only<br>

> promoting the row offsets (thereby avoiding the memory hit of promoting<br>

> column indices):<br>

> <br>

> <a href="https://gitlab.com/petsc/petsc/-/issues/333" rel="noreferrer" target="_blank">https://gitlab.com/petsc/petsc/-/issues/333</a><br>

<br>

An interesting extra challenge.<br>

<br>

<br>

> I wonder if you are aware of any static analysis tools that can<br>

> flag implicit conversions of this sort:<br>

> <br>

> int64_t n = ...;<br>

> for (int32_t i=0; i<n; i++) {<br>

>   ...<br>

> }<br>

> <br>

> There is -fsanitize=signed-integer-overflow (which generates a runtime<br>

> error message), but that requires data to cause overflow at every<br>

> possible location.<br>

<br>

I'll ask the Debian gcc team and the Science team if they have ideas <br>

about this.<br>

<br>

Drew<br>

</blockquote></div></div>