Bug#953116: [petsc-maint] 32 bit vs 64 bit PETSc

Sat May 23 16:54:23 BST 2020

On Sat, 23 May 2020, Junchao Zhang wrote:

> On Sat, May 23, 2020 at 1:49 AM Drew Parsons <dparsons at debian.org> wrote:
> 
> > On 2020-05-23 14:18, Jed Brown wrote:
> > > Drew Parsons <dparsons at debian.org> writes:
> > >
> > >> Hi, the Debian project is discussing whether we should start providing
> > >> a
> > >> 64 bit build of PETSc (which means we'd have to upgrade our entire
> > >> computational library stack, starting from BLAS and going through MPI,
> > >> MUMPS, etc).
> > >
> > > You don't need to change BLAS or MPI.
> >
> > I see, the PETSc API allows for PetscBLASInt and PetscMPIInt distinct
> > from PetscInt. That gives us more flexibility. (In any case, the Debian
> > BLAS maintainer is already providing blas64 packages. We've started
> > discussions about MPI).
> >
> > But what about MUMPS? Would MUMPS need to be built with 64 bit support
> > to work with 64-bit PETSc?
> > (the MUMPS docs indicate that its 64 bit support needs 64-bit versions
> > of BLAS, SCOTCH, METIS and MPI).
> >
> In MUMPS's manual, it is called full 64-bit. Out of the same memory
> bandwidth concern, MUMPS also supports selective 64-bit, in a sense it only
> uses int64_t for selected variables. One can still use it with 32-bit BLAS,
> MPI etc.  We support selective 64-bit MUMPS starting from petsc-3.13.0

If I remember correctly - the 'full 64-bit' mode relies on fortran compiler option '-i8' - which is basally equivalent to ILP64 - and this mode only works
with ILP64 MPI, BLAS etc from Intel-MPI/MKL

We haven't tried using MUMPS in this mode with PETSc

Satish

> 
> 
> >
> >
> > >> A default PETSc build uses 32 bit addressing to index vectors and
> > >> matrices.  64 bit addressing can be switched on by configuring with
> > >> --with-64-bit-indices=1, allowing much larger systems to be handled.
> > >>
> > >> My question for petsc-maint is, is there a reason why 64 bit indexing
> > >> is
> > >> not already activated by default on 64-bit systems?  Certainly C
> > >> pointers and type int would already be 64 bit on these systems.
> > >
> > > Umm, x86-64 Linux is LP64, so int is 32-bit.  ILP64 is relatively
> > > exotic
> > > these days.
> >
> >
> > oh ok. I had assumed int was 64 bit on x86-64. Thanks for the
> > correction.
> >
> >
> > >> Is it a question of performance?  Is 32 bit indexing executed faster
> > >> (in
> > >> the sense of 2 operations per clock cycle), such that 64-bit
> > >> addressing
> > >> is accompanied with a drop in performance?
> > >
> > > Sparse iterative solvers are entirely limited by memory bandwidth;
> > > sizeof(double) + sizeof(int64_t) = 16 incurs a performance hit relative
> > > to 12 for int32_t.  It has nothing to do with clock cycles for
> > > instructions, just memory bandwidth (and usage, but that is less often
> > > an issue).
> > >
> > >> In that case we'd only want to use 64-bit PETSc if the system being
> > >> modelled is large enough to actually need it. Or is there a different
> > >> reason that 64 bit indexing is not switched on by default?
> > >
> > > It's just about performance, as above.
> >
> >
> > Thanks Jed.  That's good justification for us to keep our current 32-bit
> > built then, and provide a separate 64-bit build alongside it.
> >
> >
> > >  There are two situations in
> > > which 64-bit is needed.  Historically (supercomputing with thinner
> > > nodes), it has been that you're solving problems with more than 2B
> > > dofs.
> > > In today's age of fat nodes, it also happens that a matrix on a single
> > > MPI rank has more than 2B nonzeros.  This is especially common when
> > > using direct solvers.  We'd like to address the latter case by only
> > > promoting the row offsets (thereby avoiding the memory hit of promoting
> > > column indices):
> > >
> > > https://gitlab.com/petsc/petsc/-/issues/333
> >
> > An interesting extra challenge.
> >
> >
> > > I wonder if you are aware of any static analysis tools that can
> > > flag implicit conversions of this sort:
> > >
> > > int64_t n = ...;
> > > for (int32_t i=0; i<n; i++) {
> > >   ...
> > > }
> > >
> > > There is -fsanitize=signed-integer-overflow (which generates a runtime
> > > error message), but that requires data to cause overflow at every
> > > possible location.
> >
> > I'll ask the Debian gcc team and the Science team if they have ideas
> > about this.
> >
> > Drew
> >
>