Bug#953116: [petsc-maint] 32 bit vs 64 bit PETSc

Sat May 23 07:18:43 BST 2020

Drew Parsons <dparsons at debian.org> writes:

> Hi, the Debian project is discussing whether we should start providing a 
> 64 bit build of PETSc (which means we'd have to upgrade our entire 
> computational library stack, starting from BLAS and going through MPI, 
> MUMPS, etc).

You don't need to change BLAS or MPI.

> A default PETSc build uses 32 bit addressing to index vectors and 
> matrices.  64 bit addressing can be switched on by configuring with 
> --with-64-bit-indices=1, allowing much larger systems to be handled.
>
> My question for petsc-maint is, is there a reason why 64 bit indexing is 
> not already activated by default on 64-bit systems?  Certainly C 
> pointers and type int would already be 64 bit on these systems.

Umm, x86-64 Linux is LP64, so int is 32-bit.  ILP64 is relatively exotic
these days.

> Is it a question of performance?  Is 32 bit indexing executed faster (in 
> the sense of 2 operations per clock cycle), such that 64-bit addressing 
> is accompanied with a drop in performance? 

Sparse iterative solvers are entirely limited by memory bandwidth;
sizeof(double) + sizeof(int64_t) = 16 incurs a performance hit relative
to 12 for int32_t.  It has nothing to do with clock cycles for
instructions, just memory bandwidth (and usage, but that is less often
an issue).

> In that case we'd only want to use 64-bit PETSc if the system being
> modelled is large enough to actually need it. Or is there a different
> reason that 64 bit indexing is not switched on by default?

It's just about performance, as above.  There are two situations in
which 64-bit is needed.  Historically (supercomputing with thinner
nodes), it has been that you're solving problems with more than 2B dofs.
In today's age of fat nodes, it also happens that a matrix on a single
MPI rank has more than 2B nonzeros.  This is especially common when
using direct solvers.  We'd like to address the latter case by only
promoting the row offsets (thereby avoiding the memory hit of promoting
column indices):

https://gitlab.com/petsc/petsc/-/issues/333

I wonder if you are aware of any static analysis tools that can
flag implicit conversions of this sort:

int64_t n = ...;
for (int32_t i=0; i<n; i++) {
  ...
}

There is -fsanitize=signed-integer-overflow (which generates a runtime
error message), but that requires data to cause overflow at every
possible location.