[Debian-med-packaging] Bug#995224: Bug#995224: relion-cuda: FTBFS with cub 1.14

Sascha Steinbiss satta at debian.org
Sat Feb 19 19:13:10 GMT 2022


Hi all,

greetings from the Debian Med Sprint 2021!

[...]
> /usr/bin/nvcc -M -D__CUDACC__ /build/relion-cuda-3.1.0/src/acc/cuda/cuda_projector_plan.cu -o /build/relion-cuda-3.1.0/build/src/apps/CMakeFiles/relion_gpu_util.dir/__/acc/cuda/relion_gpu_util_generated_cuda_projector_plan.cu.o.NVCC-depend -ccbin /usr/bin/cc -m64 -DINSTALL_LIBRARY_DIR=/usr/lib/ -DSOURCE_DIR=/build/relion-cuda-3.1.0/src/ -DACC_CUDA=2 -DACC_CPU=1 -DCUDA -DALLOW_CTF_IN_SGD -DHAVE_SINCOS -DHAVE_TIFF -Xcompiler ,\"-g\",\"-O2\",\"-ffile-prefix-map=/build/relion-cuda-3.1.0=.\",\"-fstack-protector-strong\",\"-Wformat\",\"-Werror=format-security\",\"-O3\",\"-DNDEBUG\" -arch=sm_35 -D__INTEL_COMPILER --default-stream per-thread --disable-warnings -DNVCC -I/usr/include -I/usr/lib/x86_64-linux-gnu/openmpi/include -I/usr/lib/x86_64-linux-gnu/openmpi/include/openmpi -I/build/relion-cuda-3.1.0 -I/usr/lib/fltk -I/usr/include/x86_64-linux-gnu
> nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
> In file included from /usr/include/thrust/system/cuda/config.h:33,
>                  from /usr/include/thrust/system/cuda/detail/execution_policy.h:35,
>                  from /usr/include/thrust/iterator/detail/device_system_tag.h:23,
>                  from /usr/include/thrust/iterator/detail/iterator_facade_category.h:22,
>                  from /usr/include/thrust/iterator/iterator_facade.h:37,
>                  from /build/relion-cuda-3.1.0/src/acc/cuda/cub/device/../iterator/arg_index_input_iterator.cuh:48,
>                  from /build/relion-cuda-3.1.0/src/acc/cuda/cub/device/device_reduce.cuh:41,
>                  from /build/relion-cuda-3.1.0/src/acc/cuda/cuda_utils_cub.cuh:16,
>                  from /build/relion-cuda-3.1.0/src/acc/cuda/cuda_projector_plan.cu:10:
> /usr/include/cub/util_namespace.cuh:46:2: error: #error CUB requires a definition of CUB_NS_QUALIFIER when CUB_NS_PREFIX/POSTFIX are defined.
>    46 | #error CUB requires a definition of CUB_NS_QUALIFIER when CUB_NS_PREFIX/POSTFIX are defined.
>       |  ^~~~~
> CMake Error at relion_gpu_util_generated_cuda_projector_plan.cu.o.Release.cmake:220 (message):
>   Error generating
>   /build/relion-cuda-3.1.0/build/src/apps/CMakeFiles/relion_gpu_util.dir/__/acc/cuda/./relion_gpu_util_generated_cuda_projector_plan.cu.o
> 
> 
> make[4]: *** [src/apps/CMakeFiles/relion_gpu_util.dir/build.make:1439: src/apps/CMakeFiles/relion_gpu_util.dir/__/acc/cuda/relion_gpu_util_generated_cuda_projector_plan.cu.o] Error 1
> make[4]: Leaving directory '/build/relion-cuda-3.1.0/build'
> 
> 
> This seems to be the Breaking Change described in
> https://github.com/NVIDIA/cub/releases/tag/1.14.0:
> 
>     #350: When the CUB_NS_[PRE|POST]FIX macros are set, CUB_NS_QUALIFIER
>     must also be defined to the fully qualified CUB namespace (e.g.
>     #define CUB_NS_QUALIFIER ::foo::cub). Note that this is handled
>     automatically when using the new [THRUST_]CUB_WRAPPED_NAMESPACE mechanism.

I updated the relion code to the latest upstream version (3.1.3) and
tried to rebuild in the hope that it changed something: it did, now I get:

[...]
/usr/bin/nvcc -M -D__CUDACC__
/build/relion-cuda-3.1.3/src/acc/cuda/cuda_projector_plan.cu -o
/build/relion-cuda-3.1.3/build/src/apps/CMakeFiles/relion_gpu_util.dir/__/acc/cuda/relion_gpu_util_generated_cuda_projector_plan.cu.o.NVCC-depend
-ccbin /usr/bin/cc -m64 -DINSTALL_LIBRARY_DIR=/usr/lib/
-DSOURCE_DIR=/build/relion-cuda-3.1.3/src/ -DACC_CUDA=2 -DACC_CPU=1
-DCUDA -DALLOW_CTF_IN_SGD -DHAVE_SINCOS -DHAVE_TIFF -Xcompiler
,\"-g\",\"-O2\",\"-ffile-prefix-map=/build/relion-cuda-3.1.3=.\",\"-fstack-protector-strong\",\"-Wformat\",\"-Werror=format-security\",\"-O3\",\"-DNDEBUG\"
-arch=sm_35 -D__INTEL_COMPILER --default-stream per-thread
--disable-warnings -DNVCC -I/usr/include
-I/usr/lib/x86_64-linux-gnu/openmpi/include
-I/usr/lib/x86_64-linux-gnu/openmpi/include/openmpi
-I/build/relion-cuda-3.1.3 -I/usr/lib/fltk -I/usr/include/x86_64-linux-gnu
nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35',
'sm_37' and 'sm_50' architectures are deprecated, and may be removed in
a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
In file included from
/usr/include/thrust/system/cuda/detail/execution_policy.h:35,
                 from
/usr/include/thrust/iterator/detail/device_system_tag.h:23,
                 from
/usr/include/thrust/iterator/detail/iterator_facade_category.h:22,
                 from /usr/include/thrust/iterator/iterator_facade.h:37,
                 from
/build/relion-cuda-3.1.3/src/acc/cuda/cub/device/../iterator/arg_index_input_iterator.cuh:48,
                 from
/build/relion-cuda-3.1.3/src/acc/cuda/cub/device/device_reduce.cuh:41,
                 from
/build/relion-cuda-3.1.3/src/acc/cuda/cuda_utils_cub.cuh:18,
                 from
/build/relion-cuda-3.1.3/src/acc/cuda/cuda_projector_plan.cu:10:
/usr/include/thrust/system/cuda/config.h:79:2: error: #error The version
of CUB in your include path is not compatible with this release of
Thrust. CUB is now included in the CUDA Toolkit, so you no longer need
to use your own checkout of CUB. Define THRUST_IGNORE_CUB_VERSION_CHECK
to ignore this.
   79 | #error The version of CUB in your include path is not compatible
with this release of Thrust. CUB is now included in the CUDA Toolkit, so
you no longer need to use your own checkout of CUB. Define
THRUST_IGNORE_CUB_VERSION_CHECK to ignore this.
      |  ^~~~~
CMake Error at
relion_gpu_util_generated_cuda_projector_plan.cu.o.Release.cmake:220
(message):
  Error generating

/build/relion-cuda-3.1.3/build/src/apps/CMakeFiles/relion_gpu_util.dir/__/acc/cuda/./relion_gpu_util_generated_cuda_projector_plan.cu.o


which seems to indicate that Thrust also has a problem. I then tried to
patch the relion headers to use Debian's cub instead of the embedded one:

diff --git a/src/acc/cuda/cuda_utils_cub.cuh
b/src/acc/cuda/cuda_utils_cub.cuh
index 3e32fb86..4b9efd25 100644
--- a/src/acc/cuda/cuda_utils_cub.cuh
+++ b/src/acc/cuda/cuda_utils_cub.cuh
@@ -14,10 +14,10 @@
 #endif

 #define CUB_NS_QUALIFIER ::cub # for compatibility with CUDA 11.5
-#include "src/acc/cuda/cub/device/device_radix_sort.cuh"
-#include "src/acc/cuda/cub/device/device_reduce.cuh"
-#include "src/acc/cuda/cub/device/device_scan.cuh"
-#include "src/acc/cuda/cub/device/device_select.cuh"
+#include "/usr/include/cub/device/device_radix_sort.cuh"
+#include "/usr/include/cub/device/device_reduce.cuh"
+#include "/usr/include/cub/device/device_scan.cuh"
+#include "/usr/include/cub/device/device_select.cuh"

 namespace CudaKernels
 {

which did not change anything.

I must admit I feel a bit out of ideas since I am not really familiar
with this ecosystem. Roland, maybe you know more and could shed some
light on this? Could this now be an issue not in relion-cude, but in thrust?

Thanks
Sascha


P.S. I pushed my version update to salsa, to the usual
`debian-contrib/master` branch.



More information about the Debian-med-packaging mailing list