Bug#1033724: clang-16 fails to detect amdgpu arch
Cordell Bloor
cgmb at slerp.xyz
Fri Mar 31 05:15:27 BST 2023
Package: clang-16
Version: 1:16.0.0-1~exp5
Severity: normal
X-Debbugs-Cc: cgmb at slerp.xyz, debian-ai at lists.debian.org
Dear Maintainer,
Thank you for packaging amdgpu-arch as part of clang-tools-16! I see
that it can correctly detect the architecture of my installed GPU when
libhsa-runtime-dev is installed. Should libhsa-runtime-dev be
added to the list of Suggested packages?
In any case, I would expect that clang++-16 would be using amdgpu-arch
to autodetect the default GPU build target for hip programs, but that
doesn't seem to be happening. I have an RX 5700 XT installed on my
workstation and when I run amdgpu-arch-16, I see:
$ amdgpu-arch-16 2> /dev/null
gfx1010
However, when I build a sample program, it doesn't build for the right
architecture unless I manually specify. For example,
main.hip:
#include <stdio.h>
#include <stdlib.h>
#include <hip/hip_runtime.h>
#define CHECK_HIP(expr) do { \
hipError_t result = (expr); \
if (result != hipSuccess) { \
fprintf(stderr, "%s:%d: %s (%d)\n", \
__FILE__, __LINE__, \
hipGetErrorString(result), result); \
exit(EXIT_FAILURE); \
} \
} while(0)
__global__ void sq_arr(float *arr, int n) {
int tid = blockDim.x*blockIdx.x + threadIdx.x;
if (tid < n) {
arr[tid] = arr[tid] * arr[tid];
}
}
int main() {
enum { N = 5 };
float hArr[N] = { 1, 2, 3, 4, 5 };
float *dArr;
CHECK_HIP(hipMalloc(&dArr, sizeof(float) * N));
CHECK_HIP(hipMemcpy(dArr, hArr, sizeof(float) * N, hipMemcpyHostToDevice));
sq_arr<<<dim3(1), dim3(32,1,1), 0, 0>>>(dArr, N);
CHECK_HIP(hipMemcpy(hArr, dArr, sizeof(float) * N, hipMemcpyDeviceToHost));
for (int i = 0; i < N; ++i) {
printf("%f\n", hArr[i]);
}
CHECK_HIP(hipFree(dArr));
return 0;
}
build command:
clang++-16 --rocm-device-lib-path=/usr/lib/x86_64-linux-gnu/amdgcn/bitcode -x hip main.hip -lamdhip64
output:
$ ./a.out
KFD does not support xnack mode query.
ROCr must assume xnack is disabled.
"hipErrorNoBinaryForGpu: Unable to find code object for all current devices!"
Aborted (core dumped)
If I add -v to the build command, I can see that the compiler is
building for gfx906, which I presume is the default build architecture.
I'm not sure exactly why the architecture detection is failing. Could it
be a caused by the rename from amdgpu-arch to amdgpu-arch-16? Or perhaps
it's because in addition to printing my gpu architecture to stdout,
amdgpu-arch also prints this message to stderr?
KFD does not support xnack mode query.
ROCr must assume xnack is disabled.
I'm not exactly sure what the problem is, but I get my expected output
when I explicitly specify the architecture:
clang++-16 --rocm-device-lib-path=/usr/lib/x86_64-linux-gnu/amdgcn/bitcode --offload-arch=gfx1010 -x hip main.hip -lamdhip64
expected output:
# ./a.out
KFD does not support xnack mode query.
ROCr must assume xnack is disabled.
1.000000
4.000000
9.000000
16.000000
25.000000
The logic within clang to use amdgpu-arch is in the AMDGPU ToolChain
Driver. The answer to what is happening can probably be found within
the short function that deals with the amdgpu-arch [1].
P.S. I guess maybe the bitcode works cross-versions after all? I
wouldn't trust it without running a large test suite on a variety of
hardware platforms, but it's nice that it at least seems to work for
this particular example on this particular architecture. If it does
turn out to be compatible, we won't need to patch clang like was done
for LLVM 15. There is now a configuration file in LLVM 16 for specifying
default command-line options [2].
[1]: https://github.com/llvm/llvm-project/blob/llvmorg-16.0.0/clang/lib/Driver/ToolChains/AMDGPU.cpp#L751-L774
[2]: https://clang.llvm.org/docs/UsersManual.html#configuration-files
-- System Information:
Debian Release: 12.0
APT prefers unstable
APT policy: (500, 'unstable'), (1, 'experimental')
Architecture: amd64 (x86_64)
Kernel: Linux 6.1.0-5-amd64 (SMP w/32 CPU threads; PREEMPT)
Kernel taint flags: TAINT_WARN
Locale: LANG=C, LC_CTYPE=C.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: unable to detect
Versions of packages clang-16 depends on:
ii binutils 2.40-2
ii libc6 2.36-8
ii libc6-dev 2.36-8
ii libclang-common-16-dev 1:16.0.0-1~exp5
ii libclang-cpp16 1:16.0.0-1~exp5
ii libclang1-16 1:16.0.0-1~exp5
ii libgcc-12-dev 12.2.0-14
ii libgcc-s1 12.2.0-14
ii libllvm16 1:16.0.0-1~exp5
ii libobjc-12-dev 12.2.0-14
ii libstdc++-12-dev 12.2.0-14
ii libstdc++6 12.2.0-14
ii llvm-16-linker-tools 1:16.0.0-1~exp5
Versions of packages clang-16 recommends:
ii llvm-16-dev 1:16.0.0-1~exp5
ii python3 3.11.2-1
Versions of packages clang-16 suggests:
pn clang-16-doc <none>
pn wasi-libc <none>
-- no debconf information
More information about the Pkg-llvm-team
mailing list