fail to link cuda example with clang++-9 under Ubuntu 18.04 - cuda

I am trying to follow the example in
https://llvm.org/docs/CompileCudaWithLLVM.html#invoking-clang
I use Ubuntu 18.04.3 LTS, clang version 9.0.0-2
The device I have is (snippet from the output of deviceQuery):
Detected 1 CUDA Capable device(s)
Device 0: "Quadro P520"
CUDA Driver Version / Runtime Version 10.2 / 10.2
CUDA Capability Major/Minor version number: 6.1
I ran the command:
clang++-9 --verbose --cuda-path=/usr/local/cuda-10.2 axpy.cu -o axpy --cuda-gpu-arch=sm_61 -L/usr/local/cuda-10.2 -lcudart_static -ldl -lrt -pthread
And the output is:
clang version 9.0.0-2~ubuntu18.04.1 (tags/RELEASE_900/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
Found candidate GCC installation: /usr/bin/../lib/gcc/i686-linux-gnu/8
Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/7
Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/7.4.0
Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/8
Found candidate GCC installation: /usr/lib/gcc/i686-linux-gnu/8
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/7
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/7.4.0
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/8
Selected GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/7.4.0
Candidate multilib: .;#m64
Selected multilib: .;#m64
Found CUDA installation: /usr/local/cuda-10.2, version unknown
clang: error: cannot find libdevice for sm_61. Provide path to different CUDA installation via --cuda-path, or pass -nocudalib to build without linking with libdevice.
As far as I can tell, libdevice is right where it should be:
~>ls /usr/local/cuda-10.2/nvvm/libdevice/
libdevice.10.bc
What am I doing wrong ?
Added Nov 2020:
Following #ArtemB comment, I tried running it with clang++-10, which throws a warning, but compiles and runs just fine.

Short answer: The version of cuda my driver supports (10.2) is too current for my clang (9.0.0).
Here is the top of the output of nvidia-smi on my machine:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
So my driver indeed supports cuda-10.2. However, it seems this version is not supported by clang 9.0.0. Indeed when running the above command with the extra flag -nocudalib , one gets the following response (only showing the last lines):
In file included from <built-in>:1:
/usr/lib/llvm-9/lib/clang/9.0.0/include/__clang_cuda_runtime_wrapper.h:52:2: error: "Unsupported CUDA version!"
#error "Unsupported CUDA version!"
^
axpy.cu:23:7: error: use of undeclared identifier cudaConfigureCall
axpy<<<1, kDataLen>>>(a, device_x, device_y);
^
2 errors generated when compiling for sm_61.
When inspecting the offending file (the clang cuda runtime wrapper), one sees the following in lines 48-53:
#include "cuda.h"
#if !defined(CUDA_VERSION)
#error "cuda.h did not define CUDA_VERSION"
#elif CUDA_VERSION < 7000 || CUDA_VERSION > 10010
#error "Unsupported CUDA version!"
#endif

Until recently clang was rather particular about CUDA versions. I've relaxed it a bit lately, so clang-10 is more lenient and will attempt to use a newer CUDA version at a feature parity with the latest supported CUDA version (currently 10.1). It will also issue a warning. It does work with CUDA-11.0 well enough to compile Tensorflow.
CUDA-11.1 (and I believe 11.0 update1 on windows) have dropped the version.txt file from the distribution and that will break CUDA compilation with the currently released clang versions, again. This should be fixed in clang-11.0.1 when it's released (version match with CUDA is purely coincidental).

Related

Undefined reference to `cublasCreate_v2’ in ‘/tmp/tmpxft_0000120b_0000000-10_my_program”

I have tried to compile a code using CUDA 9.0 toolkit on NVIDIA Tesla P100 graphic card (Ubuntu version 16.04) and CUBLAS library is used in the code. For compilation, I have used the following command to compile “my_program.cu”
nvcc -std=c++11 -L/usr/local/cuda-9.0/lib64 my_program.cu -o mu_program.o -lcublas
But, I have got the following error:
nvlink error: Undefined reference to 'cublasCreate_v2’in '/tmp/tmpxft_0000120b_0000000-10_my_program’
As I have already linked the library path in the compilation command, why do I still get the error. Please help me to solve this error.
It seems fairly evident that you are trying to use the CUBLAS library in device code. This is different than ordinary host usage and requires special compilation/linking steps. You need to:
compile for the correct device architecture (must be cc3.5 or higher)
use relocatable device code linking
link in the cublas device library (in addition to the cublas host library)
link in the CUDA device runtime library
Use a CUDA toolkit prior to CUDA 10.0
The following additions to your compile command line should get you there:
nvcc -std=c++11 my_program.cu -o my_program.o -lcublas -arch=sm_60 -rdc=true -lcublas_device -lcudadevrt
The above assumes you are actually using a proper install of CUDA 9.0. The CUBLAS device library was deprecated and is now removed from newer CUDA toolkits (see here).

Using CUDA 8.0 with GCC 6.x - bad function overloading complaint

I'm trying to build some CUDA code using GCC 6.2.1, the default compiler of my distribution (Note: not a GCC version officially supported by CUDA, so you can call this experimental). This is code which builds fine with GCC 4.9.3 and both CUDA versions 7.5 and 8.0.
Well, if I build the following (close-to) minimal example:
#include <tuple>
int main() { return 0; }
with the command-line
nvcc -std=c++11 -Wno-deprecated-gpu-targets -o main main.cu
I get the following errors:
/usr/local/cuda/bin/../targets/x86_64-linux/include/math_functions.h(8897): error: cannot overload functions distinguished by return type alone
/usr/local/cuda/bin/../targets/x86_64-linux/include/math_functions.h(8901): error: cannot overload functions distinguished by return type alone
2 errors detected in the compilation of "/tmp/tmpxft_000071fe_00000000-9_b.cpp1.ii".
Why is that? How can I correct/circumvent this?
TL;DR: Forget about it. Only use CUDA 8.x with GCC 5.x , and CUDA 9 or later with GCC 6.x
It seems other people have seen this issue with GCC 6.1.x and the suggestion is to add the following flags to nvcc: -Xcompiler -D__CORRECT_ISO_CPP11_MATH_H_PROTO (yes, two successive flags; see nvcc --help for details). (But I can't report complete success since other issues pop up instead.)
But remember that GCC 5.4.x is the latest supported version, and that probably has a good reason, so it's somewhat of a wild goose chase to force GCC 6.x onto it - especially when CUDA 9 is available now.

How to Install the CUDA Driver for TensorFlow (installing from source)

I'm trying to build TensorFlow from source and run it with GPU support. To install the toolkit I use the runfile, to install the driver I used the Additional Drivers Tool, since I did not get Ubuntu to boot into Text mode as specified in the CUDA documentation and stop lightdm and start lightdm does not work either, it gives me (also with sudo):
Name com.ubuntu.Upstart does not exist
So far I could build a release from the TensorFlow repository. However, when I'm trying to run the example as specified in the how-to
bazel-bin/tensorflow/cc/tutorials_example_trainer --use_gpu
the GPU apparently cannot be found:
jonas#jonas-Aspire-V5-591G:~/Documents/repos/tensoflow_fork$ bazel-bin/tensorflow/cc/tutorials_example_trainer --use_gpu
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
E tensorflow/stream_executor/cuda/cuda_driver.cc:491] failed call to cuInit: CUDA_ERROR_UNKNOWN
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:153] retrieving CUDA diagnostic information for host: jonas-Aspire-V5-591G
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:160] hostname: jonas-Aspire-V5-591G
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:185] libcuda reported version is: 352.63.0
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:356] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module 352.63 Sat Nov 7 21:25:42 PST 2015 GCC version: gcc version
4.9.2 (Ubuntu 4.9.2-10ubuntu13) """
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] kernel reported version is: 352.63.0
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:293] kernel version seems to match DSO: 352.63.0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:81] No GPU devices available on machine.
F tensorflow/cc/tutorials/example_trainer.cc:125] Check failed: ::tensorflow::Status::OK() == (session->Run({{"x", x}}, {"y:0", "y_normalized:0"}, {}, &outputs)) (OK vs. Invalid argument: Cannot assign a device to node 'y': Could not satisfy explicit device specification '/gpu:0' because no devices matching that specification are registered in this process; available devices: /job:localhost/replica:0/task:0/cpu:0
[[Node: y = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/gpu:0"](Const, x)]])
Aborted
I'm using a clean Ubuntu 15.04 installation on an Acer Notebook with the GTX950M.
Can anybody tell me how to properly install the driver?
Can you run deviceQuery (comes with cuda installation)? Can you see nvidia present in lspci/lsmod/nvidia-smi?
lsmod |grep nvidia
dmesg | grep -i nvidia
lspci | grep -i nvidia
nvidia-smi
You can reload nvidia module and look for error messages
modprobe -r nvidia
dmesg | tail
sudo dmesg | grep NVRM
Related issue https://github.com/tensorflow/tensorflow/issues/601

What is the minimum compute capability for CUDA compilation supported by LLVM compiler?

A CUDA source file can be compiled into PTX format using LLVM compiler with the command clang -Xclang -I$LIBCLC/include/generic -I$LIBCLC/include/ptx -Dcl_clang_storage_class_specifiers -O3 cudaFile.cu -S -o ptxOutputFile.ptx --cuda-gpu-arch=sm_XX
Where sm_XX can be replaced as sm_20, sm_30. For compute capability 1.0, when sm_XX was replaced with sm_10, it gives the error fatal error: cannot open file '/tmp/shared-25f2f5.s': No such file or directory
1 error generated.
So it seems the LLVM has a minimum compute capability of 2.0. Is this assumption correct?
It should be correct. As from CUDA 7.0, both the toolkit and driver support for sm_1x has stopped. If sm_20 works, it has to be the minimum.
CUDA Toolkit and CUDA Driver Support for Tesla Architecture
The CUDA Toolkit and CUDA Driver no longer supports the sm_10, sm_11, sm_12, and sm_13 architectures. As a consequence, CU_TARGET_COMPUTE_1x enum values have been removed from the CUDA headers.
http://developer.download.nvidia.com/compute/cuda/7_0/Prod/doc/CUDA_Toolkit_Release_Notes.pdf

nvcc -arch sm_52 gives error "Value 'sm_52' is not defined for option 'gpu-architecture'"

I updated my cuda toolkit from 5.5 to 6.5. Then following command
nvcc -arch=sm_52
starts to give me an error
nvcc fatal : Value 'sm_52' is not defined for option 'gpu-architecture'
Is this a bug ? or nvcc 6.5 does not support Maxwell virtual architecture.
CUDA Toolkit 6.5 was released before sm_52 architecture came into production.
After the arrival of sm_52 architecture, an update to CUDA 6.5 was released which enabled nvcc to generate code for sm_52.
Make sure you download the newer version of CUDA Toolkit 6.5.
P.S: I would rather use the latest version of toolkit (currently 7.0).