I installed CUDA 7.0 as described here on Ubuntu 14.04. I look at matrix mul example, if I start executable file matrixMul that runs, but if I try to compile it gives me error on libraries.
i.e
user#Mars:~/Documenti/Bello/NVIDIA_CUDA-7.0_Samples/0_Simple/matrixMul$ nvcc matrixMul.cu
matrixMul.cu:36:30: fatal error: helper_functions.h: File o directory non esistente
#include <helper_functions.h>
^
compilation terminated.
The problem was caused by trying to compile the sample using nvcc without the correct compiler options, rather than with the supplied makefile. Using the makefile allowed the compilation to work successfully.
Related
I have tried to compile a code using CUDA 9.0 toolkit on NVIDIA Tesla P100 graphic card (Ubuntu version 16.04) and CUBLAS library is used in the code. For compilation, I have used the following command to compile “my_program.cu”
nvcc -std=c++11 -L/usr/local/cuda-9.0/lib64 my_program.cu -o mu_program.o -lcublas
But, I have got the following error:
nvlink error: Undefined reference to 'cublasCreate_v2’in '/tmp/tmpxft_0000120b_0000000-10_my_program’
As I have already linked the library path in the compilation command, why do I still get the error. Please help me to solve this error.
It seems fairly evident that you are trying to use the CUBLAS library in device code. This is different than ordinary host usage and requires special compilation/linking steps. You need to:
compile for the correct device architecture (must be cc3.5 or higher)
use relocatable device code linking
link in the cublas device library (in addition to the cublas host library)
link in the CUDA device runtime library
Use a CUDA toolkit prior to CUDA 10.0
The following additions to your compile command line should get you there:
nvcc -std=c++11 my_program.cu -o my_program.o -lcublas -arch=sm_60 -rdc=true -lcublas_device -lcudadevrt
The above assumes you are actually using a proper install of CUDA 9.0. The CUBLAS device library was deprecated and is now removed from newer CUDA toolkits (see here).
I am trying to follow the example in
https://llvm.org/docs/CompileCudaWithLLVM.html#invoking-clang
I use Ubuntu 18.04.3 LTS, clang version 9.0.0-2
The device I have is (snippet from the output of deviceQuery):
Detected 1 CUDA Capable device(s)
Device 0: "Quadro P520"
CUDA Driver Version / Runtime Version 10.2 / 10.2
CUDA Capability Major/Minor version number: 6.1
I ran the command:
clang++-9 --verbose --cuda-path=/usr/local/cuda-10.2 axpy.cu -o axpy --cuda-gpu-arch=sm_61 -L/usr/local/cuda-10.2 -lcudart_static -ldl -lrt -pthread
And the output is:
clang version 9.0.0-2~ubuntu18.04.1 (tags/RELEASE_900/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
Found candidate GCC installation: /usr/bin/../lib/gcc/i686-linux-gnu/8
Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/7
Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/7.4.0
Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/8
Found candidate GCC installation: /usr/lib/gcc/i686-linux-gnu/8
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/7
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/7.4.0
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/8
Selected GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/7.4.0
Candidate multilib: .;#m64
Selected multilib: .;#m64
Found CUDA installation: /usr/local/cuda-10.2, version unknown
clang: error: cannot find libdevice for sm_61. Provide path to different CUDA installation via --cuda-path, or pass -nocudalib to build without linking with libdevice.
As far as I can tell, libdevice is right where it should be:
~>ls /usr/local/cuda-10.2/nvvm/libdevice/
libdevice.10.bc
What am I doing wrong ?
Added Nov 2020:
Following #ArtemB comment, I tried running it with clang++-10, which throws a warning, but compiles and runs just fine.
Short answer: The version of cuda my driver supports (10.2) is too current for my clang (9.0.0).
Here is the top of the output of nvidia-smi on my machine:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
So my driver indeed supports cuda-10.2. However, it seems this version is not supported by clang 9.0.0. Indeed when running the above command with the extra flag -nocudalib , one gets the following response (only showing the last lines):
In file included from <built-in>:1:
/usr/lib/llvm-9/lib/clang/9.0.0/include/__clang_cuda_runtime_wrapper.h:52:2: error: "Unsupported CUDA version!"
#error "Unsupported CUDA version!"
^
axpy.cu:23:7: error: use of undeclared identifier cudaConfigureCall
axpy<<<1, kDataLen>>>(a, device_x, device_y);
^
2 errors generated when compiling for sm_61.
When inspecting the offending file (the clang cuda runtime wrapper), one sees the following in lines 48-53:
#include "cuda.h"
#if !defined(CUDA_VERSION)
#error "cuda.h did not define CUDA_VERSION"
#elif CUDA_VERSION < 7000 || CUDA_VERSION > 10010
#error "Unsupported CUDA version!"
#endif
Until recently clang was rather particular about CUDA versions. I've relaxed it a bit lately, so clang-10 is more lenient and will attempt to use a newer CUDA version at a feature parity with the latest supported CUDA version (currently 10.1). It will also issue a warning. It does work with CUDA-11.0 well enough to compile Tensorflow.
CUDA-11.1 (and I believe 11.0 update1 on windows) have dropped the version.txt file from the distribution and that will break CUDA compilation with the currently released clang versions, again. This should be fixed in clang-11.0.1 when it's released (version match with CUDA is purely coincidental).
i'm writing a simple code for fast fourier transform with cufft cuda library. My source file work well with visual studio in windows7 but with eclipse nsight, in ubuntu 14.04, not work!
i've installed nvidia 346.72 driver and cuda toolkit 7.0 and my video hardware is geforce 410M. When i build my source code i have following message:
16:56:24 **** Incremental Build of configuration Debug for project cufft_double ****
make all
Building target: cufft_double
Invoking: NVCC Linker
/usr/local/cuda-7.0/bin/nvcc --cudart static -L/usr/local/cuda-7.0/lib64 --relocatable-device-code=false -gencode arch=compute_20,code=compute_20 -gencode arch=compute_20,code=sm_20 -m64 -link -o "cufft_double" ./cufft_double.o
./cufft_double.o: In function `main':
/home/marco/cuda-workspace/cufft_double/Debug/../cufft_double.cu:79: undefined reference to `cufftPlan1d'
/home/marco/cuda-workspace/cufft_double/Debug/../cufft_double.cu:85: undefined reference to `cufftExecZ2Z'
/home/marco/cuda-workspace/cufft_double/Debug/../cufft_double.cu:108: undefined reference to `cufftDestroy'
/home/marco/cuda-workspace/cufft_double/Debug/../cufft_double.cu:111: undefined reference to `cufftPlan1d'
/home/marco/cuda-workspace/cufft_double/Debug/../cufft_double.cu:117: undefined reference to `cufftExecZ2Z'
/home/marco/cuda-workspace/cufft_double/Debug/../cufft_double.cu:136: undefined reference to `cufftDestroy'
collect2: error: ld returned 1 exit status
make: *** [cufft_double] Error 1
16:56:27 Build Finished (took 2s.792ms)
i tried to set library path but in preferences windows i read "no CUDA-compatible devices detected"
please help me!
Best reguards
marco
now i can build source code but my program not work!
i read this error:
modprobe: ERROR: could not insert 'nvidia_331_uvm': Invalid argument
and i receive a message programmed by me if "cudaGetLastError() != cudaSuccess"
after "cudaMalloc"
For best clarification i read "cuda error: allocazione fallita" for this frame of code:
cudaMalloc((void**)&out_device, sizeof(cufftDoubleComplex)*NX*BATCH);
if (cudaGetLastError() != cudaSuccess){
printf("Cuda error: allocazione fallita\n");
return 0;
};
Run these commands in sequence:
sudo apt-get remove --purge nvidia-*
sudo apt-get install cuda-drivers
sudo apt-get install nvidia-nsight
Restart the machine and open nsight and look at the properties whether you see the detected driver.
In the samples provided with CUDA 6.0, I'm running the following compile command with error output:
foo#foo:/usr/local/cuda-6.0/samples/0_Simple/cdpSimpleQuicksort$ nvcc --cubin -I../../common/inc cdpSimpleQuicksort.cu
nvcc warning : The 'compute_10' and 'sm_10' architectures are deprecated, and may be removed in a future release.
cdpSimpleQuicksort.cu(105): error: calling a __global__ function("cdp_simple_quicksort") from a __global__ function("cdp_simple_quicksort") is only allowed on the compute_35 architecture or above
cdpSimpleQuicksort.cu(114): error: calling a __global__ function("cdp_simple_quicksort") from a __global__ function("cdp_simple_quicksort") is only allowed on the compute_35 architecture or above
2 errors detected in the compilation of "/tmp/tmpxft_0000241a_00000000-6_cdpSimpleQuicksort.cpp1.ii".
I then altered the command to this, with a new failure:
foo#foo:/usr/local/cuda-6.0/samples/0_Simple/cdpSimpleQuicksort$ nvcc --cubin -I../../common/inc -gencode arch=compute_35,code=sm_35 cdpSimpleQuicksort.cu
cdpSimpleQuicksort.cu(105): error: kernel launch from __device__ or __global__ functions requires separate compilation mode
cdpSimpleQuicksort.cu(114): error: kernel launch from __device__ or __global__ functions requires separate compilation mode
2 errors detected in the compilation of "/tmp/tmpxft_000024f3_00000000-6_cdpSimpleQuicksort.cpp1.ii".
Does this have anything to do with the fact that the machine I'm on is only Compute 2.1 capable and the build tools are blocking me? What's the resolution... I'm not finding anything in the documentation that is clearly handling this error.
I looked at this question, and that... a link to documentation is simply not helping. I need to know how I have to modify the compile command.
Look at the makefile that comes with that cdpSimpleQuicksort project. It shows some additional switches that are needed to compile it, due to CUDA dynamic parallelism (which is essentially the second set of errors you are seeing.) Go back and study that makefile, and see if you can figure out how to combine some of the compile commands there with --cubin.
The readers digest version is that this should compile without error:
nvcc --cubin -rdc=true -I../../common/inc -arch=sm_35 cdpSimpleQuicksort.cu
Having said all that, you should be able to compile for whatever kind of target you want, but you won't be able to run a cdp code on a cc2.1 architecture.
cdp documentation
and here
For a computer with Titan GPU (compute_35,sm_35), I compiled some code using this line in CMakeLists.txt:
set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS};-gencode arch=compute_35,code=sm_35)
The code compiles and also runs fine.
I wanted to check what compilation problems this code would cause for a friend who uses a GTS 450 (compute_20,sm_21). So, I changed the above line to:
set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS};-gencode arch=compute_20,code=sm_21)
The code compiles without any errors on my computer with Titan. But when I run it (again on my Titan computer), its fails after a thrust::copy call with the following error:
$ ./foobar
terminate called after throwing an instance of 'thrust::system::system_error'
what(): invalid device function
"foobar" terminated by signal SIGABRT (Abort)
Google says the above error is caused due to GPU architecture mismatch.
The strangest part is that with the above line (arch=compute_20,code=sm_21), the code compiles and runs without error on my friend's computer with GTS 450! Except for the GPU, her Ubuntu 12.04, gcc and CUDA SDK 5.5 versions are the same as mine.
Is this the real cause of this error? Why cannot Titan run compute_20 code? Isn't a CUDA GPU supposed to be backwards compatible with PTX or SASS code? Even if it isn't, why cannot the driver JIT compile the compute_20 PTX to the SASS of sm_35?
If you specify:
-gencode arch=compute_20,code=compute_20
your code should run (via JIT) on either GPU.
According to the nvcc manual, JIT is directly enabled when you specify a virtual architecture for the code switch. You can make multiple specifications in a single command:
-arch=compute_20 -code=compute20,sm_21,sm_35
(note this is in lieu of specifying -gencode ...)
which would allow JIT from sm_20 PTX, and non-JIT execution directly on cc2.1 or cc3.5 devices.