Why is nvlink warning me about lack of sm_20 (compute capability 2.0) object code? - cuda

I'm working with CUDA 6.5 on a machine with a GTX Titan card (compute capability 3.5). I'm building my code with just -gencode=arch=compute_30,code=sm_30 -gencode=arch=compute_35,code=sm_35 - and when I link my binary, nvlink says:
nvlink warning : SM Arch ('sm_20') not found in '/local/eyalroz/src/foo/CMakeFiles/tester.dir/src/./tester_generated_main.cu.o'
Why is it warning me about that? Do I need sm_20 for something I'm not aware of? If it's merely about the lack of lower compute capability support, why not sm_10 as well? (Also, how do I turn off the warning, if it's gratuitous?)

The issue was identified in CUDA 6.5 and has been rectified I believe in CUDA 7.5. Using the latest version of CUDA should make those warnings go away.

Just ignore it
i'm on cuda RC8 and have the same issue:
nvlink warning : SM Arch ('sm_20') not found in 'cudainfo.o'
compile:
/usr/local/cuda/bin/nvcc -g -O2 -Iyes/include -Iyes/include -I. -gencode arch=compute_35,code=sm_35 -rdc=true --ptxas-options=-v -I./compat/jansson -o cudainfo.o -c cudainfo.cu
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Wed_May__4_21:01:56_CDT_2016
Cuda compilation tools, release 8.0, V8.0.26

Related

Undefined reference to `cublasCreate_v2’ in ‘/tmp/tmpxft_0000120b_0000000-10_my_program”

I have tried to compile a code using CUDA 9.0 toolkit on NVIDIA Tesla P100 graphic card (Ubuntu version 16.04) and CUBLAS library is used in the code. For compilation, I have used the following command to compile “my_program.cu”
nvcc -std=c++11 -L/usr/local/cuda-9.0/lib64 my_program.cu -o mu_program.o -lcublas
But, I have got the following error:
nvlink error: Undefined reference to 'cublasCreate_v2’in '/tmp/tmpxft_0000120b_0000000-10_my_program’
As I have already linked the library path in the compilation command, why do I still get the error. Please help me to solve this error.
It seems fairly evident that you are trying to use the CUBLAS library in device code. This is different than ordinary host usage and requires special compilation/linking steps. You need to:
compile for the correct device architecture (must be cc3.5 or higher)
use relocatable device code linking
link in the cublas device library (in addition to the cublas host library)
link in the CUDA device runtime library
Use a CUDA toolkit prior to CUDA 10.0
The following additions to your compile command line should get you there:
nvcc -std=c++11 my_program.cu -o my_program.o -lcublas -arch=sm_60 -rdc=true -lcublas_device -lcudadevrt
The above assumes you are actually using a proper install of CUDA 9.0. The CUBLAS device library was deprecated and is now removed from newer CUDA toolkits (see here).

Segmentation fault when compiling Darknet for GPU

I want to compile the Darknet framework for machine learning on my PC with GPU support. However I call make I will get a segmentation fault:
nvcc -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=[sm_50,compute_50] -gencode arch=compute_52,code=[sm_52,compute_52] -Iinclude/ -Isrc/ -DOPENCV `pkg-config --cflags opencv` -DGPU -I/usr/local/cuda/include/ --compiler-options "-Wall -Wno-unused-result -Wno-unknown-pragmas -Wfatal-errors -fPIC -Ofast -DOPENCV -DGPU" -c ./src/convolutional_kernels.cu -o obj/convolutional_kernels.o
Segmentation fault (core dumped)
Makefile:92: recipe for target 'obj/convolutional_kernels.o' failed
make: *** [obj/convolutional_kernels.o] Error 139
nvidia-smi gives me following information:
NVIDIA-SMI 418.87.01 Driver Version: 418.87.01 CUDA Version: 10.1
When I do nvcc --version I get:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85
The CUDA Version 10.1 is not the same as the Verions 9.1 of the Cuda compilation tools. Could this be the problem? NVCC is installed via apt install nvidia-cuda-toolkit
Just gonna post my solution here because I figured out the actual reason for this. So the reason this happens is because it's running a different binary than the actual one darknet wants to run. At least for me, which nvcc gave me /usr/bin/nvcc. The actual nvcc you want is located in /usr/local/cuda-11.1/bin (version number might be different obviously). So all you need to do is prepend (important!) that directory to your PATH variable.
export PATH=/usr/local/cuda-11.1/bin${PATH:+:${PATH}} >> ~/.bashrc
Source:https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#post-installation-actions
I recommend you follow the link because there are a couple more mandatory post-installation steps that I also did not follow.
I solved the problem. After installing cuda the actual binary of nvcc is at /usr/local/cuda/bin/nvcc. Creating a symbolic link in /usr/bin/ to this binary solved the problem.
Another approach is to edit the Makefile and set the correct nvcc.
In my case:
line 24 replace
NVCC=nvcc
to
NVCC=/usr/local/cuda-11.0/bin/nvcc
Note that the cuda version may vary.

What is the minimum compute capability for CUDA compilation supported by LLVM compiler?

A CUDA source file can be compiled into PTX format using LLVM compiler with the command clang -Xclang -I$LIBCLC/include/generic -I$LIBCLC/include/ptx -Dcl_clang_storage_class_specifiers -O3 cudaFile.cu -S -o ptxOutputFile.ptx --cuda-gpu-arch=sm_XX
Where sm_XX can be replaced as sm_20, sm_30. For compute capability 1.0, when sm_XX was replaced with sm_10, it gives the error fatal error: cannot open file '/tmp/shared-25f2f5.s': No such file or directory
1 error generated.
So it seems the LLVM has a minimum compute capability of 2.0. Is this assumption correct?
It should be correct. As from CUDA 7.0, both the toolkit and driver support for sm_1x has stopped. If sm_20 works, it has to be the minimum.
CUDA Toolkit and CUDA Driver Support for Tesla Architecture
The CUDA Toolkit and CUDA Driver no longer supports the sm_10, sm_11, sm_12, and sm_13 architectures. As a consequence, CU_TARGET_COMPUTE_1x enum values have been removed from the CUDA headers.
http://developer.download.nvidia.com/compute/cuda/7_0/Prod/doc/CUDA_Toolkit_Release_Notes.pdf

nvcc -arch sm_52 gives error "Value 'sm_52' is not defined for option 'gpu-architecture'"

I updated my cuda toolkit from 5.5 to 6.5. Then following command
nvcc -arch=sm_52
starts to give me an error
nvcc fatal : Value 'sm_52' is not defined for option 'gpu-architecture'
Is this a bug ? or nvcc 6.5 does not support Maxwell virtual architecture.
CUDA Toolkit 6.5 was released before sm_52 architecture came into production.
After the arrival of sm_52 architecture, an update to CUDA 6.5 was released which enabled nvcc to generate code for sm_52.
Make sure you download the newer version of CUDA Toolkit 6.5.
P.S: I would rather use the latest version of toolkit (currently 7.0).

nvcc fatal : Unknown option 'fmad'

compiling John the Ripper with cuda on Ubuntu 12.04 x86
nvcc -c -Xptxas -v -fmad=true -arch sm_10 cuda_common.cu -o ../cuda_common.o
nvcc fatal : Unknown option 'fmad'
make[1]: * [cuda_common.o] Error 255
make: * [linux-x86-cuda] Error 2
Try removing the fmad option or increasing your compute capability target.
-fmad=true is not a valid option for your targeted compute capability (1.0). fmad (fused multiply add) became available on compute capability 2.0 (Fermi).
From the nvcc help:
--fmad=true and --fmad=false enables and disables the contraction respectively. This switch is supported only when the --gpu-architecture option is set withcompute_20, sm_20, or higher. For other architecture classes, the contraction is always enabled.