nvcc unable to compile - cuda

I have installed CUDA toolkit on my pc, but something seems broken. The nvcc is unable to compile, neither a simple hello-world like this:
#include <stdio.h>
int main(int argc, char** argv) {
printf("Hello, world!\n");
return 0;
}
The output is:
$ nvcc hello.cu
/usr/include/c++/4.8.0/cstdlib(178): error: identifier "__int128" is undefined
/usr/include/c++/4.8.0/cstdlib(179): error: identifier "__int128" is undefined
2 errors detected in the compilation of "/tmp/tmpxft_000011a2_00000000-6_hello.cpp1.ii".
The verbose output is:
$ nvcc --verbose hello.cu
#$ _SPACE_=
#$ _CUDART_=cudart
#$ _HERE_=/opt/cuda/bin
#$ _THERE_=/opt/cuda/bin
#$ _TARGET_SIZE_=64
#$ TOP=/opt/cuda/bin/..
#$ LD_LIBRARY_PATH=/opt/cuda/bin/../lib:
#$ PATH=/opt/cuda/bin/../open64/bin:/opt/cuda/bin/../nvvm:/opt/cuda/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/opt/android-sdk/platform-tools:/opt/android-sdk/tools:/opt/android-studio/bin:/opt/cuda/bin:/extra/usr/bin:/opt/java/bin:/opt/java/db/bin:/opt/java/jre/bin:/usr/bin/core_perl:/usr/lib/smlnj/bin:.:.
#$ INCLUDES="-I/opt/cuda/bin/../include"
#$ LIBRARIES= "-L/opt/cuda/bin/../lib64" -lcudart
#$ CUDAFE_FLAGS=
#$ OPENCC_FLAGS=
#$ PTXAS_FLAGS=
#$ gcc -D__CUDA_ARCH__=100 -E -x c++ -DCUDA_FLOAT_MATH_FUNCTIONS -DCUDA_NO_SM_11_ATOMIC_INTRINSICS -DCUDA_NO_SM_12_ATOMIC_INTRINSICS -DCUDA_NO_SM_13_DOUBLE_INTRINSICS -D__CUDACC__ -D__NVCC__ "-I/opt/cuda/bin/../include" -include "cuda_runtime.h" -m64 -o "/tmp/tmpxft_000011e1_00000000-6_hello.cpp1.ii" "hello.cu"
#$ cudafe --m64 --gnu_version=40800 -tused --no_remove_unneeded_entities --gen_c_file_name "/tmp/tmpxft_000011e1_00000000-3_hello.cudafe1.c" --stub_file_name "/tmp/tmpxft_000011e1_00000000-3_hello.cudafe1.stub.c" --gen_device_file_name "/tmp/tmpxft_000011e1_00000000-3_hello.cudafe1.gpu" --nv_arch "compute_10" --gen_module_id_file --module_id_file_name "/tmp/tmpxft_000011e1_00000000-2_hello.module_id" --include_file_name "tmpxft_000011e1_00000000-1_hello.fatbin.c" "/tmp/tmpxft_000011e1_00000000-6_hello.cpp1.ii"
/usr/include/c++/4.8.0/cstdlib(178): error: identifier "__int128" is undefined
/usr/include/c++/4.8.0/cstdlib(179): error: identifier "__int128" is undefined
2 errors detected in the compilation of "/tmp/tmpxft_000011e1_00000000-6_hello.cpp1.ii".
# --error 0x2 --
Some info about my system:
The OS is Chakra Linux (an Arch-based distro) 64 bit. Currently I'm using the gcc-multilib version:
$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-unknown-linux-gnu/4.8.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: /chakra/lib32-testing/gcc-multilib/src/gcc-4.8-20130411/configure --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://chakra-project.org/bugs --enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++ --enable-shared --enable-threads=posix --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-clocale=gnu --disable-libstdcxx-pch --enable-gnu-unique-object --enable-linker-build-id --enable-cloog-backend=isl --disable-cloog-version-check --enable-lto --enable-gold --enable-ld=default --enable-plugin --with-plugin-ld=ld.gold --with-linker-hash-style=gnu --disable-install-libiberty --enable-multilib --disable-libssp --disable-werror --enable-checking=release
Thread model: posix
gcc version 4.8.0 20130411 (prerelease) (GCC)
Info about CUDA toolkit:
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2012 NVIDIA Corporation
Built on Fri_Sep_21_17:28:58_PDT_2012
Cuda compilation tools, release 5.0, V0.2.1221
Last but not least, this is the problematic header file: http://pastebin.com/WtUckrYv
Thank you in advance.

CUDA 5.0 is not compatible with gcc 4.8.0.
For better understanding of compatible distros and gcc versions, refer to the release notes.
You may have better luck with cuda 5.5 RC, but that particular distro as well as the gcc version you mention are still not listed in the release notes under linux support.
The best experience will most likely be found by switching to a supported distro.
Normally, nvcc in CUDA 5 will automatically include the file /usr/local/cuda/include/host_config.h which has various macros to check for correct/acceptable compiler versions, and one of those macros should have thrown an error in your case, but it seems it did not:
#if defined(__GNUC__)
#if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ > 6)
#error -- unsupported GNU version! gcc 4.7 and up are not supported!
#endif /* __GNUC__> 4 || (__GNUC__ == 4 && __GNUC_MINOR__ > 6) */
#endif /* __GNUC__ */

In my case the error was there only if I tried to compile using llvm as a back-end, i.e.:
nvcc -ccbin=/usr/bin/clang -arch=sm_20 -m64 main.cu
So after looking into above mentioned header I added:
-D__STRICT_ANSI__
So the final compilation command looked like:
nvcc -ccbin=/usr/bin/clang -arch=sm_20 -m64 main.cu -D__STRICT_ANSI__
I looked at clang dev forums and it looks like it's a known issue but I didn't dig deep.
Hope this helps.

If you are having trouble with gcc-4.7+ even with cuda-5.5, check if you also have older version of nvidia-cuda-toolkit installed, I purged all the nvdia-cuda-* stuff and it went fine with cuda-5.5 (I had gcc-4.8.1)

This is a solution that, at least, worked for me. I have cuda 5.0 and gcc 4.8 installed.
You just have to add:
#undef _GLIBCXX_USE_INT128
just before line #129:
#if !defined(__STRICT_ANSI__) && defined(_GLIBCXX_USE_INT128)
in file /usr/include/c++/4.8.1/cstdlib
By doing this, i could compile my code again.

Related

Apple Clang 13 C++20 Module Support Missing

According to official documentation, Clang 13 supports C++20 Modules thru the use of a -fmodules command-line parameter.
I cannot even get a basic module to compile using Clang 13 (macOS Monterey) either on an Intel or M1 based macs.
Assuming the following text contents of file module.cpp:
export module a;
export int f(int a, int b) {
return a + b;
}
Running the following:
$ clang++ --version
Apple clang version 13.0.0 (clang-1300.0.29.3)
Target: x86_64-apple-darwin21.1.0
$ clang++ -std=c++20 -stdlib=libc++ -fmodules -fbuiltin-module-map -c module.cpp
module.cpp:1:8: error: expected template
export module a;
^
module.cpp:1:8: error: unknown type name 'module'
module.cpp:3:8: error: expected template
export int f(int a, int b) {
^
3 errors generated.
Tested with identical results on an ARM M1 chip:
$ clang++ --version
Apple clang version 13.0.0 (clang-1300.0.29.3)
Target: arm64-apple-darwin21.1.0
Is there another way to get modules working OR is there some undocumented C++20 modules limitation with Apple Clang 13?
Note: compiling with the experimental -fmodules-ts flag works.
Cause
Thanks for the comments - the errors above are an indication of a Clang version that was built without module support. This is what Xcode comes with, i.e. by running xcode-select --install in a terminal.
Solution
As suggested the solution has been to install Clang thru HomeBrew which is done as follows (tested on macOS Monterey):
brew install llvm
Clang gets installed to /opt/homebrew/opt/llvm/bin/clang++. Confirm the running version as shown below:
% /opt/homebrew/opt/llvm/bin/clang++ --version
Homebrew clang version 13.0.0
Target: arm64-apple-darwin21.1.0
Thread model: posix
InstalledDir: /opt/homebrew/opt/llvm/bin
Which is a different build from the Xcode system-wide default version:
% clang++ --version
Apple clang version 13.0.0 (clang-1300.0.29.3)
Target: arm64-apple-darwin21.1.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin
Working Example
Steps to see working example loosely based on repo posted by #alexpanter:
main.cpp
import <iostream>;
import mathlib;
using namespace std;
int main() {
cout << "Modules, baby!" << endl;
cout << "2 plus 3 makes " << add(2, 3) << " says module 'mathlib'" << endl;
}
mathlib.cpp
export module mathlib;
export int add(int a, int b)
{
return a + b;
}
Build by runnning in a terminal in same directory as files above:
/opt/homebrew/opt/llvm/bin/clang++ -std=c++20 -c -Xclang -emit-module-interface mathlib.cpp -o mathlib.pcm
/opt/homebrew/opt/llvm/bin/clang++ -std=c++20 -fmodules -c -fprebuilt-module-path=. main.cpp -o main.o
/opt/homebrew/opt/llvm/bin/clang++ -std=c++2a -fmodules -o main main.o *.pcm
Test module-based executable:
./main
Expected output:
Modules, baby!
2 plus 3 makes 5 says module 'mathlib'

Segmentation fault when compiling Darknet for GPU

I want to compile the Darknet framework for machine learning on my PC with GPU support. However I call make I will get a segmentation fault:
nvcc -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=[sm_50,compute_50] -gencode arch=compute_52,code=[sm_52,compute_52] -Iinclude/ -Isrc/ -DOPENCV `pkg-config --cflags opencv` -DGPU -I/usr/local/cuda/include/ --compiler-options "-Wall -Wno-unused-result -Wno-unknown-pragmas -Wfatal-errors -fPIC -Ofast -DOPENCV -DGPU" -c ./src/convolutional_kernels.cu -o obj/convolutional_kernels.o
Segmentation fault (core dumped)
Makefile:92: recipe for target 'obj/convolutional_kernels.o' failed
make: *** [obj/convolutional_kernels.o] Error 139
nvidia-smi gives me following information:
NVIDIA-SMI 418.87.01 Driver Version: 418.87.01 CUDA Version: 10.1
When I do nvcc --version I get:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85
The CUDA Version 10.1 is not the same as the Verions 9.1 of the Cuda compilation tools. Could this be the problem? NVCC is installed via apt install nvidia-cuda-toolkit
Just gonna post my solution here because I figured out the actual reason for this. So the reason this happens is because it's running a different binary than the actual one darknet wants to run. At least for me, which nvcc gave me /usr/bin/nvcc. The actual nvcc you want is located in /usr/local/cuda-11.1/bin (version number might be different obviously). So all you need to do is prepend (important!) that directory to your PATH variable.
export PATH=/usr/local/cuda-11.1/bin${PATH:+:${PATH}} >> ~/.bashrc
Source:https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#post-installation-actions
I recommend you follow the link because there are a couple more mandatory post-installation steps that I also did not follow.
I solved the problem. After installing cuda the actual binary of nvcc is at /usr/local/cuda/bin/nvcc. Creating a symbolic link in /usr/bin/ to this binary solved the problem.
Another approach is to edit the Makefile and set the correct nvcc.
In my case:
line 24 replace
NVCC=nvcc
to
NVCC=/usr/local/cuda-11.0/bin/nvcc
Note that the cuda version may vary.

CUDA code fails on Pascal cards (GTX 1080)

I tried running an executable which uses separable compilation on a GTX 1080 today (Compute Capability 6.1 which is not directly supported by CUDA 7.5), and wasn't able to run it, as the first CUDA call fails. I have traced it down to cublas, as this simple program (which doesn't even use cublas)
#include <cuda_runtime_api.h>
#include <cstdio>
__global__ void foo()
{
}
int main(int, char**)
{
void * data = nullptr;
auto err = cudaMalloc(&data, 256);
printf("%s\n", cudaGetErrorString(err));
return 0;
}
fails (outputs "unknown error") if built using
nvcc -dc --gpu-architecture=compute_52 -m64 main.cu -o main.dc.obj
nvcc -dlink --gpu-architecture=compute_52 -m64 -lcublas_device main.dc.obj -o main.obj
link /SUBSYSTEM:CONSOLE /LIBPATH:"%CUDA_PATH%\lib\x64" main.obj main.dc.obj cudart_static.lib cudadevrt.lib cublas_device.lib
And works (outputs "no error") if built using
nvcc -dc --gpu-architecture=compute_52 -m64 main.cu -o main.dc.obj
nvcc -dlink --gpu-architecture=compute_52 -m64 main.dc.obj -o main.obj
link /SUBSYSTEM:CONSOLE /LIBPATH:"%CUDA_PATH%\lib\x64" main.obj main.dc.obj cudart_static.lib cudadevrt.lib
Even if built using the CUDA 8 release candidate, and compute_61 instead, it still fails as long as cublas_device.lib is linked.
Analysis of the simpleDevLibCublas example shows that the example is built for a set of real architectures (sm_xx), and not for virtual architectures (compute_xx), therefore the example in CUDA 7.5 does not run on newer cards. Furthermore, the same example in CUDA 8RC only includes one additional architecture, sm_60. Which is only used by the P100. However, that example does run on 6.1 cards such as the GTX 1080 as well. Support for the sm_61 architecture is not included in Cublas even in CUDA 8RC.
Therefore, the program will work if built using --gpu-architecture=sm_60 even if linking cublas_device, but will not work with --gpu-architecture=compute_60, --gpu-architecture=sm_61 or --gpu-architecture=compute_61. Or any --gpu-architecture=compute_xx for that matter.

Launch kernel from another kernel in CUDA [duplicate]

I am doing dynamic parallelism programming using CUDA 5.5 and an NVDIA GeForce GTX 780 whose compute capability is 3.5. I am calling a kernel function inside a kernel function but it is giving me an error:
error : calling a __global__ function("kernel_6") from a __global__ function("kernel_5") is only allowed on the compute_35 architecture or above
What am I doing wrong?
You can do something like this
nvcc -arch=sm_35 -rdc=true simple1.cu -o simple1 -lcudadevrt
or
If you have 2 files simple1.cu and test.c then you can do something as below. This is called seperate compilation.
nvcc -arch=sm_35 -dc simple1.cu
nvcc -arch=sm_35 -dlink simple1.o -o link.o -lcudadevrt
g++ -c test.c
g++ link.o simple1.o test.o -o simple -L/usr/local/cuda/lib64/ -lcudart
The same is explained in the cuda programming guide
From Visual Studio 2010:
1) View -> Property Pages
2) Configuration Properties -> CUDA C/C++ -> Common -> Generate Relocatable Device Code -> Yes (-rdc=true)
3) Configuration Properties -> CUDA C/C++ -> Device -> Code Generation -> compute_35,sm_35
4) Configuration Properties -> Linker -> Input -> Additional Dependencies -> cudadevrt.lib
You need to let nvcc generate CC 3.5 code for your device. This can be done by adding this option to nvcc command line.
-gencode arch=compute_35,code=sm_35
You may find the CUDA samples on dynamic parallelism for more detail. They contain both command line options and project settings for all supported OS.
http://docs.nvidia.com/cuda/cuda-samples/index.html#simple-quicksort--cuda-dynamic-parallelism-

compute capability and calling a kernel from a kernel [duplicate]

I am doing dynamic parallelism programming using CUDA 5.5 and an NVDIA GeForce GTX 780 whose compute capability is 3.5. I am calling a kernel function inside a kernel function but it is giving me an error:
error : calling a __global__ function("kernel_6") from a __global__ function("kernel_5") is only allowed on the compute_35 architecture or above
What am I doing wrong?
You can do something like this
nvcc -arch=sm_35 -rdc=true simple1.cu -o simple1 -lcudadevrt
or
If you have 2 files simple1.cu and test.c then you can do something as below. This is called seperate compilation.
nvcc -arch=sm_35 -dc simple1.cu
nvcc -arch=sm_35 -dlink simple1.o -o link.o -lcudadevrt
g++ -c test.c
g++ link.o simple1.o test.o -o simple -L/usr/local/cuda/lib64/ -lcudart
The same is explained in the cuda programming guide
From Visual Studio 2010:
1) View -> Property Pages
2) Configuration Properties -> CUDA C/C++ -> Common -> Generate Relocatable Device Code -> Yes (-rdc=true)
3) Configuration Properties -> CUDA C/C++ -> Device -> Code Generation -> compute_35,sm_35
4) Configuration Properties -> Linker -> Input -> Additional Dependencies -> cudadevrt.lib
You need to let nvcc generate CC 3.5 code for your device. This can be done by adding this option to nvcc command line.
-gencode arch=compute_35,code=sm_35
You may find the CUDA samples on dynamic parallelism for more detail. They contain both command line options and project settings for all supported OS.
http://docs.nvidia.com/cuda/cuda-samples/index.html#simple-quicksort--cuda-dynamic-parallelism-