I installed CUDA 5 recently and found existing code based on Thrust cannot be compiled. The error only happens if I switch to OMP or TBB.
So I did an experiment using monte_carlo.cpp from Thrust example.
When I using include path of CUDA 5.0, I got this error:
g++ -O2 -o monte_carlo monte_carlo.cpp
-DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_OMP -fopenmp -I /usr/local/cuda-5.0/include/
/tmp/ccFsJtAs.o: In function main': monte_carlo.cpp:(.text+0xa0):
undefined reference tofloat
thrust::detail::backend::cuda::reduce_n, float,
thrust::use_default>, long, float, thrust::plus
(thrust::transform_iterator, float,
thrust::use_default>, long, float, thrust::plus)'
But if I change to CUDA 4.1 using
g++ -O2 -o monte_carlo monte_carlo.cpp
-DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_OMP -fopenmp -I /usr/local/cuda-4.1/include/
There is no error.
And my platform is Ubuntu 10.04 with g++ 4.4.3.
Hope anyone can help me, thanks!
Edit
OMP problem is solved by changing the order of -fopenmp as #Robert pointed out, but I failed to compile using TBB
g++ -O2 -o monte_carlo monte_carlo.cpp -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_TBB -ltbb -I /usr/local/cuda/include/
/tmp/ccxSmcnJ.o: In function main':
monte_carlo.cpp:(.text+0xa0): undefined reference tofloat thrust::detail::backend::cuda::reduce_n, float, thrust::use_default>, long, float, thrust::plus >(thrust::transform_iterator, float, thrust::use_default>, long, float, thrust::plus)'
collect2: ld returned 1 exit status
But the compilation successes if I use
g++ -O2 -o monte_carlo monte_carlo.cpp -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_TBB -ltbb -I /usr/local/cuda-4.1/include/
The OpenMP compilation would appear to have been caused by incorrectly specified compilation arguments. Compiling using
g++ -O2 -o monte_carlo monte_carlo.cpp -fopenmp -DTHRUST_DEVICE_BACKEND=THRUST_DEVICE_BACKEND_OMP -lgomp -I\usr\local\cuda\include
(i.e. specifying OpenMP code generation before any pre-processor directives) allowed for correct compilation using the thrust OpenMP backed.
The reported TBB back end compilation error would appear to have been caused by attempting to use the TBB backend on thrust 1.5.3, which has no TBB support.
[This answer has been assembled from question edits and comments to get the question off the unanswered list for the CUDA tag]
Related
In here Robert Crovella said that cublas routines can be called from device code. Although I am using dynamic parallelism and compiling with compute capability 3.5, I cannot manage to call Cublas routines from a device function. I always get the error "calling a host function from a device/global function is not allowed" My code contains device functions which call CUBLAS routines like cublsAlloc, cublasGetVector, cublasSetVector and cublasDgemm
My compilation and linking commands:
nvcc -arch=sm_35 -I. -I/usr/local/cuda/include -c -O3 -dc GPUutil.cu -o ./build/GPUutil.o
nvcc -arch=sm_35 -I. -I/usr/local/cuda/include -c -O3 -dc DivideParalelo.cu -o ./build/DivideParalelo.o
nvcc -arch=sm_35 -I. -I/usr/local/cuda/include -dlink ./build/io.o ./build/GPUutil.o ./build/DivideParalelo.o -lcudadevrt -o ./build/link.o
icc -Wwrite-strings ./build/GPUutil.o ./build/DivideParalelo.o ./build/link.o -lcudadevrt -L/usr/local/cuda/lib64 -L~/Intel/composer_xe_2015.0.090/mkl/lib/intel64 -L~/Intel/composer_xe_2015.0.090/mkl/../compiler/lib/intel64 -Wl,--start-group ~/Intel/composer_xe_2015.0.090/mkl/lib/intel64/libmkl_intel_lp64.a ~/Intel/composer_xe_2015.0.090/mkl/lib/intel64/libmkl_sequential.a ~/Intel/composer_xe_2015.0.090/mkl/lib/intel64/libmkl_core.a ~/Intel/composer_xe_2015.0.090/mkl/../compiler/lib/intel64/libiomp5.a -Wl,--end-group -lpthread -lm -lcublas -lcudart -o DivideParalelo
Here you can find all the details about cuBLAS device API, such as:
Starting with release 5.0, the CUDA Toolkit now provides a static cuBLAS Library cublas_device.a that contains device routines with the same API as the regular cuBLAS Library. Those routines use internally the Dynamic Parallelism feature to launch kernel from within and thus is only available for device with compute capability at least equal to 3.5.
In order to use those library routines from the device the user must include the header file “cublas_v2.h” corresponding to the new cuBLAS API and link against the static cuBLAS library cublas_device.a.
If you still experience issues even after reading through the documentation and applying all of the steps described there, then ask for additional assistance.
I'm just trying to build an old project of mine using cuda 5.0 preview.
I get an Error when linking, telling me that certain cuda functions can not be found. For example:
undefined reference to 'cudaMalloc'.
My linking command includes the following options for cuda :
-L/usr/local/cuda/lib64 -L/home/myhome/NVIDIA_CUDA_Samples/C/lib -L/home/myhome/NVIDIA_CUDA_Samples/C/common/lib/linux -lcudart
ls -lah /usr/local/cuda/lib64/ gives me 8 cuda libraries including libcudart.so.5.0.7 with symlinks using only the .so-file-ending.
ls /home/myhome/NVIDIA_CUDA_Samples/C/lib/ gives me an empty directory, which is kind of strange?
ls /home/myhome/NVIDIA_CUDA_Samples/C/common/lib/linux/ gives me two directories: i686 and x86_64 both containing only libGLEW.a
I have no idea which way to look for a solution. Any help is appreciated!
EDIT:
Here is my complete linking command (TARGET_APPLICATION is my binary and x86_64/Objectfiles.o stands for all (23) object files including the object file compiled with nvcc):
/home/myhome/nullmpi-0.7/bin/mpicxx -CC=g++ -I. -I/home/myhome/nullmpi-0.7/src -I/usr/lib/openmpi/include -L/usr/local/cuda/lib64 -L/home/myhome/NVIDIA_CUDA_Samples/C/lib -L/home/myhome/NVIDIA_CUDA_Samples/C/common/lib/linux -lcudart -o TARGET_APPLICATION x86_64/Objectfiles.o /usr/lib/liblapack.so /usr/lib/libblas.so /home/myhome/nullmpi-0.7/lib/libnullpmpi.a -lm
I use nullmpi for compilation and linking (project uses MPI and CUDA), which internally uses g++ as can be seen by -CC=g++, i wanted to keep this stuff out.
The compilation command for my cuda object file:
/usr/local/cuda/bin/nvcc -c -arch=sm_21 -L/home/myhome/NVIDIA_CUDA_Samples/C/lib -O3 kernelwrapper.cu -o x86_64/kernelwrapper.RELEASE.2.o
echo $LD_LIBRARY_PATH results in:
/usr/local/cuda/lib64:/usr/local/cuda/lib:
echo $PATH results in:
otherOptions:/usr/local/cuda/bin:/home/myhome/nullmpi-0.7/bin
I'm building 64-bit. For the sake of completeness I'm building on Ubuntu 12.04. (64bit).
Building the CUDA Samples works fine.
SOLUTION (thanks to talonmies for pointing me to it):
This is the correct linking command:
/home/myhome/nullmpi-0.7/bin/mpicxx -CC=g++ -I. -I/home/myhome/nullmpi-0.7/src -I/usr/lib/openmpi/include -L/usr/local/cuda/lib64 -L/home/myhome/NVIDIA_CUDA_Samples/C/lib -L/home/myhome/NVIDIA_CUDA_Samples/C/common/lib/linux -o TARGET_APPLICATION x86_64/Objectfiles.o /usr/lib/liblapack.so /usr/lib/libblas.so /home/myhome/nullmpi-0.7/lib/libnullpmpi.a -lcudart -lm
You have your linking statements in the incorrect order. It should be something more like this:
/home/myhome/nullmpi-0.7/bin/mpicxx -CC=g++ -I. -I/home/myhome/nullmpi-0.7/src \
-I/usr/lib/openmpi/include -L/usr/local/cuda/lib64 \
-L/home/myhome/NVIDIA_CUDA_Samples/C/lib \
-L/home/myhome/NVIDIA_CUDA_Samples/C/common/lib/linux \
-o TARGET_APPLICATION x86_64/Objectfiles.o \
/home/myhome/nullmpi-0.7/lib/libnullpmpi.a -llapack -lblas -lm -lcudart
The source of your problem is that you have specified the CUDA runtime library before the object file that contains a dependency to it. The linker simply discards libcudart.so from the linkage because there are no dependencies to it at the point when it is processed. Golden rule in POSIX style compilation statements: linkage statements are parsed left-to-right; so objects containing external dependencies first, libraries satisfying those dependencies afterwards.
Sage is supposed to be able to create compiled code using Cython. I have never been able to get this to work. The problem appears to be with my Sage installation, since compiling fails on the included example. I don't believe I did anything special during installation, but apparently I something wrong. The Sage tutorial says
In order to make your own compiled Sage code, give the file an .spyx extension (instead of .sage). If you are working with the command-line interface, you can attach and load compiled code exactly like with interpreted code (at the moment, attaching and loading Cython code is not supported with the notebook interface). The actual compilation is done “behind the scenes” without your having to do anything explicit. See $SAGE_ROOT/examples/programming/sagex/factorial.spyx for an example of a compiled implementation of the factorial function that directly uses the GMP C library. To try this out for yourself, cd to $SAGE_ROOT/examples/programming/sagex/, then do the following:
sage: load "factorial.spyx"
When I try I get the following message:
Compiling ./factorial.spyx...
Error compiling cython file:
Error compiling ./factorial.spyx:
running build
running build_ext
building '_home_oliver_Desktop_sage_4_7_1_linux_32bit_ubuntu_10_04_lts_i686_Linux_examples_programming_sagex_factorial_spyx_0' extension
creating build
creating build/temp.linux-i686-2.6
gcc -fno-strict-aliasing -g -O2 -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -fPIC -I/home/oliver/Desktop/sage-4.7.1-linux-32bit-ubuntu_10.04_lts-i686-Linux/local/include/csage/ -I/home/oliver/Desktop/sage-4.7.1-linux-32bit-ubuntu_10.04_lts-i686-Linux/local/include/ -I/home/oliver/Desktop/sage-4.7.1-linux-32bit-ubuntu_10.04_lts-i686-Linux/local/include/python2.6/ -I/home/oliver/Desktop/sage-4.7.1-linux-32bit-ubuntu_10.04_lts-i686-Linux/local/lib/python2.6/site-packages/numpy/core/include -I/home/oliver/Desktop/sage-4.7.1-linux-32bit-ubuntu_10.04_lts-i686-Linux/devel/sage/sage/ext/ -I/home/oliver/Desktop/sage-4.7.1-linux-32bit-ubuntu_10.04_lts-i686-Linux/devel/sage/ -I/home/oliver/Desktop/sage-4.7.1-linux-32bit-ubuntu_10.04_lts-i686-Linux/devel/sage/sage/gsl/ -I. -I/home/oliver/Desktop/sage-4.7.1-linux-32bit-ubuntu_10.04_lts-i686-Linux/local/include/python2.6 -c _home_oliver_Desktop_sage_4_7_1_linux_32bit_ubuntu_10_04_lts_i686_Linux_examples_programming_sagex_factorial_spyx_0.c -o build/temp.linux-i686-2.6/_home_oliver_Desktop_sage_4_7_1_linux_32bit_ubuntu_10_04_lts_i686_Linux_examples_programming_sagex_factorial_spyx_0.o -w -O2
creating build/lib.linux-i686-2.6
gcc -pthread -shared build/temp.linux-i686-2.6/_home_oliver_Desktop_sage_4_7_1_linux_32bit_ubuntu_10_04_lts_i686_Linux_examples_programming_sagex_factorial_spyx_0.o -L/home/oliver/Desktop/sage-4.7.1-linux-32bit-ubuntu_10.04_lts-i686-Linux/local//lib/ -L/home/wstein/build/sage-4.7.1/local/lib -lmpfr -lgmp -lgmpxx -lstdc++ -lpari -lm -lcurvesntl -lg0nntl -ljcntl -lrankntl -lgsl -lgslcblas -latlas -lntl -lcsage -lpython2.6 -o build/lib.linux-i686-2.6/_home_oliver_Desktop_sage_4_7_1_linux_32bit_ubuntu_10_04_lts_i686_Linux_examples_programming_sagex_factorial_spyx_0.so -L/home/oliver/Desktop/sage-4.7.1-linux-32bit-ubuntu_10.04_lts-i686-Linux/local//lib
/usr/bin/ld: cannot find -lstdc++ collect2: ld returned 1 exit status
error: command 'gcc' failed with exit status 1
Any suggestions? Thanks.
Per DSM's comment, I reinstalled g++-multilib, and now everything works fine.
[update] How can I output the warning messages from compiler.SourceModule(kernel_code)?
With help of #flipchart I was able to pass the right parameters to NVCC trough PyCUDA, but I still don't know, where to access the compiler warnings.
[original question]
Using NVCC directly one can use the compiler switch -Wall*. How would one archive this in pycuda?
I tried mod = compiler.SourceModule(kernel_code,options=['-Wall']), but the error message states:
pytools.prefork.ExecError: error invoking 'nvcc --cubin -Wall -arch sm_11 -I/usr/local/lib/python2.6/dist-packages/pycuda-0.94.2-py2.6-linux-x86_64.egg/pycuda/../include/pycuda kernel.cu': status 255 invoking 'nvcc --cubin -Wall -arch sm_11 -I/usr/local/lib/python2.6/dist-packages/pycuda-0.94.2-py2.6-linux-x86_64.egg/pycuda/../include/pycuda kernel.cu': nvcc fatal : Unknown option 'Wall'
The source problem is, that I spent a whole day debugging, because I overlooked an inexplicit conversion fromfloat to int.
*Warnings from system-header with "--compiler-options -Wall" since CUDA 3.0
The -Wall option is not a nvcc compiler option, but rather one that is passed on to the supporting compiler (g++ or cl.exe). You need to pass in the option --compiler-options -Wall to indicate to nvcc that the option is for the supporting compiler. In you python code:
mod = compiler.SourceModule(kernel_code,options=['--compiler-options','-Wall'])
PyCuda seems to want each option specified as a list item, otherwise it wraps the whole thing in quotes which nvcc doesn't like.
I am installing DBD-mysql-4.020 perl module on 5.14.2.
when running make, I encounter the below error:
cc -c -I/u01/app/appadmin/product/perl-5.14.2/lib/site_perl/5.14.2/x86_64-linux/auto/DBI -I/usr/local/mysql-standard-4.1.14-pc-linux-gnu-i686/include -mtune=pentiumpro -DDBD_MYSQL_INSERT_ID_IS_GOOD -g -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -O2 -DVERSION=\"4.020\" -DXS_VERSION=\"4.020\" -fPIC "-I/u01/app/appadmin/product/perl-5.14.2/lib/5.14.2/x86_64-linux/CORE" dbdimp.c
dbdimp.c:1: error: CPU you selected does not support x86-64 instruction set
make: * [dbdimp.o] Error 1
upgraded compiler to gcc4.4 and did put a lot of effort to overcome this. Your inputs in resolving this and installing the perl Module are greatly appreciated.
error: CPU you selected does not support x86-64 instruction set make:
Does this machine have a 64-bit CPU?
If so, have you checked to see if you have 64 bit versions of perl and mysql? Or you can you the 'lazy' route and just try installing a 32 bit version of DBD-mysql and see what happens.