Undefined Symbol Error when using thrust::max_element - cuda

I am working on a CUDA C++ project that uses separable compilation, and I am having some trouble getting a thrust function to compile.
The project builds with no problem until the following function call is added.
thrust::device_ptr<float> max_int = thrust::max_element(
thrust::device_ptr<float>(dev_temp_intensity_buffer),
thrust::device_ptr<float>(dev_temp_intensity_buffer + INT_BUF_SIZE);
As said, I get the build error:
Severity Code Description Project File Line Suppression State
Error LNK2019 unresolved external symbol __fatbinwrap_66_tmpxft_00006db0_00000000_18_cuda_device_runtime_compute_61_cpp1_ii_8b1a5d37 referenced in function __cudaRegisterLinkedBinary_66_tmpxft_00006db0_00000000_18_cuda_device_runtime_compute_61_cpp1_ii_8b1a5d37 visualize C:\Users\13\Google Drive\WireMeshOT Rafael\CUDA\simulator\build\src\visualize_intermediate_link.obj 1
The funny thing is that this other thrust function call compiles just fine:
thrust::exclusive_scan(thrust::device_ptr<unsigned int>(dev_ray_alive),
thrust::device_ptr<unsigned int>(dev_ray_alive + NRAYS),
thrust::device_ptr<unsigned int>(dev_scanned_alive_rays));
Obs1: dev_temp_intensity_buffer is a float device pointer, and I am including thrust/extrema.h and thrust/device_ptr.h.
Obs2: I am using CMake to configure the build. The relevant CMake code excerpts are shown below.
SET(CUDA_SEPARABLE_COMPILATION ON)
set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS} -rdc=true -D_FORCE_INLINES)
set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS} -arch=compute_52 -code=sm_52 -lcudart -lcudadevrt -lcuda)
set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS} -Xptxas -v)
cuda_add_executable(
project
file1.cu
...)
target_link_libraries (project glut glew)

I finally figured it out!
The linking problem was due to the fact that cudadevrt library was missing. The catch is that only adding -lcudadevrt to the CUDA_NVCC_FLAGS was not enough!
The problem goes away when linking the CUDA runtime device library to the CMake target as shown below:
target_link_libraries(project glut glew ${CUDA_cudadevrt_LIBRARY})
Obs1: the CUDA_cudadevrt_LIBRARY variable is only made available on CMake versions above 3.7.2. Adding the line cmake_minimum_required(VERSION 3.7.2) is a good idea.
Obs2: linking only to CUDA_LIBRARIES as below does solve the issue only if you are using a CMake version above 3.7.2. On lower versions this variable exist but does not contain cudadevrt library.
target_link_libraries(project glut glew ${CUDA_LIBRARIES})

Related

Compiling my CUDA program with libraries provided in toolkit

I wrote simple CUDA c++ program simulating diffusion on 2D matrix. I got in trouble when I tried to used some of the libraries which are provided in Toolkit. I would like to replace my extremely inefficient matrix transpose kernel with something from cuBlas and also implCU with cuSolvers implementation of solving linear systems. Trouble is that I dont know how to use the functions or compile them. Its working with Makefiles on sample codes provided by Nvidia. If someone would help me, ideally showing me how are these functions supposed to be used when writing .cu files, I would be grateful.
Here is the code: http://pastebin.com/UKhJZQBz
I am on Ubuntu 16.04 and I have exported the PATH variables (so they include /usr/local/cuda-8.0/bin) as is written in official guide.
Here is the output from nvcc -I /usr/local/cuda-8.0/samples/common/inc/ difusion2d.cu
/tmp/tmpxft_00001c09_00000000-16_difusion2d.o: In function `csr_mat_norminf(int, int, int, cusparseMatDescr*, double const*, int const*, int const*)':
undefined reference to `cusparseGetMatIndexBase'
/tmp/tmpxft_00001c09_00000000-16_difusion2d.o: In function `display_matrix(int, int, int, cusparseMatDescr*, double const*, int const*, int const*)':
undefined reference to `cusparseGetMatIndexBase'
/tmp/tmpxft_00001c09_00000000-16_difusion2d.o: In function `main':
undefined reference to `cusolverDnCreate'
undefined reference to `cublasCreate_v2'
undefined reference to `cusolverDnSetStream'
undefined reference to `cublasSetStream_v2'
collect2: error: ld returned 1 exit status
You must explicitly link the cublas and cusolver libraries. Something like
nvcc -I /usr/local/cuda-8.0/samples/common/inc \
-L/path/to/CUDA/libraries difusion2d.cu -lcublas -lcusolver
should work. Depending on your installation, the -L option to provide a search path to the libraries may or may not be necessary.

compiling cuda using cmake works only after calling make twice

I want to use cmake to compile CUDA with '-arch=sm_12', but cmake/make behave strangely.
I have following CMakeLists.txt:
CMAKE_MINIMUM_REQUIRED(VERSION 2.8)
PROJECT(test)
FIND_PACKAGE(CUDA REQUIRED)
CUDA_ADD_EXECUTABLE(test prog.cu)
SET(CUDA_NVCC_FLAGS "-arch=sm_12")
SET(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS} CACHE STRING "Forced" FORCE)
but 'cmake ../src && make' leads to a executable for sm_20.
The flag seems to be ignored.
EDIT: If I call 'make' again (without any modification in CMakeListss.txt) it uses the Flag. - But only if I force the flag to cache (last line)
Am I doing anything wrong?
EDIT: After checking again: I have to call 'make' twice to work correctly. Does anybody know this behaviour?
inJeans was right:
FindCUDA-docs https://cmake.org/cmake/help/v3.3/module/FindCUDA.html
This is the essential information:
"Note that any of these flags can be changed multiple times in the same directory before calling CUDA_ADD_EXECUTABLE, CUDA_ADD_LIBRARY, CUDA_COMPILE, CUDA_COMPILE_PTX, CUDA_COMPILE_FATBIN, CUDA_COMPILE_CUBIN or CUDA_WRAP_SRCS:"

In different situations during creating shared libraries, what functions can be called with the lib?

I have read related tutorials regarding shared and static libraries, such as:
"Creating a shared and static library with the gnu compiler [gcc]"
"Static, Shared Dynamic and Loadable Linux Libraries"
However, unfortunately, all examples they used are one function one .c file.
I have two questions:
(1) If I have one file with two more functions, such as example1.c
void ctest11(int *i)
{ *i = 5; }
void ctest12(int *i)
{ *i = 5; }
After compiling exmaple1.c to libexample1.so, can I call ctest11 and ctest12 in it?
(2) If I have one file with two more functions, one of them is a main function, such as example2.c
void ctest21(int *i)
{ *i = 5; }
void main(int *i)
{ *i = 5; }
After compiling exmaple2.c to libexample2.so, is it the same as to compile one .c file with only ctest21 function?
(3) If I have one file example3.c and exmaple4.c
The funcion in example3.c will use the function in example4.c
For example:
example3.c
void ctest31(int *i)
{ *i = ctest41(2,3); }
example4.c
int ctest41(int a, int b)
{ return a+b; }
When I compile example2.c and example3.c to libexample23.so, can I call both ctest31 and ctest41?
But if gcc example2.c example3.o to libexample2.so, I guess I can only call ctest31?
You should look inside (and build) some existing free software library, compile it, and study its code and building process.
In general, a shared object can be made from several C source files src1sh.c and src2sh.c .... Very often, the compilation is driven by a builder program, usually GNU make
First, you need to compile every source file of the shared object as position-independent-code (PIC) e.g.
gcc -Wall -fPIC src1sh.c -c -o src1sh.pic.o
gcc -Wall -fPIC src2sh.c -c -o src2sh.pic.o
You probably want to add -g to the gcc flags for debugging purposes. Once your program and shared objects are bug free because you have debugged them with gdb and valgrind, pass -O2 to gcc to have them optimized.
Then you need to link all these PIC object files into a single shared object (a *.so file), like
gcc -shared src1sh.pic.o src2sh.pic.o -o shared.so
If your intent is to make a shared library call it lib*.so e.g. libfoo.so and refer to it as -lfoo flag to the linking gcc command using your shared library.
Notice that linking a shared object may also link other shared libraries, so you could do
gcc -shared src1sh.pic.o src2sh.pic.o -lsome -o shared.so
to link some libsome.so into your shared.so
You usually don't compile a shared object containing a main (remember that main is a very special function, described specifically in the C standard, and called from the startup code crt*.o linked by gcc into every program); this is nearly non-sense (like your libexample2.so). Your main is defined in your program (and you don't need PIC code for your program executable). If your program is made from source files src1pr.c and src2pr.c (which defines main) you first compile them as
gcc -Wall src1pr.c -c -o src1pr.o
gcc -Wall src2pr.c -c -o src2pr.o
and you link them all with e.g.
gcc src1pr.o src2pr.o -o prog -lshared
where -lshared refers to a shared library libshared.so (you probably want to compile and link your program files with -g for debugging information, and you may want to pass additional -I flags for include directories, and -L flags for library directories, e.g. -L. to search library in the current directory ...)
There is a way to dynamically link at runtime some shared object, notably for having plugins. You then want to use the dlopen & dlsym functons for that (and you usually want to link your main program with -rdynamic flag).
You can call (from your program) any visible function inside a shared object. You may want to play with the visibility function attribute to e.g. restrict the visibility of some function inside your shared object. You might perhaps want to later use the constructor attribute, for a function inside a shared object to be called early at initialization time (if it is a plugin, at its dlopen time).
Read Program Library Howto and Levine's "Linkers and Loaders" book for more. Linux shared objects (and relocatable object *.o, and executable binaries) are in Executable & Linkable Format (ELF is an industry standard). Some further details are described in the Application Binary Interface (and notably the ABI supplement for your processor, e.g. AMD64 ABI supplement).
PS. You really want a builder like GNU make to combine all these steps, so read its documentation. You might want to pass -v to gcc to understand what it is doing...
Thanks for Basile' great explanation.
From
what I understand, related to my questions,
(a) For my first question (1), there are multiple functions in one object file. I can call ctest11 and ctest12 in libexample1.so
I may set visibility to the functions in libexample1.so.
(b) For my third question (3), the firs scenario is related to create a library from two object files. I can call any functions in the files.
The second scenario is related to create a library and link with another library. I can call any functions in the libraries, including the linking library.
(3) I still do not understand the situations with a main function.
You said, "You usually don't compile a shared object containing a main; this is nearly non-sense (like your libexample2.so). "
I knew it is non-sense. But if I do not want to change the program file, and want to compile it to a library, say, in example2.c, I compile it to example2.so, and want to call the function ctest21.
Can I do that?
example2.c
void ctest21(int *i)
{ *i = 5; }
void main(int *i)
{ *i = 5; }
I compile it to a library.
gcc -fPIC -g -c -Wall example2.c
gcc -shared -o libexample2.so example2.o
I think I can call crest21 function in example2.o. But the main function is useless.
Is my understanding correct?

Undefined symbol (linking .so C and Cython Code)

Since I made some progress, I changed the title and made a second edit describing my new problem. You may choose to ignore Edit1
I have been trying to run python code from C code. And for this purpose I have been using Cython.
The semantics of my system is such that there is a binary (whos source I can not access) that calls a C function defined in a file (source is accessible) and within this function I need to call python functions, do some processing and return the result to binary.
To achieve this purpose, there are two approaches that I came across:
http://docs.python.org/release/2.5.2/ext/callingPython.html ===> This approach suggests to have the python callback function passed to the C side, so that the callback is called as necessary, but this doesn't work for me as I don't have access to the binary's source (which is used to run the entire system)
https://stackoverflow.com/a/5721123/1126425 ==> I have tried this approach and I get this error when the cython function is called:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb47deb70 (LWP 2065)]
0x007fd38a in PySys_GetObject () from /usr/lib/libpython2.6.so.1.0
http://www.linuxjournal.com/article/8497?page=0,0 ==> This is in fact the basis for cython's functionality but again when I use the examples described there, I get errors similar to 2.
I have no idea how to resolve these errors. Any help would be much appreciated.
Thanks!!
Edit1:
here is a simple scenario that reflects situation:
external.c
#include <external.h>
int callback(int param1,int param2)//Function that the binary calls
{
/*SomeTasks*/
cython_func();//Function defined in the following .pyx file
/*SomeTasks*/
}
cython_file.pyx
cdef void cython_function():
print "Do Nothing!"
I am linking the shared library file created by cython with the library generated by compiling the above C code and then that library is used by the binary...
Edit2:
The segmentation fault goes away when I added Py_Initialize(); before calling cython_function(). But now I am getting the undefined symbol error as : symbol lookup error: lib_c_code.so: undefined symbol: cython_function
Here lib_c_code.so is the shared library created out of the external.c file above. I have tried including the .h file created by the cython compiler in external.c but it still didn't work out.. Here is how I am compiling lib_c_code.so:
gcc -shared -dynlib -lm -W1 -o lib_c_code.so $(OBJDIR)/*.o -lc -lm -lpy_code
and the libpy_code.so is the shared object file that was created out of the cython_file.pyx file as:
cython cython_file.pyx -o cython_file.c
gcc $(IFLAGS) -I/usr/include/python2.6 -fPIC -shared cython_file.c -lpython2.6 -lm -o libpy_code.so
Also, I can see the symbol cython_function in the lib_c_code.so file when I do : nm -g lib_c_code.so..
Any ideas please?
I have to guess here that there's a callback registration function to which you can pass the function pointer, in which case you can simply forego the C file and define a cdef function directly in your Cython code, and pass that with the callback registration function. Use with gil in case you manipulate any Python objects in it.
cdef extern from "external.h":
ctypedef int (*Cb_Func)(int param1, int param2)
void register_callback(Cb_Func func)
cdef int my_callback(int param1,int param2) with gil:
<implementation>
register_callback(my_callback)
This is also explained in the Cython user manual here: http://docs.cython.org/src/userguide/external_C_code.html

Undefined symbols: __dyld_func_lookup for MacOSX 64-bit linking error in 10.6.7

I'm trying to compile a program that references __dyld_func_lookup.
Everything compiles fine, indicating include files were able to find the function reference. However, upon the final linking, I get this error:
g++ -o ../lib/macosx64/libcogmapapi.dylib -dynamiclib ./build/CogMapApi.o ./libs/cmlabs/macosx64/AIR.a ./libs/cmlabs/macosx64/CoreLibrary.a
Undefined symbols:
"__dyld_func_lookup", referenced from:
_reference in CoreLibrary.a(dlfcn_darwin.o)
_dllopen in CoreLibrary.a(dlfcn_darwin.o)
_dllopen in CoreLibrary.a(dlfcn_darwin.o)
_dllopen in CoreLibrary.a(dlfcn_darwin.o)
ld: symbol(s) not found
collect2: ld returned 1 exit status
I've tried using -undefined dynamic_lookup in the linker and it does link properly, but then at runtime I get the undefined symbol as I do actually reference __dyld_func_lookup in my code. I'm guessing I need to link to some dylib or framework, but everything I tried like System and libdl.dylib did not seem to work.
Any help would be most appreciated!
Linking dylib1.0 worked for me in same situation.