Add MySQL to my Makefile gives problems - mysql

I have a problem with my C project built under Debian Jessie. After doing some stuffs now I need to work with MySQL so i download the library and try to update my Makefile.
This is my Makefile right now
CC = gcc
CFLAGS = -Wall -Wextra
LDFLAGS = -lbluetooth -lpthread -lmysqlclient
LFLAGS = -lm
INC = -I/usr/include/mysql
SOURCES = stb.c btscan.c and.c gima.c database.c
OBJECTS = $(SOURCES:.c=.o)
EXECUTABLE = stb
$(EXECUTABLE): $(OBJECTS)
$(CC) $(CFLAGS) $(OBJECTS) -o $# $(LFLAGS) $(LDFLAGS)
stb.o: btscan.h and.h gima.h database.h
btscan.o: btscan.h
and.o: and.h
gima.o: gima.h
database.o: database.h
clean:
#rm -f *.o *.out stb
and this is the files where I want to use the library
stb.c
#include "btscan.h"
#include "and.h"
#include "gima.h"
#include "database.h"
struct device* bt_devices;
struct device* ble_devices;
int main(void) {
//-------------------------------- DATABASE CONNECTION ----------------
//MYSQL *con = mysql_init(NULL);
....... }
and finally database.h
#ifndef _DATABASE_H
#define _DATABASE_H
#include <my_global.h>
#include <mysql.h>
#endif
When I try to make i receive "Fatal error : my_global.h No such file or directory". However if i try MySQL on a single test file and compiling it with
gcc -o test test.c -I/usr/include/mysql -lmysqlclient
it works. Where I made a mistake?
Thanks in advance

If you'd shown us the output of the compiler, or examine it yourself, you'll immediately see that what you're running on the command line is not the same at all as what make is running: in particular make is not adding the -I/usr/include/mysql flag to the command line.
That's because in your makefile you set:
INC = -I/usr/include/mysql
but nowhere in your makefile (as you've shown it) is the variable INC actually used, so this is essentially a no-op.
Since you're using the standard GNU make compilation rules, you should be setting standard GNU make variables:
CPPFLAGS = -I/usr/include/mysql

Related

Static linking with cublas

I want to link my program with the static version of cublas, but I get some undefined references. The command and error are
$ nvcc test.cu -o test --cudart=static -ldl -lpthread -lcurand_static -lcublas_static -lculibos
/home/mahmood/cuda_10.1.168/bin/../targets/x86_64-linux/lib/libcublas_static.a(cublas.o): In function `cublasCtxInit(cublasContext**)':
cublas.compute_75.cudafe1.cpp:(.text+0x34b): undefined reference to `cublasLtCtxInit'
cublas.compute_75.cudafe1.cpp:(.text+0x417): undefined reference to `init_gemm_select'
...
...
In fact, the library path is fine and the cublasLtCtxInit exists in the static library file.
$ ls -l /home/mahmood/cuda_10.1.168/lib64/libcublas_static.a
-rw-rw-r-- 1 mahmood mahmood 75127082 Jun 27 16:06 /home/mahmood/cuda_10.1.168/lib64/libcublas_static.a
$ grep cublasLtCtxInit ~/cuda_10.1.168/lib64/libcublas_static.a
Binary file /home/mahmood/cuda_10.1.168/lib64/libcublas_static.a matches
$ echo $LD_LIBRARY_PATH
/home/mahmood/cuda_10.1.168/lib64:
So, how can I fix that?
The correct static linking sequence with cublas can be found in the Makefile for the conjugateGradient CUDA sample code.
The needed switches for nvcc are:
-lcublas_static -lcublasLt_static -lculibos
example:
$ cat t1752.cu
#include <cublas_v2.h>
int main(){
cublasHandle_t h;
cublasCreate(&h);
}
$ nvcc t1752.cu -o t1752 -lcublas_static -lcublasLt_static -lculibos
$

Octave: how to specify arguments for mkoctfile

Im am using octave under windows (native) and try to compile a c++ program into a mex file and link some libraries to it:
% compile for octave
cmd = sprintf("mex main.cpp -I\"%s\\Winnt\\Include\" -L\"%s\\Winnt\\lib_x64\\msc\" -lvisa64.lib", ...
getenv('VXIPNPPATH'), getenv('VXIPNPPATH'))
eval(cmd);
When run, the output of the command is:
>> mex main.cpp -I'C:\Program Files (x86)\IVI Foundation\VISA\\Winnt\Include' -L'C:\Program Files (x86)\IVI Foundation\VISA\\Winnt\lib_x64\msc' -lvisa64.lib
g++: error: Files: No such file or directory
g++: error: (x86)\IVI: No such file or directory
g++: error: Foundation\VISA\\Winnt\lib_x64\msc: No such file or directory
warning: mkoctfile: building exited with failure status
I also tried to run the string directly from the command line:
mex main.cpp -I'C:\Program Files (x86)\IVI Foundation\VISA\\Winnt\Include' -L'C:\Program Files (x86)\IVI Foundation\VISA\\Winnt\lib_x64\msc' -lvisa64.lib
with the same result.
While the -I command appears to work well, why does the -L argument causes problems? What would be the right way to escape the path names with spaces?
Double quotes also won't work.
EDIT
Based on the answers, I am using mex() in its functional form, but the result is still the same:
vxipath = getenv('VXIPNPPATH');
params={};
params{1} = sprintf('-I%s', fullfile(vxipath, 'Winnt', 'Include'));
params{2} = sprintf('-L%s', fullfile(vxipath, 'Winnt', 'lib_x64', 'msc'));
params{3} = sprintf('-lvisa64.lib');
% replace \ with /
for i1=1:length(params)
s = params{i1};
s(s=='\') = '/';
params{i1} = s;
end
params
mex("main.cpp", params{:});
Gives the output:
params =
{
[1,1] = -IC:/Program Files (x86)/IVI Foundation/VISA/Winnt/Include
[1,2] = -LC:/Program Files (x86)/IVI Foundation/VISA/Winnt/lib_x64/msc
[1,3] = -lvisa64.lib
}
g++: error: Files: No such file or directory
g++: error: (x86)/IVI: No such file or directory
g++: error: Foundation/VISA/Winnt/lib_x64/msc: No such file or directory
warning: mkoctfile: building exited with failure status
Which is the same result as before. Additional observations are:
'/' or '\' does not make a difference
if I omit all parameters, I get a missing-include-file-error: OK
if I omit the '-L' argument, I get a missing-lib-file-error: OK
if I add the '-L' argument, I get the error shown above: It appears that the -L argument behaves differently than the -I argument.
I also tried it directly from the bash shell with the corresponding command with the same result.
Replace backslashes with slashes and place each argument inside single quotes.
mex 'main.cpp' '-IC:/Program Files (x86)/IVI Foundation/VISA//Winnt/Include' '-LC:/Program Files (x86)/IVI Foundation/VISA//Winnt/lib_x64/msc' '-lvisa64.lib'
or
mex ('main.cpp', '-IC:/Program Files (x86)/IVI Foundation/VISA//Winnt/Include', '-LC:/Program Files (x86)/IVI Foundation/VISA//Winnt/lib_x64/msc', '-lvisa64.lib')
This doesn't answer the how to fix it, as rahnema1 already did that. But I'll show you how to simplify your code.
Do not use eval. eval is evil.
Instead of evaluating a string function paramA paramB, call function directly with string input arguments. function paramA paramB is translated by the interpreter to a call function('paramA','paramB'). But it is a lot easier to generate the latter form, and you get to avoid eval to boot:
params = {};
params{1} = '-IC:/Program Files (x86)/IVI Foundation/VISA//Winnt/Include';
params{2} = '-LC:/Program Files (x86)/IVI Foundation/VISA//Winnt/lib_x64/msc';
params{2} = '-lvisa64.lib';
mex('main.cpp', params{:});
Properly generate paths using fullfile. It adds / or \ depending on which platform you're on, plus I find it easier to read:
include_path = fullfile(getenv('VXIPNPPATH'), 'Winnt', 'Include');
params{1} = ['-I', include_path];
mkoctfile does not escape the arguments properly if they contain spaces and it does not like backslashes in Octave's own paths.
It creates the following two commands:
g++ -c -I/release/mxe-octave-w64/usr/x86_64-w64-mingw32/include -IC:\Octave\OCTAVE~1.0\\mingw64\include\octave-5.1.0\octave\.. -IC:\Octave\OCTAVE~1.0\\mingw64\include\octave-5.1.0\octave -IC:\Octave\OCTAVE~1.0\\mingw64\include -fopenmp -g -O2 -I. "-IC:\Program Files (x86)\IVI Foundation\VISA\Winnt\Include" -DMEX_DEBUG main.cpp -o C:\Octave\OCTAVE~1.0\tmp/oct-u4r15I.o
g++ -IC:\Octave\OCTAVE~1.0\\mingw64\include\octave-5.1.0\octave\.. -IC:\Octave\OCTAVE~1.0\\mingw64\include\octave-5.1.0\octave -IC:\Octave\OCTAVE~1.0\\mingw64\include -fopenmp -g -O2 -shared -Wl,-rpath-link,/release/mxe-octave-w64/usr/x86_64-w64-mingw32/lib -L/release/mxe-octave-w64/usr/x86_64-w64-mingw32/lib -L/release/mxe-octave-w64/usr/x86_64-w64-mingw32/qt5/lib -Wl,--export-all-symbols -o main.mex C:\Octave\OCTAVE~1.0\tmp/oct-u4r15I.o -lvisa64.lib -LC:\Program Files (x86)\IVI Foundation\VISA\Winnt\lib_x64\msc -LC:\Octave\OCTAVE~1.0\\mingw64\lib\octave\5.1.0 -LC:\Octave\OCTAVE~1.0\\mingw64\lib -LC:\Octave\OCTAVE~1.0\\mingw64\lib\octave\5.1.0 -loctinterp -loctave -Wl,-rpath-link,/release/mxe-octave-w64/usr/x86_64-w64-mingw32/lib -L/release/mxe-octave-w64/usr/x86_64-w64-mingw32/lib -L/release/mxe-octave-w64/usr/x86_64-w64-mingw32/qt5/lib -Wl,--export-all-symbols
When I change it to the following:
replace \ with /
specify the library name without .lib extension
escape -LC:\Program Files... to "-LC:\Program Files..."
g++ -c -I/release/mxe-octave-w64/usr/x86_64-w64-mingw32/include -IC:/Octave/OCTAVE~1.0//mingw64/include/octave-5.1.0/octave/.. -IC:/Octave/OCTAVE~1.0//mingw64/include/octave-5.1.0/octave -IC:/Octave/OCTAVE~1.0//mingw64/include -fopenmp -g -O2 -I. "-IC:/Program Files (x86)/IVI Foundation/VISA/Winnt/Include" -DMEX_DEBUG main.cpp -o C:/Octave/OCTAVE~1.0/tmp/oct-u4r15I.o
g++ -IC:/Octave/OCTAVE~1.0//mingw64/include/octave-5.1.0/octave/.. -IC:/Octave/OCTAVE~1.0//mingw64/include/octave-5.1.0/octave -IC:/Octave/OCTAVE~1.0//mingw64/include -fopenmp -g -O2 -shared -Wl,-rpath-link,/release/mxe-octave-w64/usr/x86_64-w64-mingw32/lib -L/release/mxe-octave-w64/usr/x86_64-w64-mingw32/lib -L/release/mxe-octave-w64/usr/x86_64-w64-mingw32/qt5/lib -Wl,--export-all-symbols -o main.mex C:/Octave/OCTAVE~1.0/tmp/oct-u4r15I.o "-LC:/Program Files (x86)/IVI Foundation/VISA/Winnt/lib_x64/msc" -lvisa64 -LC:/Octave/OCTAVE~1.0//mingw64/lib/octave/5.1.0 -LC:/Octave/OCTAVE~1.0//mingw64/lib -LC:/Octave/OCTAVE~1.0//mingw64/lib/octave/5.1.0 -loctinterp -loctave -Wl,-rpath-link,/release/mxe-octave-w64/usr/x86_64-w64-mingw32/lib -L/release/mxe-octave-w64/usr/x86_64-w64-mingw32/lib -L/release/mxe-octave-w64/usr/x86_64-w64-mingw32/qt5/lib -Wl,--export-all-symbols
It will compile without error.

Dynamic Parallelism - separate compilation: undefined reference to __cudaRegisterLinkedBinary

Although I have followed apendix C "Compiling Dynamic Parallelism" from "CUDA Programming Guide" and the solutions given here, I cannot manage to solve the problem I have. After the compilation and linking (make DivideParalelo) I get the following error:
./build/metodos.o: In function `__sti____cudaRegisterAll_42_tmpxft_00002599_00000000_6_metodos_cpp1_ii_32c9141e()':
tmpxft_00002599_00000000-3_metodos.cudafe1.cpp:(.text.startup+0x15): undefined reference to `__cudaRegisterLinkedBinary_42_tmpxft_00002599_00000000_6_metodos_cpp1_ii_32c9141e'
./build/GPUutil.o: In function `__sti____cudaRegisterAll_42_tmpxft_000025c0_00000000_6_GPUutil_cpp1_ii_f81fb8b5()':
tmpxft_000025c0_00000000-3_GPUutil.cudafe1.cpp:(.text.startup+0x15): undefined reference to `__cudaRegisterLinkedBinary_42_tmpxft_000025c0_00000000_6_GPUutil_cpp1_ii_f81fb8b5'
./build/PCA_Kernels.o: In function `__sti____cudaRegisterAll_46_tmpxft_000025e6_00000000_6_PCA_Kernels_cpp1_ii_8a59b72a()':
tmpxft_000025e6_00000000-3_PCA_Kernels.cudafe1.cpp:(.text.startup+0x15): undefined reference to `__cudaRegisterLinkedBinary_46_tmpxft_000025e6_00000000_6_PCA_Kernels_cpp1_ii_8a59b72a'
./build/DivideParalelo.o: In function `__sti____cudaRegisterAll_49_tmpxft_0000260c_00000000_6_DivideParalelo_cpp1_ii_16d0a16f()':
tmpxft_0000260c_00000000-3_DivideParalelo.cudafe1.cpp:(.text.startup+0x385): undefined reference to `__cudaRegisterLinkedBinary_49_tmpxft_0000260c_00000000_6_DivideParalelo_cpp1_ii_16d0a16f'
make: *** [DivideParalelo] Error 1
A simplified version of my code is listed below.
DivideParalelo.cu:
#include <stdio.h> #include <string.h>
/*C includes*/
extern"C" {
#include"io.h"
#include"util.h"
}
/* CUDA includes*/
#include"cuda.h"
#include"cublas.h"
#include"metodos.h"
#define CUDA_CHECK_RETURN(value) {
/...
}
#define DIM 100
/*
* image
* num_bands
* columns initially is lines_samples, later the number of endmembers
*/
__global__ void Divide(double *image, int num_bands, int columns, int DIM_MIN, int numColsLastPiece, double *out, double *piece) {
int tid=threadIdx.x; //col
int bid=blockIdx.x; //row
for (int tile=0;tile<(columns -1)/ DIM_MIN +1;tile++) {
__shared__ double sh_piece[DIM];
//some code here...
__syncthreads();
}
int mat=HYSIME(piece,columns,num_bands);
}
}
int main(int argc,
char** argv) {
//load file (argv[1]) with the image into dMt
//...
//Allocate GPU memory:
double *devicedM, *deviceOut;
CUDA_CHECK_RETURN(cudaMalloc((void**)&devicedM, num_bands*lines_samples*sizeof(double)));
CUDA_CHECK_RETURN(cudaMalloc((void**)&deviceOut, num_bands*lines_samples*sizeof(double)));
//here the call to the kernel
}
metodos.cu:
extern "C"{
#include "util.h"
#include "io.h"
}
#include "cuda.h"
#include "cublas.h"
#include "PCA_Kernels.h"
#include "GPUutil.h"
#include <stdio.h>
__device__ __host__ int HYSIME(double *M, int lines_samples, int num_bands){
int N_END =0;
double *y;
double *w;
double *Rw;
y = (double*) malloc(lines_samples * num_bands * sizeof(double));
//changed to implement calloc in the device:
w = (double*) malloc(lines_samples * num_bands*sizeof(double));
memset (w,0,lines_samples * num_bands);
Rw = (double*) malloc(num_bands * num_bands* sizeof(double));
memset (Rw,0,num_bands * num_bands);
//some additional code here
estNoise(y, w, Rw, num_bands, lines_samples);//GPUutil.cu
return(N_END);
}
GPUutil.cu:
#include "cublas.h"
#include "cuda.h"
#include "cuda_runtime.h"
__device__ __host__ int destAdditiveNoise(double *r, double *w, double *Rw, int L, int N){
//the code
return (0);
}
__device__ __host__ int estNoise(double *y, double *w, double *Rw, int L, int N){
//the code
return (0);
}
__device__ __host__ int hysime(double *y, double *w, double *Rw, int L, int N){ //L is num_bands N is lines_samples
//the code
return(0);
}
Makefile:
MKL =1
#initial definitions (library paths et al.)
CUDA_PATH=/usr/local/cuda-6.5
MKLROOT=/home/emartel/intel/composer_xe_2015.0.090/mkl
BUILD_DIR=./build
####################
#includes
####################
#Cuda includes
CUDA_INCLUDE_DIR=-I. -I$(CUDA_PATH)/include
#-I$(SDK)/C/common/inc
#BLAS includes
BLAS_INCLUDE_DIR=-I. -I$(MKLROOT)/include
####################
#library search paths
####################
CUDA_LIB_DIR=-L$(CUDA_PATH)/lib64
#-L$(SDK)/C/lib -L$(SDK)/C/common/lib/linux
BLAS_LIB_DIR=-L$(MKLROOT)/lib/intel64 -L$(MKLROOT)/../compiler/lib/intel64
####################
#libraries
####################
CUDALIBS=-lcublas -lcudart
#-lcutil
#-lGL -lGLU
utilS= -lpthread -lm
####################
#other compilation flags
####################
CFLAGS= -Wwrite-strings
#-Wall
#-g
MKLFLAGS=-D __MKL
#sergio CUDAFLAGS= --gpu-architecture sm_30
#changed with sm_35
CUDAFLAGS= -arch=sm_35
LINKERFLAGS= -Wl,--start-group $(MKLROOT)/lib/intel64/libmkl_intel_lp64.a $(MKLROOT)/lib/intel64/libmkl_sequential.a $(MKLROOT)/lib/intel64/libmkl_core.a $(MKLROOT)/../compiler/lib/intel64/libiomp5.a -Wl,--end-group
####################
#utilities
####################
io.o : io.c
icc $(CFLAGS) -c -O3 io.c -o $(BUILD_DIR)/io.o
#BLAS and LAPACK wrapper
util.o : util.c
icc $(CFLAGS) $(MKLFLAGS) $(BLAS_INCLUDE_DIR) -c -O3 util.c -o $(BUILD_DIR)/util.o
#changed with rdec and -lcudadevrt:
metodos.o : metodos.cu
nvcc $(CUDAFLAGS) $(CUDA_INCLUDE_DIR) -c -O3 -rdc=true metodos.cu -lcudadevrt -o $(BUILD_DIR)/metodos.o
##################################
# PCA files
##################################
#changed with rdec and -lcudadevrt:
GPUutil.o: GPUutil.cu
nvcc $(CUDAFLAGS) $(CUDA_INCLUDE_DIR) -c -O3 -rdc=true GPUutil.cu -lcudadevrt -o $(BUILD_DIR)/GPUutil.o
#changed with rdec and -lcudadevrt:
PCA_Kernels.o: PCA_Kernels.cu
nvcc $(CUDAFLAGS) $(CUDA_INCLUDE_DIR) -c -O3 -rdc=true PCA_Kernels.cu -lcudadevrt -o $(BUILD_DIR)/PCA_Kernels.o
#changed with rdec and -lcudadevrt:
DivideParalelo.o: DivideParalelo.cu
nvcc $(CUDAFLAGS) $(CUDA_INCLUDE_DIR) -c -O3 -rdc=true DivideParalelo.cu -lcudadevrt -o $(BUILD_DIR)/DivideParalelo.o
#everything is already compiled, this is just a call to the linker
DivideParalelo: io.o util.o metodos.o GPUutil.o PCA_Kernels.o DivideParalelo.o
icc $(CFLAGS) $(BUILD_DIR)/io.o $(BUILD_DIR)/util.o $(BUILD_DIR)/metodos.o $(BUILD_DIR)/GPUutil.o $(BUILD_DIR)/PCA_Kernels.o $(BUILD_DIR)/DivideParalelo.o $(CUDA_LIB_DIR) $(BLAS_LIB_DIR) $(LINKERFLAGS) $(utilS) $(CUDALIBS) -o DivideParalelo
####################
#misc
####################
clean:
rm -rf $(BUILD_DIR)/*.o ./DivideParalelo
Any suggestion will be greatly appreciated. Perhaps I misunderstood the separate compilation for dynamic parallelism.
I have solved the problem changing both compilation and linking of each cu file.
Makefile:
MKL =1
#initial definitions (library paths et al.)
CUDA_PATH=/usr/local/cuda-6.5
MKLROOT=/home/emartel/intel/composer_xe_2015.0.090/mkl
BUILD_DIR=./build
####################
#includes
####################
#Cuda includes
CUDA_INCLUDE_DIR=-I. -I$(CUDA_PATH)/include
#BLAS includes
BLAS_INCLUDE_DIR=-I. -I$(MKLROOT)/include
####################
#library search paths
####################
CUDA_LIB_DIR=-L$(CUDA_PATH)/lib64
BLAS_LIB_DIR=-L$(MKLROOT)/lib/intel64 -L$(MKLROOT)/../compiler/lib/intel64
####################
#libraries
####################
CUDALIBS=-lcublas -lcudart
utilS= -lpthread -lm
####################
#other compilation flags
####################
CFLAGS= -Wwrite-strings
MKLFLAGS=-D __MKL
CUDAFLAGS= -arch=sm_35
LINKERFLAGS= -Wl,--start-group $(MKLROOT)/lib/intel64/libmkl_intel_lp64.a $(MKLROOT)/lib/intel64/libmkl_sequential.a $(MKLROOT)/lib/intel64/libmkl_core.a $(MKLROOT)/../compiler/lib/intel64/libiomp5.a -Wl,--end-group
####################
#utilities
####################
io.o : io.c
icc $(CFLAGS) -c -O3 io.c -o $(BUILD_DIR)/io.o
#BLAS and LAPACK wrapper
util.o : util.c
icc $(CFLAGS) $(MKLFLAGS) $(BLAS_INCLUDE_DIR) -c -O3 util.c -o $(BUILD_DIR)/util.o
metodos.o : metodos.cu
nvcc $(CUDAFLAGS) $(CUDA_INCLUDE_DIR) -c -O3 -dc metodos.cu -o $(BUILD_DIR)/metodos.o
##################################
# PCA files
##################################
GPUutil.o: GPUutil.cu
nvcc $(CUDAFLAGS) $(CUDA_INCLUDE_DIR) -c -O3 -dc GPUutil.cu -o $(BUILD_DIR)/GPUutil.o
PCA_Kernels.o: PCA_Kernels.cu
nvcc $(CUDAFLAGS) $(CUDA_INCLUDE_DIR) -c -O3 -dc PCA_Kernels.cu -o $(BUILD_DIR)/PCA_Kernels.o
DivideParalelo.o: DivideParalelo.cu
nvcc $(CUDAFLAGS) $(CUDA_INCLUDE_DIR) -c -O3 -dc DivideParalelo.cu -o $(BUILD_DIR)/DivideParalelo.o
DivideParalelo: io.o util.o metodos.o GPUutil.o PCA_Kernels.o DivideParalelo.o
nvcc $(CUDAFLAGS) $(CUDA_INCLUDE_DIR) -dlink $(BUILD_DIR)/io.o $(BUILD_DIR)/util.o $(BUILD_DIR)/metodos.o $(BUILD_DIR)/GPUutil.o $(BUILD_DIR)/PCA_Kernels.o $(BUILD_DIR)/DivideParalelo.o -lcudadevrt -o $(BUILD_DIR)/link.o
icc $(CFLAGS) $(BUILD_DIR)/io.o $(BUILD_DIR)/util.o $(BUILD_DIR)/metodos.o $(BUILD_DIR)/GPUutil.o $(BUILD_DIR)/PCA_Kernels.o $(BUILD_DIR)/DivideParalelo.o $(BUILD_DIR)/link.o -lcudadevrt $(CUDA_LIB_DIR) $(BLAS_LIB_DIR) $(LINKERFLAGS) $(utilS) $(CUDALIBS) -o DivideParalelo -lcudart
####################
#misc
####################
clean:
rm -rf $(BUILD_DIR)/*.o ./DivideParalelo

CUDA separable compilation and CMake

I have a large library project that contains both cpp and cu source files. I'd like to compile it in a standalone shared object, but since I have some device functions I decided to split it in a shared object containing the majority of the functions and an archive file containing the device functions only. Here's part of the Makefile I wrote for it - all the (non-template) device functions have been put in the file device.cu:
Makefile
LIB_NAME = libexample.so
CUDA = /usr/local/cuda/include
CXX = g++
CXXFLAGS = -c -O2 -fPIC -Wall -I. -I./code -I./code/header -I$(CUDA)
SOURCES = src1.cpp src2.cpp src3.cpp
OBJECTS = $(SOURCES:.cpp=.o)
NVCC = nvcc
CU_SOURCES = src1_cu.cu src2_cu.cu src3_cu.cu device.cu
CU_OBJECTS = $(CU_SOURCES:.cu=.o)
BUILDDIR = .
VPATH = code/src/common_src code/src/src_CUDA
DEVICE_LINK = dev_link.o
GENCODE_FLAGS := -gencode arch=compute_20,code=sm_20
NVCCFLAGS = -x cu -O2 --compiler-options '-fPIC' $(GENCODE_FLAGS) -I. -I./code -I./code/header -I$(CUDA) -dc
NVCCLINK = --compiler-options '-fPIC' $(GENCODE_FLAGS) -dlink
all: lib/$(LIB_NAME)
lib/$(LIB_NAME): $(OBJECTS) $(CU_OBJECTS) $(DEVICE_LINK)
$(CXX) -shared -Wl,-soname,libexample.so $^ -o $#
ar rcs lib/device.a device.o
%.o: %.cpp
$(CXX) $(CXXFLAGS) $< -o $#
%.o: %.cu
$(NVCC) $(NVCCFLAGS) $< -o $#
$(DEVICE_LINK): $(CU_OBJECTS)
$(NVCC) $(NVCCLINK) $^ -o $#
I decided to change my build system and I switched to CMake to produce both Makefiles and Visual Studio projects. It's obvious how to write a working CMakeLists.txt file without separable compilation, but I couldn't find a solution that works in my case (I read some proposed solutions here on S.O. but they don't seem to work for me!). Can you help me to write said CMakeLists.txt file? Here's what I did so far:
CMakeLists.txt
cmake_minimum_required(VERSION 2.8.10)
# Set Library & project name
set(LIB_NAME example)
project(${LIB_NAME})
message("LIBRARY ${LIB_NAME}")
enable_language(CXX)
# Check if CUDA is installed on this system
find_package(CUDA REQUIRED)
# Set source directories
set(COMMON_SRCS_DIR ${CMAKE_SOURCE_DIR}/code/src/common_src)
set(CUDA_SRCS_DIR ${CMAKE_SOURCE_DIR}/code/src/src_CUDA)
# Set source files
set(COMMON_SRCS ${COMMON_SRCS_DIR}/src1.cpp
${COMMON_SRCS_DIR}/src2.cpp
${COMMON_SRCS_DIR}/src3.cpp
)
# Set CUDA device library name
set(DEVICE_LIB "device")
# Set CUDA objects
cuda_compile(SRC1_CU_O ${CUDA_SRCS_DIR}/src1_cu.cu)
cuda_compile(SRC2_CU_O ${CUDA_SRCS_DIR}/src2_cu.cu)
cuda_compile(SRC3_CU_O ${CUDA_SRCS_DIR}/src3_cu.cu)
cuda_compile(DEVICE_CU_O ${CUDA_SRCS_DIR}/device.cu)
# Set header file directories
include_directories(${CMAKE_SOURCE_DIR})
include_directories(${CMAKE_SOURCE_DIR}/code)
include_directories(${CMAKE_SOURCE_DIR}/code/header)
include_directories(${CUDA_INCLUDE_DIRS})
# Get CUDA compute capability - contains CUDA_GENCODE define
include(CudaParams.cmake)
# Set include stuff
cuda_include_directories(${CMAKE_SOURCE_DIR})
cuda_include_directories(${CMAKE_SOURCE_DIR}/code)
cuda_include_directories(${CMAKE_SOURCE_DIR}/code/header)
cuda_include_directories(${CUDA_INCLUDE_DIRS})
set(CUDA_SEPARABLE_COMPILATION ON)
# Set compiler flags
set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS} ${CUDA_GENCODE} --compiler-options '-fPIC' -O2")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall -march=native -O2")
# Generate main target
cuda_add_library(${LIB_NAME} SHARED ${COMMON_SRCS}
${SRC1_CU_O} ${SRC2_CU_O} ${SRC3_CU_O} ${DEVICE_CU_O})
cuda_add_library(${DEVICE_LIB} STATIC ${DEVICE_CU_O})
# Install instructions
INSTALL(TARGETS ${LIB_NAME}
LIBRARY DESTINATION ${CMAKE_SOURCE_DIR}/lib
ARCHIVE DESTINATION ${CMAKE_SOURCE_DIR}/lib
)
As you can see, there's no dev_link.o mention in the CMakeLists.txt file because I simply don't know where I could put it!

error: ‘blockIdx’ was not declared in this scope

I try to write a GPU program using CUDA. Below is my function:
__global__ static void
histogram_gpu(int * hist_out, unsigned char * img_in, int img_size, int nbr_bin){
int i;
const int bid = blockIdx.x;
const int tid = threadIdx.x;
// for ( i = 0; i < img_size; i ++){
// hist_out[img_in[i]] ++;
// }
for (i = bid*THREAD_NUM + tid; i < img_size; i += BLOCK_NUM*THREAD_NUM) {
hist_out[img_in[i]]++;
}
}
When I call this function in the main function, there's an error occurs:
error: ‘blockIdx’ was not declared in this scope
I use the CUDA 5.0 on my MAC machine, and below is the Makefile:
OSUPPER = $(shell uname -s 2>/dev/null | tr [:lower:] [:upper:])
OSLOWER = $(shell uname -s 2>/dev/null | tr [:upper:] [:lower:])
# Flags to detect 32-bit or 64-bit OS platform
OS_SIZE = $(shell uname -m | sed -e "s/i.86/32/" -e "s/x86_64/64/")
OS_ARCH = $(shell uname -m | sed -e "s/i386/i686/")
# These flags will override any settings
ifeq ($(i386),1)
OS_SIZE = 32
OS_ARCH = i686
endif
ifeq ($(x86_64),1)
OS_SIZE = 64
OS_ARCH = x86_64
endif
# Flags to detect either a Linux system (linux) or Mac OSX (darwin)
DARWIN = $(strip $(findstring DARWIN, $(OSUPPER)))
# Location of the CUDA Toolkit binaries and libraries
CUDA_PATH ?= /Developer/NVIDIA/CUDA-5.0
CUDA_INC_PATH ?= $(CUDA_PATH)/include
CUDA_BIN_PATH ?= $(CUDA_PATH)/bin
ifneq ($(DARWIN),)
CUDA_LIB_PATH ?= $(CUDA_PATH)/lib
else
ifeq ($(OS_SIZE),32)
CUDA_LIB_PATH ?= $(CUDA_PATH)/lib
else
CUDA_LIB_PATH ?= $(CUDA_PATH)/lib64
endif
endif
# Common binaries
NVCC ?= $(CUDA_BIN_PATH)/nvcc
GCC ?= g++
# Extra user flags
EXTRA_NVCCFLAGS ?=
EXTRA_LDFLAGS ?=
# CUDA code generation flags
GENCODE_SM10 := -gencode arch=compute_10,code=sm_10
GENCODE_SM20 := -gencode arch=compute_20,code=sm_20
GENCODE_SM30 := -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35
GENCODE_FLAGS := $(GENCODE_SM10) $(GENCODE_SM20) $(GENCODE_SM30)
GENCODE_FLAGS := $(GENCODE_SM10) $(GENCODE_SM20) $(GENCODE_SM30)
# OS-specific build flags
# ifneq ($(DARWIN),)
# LDFLAGS := -Xlinker -rpath $(CUDA_LIB_PATH) -L$(CUDA_LIB_PATH) -lcudart -lcublas -lcuda -lcufft -ltlshook
# CCFLAGS := -arch $(OS_ARCH)
# else
# ifeq ($(OS_SIZE),32)
# LDFLAGS := -L$(CUDA_LIB_PATH) -lcudart
# CCFLAGS := -m32
# else
LDFLAGS := -L$(CUDA_LIB_PATH) -lcudart -lcublas -lcuda -lcufft -ltlshook
CCFLAGS := -m64
# endif
# endif
# OS-architecture specific flags
ifeq ($(OS_SIZE),32)
NVCCFLAGS := -m32
else
NVCCFLAGS := -m64
endif
# Debug build flags
ifeq ($(dbg),1)
CCFLAGS += -g
NVCCFLAGS += -g -G
TARGET := debug
else
TARGET := release
endif
# Common includes and paths for CUDA
INCLUDES := -I$(CUDA_INC_PATH) -I. -I.. -I../../common/inc
# Add source files here
EXECUTABLE := 5kk70-assignment-gpu
# Cuda source files (compiled with cudacc)
CUFILES :=
# C/C++ source files (compiled with gcc / c++)
CCFILES := main.cpp histogram-equalization.cu contrast-enhancement.cu
################################################################################
# Rules and targets
# All Phony Targets
.PHONY : everything clean
# Default starting position
everything : $(EXECUTABLE)
# Common includes and paths for CUDA
# INCLUDES := -I$(CUDA_INC_PATH) -I. -I.. -I$(CUDA_INC_PATH)/samples/common/inc/
# Clean OBJECTS
clean :
rm -f $(EXECUTABLE) $(OBJ)
$(EXECUTABLE) : $(CCFILES)
$(NVCC) -o $# $^ $(INCLUDES) $(LDFLAGS) $(EXTRA_LDFLAGS) $(GENCODE_FLAGS)
What's the problem with my code?
This problem will occur when you are writing cuda code that is inside a file named .cpp, and you go to compile it. Rename the file to .cu, and the compiler will not complain at you.
In a bazel build rule, try putting the .cu.cc file in the hdrs rather than srcs.