CUDA with Qmake on Ubuntu 12.04

CUDA with Qmake on Ubuntu 12.04 - cuda

I am trying to get CUDA code to work with Qt on Ubuntu 12.04
My cuda_interface.cu
// CUDA-C includes
#include <cuda.h>
extern "C"
void runCudaPart();
// Main cuda function
void runCudaPart() {
// all your cuda code here *smile*
}
My main.cpp
#include
extern "C"
void runCudaPart();
int main(int argc, char *argv[])
{
runCudaPart();
}
My .pro file
#-------------------------------------------------
#
# Project created by QtCreator 2013-04-17T10:50:37
#
#-------------------------------------------------
QT += core
QT -= gui
TARGET = QtCuda
CONFIG += console
CONFIG -= app_bundle
TEMPLATE = app
SOURCES += main.cpp
# This makes the .cu files appear in your project
OTHER_FILES += ./cuda_interface.cu
# CUDA settings <-- may change depending on your system
CUDA_SOURCES += ./cuda_interface.cu
CUDA_SDK = "/usr/local/cuda-5.0/" # Path to cuda SDK install
CUDA_DIR = "/usr/local/cuda-5.0/" # Path to cuda toolkit install
SYSTEM_NAME = unix # Depending on your system either 'Win32', 'x64', or 'Win64'
SYSTEM_TYPE = 32 # '32' or '64', depending on your system
CUDA_ARCH = sm_21 # Type of CUDA architecture, for example 'compute_10', 'compute_11', 'sm_10'
NVCC_OPTIONS = --use_fast_math
# include paths
INCLUDEPATH += $$CUDA_DIR/include
# library directories
QMAKE_LIBDIR += $$CUDA_DIR/lib/
CUDA_OBJECTS_DIR = ./
# The following library conflicts with something in Cuda
#QMAKE_LFLAGS_RELEASE = /NODEFAULTLIB:msvcrt.lib
#QMAKE_LFLAGS_DEBUG = /NODEFAULTLIB:msvcrtd.lib
# Add the necessary libraries
CUDA_LIBS = libcuda libcudart
# The following makes sure all path names (which often include spaces) are put between quotation marks
CUDA_INC = $$join(INCLUDEPATH,'" -I"','-I"','"')
NVCC_LIBS = $$join(CUDA_LIBS,' -l','-l', '')
LIBS += $$join(CUDA_LIBS,'.so ', '', '.so')
# Configuration of the Cuda compiler
CONFIG(debug, debug|release) {
# Debug mode
cuda_d.input = CUDA_SOURCES
cuda_d.output = $$CUDA_OBJECTS_DIR/${QMAKE_FILE_BASE}_cuda.o
cuda_d.commands = $$CUDA_DIR/bin/nvcc -D_DEBUG $$NVCC_OPTIONS $$CUDA_INC $$NVCC_LIBS --machine $$SYSTEM_TYPE -arch=$$CUDA_ARCH -c -o ${QMAKE_FILE_OUT} ${QMAKE_FILE_NAME}
cuda_d.dependency_type = TYPE_C
QMAKE_EXTRA_COMPILERS += cuda_d
}
else {
# Release mode
cuda.input = CUDA_SOURCES
cuda.output = $$CUDA_OBJECTS_DIR/${QMAKE_FILE_BASE}_cuda.o
cuda.commands = $$CUDA_DIR/bin/nvcc $$NVCC_OPTIONS $$CUDA_INC $$NVCC_LIBS --machine $$SYSTEM_TYPE -arch=$$CUDA_ARCH -c -o ${QMAKE_FILE_OUT} ${QMAKE_FILE_NAME}
cuda.dependency_type = TYPE_C
QMAKE_EXTRA_COMPILERS += cuda
}
I am trying to adopt this .pro file from Compiling Cuda code in Qt Creator on Windows
Which is a similar question but seeks a solution for windows.
At the moment the compiler shows the following errors :
make: Entering directory `/home/swaroop/Work/ai-junkies/cuda/uc_davis/opencv2.x/QtCuda'
g++ -Wl,-O1 -o QtCuda cuda_interface_cuda.o main.o -L/usr/local/cuda-5.0//lib/ -L/usr/lib/i386-linux-gnu libcuda.so libcudart.so -lQtCore -lpthread
g++: error: libcuda.so: No such file or directory
g++: error: libcudart.so: No such file or directory
make: *** [QtCuda] Error 1
Please help me fix these problems.

I can finally run CUDA code with Qt Creator on Ubuntu 12.04. I assume that you can run cuda independently on your system. Here is an excellent quide to setup cuda on ubuntu 12.04
http://sn0v.wordpress.com/2012/05/11/installing-cuda-on-ubuntu-12-04/
I started off an a Qt console application from Qt-Creator.
Here is my main.cpp
#include <QtCore/QCoreApplication>
extern "C"
void runCudaPart();
int main(int argc, char *argv[])
{
runCudaPart();
}
Here is cuda_interface.cu
// CUDA-C includes
#include <cuda.h>
#include <cuda_runtime.h>
#include <stdio.h>
extern "C"
//Adds two arrays
void runCudaPart();
__global__ void addAry( int * ary1, int * ary2 )
{
int indx = threadIdx.x;
ary1[ indx ] += ary2[ indx ];
}
// Main cuda function
void runCudaPart() {
int ary1[32];
int ary2[32];
int res[32];
for( int i=0 ; i<32 ; i++ )
{
ary1[i] = i;
ary2[i] = 2*i;
res[i]=0;
}
int * d_ary1, *d_ary2;
cudaMalloc((void**)&d_ary1, 32*sizeof(int));
cudaMalloc((void**)&d_ary2, 32*sizeof(int));
cudaMemcpy((void*)d_ary1, (void*)ary1, 32*sizeof(int), cudaMemcpyHostToDevice);
cudaMemcpy((void*)d_ary2, (void*)ary2, 32*sizeof(int), cudaMemcpyHostToDevice);
addAry<<<1,32>>>(d_ary1,d_ary2);
cudaMemcpy((void*)res, (void*)d_ary1, 32*sizeof(int), cudaMemcpyDeviceToHost);
for( int i=0 ; i<32 ; i++ )
printf( "result[%d] = %d\n", i, res[i]);
cudaFree(d_ary1);
cudaFree(d_ary2);
}
Here is my .pro file.
#-------------------------------------------------
#
# Project created by QtCreator 2013-04-17T16:30:33
#
#-------------------------------------------------
QT += core
QT -= gui
TARGET = QtCuda
CONFIG += console
CONFIG -= app_bundle
TEMPLATE = app
SOURCES += main.cpp
# This makes the .cu files appear in your project
OTHER_FILES += ./cuda_interface.cu
# CUDA settings <-- may change depending on your system
CUDA_SOURCES += ./cuda_interface.cu
CUDA_SDK = "/usr/local/cuda-5.0/" # Path to cuda SDK install
CUDA_DIR = "/usr/local/cuda-5.0/" # Path to cuda toolkit install
# DO NOT EDIT BEYOND THIS UNLESS YOU KNOW WHAT YOU ARE DOING....
SYSTEM_NAME = unix # Depending on your system either 'Win32', 'x64', or 'Win64'
SYSTEM_TYPE = 32 # '32' or '64', depending on your system
CUDA_ARCH = sm_21 # Type of CUDA architecture, for example 'compute_10', 'compute_11', 'sm_10'
NVCC_OPTIONS = --use_fast_math
# include paths
INCLUDEPATH += $$CUDA_DIR/include
# library directories
QMAKE_LIBDIR += $$CUDA_DIR/lib/
CUDA_OBJECTS_DIR = ./
# Add the necessary libraries
CUDA_LIBS = -lcuda -lcudart
# The following makes sure all path names (which often include spaces) are put between quotation marks
CUDA_INC = $$join(INCLUDEPATH,'" -I"','-I"','"')
#LIBS += $$join(CUDA_LIBS,'.so ', '', '.so')
LIBS += $$CUDA_LIBS
# Configuration of the Cuda compiler
CONFIG(debug, debug|release) {
# Debug mode
cuda_d.input = CUDA_SOURCES
cuda_d.output = $$CUDA_OBJECTS_DIR/${QMAKE_FILE_BASE}_cuda.o
cuda_d.commands = $$CUDA_DIR/bin/nvcc -D_DEBUG $$NVCC_OPTIONS $$CUDA_INC $$NVCC_LIBS --machine $$SYSTEM_TYPE -arch=$$CUDA_ARCH -c -o ${QMAKE_FILE_OUT} ${QMAKE_FILE_NAME}
cuda_d.dependency_type = TYPE_C
QMAKE_EXTRA_COMPILERS += cuda_d
}
else {
# Release mode
cuda.input = CUDA_SOURCES
cuda.output = $$CUDA_OBJECTS_DIR/${QMAKE_FILE_BASE}_cuda.o
cuda.commands = $$CUDA_DIR/bin/nvcc $$NVCC_OPTIONS $$CUDA_INC $$NVCC_LIBS --machine $$SYSTEM_TYPE -arch=$$CUDA_ARCH -c -o ${QMAKE_FILE_OUT} ${QMAKE_FILE_NAME}
cuda.dependency_type = TYPE_C
QMAKE_EXTRA_COMPILERS += cuda
}

libcuda.so and libcudart.so are missing the -l flag in front of them in the g++ call. You have an appropriate join command to add them for the NVCC, so use the same logic for g++:
CUDA_LIBS = $$join(CUDA_LIBS,' -l','-l', '')
LIBS += $$join(CUDA_LIBS,'.so ', '', '.so')
Or just change to this:
CUDA_LIBS = -llibcuda -llibcudart
And get rid of the NVCC_LIBS variable.

I can't comment in the mkuse's answer but I wanted to add that...
I had to add -L/usr/local/cuda-6.5/lib64 to the CUDA_LIBS:
# Add the necessary libraries
CUDA_LIBS = -lcuda -lcudart -L/usr/local/cuda-6.5/lib64
Otherwise I get the error "cannot find -lcudart", even when I can run cuda independently. Just in case.
EDIT: I realized that this is not necessary, I just had to check the path for QMAKE_LIBDIR since I have a 64 bit system.

Related

What does nvprof output: "No kernels were profiled" mean, and how to fix it

I have recently installed Cuda on my arch-Linux machine through the system's package manager, and I have been trying to test whether or not it is working by running a simple vector addition program.
I simply copy-paste the code from this tutorial (Both the one using one and more kernels) into a file titled cuda_test.cu and run
> nvcc cuda_test.cu -o cuda_test
In either case, the program can run, and I get no errors (both as in the program doesn't crash and the output is that there were no errors). But when I try to run the Cuda profiler on the program:
> sudo nvprof ./cuda_test
I get result:
==3201== NVPROF is profiling process 3201, command: ./cuda_test
Max error: 0
==3201== Profiling application: ./cuda_test
==3201== Profiling result:
No kernels were profiled.
No API activities were profiled.
==3201== Warning: Some profiling data are not recorded. Make sure cudaProfilerStop() or cuProfilerStop() is called before application exit to flush profile data.
The latter warning is not my main problem or the topic of my question, my problem is the message saying that No Kernels were profiled and no API activities were profiled.
Does this mean that the program was run entirely on my CPU? or is it an error in nvprof?
I have found a discussion about the same error here, but there the answer was that the wrong version of Cuda was installed, and in my case, the version installed is the latest version installed through the systems package manager (Version 10.1.243-1)
Is there any way I can get either nvprof to display the expected output?
Edit
Trying to adhere to the warning at the end does not solve the problem:
Adding call to cudaProfilerStop() (or cuProfilerStop()), and also adding cudaDeviceReset(); at end as suggested and linking the appropriate library (cuda_profiler_api.h or cudaProfiler.h) and compiling with
> nvcc cuda_test.cu -o cuda_test -lcuda
Yields a program which can still run, but which, when uppon which nvprof is run, returns:
==12558== NVPROF is profiling process 12558, command: ./cuda_test
Max error: 0
==12558== Profiling application: ./cuda_test
==12558== Profiling result:
No kernels were profiled.
No API activities were profiled.
==12558== Warning: Some profiling data are not recorded. Make sure cudaProfilerStop() or cuProfilerStop() is called before application exit to flush profile data.
======== Error: Application received signal 139
This has not solved the original problem, and has in fact created a new error; the same happens when cudaProfilerStop() is used on its own or alongside cuProfilerStop() and cudaDeviceReset();
The code
The code is, as mentioned copied from a tutorial to test if Cuda is working, though I also have included calls to cudaProfilerStop() and cudaDeviceReset(); for clarity, it is here included:
#include <iostream>
#include <math.h>
#include <cuda_profiler_api.h>
// Kernel function to add the elements of two arrays
__global__
void add(int n, float *x, float *y)
{
int index = threadIdx.x;
int stride = blockDim.x;
for (int i = index; i < n; i += stride)
y[i] = x[i] + y[i];
}
int main(void)
{
int N = 1<<20;
float *x, *y;
cudaProfilerStart();
// Allocate Unified Memory – accessible from CPU or GPU
cudaMallocManaged(&x, N*sizeof(float));
cudaMallocManaged(&y, N*sizeof(float));
// initialize x and y arrays on the host
for (int i = 0; i < N; i++) {
x[i] = 1.0f;
y[i] = 2.0f;
}
// Run kernel on 1M elements on the GPU
add<<<1, 1>>>(N, x, y);
// Wait for GPU to finish before accessing on host
cudaDeviceSynchronize();
// Check for errors (all values should be 3.0f)
float maxError = 0.0f;
for (int i = 0; i < N; i++)
maxError = fmax(maxError, fabs(y[i]-3.0f));
std::cout << "Max error: " << maxError << std::endl;
// Free memory
cudaFree(x);
cudaFree(y);
cudaDeviceReset();
cudaProfilerStop();
return 0;
}

This problem was apparently somewhat well known, after some searching I found this thread about the error-code in the edited version; the solution as discussed there is to call nvprof with the flag --unified-memory-profiling off:
> sudo nvprof --unified-memory-profiling off ./cuda_test
This makes nvprof work as expected-- even without the call to cudaProfileStop.

You can solve the problem by using
sudo nvprof --unified-memory-profiling per-process-device <your program>

SWIG tcl : undefined symbol error for log4cpp wrapper

I am new in log4cpp and swig wrapper. I am trying to write an interface for simple logging using log4cpp.
I have installed log4cpp and swig in my Ubuntu machine.
log4cpp.cpp:
#include "log4cpp/Category.hh"
#include "log4cpp/Appender.hh"
#include "log4cpp/FileAppender.hh"
#include "log4cpp/OstreamAppender.hh"
#include "log4cpp/Layout.hh"
#include "log4cpp/BasicLayout.hh"
#include "log4cpp/Priority.hh"
#include "log4cpp.h"
void writeLog() {
log4cpp::Appender *appender1 = new log4cpp::OstreamAppender("console", &std::cout);
appender1->setLayout(new log4cpp::BasicLayout());
log4cpp::Appender *appender2 = new log4cpp::FileAppender("default", "program.log");
appender2->setLayout(new log4cpp::BasicLayout());
log4cpp::Category& root = log4cpp::Category::getRoot();
root.setPriority(log4cpp::Priority::WARN);
root.addAppender(appender1);
log4cpp::Category& sub1 = log4cpp::Category::getInstance(std::string("sub1"));
sub1.addAppender(appender2);
// use of functions for logging messages
root.error("root error");
root.info("root info");
sub1.error("sub1 error");
sub1.warn("sub1 warn");
// printf-style for logging variables
root.warn("%d + %d == %s ?", 1, 1, "two");
// use of streams for logging messages
root << log4cpp::Priority::ERROR << "Streamed root error";
root << log4cpp::Priority::INFO << "Streamed root info";
sub1 << log4cpp::Priority::ERROR << "Streamed sub1 error";
sub1 << log4cpp::Priority::WARN << "Streamed sub1 warn";
// or this way:
root.errorStream() << "Another streamed error";
}
log4cpp.h:
void writeLog(void);
log4cpp.i:
%module log4cpp
%{
#include "log4cpp.h"
%}
%inline %{
extern void writeLog(void);
%}
I have done following steps to generate log4cpp.so file:
swig -tcl -c++ log4cpp.i
g++ -c -fPIC log4cpp.cpp log4cpp_wrap.cxx -I/usr/include/tcl8.5
g++ -shared log4cpp.o log4cpp_wrap.o -o log4cpp.so
It generates the log4cpp_wrap.cxx, log4cpp.o, log4cpp_wrap.o and log4cpp.so files without any warning and error.
Whenever I am running the below command in tcl.
load ./log4cpp.so
It generates an undefined symbol error:
% load ./log4cpp.so
couldn't load file "./log4cpp.so": ./log4cpp.so: undefined symbol: _ZN7log4cpp8Appender29AppenderMapStorageInitializerD1Ev
What to do that to remove this error?

You need to link your SWIG shared library to log4cxx like you would with any other C++ application that uses this library. So when you call
g++ -shared log4cpp.o log4cpp_wrap.o -o log4cpp.so
It really needs to be something like this (but adapted to have real library and search path)
g++ -shared log4cpp.o log4cpp_wrap.o -L/path/to/your/install/of/log4cxx -llog4cxx -o log4cpp.so

Determining which gencode (compute_, arch_) values I need for nvcc - within CMake

I'm using CMake as a build system for my code, which involves CUDA. I was thinking of automating the task of deciding which compute_XX and arch_XX I need to to pass to my nvcc in order to compile for the GPU(s) on my current machine.
Is there a way to do this:
With the NVIDIA GPU deployment kit?
Without the NVIDIA GPU deployment kit?
Does CMake's FindCUDA help you in determining the values for these switches?

My strategy has been to compile and run a bash script that probes the card and returns the gencode for cmake. Inspiration came from University of Chicago's SLURM. To handle errors or multiple gpus or other circumstances, modify as necessary.
In your project folder create a file cudaComputeVersion.bash and ensure it is executable from the shell. Into this file put:
#!/bin/bash
# create a 'here document' that is code we compile and use to probe the card
cat << EOF > /tmp/cudaComputeVersion.cu
#include <stdio.h>
int main()
{
cudaDeviceProp prop;
cudaGetDeviceProperties(&prop,0);
int v = prop.major * 10 + prop.minor;
printf("-gencode arch=compute_%d,code=sm_%d\n",v,v);
}
EOF
# probe the card and cleanup
/usr/local/cuda/bin/nvcc /tmp/cudaComputeVersion.cu -o /tmp/cudaComputeVersion
/tmp/cudaComputeVersion
rm /tmp/cudaComputeVersion.cu
rm /tmp/cudaComputeVersion
And in your CMakeLists.txt put:
# at cmake-build-time, probe the card and set a cmake variable
execute_process(COMMAND ${CMAKE_CURRENT_SOURCE_DIR}/cudaComputeVersion.bash OUTPUT_VARIABLE GENCODE)
# at project-compile-time, include the gencode into the compile options
set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS}; "${GENCODE}")
# this makes CMake all chatty and allows you to see that GENCODE was set correctly
set(CMAKE_VERBOSE_MAKEFILE TRUE)
cheers

You can use the cuda_select_nvcc_arch_flags() macro in the FindCUDA module for this without any additional scripts when using CMake 3.7 or newer.
include(FindCUDA)
set(CUDA_ARCH_LIST Auto CACHE STRING
"List of CUDA architectures (e.g. Pascal, Volta, etc) or \
compute capability versions (6.1, 7.0, etc) to generate code for. \
Set to Auto for automatic detection (default)."
)
cuda_select_nvcc_arch_flags(CUDA_ARCH_FLAGS ${CUDA_ARCH_LIST})
list(APPEND CUDA_NVCC_FLAGS ${CUDA_ARCH_FLAGS})
The above sets CUDA_ARCH_FLAGS to -gencode arch=compute_61,code=sm_61 on my machine, for example.
The CUDA_ARCH_LIST cache variable can be configured by the user to generate code for specific compute capabilites instead of automatic detection.
Note: the FindCUDA module has been deprecated since CMake 3.10. However, no equivalent alternative to the cuda_select_nvcc_arch_flags() macro appears to be provided yet in the latest CMake release (v3.14). See this relevant issue at the CMake issue tracker for further details.

A slight improvement over #orthopteroid's answer, which pretty much ensures a unique temporary file is generated, and only requires one instead of two temporary files.
The following goes into scripts/get_cuda_sm.sh:
#!/bin/bash
#
# Prints the compute capability of the first CUDA device installed
# on the system, or alternatively the device whose index is the
# first command-line argument
device_index=${1:-0}
timestamp=$(date +%s.%N)
gcc_binary=$(which g++)
gcc_binary=${gcc_binary:-g++}
cuda_root=${CUDA_DIR:-/usr/local/cuda}
CUDA_INCLUDE_DIRS=${CUDA_INCLUDE_DIRS:-${cuda_root}/include}
CUDA_CUDART_LIBRARY=${CUDA_CUDART_LIBRARY:-${cuda_root}/lib64/libcudart.so}
generated_binary="/tmp/cuda-compute-version-helper-$$-$timestamp"
# create a 'here document' that is code we compile and use to probe the card
source_code="$(cat << EOF
#include <stdio.h>
#include <cuda_runtime_api.h>
int main()
{
cudaDeviceProp prop;
cudaError_t status;
int device_count;
status = cudaGetDeviceCount(&device_count);
if (status != cudaSuccess) {
fprintf(stderr,"cudaGetDeviceCount() failed: %s\n", cudaGetErrorString(status));
return -1;
}
if (${device_index} >= device_count) {
fprintf(stderr, "Specified device index %d exceeds the maximum (the device count on this system is %d)\n", ${device_index}, device_count);
return -1;
}
status = cudaGetDeviceProperties(&prop, ${device_index});
if (status != cudaSuccess) {
fprintf(stderr,"cudaGetDeviceProperties() for device ${device_index} failed: %s\n", cudaGetErrorString(status));
return -1;
}
int v = prop.major * 10 + prop.minor;
printf("%d\\n", v);
}
EOF
)"
echo "$source_code" | $gcc_binary -x c++ -I"$CUDA_INCLUDE_DIRS" -o "$generated_binary" - -x none "$CUDA_CUDART_LIBRARY"
# probe the card and cleanup
$generated_binary
rm $generated_binary
and the following goes into CMakeLists.txt or a CMake module:
if (NOT CUDA_TARGET_COMPUTE_CAPABILITY)
if("$ENV{CUDA_SM}" STREQUAL "")
set(ENV{CUDA_INCLUDE_DIRS} "${CUDA_INCLUDE_DIRS}")
set(ENV{CUDA_CUDART_LIBRARY} "${CUDA_CUDART_LIBRARY}")
set(ENV{CMAKE_CXX_COMPILER} "${CMAKE_CXX_COMPILER}")
execute_process(COMMAND
bash -c "${CMAKE_CURRENT_SOURCE_DIR}/scripts/get_cuda_sm.sh"
OUTPUT_VARIABLE CUDA_TARGET_COMPUTE_CAPABILITY_)
else()
set(CUDA_TARGET_COMPUTE_CAPABILITY_ $ENV{CUDA_SM})
endif()
set(CUDA_TARGET_COMPUTE_CAPABILITY "${CUDA_TARGET_COMPUTE_CAPABILITY_}"
CACHE STRING "CUDA compute capability of the (first) CUDA device on \
the system, in XY format (like the X.Y format but no dot); see table \
of features and capabilities by capability X.Y value at \
https://en.wikipedia.org/wiki/CUDA#Version_features_and_specifications")
execute_process(COMMAND
bash -c "echo -n $(echo ${CUDA_TARGET_COMPUTE_CAPABILITY})"
OUTPUT_VARIABLE CUDA_TARGET_COMPUTE_CAPABILITY)
execute_process(COMMAND
bash -c "echo ${CUDA_TARGET_COMPUTE_CAPABILITY} | sed 's/^\\([0-9]\\)\\([0-9]\\)/\\1.\\2/;' | xargs echo -n"
OUTPUT_VARIABLE FORMATTED_COMPUTE_CAPABILITY)
message(STATUS
"CUDA device-side code will assume compute capability \
${FORMATTED_COMPUTE_CAPABILITY}")
endif()
set(CUDA_GENCODE
"arch=compute_${CUDA_TARGET_COMPUTE_CAPABILITY}, code=compute_${CUDA_TARGET_COMPUTE_CAPABILITY}")
set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS} -gencode ${CUDA_GENCODE} )

Segmentation fault when passing device pointer to cublasSnrm2

The code of cublas below give us the errors:core dumped while being at "cublasSnrm2(handle,row,dy,incy,de)",could you give some advice?
main.cu
#include <iostream>
#include "cublas.h"
#include "cublas_v2.h"
#include "helper_cuda.h"
using namespace std;
int main(int argc,char *args[])
{
float y[10] = {1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0};
int dev=0;
checkCudaErrors(cudaSetDevice(dev));
//cublas init
cublasStatus stat;
cublasInit();
cublasHandle_t handle;
stat = cublasCreate(&handle);
if (stat !=CUBLAS_STATUS_SUCCESS)
{
printf("cublas handle create failed!\n");
cublasShutdown();
}
float * dy,*de,*e;
int incy = 1,ONE = 1,row = 10;
e = (float *)malloc(sizeof(float)*ONE);
e[0]=0.0f;
checkCudaErrors(cudaMalloc(&dy,sizeof(float)*row));
checkCudaErrors(cudaMalloc(&de,sizeof(float)*ONE));
checkCudaErrors(cudaMemcpy(dy,y,row*sizeof(float),cudaMemcpyHostToDevice));
checkCudaErrors(cudaMemcpy(de,e,ONE*sizeof(float),cudaMemcpyHostToDevice));
stat = cublasSnrm2(handle,row,dy,incy,de);
if (stat !=CUBLAS_STATUS_SUCCESS)
{
printf("norm2 compute failed!\n");
cublasShutdown();
}
checkCudaErrors(cudaMemcpy(e,de,ONE*sizeof(float),cudaMemcpyDeviceToHost));
std::cout<<e[0]<<endl;
return 0;
}
makefile is below:
NVIDIA = $(HOME)/NVIDIA_CUDA-5.0_Samples
CUDA = /usr/local/cuda-5.0
NVIDINCADD = -I$(NVIDIA)/common/inc
CUDAINCADD = -I$(CUDA)/include
CC = -L/usr/lib64/ -lstdc++
GCCOPT = -O2 -fno-rtti -fno-exceptions
INTELOPT = -O3 -fno-rtti -xW -restrict -fno-alias
DEB = -g
NVCC = -G
ARCH = -arch=sm_35
bcg:main.cu
nvcc $(DEB) $(NVCC) $(ARCH) $(CC) -lm $(NVIDINCADD) $(CUDAINCADD) -lcublas -I./ -o $(#) $(<)
clean:
rm -f bcg
rm -f hyb
My OS is linux redhat 6.2,CUDA's version is 5.0, GPU is K20M.

The problem is here:
cublasSnrm2(handle,row,dy,incy,de);
By default, the last parameter is a host pointer. So either pass e to the snrm2 call rather than de or do this:
cublasSetPointerMode(handle,CUBLAS_POINTER_MODE_DEVICE);
stat = cublasSnrm2(handle,row,dy,incy,de);
The pointer mode needs to be set to device if you want to pass a device pointer to store the result.

"unable to include mysql.h" in C program [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
How to make #include <mysql.h> work?
I need to connect C and mysql
This is my program
#include <stdio.h>
#include <mysql.h>
#define host "localhost"
#define username "root"
#define password "viswa"
#define database "dbase"
MYSQL *conn;
int main()
{
MYSQL_RES *res_set;
MYSQL_ROW row;
conn = mysql_init(NULL);
if( conn == NULL )
{ `
printf("Failed to initate MySQL\n");
return 1;
}
if( ! mysql_real_connect(conn,host,username,password,database,0,NULL,0) )
{
printf( "Error connecting to database: %s\n", mysql_error(conn));
return 1;
}
unsigned int i;
mysql_query(conn,"SELECT name, email, password FROM users");
res_set = mysql_store_result(conn);
unsigned int numrows = mysql_num_rows(res_set);
unsigned int num_fields = mysql_num_fields(res_set);
while ((row = mysql_fetch_row(res_set)) != NULL)
{
for(i = 0; i < num_fields; i++)
{
printf("%s\t", row[i] ? row[i] : "NULL");
}
printf("\n");
}
mysql_close(conn);
return 0;
}
I got the error "unable to include mysql.h".
I am using windows 7, Turbo C, mysql and I downloaded mysql-connector-c-noinstall-6.0.2-win32-vs2005, but I don't know how to include it.

Wrong syntax. The #include is a C preprocessor directive, not a statement (so should not end with a semi-colon). You should use
#include <mysql.h>
and you may need instead to have
#include <mysql/mysql.h>
or to pass -I /some/dir options to your compiler (with /some/dir replaced by the directory containing the mysql.h header).
Likewise, your #define should very probably not be ended with a semi-colon, you may need
#define username "root"
#define password "viswa"
#define database "dbase"
I strongly suggest reading a good book on C programming. You may want to examine the preprocessed form of your source code; when using gcc you could invoke it as gcc -C -E
yoursource.c to get the preprocessed form.
I also strongly recommend enabling warnings and debugging info (e.g. gcc -Wall -g for GCC). Find out how your specific compiler should be used. Learn also how to use your debugger (e.g. gdb). Study also existing C programs (notably free software).
You should learn how to configure your compiler to use extra include directories, and to link extra libraries.
N.B. With a linux distribution, you'll just have to install the appropriate packages and perhaps use mysql_config inside our Makefile (of course you'll need appropriate compiler and linker flags), perhaps with lines like
CFLAGS += -g -Wall $(shell mysql_config --cflags)
LIBES += $(shell mysql_config --libs)
added to your Makefile.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

CUDA with Qmake on Ubuntu 12.04 - cuda

Related

What does nvprof output: "No kernels were profiled" mean, and how to fix it

SWIG tcl : undefined symbol error for log4cpp wrapper

Determining which gencode (compute_, arch_) values I need for nvcc - within CMake

Segmentation fault when passing device pointer to cublasSnrm2

"unable to include mysql.h" in C program [duplicate]

Categories

Resources