Compiling CUDA kernel separately from host CMake - cuda

I heard that it is better to compile CUDA kernels separately from host code. How do I do that with cmake? I am an absolute beginner in Cmake.
Thanks

As I know it is not possible, unless you do some hacks, with only one cmake command. You could write two CMakeLists.txt one for the CUDA code and one for host code. In the file for the host code you can add the CUDA stuff as library. After that you can write a shell script which executes the two commands for each CMakeLists.txt.
I usually do this with make. There I have two targets for CUDA and host code each compiled into an object file. A third target executes the others and then links the object files to an executable.

Related

How can I execute a C program on Qemu riscv and observe the output?

What should be the best approach to run a C code in Qemu riscv and observe the output? I installed Qemu riscv following this link.What should I do now?
https://risc-v-getting-started-guide.readthedocs.io/en/latest/linux-qemu.html
You probably want to use the static user mode version of Qemu for most applications.
Then make sure to compile for RISC-V with the -static flag, and call qemu-riscv64-static [executable].
I highly recommend this, the system mode is a massive pain to handle if you don't need it
(have fun debugging the UART).
You can use libriscv to run RISC-V programs: https://github.com/fwsGonzo/libriscv
Inside the emulator folder there are 2 ways to build the emulator. build.sh produces emulators that run programs with no instruction listing. debug.sh produces debugging variant that shows the state of registers and instructions all the way through the program.
Building Qemu from sources is complete overkill.

How is the CUDA<<<...>>>() kernel launch syntax implemented

CUDA kernels are launched with this syntax (at least in the runtime API)
mykernel<<<blocks, threads, shared_mem, stream>>>(args);
Is this implemented as a macro or is it special syntax that nvcc removes before handing host code off to gcc?
The nvcc preprocessing system eventually converts it to a sequence of CUDA runtime library calls before handing the code off to the host code compiler for compilation. The exact sequence of calls may change depending on CUDA version.
You can inspect files using the --keep option to nvcc (and --verbose may help with understanding as well), and you can also see a trace of API calls issued for a kernel call using one of the profilers e.g. nvprof --print-api-trace ...
---EDIT---
Just to make this answer more concise, nvcc directly modifies the host code to replace the <<<...>>> syntax before passing it off to the host compiler (https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#offline-compilation)

Can't make cuda sample: Makefile:36: findcudalib.mk: No such file or directory

My problem is that I cannot compile a CUDA example. I believe I've got CUDA 4.0 installed correctly ( I need the old version b/c I'm trying to run GPGPU-Sim). I downloaded an NVIDIA cuda sample, namely conjugateGradient. If I cd to it and run
make
it doesn't work:
macair93278:7_CUDALibraries r8t$ cd conjugateGradient/
macair93278:conjugateGradient r8t$ ls
Makefile main.cpp
macair93278:conjugateGradient r8t$ make
Makefile:36: findcudalib.mk: No such file or directory
make: *** No rule to make target `findcudalib.mk'. Stop.
I've changed my path so that running
nvcc -V
doesn't produce an error, but gives me the version. So I think that's right.
Thanks for any help.
-bb
findcudalib.mk is missing because the individual sample you downloaded is not designed to be a complete, standalone sample. It requires a framework of other files and probably other libraries that need to be built around it.
To fix this, download the CUDA 4.0 SDK (GPU Computing SDK) from here.
Install that package. Once you have installed it, and assuming your CUDA install is otherwise intact, you should be able to change into the toplevel directory and issue make. This will build all the samples. For convenience, you may wish to issue make -k.

does nvcc always work as gcc when there is no CUDA elements in the code?

I am on Ubuntu 12.04 LTS and have installed CUDA 5.5. I understand that without any CUDA/GPGPU elements in the code, nvcc behaves as a C/C++ compiler -- more like gcc, however is there any exception to this rule ? if not, then can I use nvcc as gcc for non-CUDA C/C++ codes ?
No, nvcc doesn't behave like a C/C++ compiler for host code. What it does is the following:
separate device from host code into two separate files
compile device code (with nvcc, cudafe, ptxas)
invoke gcc for host code
If no device code exists, nothing is done in steps 1) and 2). So nvcc is actually no compiler, it is a compiler driver which invokes the right compilers for every part in the right order. To answer your question, if you use nvcc to compile host code only, you still use gcc.
It doesn't accept options to suppress warnings ( -W*)

CUDA and Sconscript

I am running a cuda project. But somehow I am not able to set the flag -arch=sm_20 in the
sconscript file which has been written by someone else. I need to use printf in kernel for debugging and I have little experience of sconscript python.
The specifics depend on the way you have SCons set up to work with CUDA. I use these scripts: http://github.com/BryanCatanzaro/cuda-scons
With this setup, all you need to do is invoke SCons with your preferred architecture:
scons arch=sm_20
And nvcc will be invoked with the -arch=sm_20 flag.
Details of your setup may be different, but if you look through your SCons script, you should see how to change this flag.