I am running a cuda project. But somehow I am not able to set the flag -arch=sm_20 in the
sconscript file which has been written by someone else. I need to use printf in kernel for debugging and I have little experience of sconscript python.
The specifics depend on the way you have SCons set up to work with CUDA. I use these scripts: http://github.com/BryanCatanzaro/cuda-scons
With this setup, all you need to do is invoke SCons with your preferred architecture:
scons arch=sm_20
And nvcc will be invoked with the -arch=sm_20 flag.
Details of your setup may be different, but if you look through your SCons script, you should see how to change this flag.
Related
I am working on a Ultra96 board, SOC Xilinx Zynq UltraScale+ MPSoC ZU3EG A484, running:
DISTRIB_ID=petalinux
DISTRIB_RELEASE=2020.1
DISTRIB_CODENAME=zeus
DISTRIB_DESCRIPTION="PetaLinux 2020.1"
Here my kernel version: 5.4.0-xilinx-v2020.1
I am not familiar with petalinux so i need help about recompiling xilinx kernel to enable hugepages.
Is there any guide that explain step by step how to recompile the kernel?
Thank you.
Run
petalinux-config -c kernel
which will take you to the kernel menuconfig, which you can then use to set hugepages.
Once you are happy with the changes and want to save them, you can run:
petalinux-build -c kernel -x finish
which will save the mods to project-spec/meta-user/recipes-kernel/devtool-fragment.cfg
which you can commit to git to keep the configuration changes.
See Configuring Project Components in https://www.xilinx.com/support/documentation/sw_manuals/xilinx2020_1/ug1144-petalinux-tools-reference-guide.pdf for more details
CUDA kernels are launched with this syntax (at least in the runtime API)
mykernel<<<blocks, threads, shared_mem, stream>>>(args);
Is this implemented as a macro or is it special syntax that nvcc removes before handing host code off to gcc?
The nvcc preprocessing system eventually converts it to a sequence of CUDA runtime library calls before handing the code off to the host code compiler for compilation. The exact sequence of calls may change depending on CUDA version.
You can inspect files using the --keep option to nvcc (and --verbose may help with understanding as well), and you can also see a trace of API calls issued for a kernel call using one of the profilers e.g. nvprof --print-api-trace ...
---EDIT---
Just to make this answer more concise, nvcc directly modifies the host code to replace the <<<...>>> syntax before passing it off to the host compiler (https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#offline-compilation)
I am on Ubuntu 12.04 LTS and have installed CUDA 5.5. I understand that without any CUDA/GPGPU elements in the code, nvcc behaves as a C/C++ compiler -- more like gcc, however is there any exception to this rule ? if not, then can I use nvcc as gcc for non-CUDA C/C++ codes ?
No, nvcc doesn't behave like a C/C++ compiler for host code. What it does is the following:
separate device from host code into two separate files
compile device code (with nvcc, cudafe, ptxas)
invoke gcc for host code
If no device code exists, nothing is done in steps 1) and 2). So nvcc is actually no compiler, it is a compiler driver which invokes the right compilers for every part in the right order. To answer your question, if you use nvcc to compile host code only, you still use gcc.
It doesn't accept options to suppress warnings ( -W*)
I heard that it is better to compile CUDA kernels separately from host code. How do I do that with cmake? I am an absolute beginner in Cmake.
Thanks
As I know it is not possible, unless you do some hacks, with only one cmake command. You could write two CMakeLists.txt one for the CUDA code and one for host code. In the file for the host code you can add the CUDA stuff as library. After that you can write a shell script which executes the two commands for each CMakeLists.txt.
I usually do this with make. There I have two targets for CUDA and host code each compiled into an object file. A third target executes the others and then links the object files to an executable.
with a very simple code, hello world, the breakpoint is not working.
I can't write the exact comment since it's not written in English,
but it's like 'the symbols of this document are not loaded' or something.
there's not cuda codes, just only one line printf in main function.
The working environment is windows7 64bit, vc++2008 sp1, cuda toolkit 3.1 64bits.
Please give me some explanation on this. :)
So this is just a host application (i.e. nothing to do with CUDA) doing printf that you can't debug? Have you selected "Debug" as the configuration instead of "Release"?
Are you trying to use a Visual Studio breakpoint to stop in your CUDA device code (.cu)? If that is the case, then I'm pretty sure that you can't do that. NVIDIA has released Parallel NSIGHT, which should allow you to do debugging of CUDA device code (.cu), though I don't have much experience with it myself.
Did you compile with -g -G options as noted in the documentation?
NVCC, the NVIDIA CUDA compiler driver, provides a mechanism for generating the debugging information necessary for CUDA-GDB to work properly. The -g -G option pair must be passed to NVCC when an application is compiled for ease of debugging with CUDA-GDB; for example,
nvcc -g -G foo.cu -o foo
here: https://docs.nvidia.com/cuda/cuda-gdb/index.html