How to use clangd text-highlighting with CUDA on header files - cuda

I'm running VSCodium with the clangd extension, and I'd like to have proper CUDA highlighting (e.g. __device__ and __host__ keywords are understood). I ran CMake to generate the compile_commands.json file, and it includes CUDA-specific keywords (e.g. --cuda-gpu-arch=sm_52, --cuda-path=/usr/local/cuda). However, clangd still gives me the squiggle underline on CUDA-specific keywords. It seems like if clangd is just using the clang compiler to understand the source code, then clangd should work with CUDA (given that clang was able to compile the CUDA code).
So, is there any way to get clangd to work for CUDA? And if so, how do I do it via VSCodium?
Edit: As it turns out, the CUDA code highlighting works on the directly linked files (with a .cu extension), but it does not work on one of the included header files (with a .hpp extension). How do I get the text-highlighting to work on the header file?

create a config file ".clangd" in your project directory and specify the location of CUDA headers. This works for me.
CompileFlags:
Add:
- -xc++
- --cuda-path=/path_to_cuda_installation
- --cuda-gpu-arch=sm_52
- -I/path_to_cuda_installation/include

Related

Can I use STL in a DriverKit driver?

Can I use for example std::vector in DriverKit driver in macOs / XCode?
DriverKit has some container class like OSArray
https://developer.apple.com/documentation/driverkit/osarray?language=objc
If I create a new "DriverKit driver" project and include <vector> then I get build errors. Those error comes from including <cstring> and the error is
No member named 'strcpy' in the global namespace
No member named 'strcat' in the global namespace
As far as I can tell, you aren't supposed to. The headers you end up pulling in there aren't from the DriverKit SDK, they're the default ones that come with Xcode's compiler toolchain. They assume a normal macOS build environment, not the dext environment. The DriverKit SDK doesn't include C++ stdlib headers, and only contains some stripped-down headers for a subset of the C standard library. Hence the missing strcpy and strcat.
Linking against libc++ also fails, even the version included in the DriverKit SDK. I don't know why there is a version of that library included with the DriverKit SDK, but it's clearly not intended for being used in dexts.
There's presumably nothing in particular stopping you from including some other container library, or even directly including parts of an STL implementation. You may need to manually wire up memory allocation calls though.

Is there a list of headers that can be used in an string to compile with NVRTC? [duplicate]

Specifically, my issue is that I have CUDA code that needs <curand_kernel.h> to run. This isn't included by default in NVRTC. Presumably then when creating the program context (i.e. the call to nvrtcCreateProgram), I have to send in the name of the file (curand_kernel.h) and also the source code of curand_kernel.h? I feel like I shouldn't have to do that.
It's hard to tell; I haven't managed to find an example from NVIDIA of someone needing standard CUDA files like this as a source, so I really don't understand what the syntax is. Some issues: curand_kernel.h also has includes... Do I have to do the same for each of these? I am not even sure the NVRTC compiler will even run correctly on curand_kernel.h, because there are some language features it doesn't support, aren't there?
Next: if you've sent in the source code of a header file to nvrtcCreateProgram, do I still have to #include it in the code to be executed / will it cause an error if I do so?
A link to example code that does this or something like it would be appreciated much more than a straightforward answer; I really haven't managed to find any.
You have to send the "filename" and the source of each header separately.
When the preprocessor does its thing, it'll use any #include filenames as a key to find the source for the header, based on the collection that you provide.
I suspect that, in this case, the compiler (driver) doesn't have file system access, so you have to give it the source in much the same way that you would for shader includes in OpenGL.
So:
Include your header's name when calling nvrtcCreateProgram. The compiler will, internally, generate the equivalent of a std::map<string,string> containing the source of each header indexed by the given name.
In your kernel source, use #include "foo.cuh" as usual.
The compiler will use foo.cuh as an index or key into its internal map (created when you called nvrtcCreateProgram), and will retrieve the header source from that collection
Compilation proceeds as normal.
One of the reasons that nvrtc provides only a "subset" of features is that the compiler plays in a somewhat sandboxed environment, without necessarily having all of the supporting tools and utilities lying around that you have with offline compilation. So, you have to manually handle a lot of the stuff that the normal nvcc + (gcc | MSVC| clang) combination provides.
A possible, but non-ideal, solution would be to preprocess the file that you need in your IDE, save the result and then #include that. However, I bet there is a better way to do that. if you just want curand, consider diving into the library and extracting the part you need (blech) or using another GPU-friendly rand implementation. On older CUDA versions, I just generated a big array of random floats on the host, uploaded it to the GPU, and sampled it in the kernels.
This related link may be helpful.
You do not need to load curand_kernel.h yourself and add it to the include "aliases" mechanism.
Instead, you can simply add the CUDA include directory to your (set of) include paths, e.g. by adding --include-path=/usr/local/cuda/include to your NVRTC compiler options.
(I do this in my GPU-kernel-runner test harness, by default, to be on the safe side.)

CUDA 5.5 - Relocatable device code causes unresolved external symbol

I'm using CUDA 5.5 and my compiler is VC2012. My projects consists of 2 .cu files. I need to use dynamic parallelism so I have enabled "Generate relocatable device code" option. As soon as this option is enabled I get the following linking error:
error LNK2001: unresolved external symbol __fatbinwrap_54_tmpxft_0000110c_00000000_8_cuda_device_runtime_cpp1_ii_5f6993ef
If I turn that option off the error disappears. The error does not depend on the contents of the files - if I comment out everything in them (#if 0 .... #endif so that they are essentially empty files) I still get the same error.
Update: I installed VC2010, and I still get the exact same error message!
Update 2: I got really tired with this whole thing so I took a dynamic parallelism sample project from VC2010 from the SDK samples (cdpSimplePrint), replaced the files in it and compiled. The issue is gone. Tried to compare the settings between the two projects, but I couldn't find anything. I don't know what caused it, but frankly, at this stage I don't care already, as long as I have a workaround.
Check that you are linking with the cudadevrt library, see the "Compiling and Linking" section in the CUDA Dynamic Parallelism Programming Guide for more info.

how to use cublas library

I'm novice to cuda. I want to use cublas functionality in cuda programs and would like to know the procedure on how to use it. As my program is giving error i.e. *"cublas_v2.h file not present"* when I try to run it on GPU.
The cublas documentation is contained
here. So you
should familiarize yourself with it.
Try and run a sample program. There are samples in the CUDA samples that come with the CUDA 5 toolkit, or you can try the one here. Make sure you understand how to build and run that example.
If you're still having trouble with your own program, post a new question here with the specifics. For example, if your program is giving error output, paste in the complete error output that you are getting, and paste in the code from your program that is generating that error output (at least). Preferably paste in your whole program, or a simple version that will reproduce the error.
The error you're getting right now looks like it should be coming from the compiler not from when you run the program. If so you simply need to tell the compiler where to find that header file. If you look at a cuda sample like this one, you can see from the included makefile how to tell the compiler where to look for header files.
There can be multiple things because of which you must be struggling to run a code which makes use of the CuBlas library.
The most important thing is to compile your source code with -lcublas flag.
It should look like nvcc -c example.cu -o example -lcublas
Secondly, confirm whether you have Cublas Library in your system.
As it says "cublas_v2.h file not present", try doing "whereis cublas_v2.h"
or search manually for the file, if it is not there you need to install
Cublas library from Nvidia's website.
Confirm your Cuda Installation path and LD_LIBRARY_PATH
Your cuda path should be /usr/local/cuda.
and LD_LIBRARY_PATH should be /usr/local/cuda/lib64 OR
/usr/local/cuda/lib.

Building GPL C program with CUDA module

I am attempting to modify a GPL program written in C. My goal is to replace one method with a CUDA implementation, which means I need to compile with nvcc instead of gcc. I need help building the project - not implementing it (You don't need to know anything about CUDA C to help, I don't think).
This is my first time trying to change a C project of moderate complexity that involves a .configure and Makefile. Honestly, this is my first time doing anything in C in a long time, including anything involving gcc or g++, so I'm pretty lost.
I'm not super interested in learning configure and Makefiles - this is more of an experiment. I would like to see if the project implementation goes well before spending time creating a proper build script. (Not unwilling to learn as necessary, just trying to give an idea of the scope).
With that said, what are my options for building this project? I have a myriad of questions...
I tried adding "CC=nvcc" to the configure.in file after AC_PROG_CC. This appeared to work - output from running configure and make showed nvcc as the compiler. However make failed to compile the source file with the CUDA kernel, not recognizing the CUDA specific syntax. I don't know why, was hoping this would just work.
Is it possible to compile a source file with nvcc, and then include it at the linking step in the make process for the main program? If so, how? (This question might not make sense - I'm really rusty at this)
What's the correct way to do this?
Is there a quick and dirty way I could use for testing purposes?
Is there some secret tool everyone uses to setup and understand these configure and Makefiles? This is even worse than the Apache Ant scripts I'm used to (Yeah, I'm out of my realm)
You don't need to compile everything with nvcc. Your guess that you can just compile your CUDA code with NVCC and leave everything else (except linking) is correct. Here's the approach I would use to start.
Add a 1 new header (e.g. myCudaImplementation.h) and 1 new source file (with .cu extension, e.g. myCudaImplementation.cu). The source file contains your kernel implementation as well as a (host) C wrapper function that invokes the kernel with the appropriate execution configuration (aka <<<>>>) and arguments. The header file contains the prototype for the C wrapper function. Let's call that wrapper function runCudaImplementation()
I would also provide another host C function in the source file (with prototype in the header) that queries and configures the GPU devices present and returns true if it is successful, false if not. Let's call this function configureCudaDevice().
Now in your original C code, where you would normally call your CPU implementation you can do this.
// must include your new header
#include "myCudaImplementation.h"
// at app initialization
// store this variable somewhere you can access it later
bool deviceConfigured = configureCudaDevice;
...
// then later, at run time
if (deviceConfigured)
runCudaImplementation();
else
runCpuImplementation(); // run the original code
Now, since you put all your CUDA code in a new .cu file, you only have to compile that file with nvcc. Everything else stays the same, except that you have to link in the object file that nvcc outputs. e.g.
nvcc -c -o myCudaImplementation.o myCudaImplementation.cu <other necessary arguments>
Then add myCudaImplementation.o to your link line (something like:)
g++ -o myApp myCudaImplementation.o
Now, if you have a complex app to work with that uses configure and has a complex makefile already, it may be more involved than the above, but this is the general approach. Bottom line is you don't want to compile all of your source files with nvcc, just the .cu ones. Use your host compiler for everything else.
I'm not expert with configure so can't really help there. You may be able to run configure to generate a makefile, and then edit that makefile -- it won't be a general solution, but it will get you started.
Note that in some cases you may also need to separate compilation of your .cu files from linking them. In this case you need to use NVCC's separate compilation and linking functionality, for which this blog post might be helpful.