CLion code completion for CUDA missing some functions - cuda

I'm using CLion with Cuda toolkit on Windows 11 with MSVC compiler. It works and compiles fine, but the code completion is missing a lot of items like cudaMalloc and cudaFree. It does include some items though, like CudaMemAttachGlobal, see screenshot below.
I think it's because I haven't included any headers, but nvcc doesn't require explicit inclusion of headers, and the default CMake settings in CLion compiles and runs my .cu files just fine.
Is there anything extra I'm supposed to do to get CLion code completion to look at the entire available API from nvcc?
EDIT: The above description was with cuda toolkit on windows with MSVC. Now I tried it with cuda toolkit from the Nvidia installations instructions on Fedora 35, and the symptoms are exactly the same. The completion items are only macros, no actual functions. I looked through cuda_runtime_api.h and the signature for cudaFree is
extern __host__ __cudart_builtin__ cudaError_t CUDARTAPI cudaFree(void *devPtr);

Update:
It seems if I press Ctrl+Space, then the code completion menu works perfectly, and is able to complete both CudaMalloc and CudaFree, and anything else. If I don't press Ctrl+Space and just let it show the menu, it still shows the menu but only has macros in it.
Without ctrl+space:
With ctrl+space:
Original:
This seems to be a bug in either CLion or whatever subroutine (maybe CMake) it calls to get the code completions from header files; I tried this on Fedora Linux and observed the exact same behavior.
In contrast, VSCode has an NSight plugin that's developed by Nvidia, and that is able to code-complete functions like cudaMallocManaged and cudaFree with no problems.

Related

Can not profile a cuda code with nvprof when using CUPTI functions inside

I'm doing a simple experiment. Everyone may know about callback_metric sample code of CUPTI (located in CUPTI folder: /usr/local/cuda/extras/CUPTI/sample/callback_metric). It contains only a simple code for reading a metric when running a vectorAdd kernel. Everything works when I compile and run the code.
But when I run this code under nvprof command (nvprof ./callback_metric), I get an error message as:
Error: incompatible CUDA driver version
both nvprof and other CUPTI-based codes work fine separately.
The profilers are not intended to be used in this way with applications that make use of CUPTI.
This is documented in the profiler documentation:
Here are a couple of reasons why Visual Profiler may fail to gather metric or event information.
More than one tool is trying to access the GPU. To fix this issue please make sure only one tool is using the GPU at any given point. Tools include the CUDA command line profiler, Parallel NSight Analysis Tools and Graphics Tools, and applications that use either CUPTI or PerfKit API (NVPM) to read event values.

let Nsight start debugging after certain kernel function is executed

My CUDA program have too many kernel functions and if I open the CUDA debugging mode, I'll have to wait for an whole hour after the breakpoint in certain kernel function is triggered.
Is there any way for Nsight to start debugging after certain kernel functions, or only debug the certain kernel function?
I'm using Nsight with VS2012
In theory you can follow the instructions in the Nsight help file (either the online help or local help. At the time of writing the page is here).
In short:
In the Nsight Monitor options, CUDA » Use this Monitor for CUDA attach should be True.
Before starting your application, set an environment variable called NSIGHT_CUDA_DEBUGGER to 1.
Then in your CUDA kernel, you can add a breakpoint like this:
asm("brkpt;");
This will work similar to the __debugbreak() intrinsic or int 3 assembly instruction in host code. When hit you will get a dialog prompting you to attach the CUDA debugger.
In practice, at least for me it Just Doesn't Work™. Maybe you'll have more luck.

Several questions about CUDA and cuPrintf

I can compile successfully my code using cuPrintf by nvcc, but cannot compile it in Visual Studio 2012 environment. It says that "volatile char *" cannot be changed to "const void *" in "cudaMemcpyToSymbol" function.
cuPrintf seems doesn't work, there's no cuPrintf function executed in kernel code.
How to make nvcc export pdb file?
Is there any other convenient way to debug in kernel function? I have only one laptop.
1st , cuPrinft is deprecated (As far as I know it has never been released) you can print data from kernel using print command, but this is a very not recommended way of debugging your kernels.
2nd, You are compiling using CUDA nvcc compiler, there is no such thing pdb file in CUDA, Albeit watch the 'g' and 'G' flags, those may dramatically increase your running time.
3rd,
The best way to debug kernels is using visual Nsight

Visual Studio + Nsight : __syncthreads() undefined [duplicate]

At the moment CUDA already recognizes a key CUDA C/C++ function such as cudaMalloc, cudaFree, cudaEventCreate, etc.
It also recognizes certain types like dim3 and cudaEvent_t.
However, it doesn't recognize other functions and types such as the texture template, the __syncthreads functions, or the atomicCAS function.
Everything compiles just fine, but I'm tired of seeing red underlinings all over the place and I want to the see the example parameters displayed when you type in any recognizable function.
How do I get VS to catch these functions?
You could create a dummy #include file of the following form:
#pragma once
#ifdef __INTELLISENSE__
void __syncthreads();
...
#endif
This should hide the fake prototypes from the CUDA and Visual C++ compilers, but still make them visible to IntelliSense.
Source for __INTELLISENSE__ macro: http://blogs.msdn.com/b/vcblog/archive/2011/03/29/10146895.aspx
You need to add CUDA-specific keywords like __syncthreads to the usertype.dat file for visual studio. An example usertype.dat file is included with the NVIDIA CUDA SDK. You also need to make sure that visual studio recognizes .cu files as c/c++ files as described in this post:
Note however that where that post uses $(CUDA_INC_PATH), with recent versions of CUDA you should use $(CUDA_PATH)/include.
Also, I would recommend Visual Assist X -- not free, but worth the money -- to improve intellisense. It works well with CUDA if you follow these instructions:
http://www.wholetomato.com/forum/topic.asp?TOPIC_ID=5481
http://forums.nvidia.com/index.php?showtopic=53690

cuda with mingw - updated

We have been developing our code in linux, but would like to compile a windows executable. The old non-gpu version compiles just fine with mingw in windows, so I was hoping I'd be able to do the same with the CUDA version.
The strategy is to compile kernel code with nvcc in visual studio, and the rest with gcc in mingw.
So far, we easily compiled the .cu file (with the kernel and kernel launches) in visual studio. However, we still can't compile the c code in mingw. The c code contains cuda api calls such as cudaMalloc and cuda types such as cudaEvent_t, so we must include cuda.h and cuda_runtime.h. However, gcc gives warnings and errors for these headers, for example:
../include/host_defines.h:57:0: warning: "__cdecl" redefined
and
../include/vector_functions.h:127:14: error: 'short_4' has no member named 'x'
Any ideas on how we can include these headers and compile the c portion of the code?
If you are really desperate there might be a way. The nvcc is really just a frontend for a bunch of compilers. It invokes g++ a lot to strip comments, separate device and host code, handle name mangling, link stuff back together, etc. (use --verbose) to get the details.
My idea is as follows: You should be able to compile the host code with mingw while compiling the device code to a fatbin on a linux machine (as I guess the device binary is host-machine independent). Afterwards link both parts of the code back together with mingw or use the driver API to load the fatbin dynamically. Disclaimer: Did not test!
As far as I know, it is impossible to use CUDA without MSVC. So, you need MSVC to make nvcc work, and you can compile CPU code with mingw and link everything together.
According to http://forums.nvidia.com/index.php?showtopic=30743
"There are no current plans to support mingw."
You might want to take a look at how the cycles renderer handles this, look at https://developer.blender.org/diffusion/B/browse/master/extern/cuew/ and
https://developer.blender.org/diffusion/B/browse/master/intern/cycles/device/device_cuda.cpp
I know it's not an automagic trick but it might help you get started.