Visual Studio + Nsight : __syncthreads() undefined [duplicate] - cuda

At the moment CUDA already recognizes a key CUDA C/C++ function such as cudaMalloc, cudaFree, cudaEventCreate, etc.
It also recognizes certain types like dim3 and cudaEvent_t.
However, it doesn't recognize other functions and types such as the texture template, the __syncthreads functions, or the atomicCAS function.
Everything compiles just fine, but I'm tired of seeing red underlinings all over the place and I want to the see the example parameters displayed when you type in any recognizable function.
How do I get VS to catch these functions?

You could create a dummy #include file of the following form:
#pragma once
#ifdef __INTELLISENSE__
void __syncthreads();
...
#endif
This should hide the fake prototypes from the CUDA and Visual C++ compilers, but still make them visible to IntelliSense.
Source for __INTELLISENSE__ macro: http://blogs.msdn.com/b/vcblog/archive/2011/03/29/10146895.aspx

You need to add CUDA-specific keywords like __syncthreads to the usertype.dat file for visual studio. An example usertype.dat file is included with the NVIDIA CUDA SDK. You also need to make sure that visual studio recognizes .cu files as c/c++ files as described in this post:
Note however that where that post uses $(CUDA_INC_PATH), with recent versions of CUDA you should use $(CUDA_PATH)/include.
Also, I would recommend Visual Assist X -- not free, but worth the money -- to improve intellisense. It works well with CUDA if you follow these instructions:
http://www.wholetomato.com/forum/topic.asp?TOPIC_ID=5481
http://forums.nvidia.com/index.php?showtopic=53690

Related

Can libcu++ be included in a regular C++ project?

I have tried using <cuda/std/chrono> and <cuda/std/array> inside my C++ project, not compiled with NVCC. The reason I am not making a CUDA project is that NVCC fails to compile some of my template code.
I am getting several errors about undefined constants such as the ones below:
error C2065: '_LInf': undeclared identifier error C3615: constexpr function 'cuda::std::__4::__libcpp_numeric_limits<long double,true>::infinity' cannot result in a constant expression
Can libcu++ be used inside C++ projects or only CUDA projects?
I have already tried including and linking to my C++ project the headers and libraries that are automatically added to CUDA projects by Visual Studio.
Converting comments into an answer:
The system requirements page for libcu++ indicate that only NVIDIA’s toolchains (so nvcc or the HPC SDK compilers) are supported.
More generally, libcu++ is intended to implement a subset of the C++ standard library in a transparent way which allows it to be used identically in host and device code without any programmer intervention beyond include a header and respecting its namespace conventions. It stands to reason that this requires the NVIDIA toolchain to implement this magic.

PyCuda C++ kernel "error: this declaration may not have extern "C" linkage"

I tried using std::tuple in my kernel code, but received many error: this declaration may not have extern "C" linkage errors that pointed to utility and tuple
It complains on the include. The following repros for me.
from pycuda.compiler import SourceModule
mod = SourceModule("""#include <tuple>""")
Do I need to do something special in my kernel code or in my Python code to specify I want to use the C++ compiler?
Cuda version: 11.8
PyCuda version: 2022.2.1
Do I need to do something special in my kernel code or in my Python code to specify I want to use the C++ compiler?
To be clear, you are using the C++ compiler. But PyCUDA automagically wraps the code you pass into a SourceModule instance in extern “C” unless you explicitly tell it not to:
Unless no_extern_c is True, the given source code is wrapped in extern “C” { … } to prevent C++ name mangling.
The underlying reason from a C++ perspective is that templated instances of types and functions can’t resolve with C linkage, thus the error.
However, even after you fix that problem, prepared to be disappointed. CUDA supports a lot of C++ language features, but it doesn’t support the standard library and you can’t use std::tuple within kernel code. NVIDIA does provide their own (very limited) reimplementation of the C++ standard library, and it does have a basic tuple type. That might work for you.

CLion code completion for CUDA missing some functions

I'm using CLion with Cuda toolkit on Windows 11 with MSVC compiler. It works and compiles fine, but the code completion is missing a lot of items like cudaMalloc and cudaFree. It does include some items though, like CudaMemAttachGlobal, see screenshot below.
I think it's because I haven't included any headers, but nvcc doesn't require explicit inclusion of headers, and the default CMake settings in CLion compiles and runs my .cu files just fine.
Is there anything extra I'm supposed to do to get CLion code completion to look at the entire available API from nvcc?
EDIT: The above description was with cuda toolkit on windows with MSVC. Now I tried it with cuda toolkit from the Nvidia installations instructions on Fedora 35, and the symptoms are exactly the same. The completion items are only macros, no actual functions. I looked through cuda_runtime_api.h and the signature for cudaFree is
extern __host__ __cudart_builtin__ cudaError_t CUDARTAPI cudaFree(void *devPtr);
Update:
It seems if I press Ctrl+Space, then the code completion menu works perfectly, and is able to complete both CudaMalloc and CudaFree, and anything else. If I don't press Ctrl+Space and just let it show the menu, it still shows the menu but only has macros in it.
Without ctrl+space:
With ctrl+space:
Original:
This seems to be a bug in either CLion or whatever subroutine (maybe CMake) it calls to get the code completions from header files; I tried this on Fedora Linux and observed the exact same behavior.
In contrast, VSCode has an NSight plugin that's developed by Nvidia, and that is able to code-complete functions like cudaMallocManaged and cudaFree with no problems.

calling cuda c kernel from fortran 90

I am planning to call a typical matrix multiply CUDA C kernel from a fortran program. I am referring the following link http://www-irma.u-strasbg.fr/irmawiki/index.php/Call_CUDA_from_Fortran . I would be glad if any resources is available on this. I intend to avoid PGI Cuda Fortran as I am not possessing the compiler. In the link above I cannot make out what should be the CUDA.F90 file. I assume the last code given in the link is that of main.F90. Kindly help.
Perhaps you need to re-read the very first line of that page you linked to. Those instructions are relying on a set of external ISO C bindings for the CUDA API. That is where the CUDA.F90 file you are asking about comes from. You will need to download and build the FortCUDA bindings to use the instructions on that wiki page.
Edited to add that given your last question was about compilation in Nsight Visual Studio Edition, it would seem that you are running on a Windows platform. You should know that you can't use gcc to build CUDA applications on Windows platforms. The supplied CUDA libraries will only work with either the Microsoft toolchain or (possibly) Intel's compilers in certain cases.

Does CUDA use an interpreter or a compiler?

This is a bit of silly question, but I'm wondering if CUDA uses an interpreter or a compiler?
I'm wondering because I'm not quite sure how CUDA manages to get source code to run on two cards with different compute capabilities.
From Wikipedia:
Programmers use 'C for CUDA' (C with Nvidia extensions and certain restrictions), compiled through a PathScale Open64 C compiler.
So, your answer is: it uses a compiler.
And to touch on the reason it can run on multiple cards (source):
CUDA C/C++ provides an abstraction, it's a means for you to express how you want your program to execute. The compiler generates PTX code which is also not hardware specific. At runtime the PTX is compiled for a specific target GPU - this is the responsibility of the driver which is updated every time a new GPU is released.
These official documents CUDA C Programming Guide and The CUDA Compiler Driver (NVCC) explain all the details about the compilation process.
From the second document:
nvcc mimics the behavior of the GNU compiler gcc: it accepts a range
of conventional compiler options, such as for defining macros and
include/library paths, and for steering the compilation process.
Not just limited to cuda , shaders in directx or opengl are also complied to some kind of byte code and converted to native code by the underlying driver.