calling cuda c kernel from fortran 90 - cuda

I am planning to call a typical matrix multiply CUDA C kernel from a fortran program. I am referring the following link http://www-irma.u-strasbg.fr/irmawiki/index.php/Call_CUDA_from_Fortran . I would be glad if any resources is available on this. I intend to avoid PGI Cuda Fortran as I am not possessing the compiler. In the link above I cannot make out what should be the CUDA.F90 file. I assume the last code given in the link is that of main.F90. Kindly help.

Perhaps you need to re-read the very first line of that page you linked to. Those instructions are relying on a set of external ISO C bindings for the CUDA API. That is where the CUDA.F90 file you are asking about comes from. You will need to download and build the FortCUDA bindings to use the instructions on that wiki page.
Edited to add that given your last question was about compilation in Nsight Visual Studio Edition, it would seem that you are running on a Windows platform. You should know that you can't use gcc to build CUDA applications on Windows platforms. The supplied CUDA libraries will only work with either the Microsoft toolchain or (possibly) Intel's compilers in certain cases.

Related

Can not profile a cuda code with nvprof when using CUPTI functions inside

I'm doing a simple experiment. Everyone may know about callback_metric sample code of CUPTI (located in CUPTI folder: /usr/local/cuda/extras/CUPTI/sample/callback_metric). It contains only a simple code for reading a metric when running a vectorAdd kernel. Everything works when I compile and run the code.
But when I run this code under nvprof command (nvprof ./callback_metric), I get an error message as:
Error: incompatible CUDA driver version
both nvprof and other CUPTI-based codes work fine separately.
The profilers are not intended to be used in this way with applications that make use of CUPTI.
This is documented in the profiler documentation:
Here are a couple of reasons why Visual Profiler may fail to gather metric or event information.
More than one tool is trying to access the GPU. To fix this issue please make sure only one tool is using the GPU at any given point. Tools include the CUDA command line profiler, Parallel NSight Analysis Tools and Graphics Tools, and applications that use either CUPTI or PerfKit API (NVPM) to read event values.

Can I profile OpenACC kernel in C source code level?

I'm trying to speed-up my code with openacc with PGI 15.7 compiler.
I want to profile my code in C source level.
I'm using 'nvvp' profiler from CUDA 7.0
When I run nvvp, I can use 'analysis tap' and can get which latency is the reason my code slows. (data dependency, conditional branch and bandwidth... etc)
But, I couldn't get line-based analysis, but only 'kernel' level analysis.
(e.g. main_300_gpu kernel used 10s) .
So I have some trouble to know where do I have to fix the code.
Is there any way to profile my code in source-level?
I'm using
PGI 15.7 (using pgcc)
CUDA 7.0
NVIDIA GTX 960
Ubuntu 14.04 LTS x86_64
[my nvvp reporting screenshots]
You can also try adding the flag "-ta=tesla:lineinfo" to have the compiler add source code association for the profiler (it's the same flag as nvcc --lineinfo). Though as Bob points out, the code may been heavily transformed so the line information many not directly correspond back to your original source.
At the current time (and on CUDA 7.5 or higher, with a cc5.2 or higher GPU), the nvvp profiler can associate various kinds of sampled execution activity with CUDA C/C++ lines of source code.
However, at the present time, this capability does not extend to OpenACC C/C++ (or Fortran) lines of source code.
It should still be possible to associate the activity with the disassembly, however, and it may be possible to associate with intermediate C source files produced by the PGI nollvm option. Niether of these will bear much resemblance to your OpenACC source code, however.
Another option for profiling OpenACC codes using PGI tools is to set the PGI_ACC_TIME=1 environment variable before executing your code. This will enable a lightweight profiler built into the runtime to give some analysis of the execution characteristics of your OpenACC code, in particular those parts associated with accelerator regions. The output is annotated so you can refer back to lines of your source code.

Does CUDA use an interpreter or a compiler?

This is a bit of silly question, but I'm wondering if CUDA uses an interpreter or a compiler?
I'm wondering because I'm not quite sure how CUDA manages to get source code to run on two cards with different compute capabilities.
From Wikipedia:
Programmers use 'C for CUDA' (C with Nvidia extensions and certain restrictions), compiled through a PathScale Open64 C compiler.
So, your answer is: it uses a compiler.
And to touch on the reason it can run on multiple cards (source):
CUDA C/C++ provides an abstraction, it's a means for you to express how you want your program to execute. The compiler generates PTX code which is also not hardware specific. At runtime the PTX is compiled for a specific target GPU - this is the responsibility of the driver which is updated every time a new GPU is released.
These official documents CUDA C Programming Guide and The CUDA Compiler Driver (NVCC) explain all the details about the compilation process.
From the second document:
nvcc mimics the behavior of the GNU compiler gcc: it accepts a range
of conventional compiler options, such as for defining macros and
include/library paths, and for steering the compilation process.
Not just limited to cuda , shaders in directx or opengl are also complied to some kind of byte code and converted to native code by the underlying driver.

cuda with mingw - updated

We have been developing our code in linux, but would like to compile a windows executable. The old non-gpu version compiles just fine with mingw in windows, so I was hoping I'd be able to do the same with the CUDA version.
The strategy is to compile kernel code with nvcc in visual studio, and the rest with gcc in mingw.
So far, we easily compiled the .cu file (with the kernel and kernel launches) in visual studio. However, we still can't compile the c code in mingw. The c code contains cuda api calls such as cudaMalloc and cuda types such as cudaEvent_t, so we must include cuda.h and cuda_runtime.h. However, gcc gives warnings and errors for these headers, for example:
../include/host_defines.h:57:0: warning: "__cdecl" redefined
and
../include/vector_functions.h:127:14: error: 'short_4' has no member named 'x'
Any ideas on how we can include these headers and compile the c portion of the code?
If you are really desperate there might be a way. The nvcc is really just a frontend for a bunch of compilers. It invokes g++ a lot to strip comments, separate device and host code, handle name mangling, link stuff back together, etc. (use --verbose) to get the details.
My idea is as follows: You should be able to compile the host code with mingw while compiling the device code to a fatbin on a linux machine (as I guess the device binary is host-machine independent). Afterwards link both parts of the code back together with mingw or use the driver API to load the fatbin dynamically. Disclaimer: Did not test!
As far as I know, it is impossible to use CUDA without MSVC. So, you need MSVC to make nvcc work, and you can compile CPU code with mingw and link everything together.
According to http://forums.nvidia.com/index.php?showtopic=30743
"There are no current plans to support mingw."
You might want to take a look at how the cycles renderer handles this, look at https://developer.blender.org/diffusion/B/browse/master/extern/cuew/ and
https://developer.blender.org/diffusion/B/browse/master/intern/cycles/device/device_cuda.cpp
I know it's not an automagic trick but it might help you get started.

I've got a Nvidia GPU, how can i code on it?

I've never really been into GPUs, not being a gamer but im aware of their parallel ability and wondered how could i get started programming on one? I recall (somewhere) there is a CUDA C-style programming language. What IDE do I use and is it relatively simple to execute code?
There are quick-start guides for getting the dev drivers and libraries set up on different platforms (win/mac/lin) here, there is also a link to the Cuda C programming guide.
http://developer.nvidia.com/object/nsight.html
Although all the CUDA stuff we do (fluid sims / particle sims etc) are done on Linux, essentially with emacs and and gcc.
Some suggestions:
(1) Download the CUDA SDK from Nvidia (http://developer.download.nvidia.com/compute/cuda/sdk/website/samples.html). They have extensive set of application examples that have been previous developed, tested and commented. Some useful examples to startwith are matrixMul,
histogram, convolutionSeparable. For more complex well documented code see the examples "nbody".
(2) If you are very good in C++ programmming, then using C++ Thrust libraries for GPU is another best place to start. It has extensive STL like support for doing operations on GPU. And the overall programming effort is much less for standard algorithms.
(3) Eclipse with CUDA plugin is a good IDE to work initially.
On windows visual studio. On linux eclipse, code::blocks and others depending on which you feel more comfortable.
IDE though is the last thing. There are steps preceding this (installing appropriate display driver, toolkit, run sdk samples). The manuals/ links provided above are really helpful. Also there is nvidia forum for cuda development an many getting started guides