I've got a Nvidia GPU, how can i code on it? - cuda

I've never really been into GPUs, not being a gamer but im aware of their parallel ability and wondered how could i get started programming on one? I recall (somewhere) there is a CUDA C-style programming language. What IDE do I use and is it relatively simple to execute code?

There are quick-start guides for getting the dev drivers and libraries set up on different platforms (win/mac/lin) here, there is also a link to the Cuda C programming guide.

http://developer.nvidia.com/object/nsight.html
Although all the CUDA stuff we do (fluid sims / particle sims etc) are done on Linux, essentially with emacs and and gcc.

Some suggestions:
(1) Download the CUDA SDK from Nvidia (http://developer.download.nvidia.com/compute/cuda/sdk/website/samples.html). They have extensive set of application examples that have been previous developed, tested and commented. Some useful examples to startwith are matrixMul,
histogram, convolutionSeparable. For more complex well documented code see the examples "nbody".
(2) If you are very good in C++ programmming, then using C++ Thrust libraries for GPU is another best place to start. It has extensive STL like support for doing operations on GPU. And the overall programming effort is much less for standard algorithms.
(3) Eclipse with CUDA plugin is a good IDE to work initially.

On windows visual studio. On linux eclipse, code::blocks and others depending on which you feel more comfortable.
IDE though is the last thing. There are steps preceding this (installing appropriate display driver, toolkit, run sdk samples). The manuals/ links provided above are really helpful. Also there is nvidia forum for cuda development an many getting started guides

Related

How do I develop CUDA application on my ATI, to be later executed on NVIDIA

My computer has an ATI graphics card, but I need to code an algorithm I already have in CUDA, to accerelate the process. Is that even possible? If yes does anyone have any link or tutorial from setting up my IDE to coding a simple image processing or passing an image. I also considered OpenCL but I have not found any information how to do anything with it.
This answer is more directed toward the part
I also considered OpenCL but I have not found any information how to do anything with it.
Check on this NVIDIA site:
http://developer.nvidia.com/nvidia-gpu-computing-documentation
Scroll down and you find
OpenCL Programming Guide
This is a detailed programming guide for OpenCL developers.
OpenCL Best Practices Guide
This is a manual to help developers obtain the best performance from OpenCL.
OpenCL Overview for the CUDA Architecture
This whitepaper summarizes the guidelines for how to choose the best implementations for NVIDIA GPUs.
OpenCL Implementation Notes
This document describes the "Implementation Defined" behavior for the NVIDIA OpenCL implementation as required by the OpenCL specification Version: 1.0. The implementation defined behavior is referenced below in the order of it's reference in the OpenCL specification and is grouped by the section number for the specification.
On AMD/ATI you have this site for a brief introduction:
http://www.amd.com/us/products/technologies/stream-technology/opencl/pages/opencl-intro.aspx
And for more resources check:
http://www.amd.com/us/products/technologies/stream-technology/Pages/training-resources.aspx
Unless CUDA is a requirement you should consider OpenCL again as you can you use it on both platforms and you state have one and want to develop for the other.
You might also want to take a look at these:
http://blogs.nvidia.com/2011/06/cuda-now-available-for-multiple-x86-processors/
http://www.pgroup.com/resources/cuda-x86.htm
I haven't tried it myself, but the prospect of running CUDA code on x86 seems pretty attractive.

Using High Level Shader Language for computational algorithms

So, I heard that some people have figured out ways to run programs on the GPU using High Level Shader Language and I would like to start writing my own programs that run on the GPU rather than my CPU, but I have been unable to find anything on the subject.
Does anyone have any experience with writing programs for the GPU or know of any documentation on the subject?
Thanks.
For computation, CUDA and OpenCL are more suitable than shader languages. For CUDA, I highly recommend the book CUDA by Example. The book is aimed at absolute beginners to this area of programming.
The best way I think to start is to
Have a CUDA Card from Nvidia
Download Driver + Toolkit + SDK
Build the examples
Read the Cuda Programming Guide
Start to recreate the cudaDeviceInfo example
Try to allocate memory in the gpu
Try to create a little kernel
From there you should be able to gain enough momentum to learn the rest.
Once you learn CUDA then OpenCL and other are a breeze.
I am suggesting CUDA because is the one most widely supported and tested.

how to use my existing .cpp code with cuda

I hv code in c++ and wanted to use it along with cuda.Can anyone please help me? Should I provide my code?? Actually I tried doing so but I need some starting code to proceed for my code.I know how to do simple square program (using cuda and c++)for windows(visual studio) .Is it sufficient to do the things for my program?
The following are both good places to start. CUDA by Example is a good tutorial that gets you up and running pretty fast. Programming Massively Parallel Processors includes more background, e.g. chapters on the history of GPU architecture, and generally more depth.
CUDA by Example: An Introduction to General-Purpose GPU Programming
Programming Massively Parallel Processors: A Hands-on Approach
These both talk about CUDA 3.x so you'll want to look at the new features in CUDA 4.x at some point.
Thrust is definitely worth a look if your problem maps onto it well (see comment above). It's an STL-like library of containers, iterators and algorithms that implements data-parallel algorithms on top of CUDA.
Here are two tutorials on getting started with CUDA and Visual C++ 2010:
http://www.ademiller.com/blogs/tech/2011/03/using-cuda-and-thrust-with-visual-studio-2010/
http://blog.cuvilib.com/2011/02/24/how-to-run-cuda-in-visual-studio-2010/
There's also a post on the NVIDIA forum:
http://forums.nvidia.com/index.php?showtopic=184539
Asking very general how do I get started on ... on Stack Overflow generally isn't the best approach. Typically the best reply you'll get is "go read a book or the manual". It's much better to ask specific questions here. Please don't create duplicate questions, it isn't helpful.
It's a non-trivial task to convert a program from straight C(++) to CUDA. As far as I know, it is possible to use C++ like stuff within CUDA (esp. with the announced CUDA 4.0), but I think it's easier to start with only C stuff (i.e. structs, pointers, elementary data types).
Start by reading the CUDA programming guide and by examining the examples coming with the CUDA SDK or available here. I personally found the vector addition sample quite enlightening. It can be found over here.
I can not tell you how to write your globals and shareds for your specific program, but after reading the introductory material, you will have at least a vague idea of how to do.
The problem is that it is (as far as I know) not possible to tell a generic way of transforming pure C(++) into code suitable for CUDA. But here are some corner stones for you:
Central idea for CUDA: Loops can be transformed into different threads executed multiple times in parallel on the GPU.
Therefore, the single iterations optimally are independent of other iterations.
For optimal execution, the single execution branches of the threads should be (almost) the same, i.e. the single threads sould do almost the same.
You can have multiple .cpp and .cu files in your project. Unless you want your .cu files to contain only device code, it should be fairly easy.
For your .cu files you specify a header file, containing host functions in it. Then, include that header file in other .cu or .cpp files. The linker will do the rest. It is nothing different than having multiple plain C++ .cpp files in your project.
I assume you already have CUDA rule files for your Visual Studio.

how to use my existing .cpp code to write cuda code [duplicate]

I hv code in c++ and wanted to use it along with cuda.Can anyone please help me? Should I provide my code?? Actually I tried doing so but I need some starting code to proceed for my code.I know how to do simple square program (using cuda and c++)for windows(visual studio) .Is it sufficient to do the things for my program?
The following are both good places to start. CUDA by Example is a good tutorial that gets you up and running pretty fast. Programming Massively Parallel Processors includes more background, e.g. chapters on the history of GPU architecture, and generally more depth.
CUDA by Example: An Introduction to General-Purpose GPU Programming
Programming Massively Parallel Processors: A Hands-on Approach
These both talk about CUDA 3.x so you'll want to look at the new features in CUDA 4.x at some point.
Thrust is definitely worth a look if your problem maps onto it well (see comment above). It's an STL-like library of containers, iterators and algorithms that implements data-parallel algorithms on top of CUDA.
Here are two tutorials on getting started with CUDA and Visual C++ 2010:
http://www.ademiller.com/blogs/tech/2011/03/using-cuda-and-thrust-with-visual-studio-2010/
http://blog.cuvilib.com/2011/02/24/how-to-run-cuda-in-visual-studio-2010/
There's also a post on the NVIDIA forum:
http://forums.nvidia.com/index.php?showtopic=184539
Asking very general how do I get started on ... on Stack Overflow generally isn't the best approach. Typically the best reply you'll get is "go read a book or the manual". It's much better to ask specific questions here. Please don't create duplicate questions, it isn't helpful.
It's a non-trivial task to convert a program from straight C(++) to CUDA. As far as I know, it is possible to use C++ like stuff within CUDA (esp. with the announced CUDA 4.0), but I think it's easier to start with only C stuff (i.e. structs, pointers, elementary data types).
Start by reading the CUDA programming guide and by examining the examples coming with the CUDA SDK or available here. I personally found the vector addition sample quite enlightening. It can be found over here.
I can not tell you how to write your globals and shareds for your specific program, but after reading the introductory material, you will have at least a vague idea of how to do.
The problem is that it is (as far as I know) not possible to tell a generic way of transforming pure C(++) into code suitable for CUDA. But here are some corner stones for you:
Central idea for CUDA: Loops can be transformed into different threads executed multiple times in parallel on the GPU.
Therefore, the single iterations optimally are independent of other iterations.
For optimal execution, the single execution branches of the threads should be (almost) the same, i.e. the single threads sould do almost the same.
You can have multiple .cpp and .cu files in your project. Unless you want your .cu files to contain only device code, it should be fairly easy.
For your .cu files you specify a header file, containing host functions in it. Then, include that header file in other .cu or .cpp files. The linker will do the rest. It is nothing different than having multiple plain C++ .cpp files in your project.
I assume you already have CUDA rule files for your Visual Studio.

GPU Emulator for CUDA programming without the hardware [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
Question: Is there an emulator for a Geforce card that would allow me to program and test CUDA without having the actual hardware?
Info:
I'm looking to speed up a few simulations of mine in CUDA, but my problem is that I'm not always around my desktop for doing this development. I would like to do some work on my netbook instead, but my netbook doesn't have a GPU. Now as far as I know, you need a CUDA capable GPU to run CUDA. Is there a way to get around this? It would seem like the only way is a GPU emulator (which obviously would be painfully slow, but would work). But whatever way there is to do this I would like to hear.
I'm programming on Ubuntu 10.04 LTS.
For those who are seeking the answer in 2016 (and even 2017) ...
Disclaimer
I've failed to emulate GPU after all.
It might be possible to use gpuocelot if you satisfy its list of
dependencies.
I've tried to get an emulator for BunsenLabs (Linux 3.16.0-4-686-pae #1 SMP
Debian 3.16.7-ckt20-1+deb8u4 (2016-02-29) i686 GNU/Linux).
I'll tell you what I've learnt.
nvcc used to have a -deviceemu option back in CUDA Toolkit 3.0
I downloaded CUDA Toolkit 3.0, installed it and tried to run a simple
program:
#include <stdio.h>
__global__ void helloWorld() {
printf("Hello world! I am %d (Warp %d) from %d.\n",
threadIdx.x, threadIdx.x / warpSize, blockIdx.x);
}
int main() {
int blocks, threads;
scanf("%d%d", &blocks, &threads);
helloWorld<<<blocks, threads>>>();
cudaDeviceSynchronize();
return 0;
}
Note that in CUDA Toolkit 3.0 nvcc was in the /usr/local/cuda/bin/.
It turned out that I had difficulties with compiling it:
NOTE: device emulation mode is deprecated in this release
and will be removed in a future release.
/usr/include/i386-linux-gnu/bits/byteswap.h(47): error: identifier "__builtin_bswap32" is undefined
/usr/include/i386-linux-gnu/bits/byteswap.h(111): error: identifier "__builtin_bswap64" is undefined
/home/user/Downloads/helloworld.cu(12): error: identifier "cudaDeviceSynchronize" is undefined
3 errors detected in the compilation of "/tmp/tmpxft_000011c2_00000000-4_helloworld.cpp1.ii".
I've found on the Internet that if I used gcc-4.2 or similarly ancient instead of gcc-4.9.2 the errors might disappear. I gave up.
gpuocelot
The answer by Stringer has a link to a very old gpuocelot project website. So at first I thought that the project was abandoned in 2012 or so. Actually, it was abandoned few years later.
Here are some up to date websites:
GitHub;
Project's website;
Installation guide.
I tried to install gpuocelot following the guide. I had several errors during installation though and I gave up again. gpuocelot is no longer supported and depends on a set of very specific versions of libraries and software.
You might try to follow this tutorial from July, 2015 but I don't guarantee it'll work. I've not tested it.
MCUDA
The MCUDA translation framework is a linux-based tool designed to
effectively compile the CUDA programming model to a CPU architecture.
It might be useful. Here is a link to the website.
CUDA Waste
It is an emulator to use on Windows 7 and 8. I've not tried it though. It doesn't seem to be developed anymore (the last commit is dated on Jul 4, 2013).
Here's the link to the project's website: https://code.google.com/archive/p/cuda-waste/
CU2CL
Last update: 12.03.2017
As dashesy pointed out in the comments, CU2CL seems to be an interesting project. It seems to be able to translate CUDA code to OpenCL code. So if your GPU is capable of running OpenCL code then the CU2CL project might be of your interest.
Links:
CU2CL homepage
CU2CL GitHub repository
This response may be too late, but it's worth noting anyway. GPU Ocelot (of which I am one of the core contributors) can be compiled without CUDA device drivers (libcuda.so) installed if you wish to use the Emulator or LLVM backends. I've demonstrated the emulator on systems without NVIDIA GPUs.
The emulator attempts to faithfully implement the PTX 1.4 and PTX 2.1 specifications which may include features older GPUs do not support. The LLVM translator strives for correct and efficient translation from PTX to x86 that will hopefully make CUDA an effective way of programming multicore CPUs as well as GPUs. -deviceemu has been a deprecated feature of CUDA for quite some time, but the LLVM translator has always been faster.
Additionally, several correctness checkers are built into the emulator to verify: aligned memory accesses, accesses to shared memory are properly synchronized, and global memory dereferencing accesses allocated regions of memory. We have also implemented a command-line interactive debugger inspired largely by gdb to single-step through CUDA kernels, set breakpoints and watchpoints, etc... These tools were specifically developed to expedite the debugging of CUDA programs; you may find them useful.
Sorry about the Linux-only aspect. We've started a Windows branch (as well as a Mac OS X port) but the engineering burden is already large enough to stress our research pursuits. If anyone has any time and interest, they may wish to help us provide support for Windows!
Hope this helps.
[1]: GPU Ocelot - https://code.google.com/archive/p/gpuocelot/
[2]: Ocelot Interactive Debugger - http://forums.nvidia.com/index.php?showtopic=174820
You can check also gpuocelot project which is a true emulator in the sense that PTX (bytecode in which CUDA code is converted to) will be emulated.
There's also an LLVM translator, it would be interesting to test if it's more fast than when using -deviceemu.
The CUDA toolkit had one built into it until the CUDA 3.0 release cycle. I you use one of these very old versions of CUDA, make sure to use -deviceemu when compiling with nvcc.
https://github.com/hughperkins/cuda-on-cl lets you run NVIDIA® CUDA™ programs on OpenCL 1.2 GPUs (full disclosure: I'm the author)
Be careful when you're programming using -deviceemu as there are operations that nvcc will accept while in emulation mode but not when actually running on a GPU. This is mostly found with device-host interaction.
And as you mentioned, prepare for some slow execution.
GPGPU-Sim is a GPU simulator that can run CUDA programs without using GPU.
I created a docker image with GPGPU-Sim installed for myself in case that is helpful.