A cuda wrapper to execute openCL - cuda

I'm involved in a project where I have to do gpu programming, one of my constraint is to do it on a nvidia device (thus in CUDA).
But I haven't access to a device equipped with nvidia gpu.
So I would like to know if there is any wrapper that exist which could allow me to write a CUDA code but executed as an openCL code to make it work on an amd gpu ?
ps : gpuocelot could fit well IF I would not have to do it on windows system.

Is the "CUDA" constraint an actual one? Because GPU programming on NVIDIA hardware doesn't necessarily imply CUDA. You have other possible solutions such as:
OpenCL which you mentioned already, which is quite complex and cumbersome to use, but which opens you up plenty of possible back-ends.
Thrust which permits you to target NVIDIA GPUs with a CUDA back-end, or CPUs with an OpenMP and a TBB back-end.
OpenACC with the PGI compiler which permits (AFAIK) to target both NVIDIA and AMD GPUs.
If it were me and the code permitting, I would try to develop using Thrust. But that's up to you.

You could take a look at GPU Ocelot. According to its website:
Ocelot currently allows CUDA programs to be executed on NVIDIA GPUs, AMD GPUs, and x86-CPUs at full speed without recompilation.

Related

Reading Shared/Local Memory Store/Load bank conflicts hardware counters for OpenCL executable under Nvidia

It is possible to use nvprof to access/read bank conflicts counters for CUDA exec:
nvprof --events shared_st_bank_conflict,shared_ld_bank_conflict my_cuda_exe
However it does not work for the code that uses OpenCL rather then CUDA code.
Is there any way to extract these counters outside nvprof from OpenCL environment, maybe directly from ptx?
Alternatively is there any way to convert PTX assembly generated from nvidia OpenCL compiler using clGetProgramInfo with CL_PROGRAM_BINARIES to CUDA kernel and run it using cuModuleLoadDataEx and thus be able to use nvprof?
Is there any simulation CPU backend that allows to set such parameters as bank size etc?
Additional option:
Use converter of opencl to cuda code inlcuding features missing from CUDA like vloadn/vstoren, float16, and other various accessors. #define work only for simple kernels. Is there any tool that provides it?
Is there any way to extract these counters outside nvprof from OpenCL
environment, maybe directly from ptx?
No. Nor is there in CUDA, nor in compute shaders in OpenGL, DirectX or Vulkan.
Alternatively is there any way to convert PTX assembly generated from
nvidia OpenCL compiler using clGetProgramInfo with
CL_PROGRAM_BINARIES to CUDA kernel and run it using
cuModuleLoadDataEx and thus be able to use nvprof?
No. OpenCL PTX and CUDA PTX are not the same and can't be used interchangeably
Is there any simulation CPU backend that allows to set such parameters
as bank size etc?
Not that I am aware of.

CUDA driver version is insufficient for runtime version [duplicate]

I have a very simple Toshiba Laptop with i3 processor. Also, I do not have any expensive graphics card. In the display settings, I see Intel(HD) Graphics as display adapter. I am planning to learn some cuda programming. But, I am not sure, if I can do that on my laptop as it does not have any nvidia's cuda enabled GPU.
In fact, I doubt, if I even have a GPU o_o
So, I would appreciate if someone can tell me if I can do CUDA programming with the current configuration and if possible also let me know what does Intel(HD) Graphics mean?
At the present time, Intel graphics chips do not support CUDA. It is possible that, in the nearest future, these chips will support OpenCL (which is a standard that is very similar to CUDA), but this is not guaranteed and their current drivers do not support OpenCL either. (There is an Intel OpenCL SDK available, but, at the present time, it does not give you access to the GPU.)
Newest Intel processors (Sandy Bridge) have a GPU integrated into the CPU core. Your processor may be a previous-generation version, in which case "Intel(HD) graphics" is an independent chip.
Portland group have a commercial product called CUDA x86, it is hybrid compiler which creates CUDA C/ C++ code which can either run on GPU or use SIMD on CPU, this is done fully automated without any intervention for the developer. Hope this helps.
Link: http://www.pgroup.com/products/pgiworkstation.htm
If you're interested in learning a language which supports massive parallelism better go for OpenCL since you don't have an NVIDIA GPU. You can run OpenCL on Intel CPUs, but at best you can learn to program SIMDs.
Optimization on CPU and GPU are different. I really don't think you can use Intel card for GPGPU.
Intel HD Graphics is usually the on-CPU graphics chip in newer Core i3/i5/i7 processors.
As far as I know it doesn't support CUDA (which is a proprietary NVidia technology), but OpenCL is supported by NVidia, ATi and Intel.
in 2020 ZLUDA was created which provides CUDA API for Intel GPUs. It is not production ready yet though.

Do all GPUs use the same architecture?

I have some experience with nVIDIA CUDA and am now thinking about learning openCL too. I would like to be able to run my programs on any GPU. My question is: does every GPU use the same architecture as nVIDIA (multi-processors, SIMT stracture, global memory, local memory, registers, cashes, ...)?
Thank you very much!
Starting with your stated goal:
"I would like to be able to run my programs on any GPU."
Then yes, you should learn OpenCL.
In answer to your overall question, other GPU vendors do use different architectures than Nvidia GPUs. In fact, GPU designs from a single vendor can vary by quite a bit, depending on the model.
This is one reason that a given OpenCL code may perform quite differently (depending on your performance metric) from one GPU to the next. In fact, to achieve optimized performance on any GPU, an algorithm should be "profiled" by varying, for example, local memory size, to find the best algorithm settings for a given hardware design.
But even with these hardware differences, the goal of OpenCL is to provide a level of core functionality that is supported by all devices (CPUs, GPUs, FPGAs, etc) and include "extensions" which allow vendors to expose unique hardware features. Although OpenCL cannot hide significant differences in hardware, it does guarantee portability. This makes it much easier for a developer to start with an OpenCL program tuned for one device and then develop a program optimized for another architecture.
To complicate matters with identifying hardware differences, the terminology used by CUDA is different than that used by OpenCL, for example, the following are roughly equivalent in meaning:
CUDA: OpenCL:
Thread Work-item
Thread block Work-group
Global memory Global memory
Constant memory Constant memory
Shared memory Local memory
Local memory Private memory
More comparisons and discussion can be found here.
You will find that the kinds of abstraction provided by OpenCL and CUDA are very similar. You can also usually count on your hardware having similar features: global mem, local mem, streaming multiprocessors, etc...
Switching from CUDA to OpenCL, you may be confused by the fact that many of the same concepts have different names (for example: CUDA "warp" == OpenCL "wavefront").

What hardware setup is required to use MPI with CUDA?

I am new to MPI. I want use CUDA with MPI. I am having three PCs, each having one GPU, which I want to use for doing some simple processing (matrix multiplication).
But I am not sure what hardware setup is required to use MPI with CUDA?
Please enlighten me.
Update
I am asking this as many a place mentions clusters with infiniband. I do not have such a set up. I only have ordinary Lan that we have in offices.
And above all the basic idea is to have a feel of how MPI and CUDA work together and do small small tests runs--irrespective of the performance.
One or more machines with nVidia GPUs that are capable of CUDA.
MPI and CUDA don't have anything to do with each other. You simply use CUDA within each MPI process.
But, but way of a followup to the OP's original question, if I may?
I realize that #gpuguy's Q was about hardware, but isn't it true that he must be running one of the OS options the Nvidia CUDA compilers supports? (IE, Linux, Win, OSX)
There is no OpenSource equivalent of CUDA, is there?

CUDA or same something that can be available to intel graphic card?

I want to learn GPGPU and CUDA programming. But I know that only Nvidia card support it. My laptop has an Intel HD Graphic Card. So I need to search if it is possible to do GPGPU or something like that with Intel graphic card. Thanks for any information.
To develop in CUDA your options are:
Use an NVIDIA GPU - all NVIDIA server, desktop and laptop GPUs support CUDA since around 2006, since your laptop does not have one you could try using one remotely.
Use PGI CUDA x86, not free but does what you want.
Use gpuocelot to execute the PTX on the CPU, that's an open-source project in development so YMMV.
You cannot do GPGPU on Intel HD Graphics cards today, unless you do shader-based programming (which was common practice in the days before CUDA and OpenCL).
In my experience, the PGI X86 stuff seems to have fallen flat and I'm not aware of anyone using that. Ocelot is another attempt at the same, but it is very reasearchy and not fully robust at this point.
The only OpenCL compliant devices from Intel are the latest CPUs (Sandy Bridge and Ivy Bridge).
What CPU do you have in your system?
CUDA is Nvidia specific as starter. The GPU emulators are always there in CUDA, so you can use them without a graphics card easily, though it will be slow. A faster solution is the
the x86 implementation. Any of these will allow you to learn the basics of CUDA without using the GPU at all.
If you are want to learn GPGPU in general you still have the option to learn OpenCL, which more widely supported, including AMD, Intel, Nvidia etc... E.g. Intel has an OpenCL SDK (the target is the CPU then, but I guess is irrelevant for you).
After learning the basics of either CUDA or OpenCL, the other will be easy to learn. Neither the syntax nor the semantics are the same, but it is easy step forward as the concepts are the same.