CUDA compute capabilities from card name - cuda

Is there a way to discover CUDA sm_xx version by card name?
My specific problem is: I have a CUDA application which requires SM_12 or higher and I have a customer who has a Quadro Q5000. How can I discover whether that card has SM_12 or higher?

The following resources are reasonably accurate:
NVIDIA
wikipedia
Q5000 is a sm_20 device (compute capability 2.0)
Google is your friend.

Related

CUDA driver version is insufficient for runtime version [duplicate]

I have a very simple Toshiba Laptop with i3 processor. Also, I do not have any expensive graphics card. In the display settings, I see Intel(HD) Graphics as display adapter. I am planning to learn some cuda programming. But, I am not sure, if I can do that on my laptop as it does not have any nvidia's cuda enabled GPU.
In fact, I doubt, if I even have a GPU o_o
So, I would appreciate if someone can tell me if I can do CUDA programming with the current configuration and if possible also let me know what does Intel(HD) Graphics mean?
At the present time, Intel graphics chips do not support CUDA. It is possible that, in the nearest future, these chips will support OpenCL (which is a standard that is very similar to CUDA), but this is not guaranteed and their current drivers do not support OpenCL either. (There is an Intel OpenCL SDK available, but, at the present time, it does not give you access to the GPU.)
Newest Intel processors (Sandy Bridge) have a GPU integrated into the CPU core. Your processor may be a previous-generation version, in which case "Intel(HD) graphics" is an independent chip.
Portland group have a commercial product called CUDA x86, it is hybrid compiler which creates CUDA C/ C++ code which can either run on GPU or use SIMD on CPU, this is done fully automated without any intervention for the developer. Hope this helps.
Link: http://www.pgroup.com/products/pgiworkstation.htm
If you're interested in learning a language which supports massive parallelism better go for OpenCL since you don't have an NVIDIA GPU. You can run OpenCL on Intel CPUs, but at best you can learn to program SIMDs.
Optimization on CPU and GPU are different. I really don't think you can use Intel card for GPGPU.
Intel HD Graphics is usually the on-CPU graphics chip in newer Core i3/i5/i7 processors.
As far as I know it doesn't support CUDA (which is a proprietary NVidia technology), but OpenCL is supported by NVidia, ATi and Intel.
in 2020 ZLUDA was created which provides CUDA API for Intel GPUs. It is not production ready yet though.

A cuda wrapper to execute openCL

I'm involved in a project where I have to do gpu programming, one of my constraint is to do it on a nvidia device (thus in CUDA).
But I haven't access to a device equipped with nvidia gpu.
So I would like to know if there is any wrapper that exist which could allow me to write a CUDA code but executed as an openCL code to make it work on an amd gpu ?
ps : gpuocelot could fit well IF I would not have to do it on windows system.
Is the "CUDA" constraint an actual one? Because GPU programming on NVIDIA hardware doesn't necessarily imply CUDA. You have other possible solutions such as:
OpenCL which you mentioned already, which is quite complex and cumbersome to use, but which opens you up plenty of possible back-ends.
Thrust which permits you to target NVIDIA GPUs with a CUDA back-end, or CPUs with an OpenMP and a TBB back-end.
OpenACC with the PGI compiler which permits (AFAIK) to target both NVIDIA and AMD GPUs.
If it were me and the code permitting, I would try to develop using Thrust. But that's up to you.
You could take a look at GPU Ocelot. According to its website:
Ocelot currently allows CUDA programs to be executed on NVIDIA GPUs, AMD GPUs, and x86-CPUs at full speed without recompilation.

Nvidia Jetson TK1 Development Board - Cuda Compute Capability

I have quite impressed with this deployment kit. Instead of buying a new CUDA card, which might require new main board and etc, this card seems provide all in one.
At it's specs it says it has CUDA compute capability 3.2. AFAIK dynamic parallelism and more comes with cm_35, cuda compute capability 3.5. Does this card support Dynamic Parallelism and HyperQ features of Kepler architecture?
Does this card support Dynamic Parallelism and HyperQ features of Kepler architecture?
No.
Jetson has compute capability 3.2. Dynamic parallelism requires compute capability 3.5 or higher. From the documentation:
Dynamic Parallelism is only supported by devices of compute capability 3.5 and higher.
Hyper-Q also requires cc 3.5 or greater. We can deduce this from careful study of the simpleHyperQ sample code, excerpted:
// HyperQ is available in devices of Compute Capability 3.5 and higher
if (deviceProp.major < 3 || (deviceProp.major == 3 && deviceProp.minor < 5))

NVML Power readings with nvmlDeviceGetPowerUsage

I'm running an application using the NVML function nvmlDeviceGetPowerUsage().
The problem is that I always get the same number for different applications I'm running using on a TESLA M2050.
Any suggestions?
If you read the documentation, you'll discover that there are some qualifiers on whether this function is available:
For "GF11x" Tesla ™and Quadro ®products from the Fermi family.
• Requires NVML_INFOROM_POWER version 3.0 or higher.
For Tesla ™and Quadro ®products from the Kepler family.
• Does not require NVML_INFOROM_POWER object.
And:
It is only available if power management mode is supported. See nvmlDeviceGetPowerManagementMode.
I think you'll find that power management mode is not supported on the M2050, and if you run that nvmlDeviceGetPowerManagementMode API call on your M2050 device, you'll get confirmation of that.
The M2050 is niether a Kepler GPU nor is it a GF11x Fermi GPU. It is using the GF100 Fermi GPU, so it is not covered by this API capability (and the GetPowerManagementMode API call would confirm that.)

What does 'compute capability' mean w.r.t. CUDA?

I am new to CUDA programming and don't know much about it. Can you please tell me what does 'CUDA compute capability' mean? When I use the following code on my university server, it showed me the following result.
for (device = 0; device < deviceCount; ++device)
{
cudaDeviceProp deviceProp;
cudaGetDeviceProperties(&deviceProp, device);
printf("\nDevice %d has compute capability %d.%d.\n", device, deviceProp.major, deviceProp.minor);
}
RESULT:
Device 0 has compute capability 4199672.0.
Device 1 has compute capability 4199672.0.
Device 2 has compute capability 4199672.0.
.
.
cudaGetDeviceProperties returns two fields major and minor. Can you please tell me what is this 4199672.0. means?
The compute capability is the "feature set" (both hardware and software features) of the device. You may have heard the NVIDIA GPU architecture names "Tesla", "Fermi" or "Kepler". Each of those architectures have features that previous versions might not have.
In your CUDA toolkit installation folder on your hard drive, look for the file CUDA_C_Programming_Guide.pdf (or google it), and find Appendix F.1. It describes the differences in features between the different compute capabilities.
As #dialer mentioned, the compute capability is your CUDA device's set of computation-related features. As NVidia's CUDA API develops, the 'Compute Capability' number increases. At the time of writing, NVidia's newest GPUs are Compute Capability 3.5. You can get some details of what the differences mean by examining this table on Wikipedia.
As #aland suggests, your call probably failed, and what you're getting is the result of using an uninitialized variable. You should wrap your cudaGetDeviceProps() call with some kind of error checking; see
What is the canonical way to check for errors using the CUDA runtime API?
for a discussion of the options for doing this.