Nvidia Jetson TK1 Development Board - Cuda Compute Capability - cuda

I have quite impressed with this deployment kit. Instead of buying a new CUDA card, which might require new main board and etc, this card seems provide all in one.
At it's specs it says it has CUDA compute capability 3.2. AFAIK dynamic parallelism and more comes with cm_35, cuda compute capability 3.5. Does this card support Dynamic Parallelism and HyperQ features of Kepler architecture?

Does this card support Dynamic Parallelism and HyperQ features of Kepler architecture?
No.
Jetson has compute capability 3.2. Dynamic parallelism requires compute capability 3.5 or higher. From the documentation:
Dynamic Parallelism is only supported by devices of compute capability 3.5 and higher.
Hyper-Q also requires cc 3.5 or greater. We can deduce this from careful study of the simpleHyperQ sample code, excerpted:
// HyperQ is available in devices of Compute Capability 3.5 and higher
if (deviceProp.major < 3 || (deviceProp.major == 3 && deviceProp.minor < 5))

Related

How to use cuda stream priority on GTX970?

Recently nvidia released cuda 7.5, and announced that the interface should be able to be used on other Video cards, not just quadro and tesla. But as I tested on my GTX 970, the cudaDeviceGetStreamPriorityRange retuns -1 and 0. The compute capability of GTX 970 is 3.5, and it should support configurable stream priority as of cuda 6.0.
Is it possible to use cuda priority on GTX970?
Yes, it is possible, and the return values from the cuda runtime API function cudaDeviceGetStreamPriorityRange of 0 ("LOW PRIORITY") and -1 ("HIGH PRIORITY") are correct (<-- refer to slide 70, it has not changed for Maxwell GeForce). There are only 2 priority levels offered in this case. (That could change in the future, for future GPUs or CUDA versions. That is why the runtime API function is provided.)
You may also be interested in reading the relevant documentation or running or studying the relevant cuda StreamPriorities sample code.

CUDA occupancy table for Maxwell architecture and compute capability 5

I am trying to use cuda occupancy calculator for a Maxwell GPU that caters compute capability of 5. Unfortunately, the occupancy table in the select compute capability option allows until 3.5.
Do you know where can I find an update version or what i can do to calculate for compute capability 5?
Thanks
The CUDA Occupancy Calculator shipped with CUDA 7 is ready for CC 5.x.

CUDA compute capabilities from card name

Is there a way to discover CUDA sm_xx version by card name?
My specific problem is: I have a CUDA application which requires SM_12 or higher and I have a customer who has a Quadro Q5000. How can I discover whether that card has SM_12 or higher?
The following resources are reasonably accurate:
NVIDIA
wikipedia
Q5000 is a sm_20 device (compute capability 2.0)
Google is your friend.

NVML Power readings with nvmlDeviceGetPowerUsage

I'm running an application using the NVML function nvmlDeviceGetPowerUsage().
The problem is that I always get the same number for different applications I'm running using on a TESLA M2050.
Any suggestions?
If you read the documentation, you'll discover that there are some qualifiers on whether this function is available:
For "GF11x" Tesla ™and Quadro ®products from the Fermi family.
• Requires NVML_INFOROM_POWER version 3.0 or higher.
For Tesla ™and Quadro ®products from the Kepler family.
• Does not require NVML_INFOROM_POWER object.
And:
It is only available if power management mode is supported. See nvmlDeviceGetPowerManagementMode.
I think you'll find that power management mode is not supported on the M2050, and if you run that nvmlDeviceGetPowerManagementMode API call on your M2050 device, you'll get confirmation of that.
The M2050 is niether a Kepler GPU nor is it a GF11x Fermi GPU. It is using the GF100 Fermi GPU, so it is not covered by this API capability (and the GetPowerManagementMode API call would confirm that.)

What are the differences between CUDA compute capabilities?

What does compute capability 2.0 add over 1.3, 2.1 over 2.0, and 3.0 over 2.1?
The Compute Capabilities designate different architectures. In general, newer architectures run both CUDA programs and graphics faster than previous architectures. Note, though, that a high end card in a previous generation may be faster than a lower end card in the generation after.
From the CUDA C Programming Guide (v6.0):