I am trying to use cuda occupancy calculator for a Maxwell GPU that caters compute capability of 5. Unfortunately, the occupancy table in the select compute capability option allows until 3.5.
Do you know where can I find an update version or what i can do to calculate for compute capability 5?
Thanks
The CUDA Occupancy Calculator shipped with CUDA 7 is ready for CC 5.x.
Related
Recently, I am reading the book 'Programming Massively Parallel processors'. One of the reading exercise in chapter 3 ask me to detect which assignment for SM is possible. The problem looks like below
Indicate which of the following assignments per multiprocessor is possible:
8 blocks with 128 threads each on a device with compute capability 1.0.
8 blocks with 128 threads each on a device with compute capability 1.2.
8 blocks with 128 threads each on a device with compute capability 3.0.
16 blocks with 64 threads each on a device with compute capability 1.0.
16 blocks with 64 threads each on a device with compute capability 1.2.
16 blocks with 64 threads each on a device with compute capability 3.0.
From the most recent CUDA programming guild, I only find the specification for compute capability 3.0 whose allows up to 16 blocks and 2048 threads per SM and up 1024 threads per block. Unfortunately, I did not find any information related to compute capability 1.0.
Can anyone tell me where to find the block specification for compute capability 1.0? Thank you very much
See the page of CUDA in Wikipedia, there is a valid specification to all devices. it will be in Compute capability (version) section.
But here is a photo about it:
My computer has a GeForce GTX 960M which is claimed by NVIDIA to have 640 CUDA cores. However, when I run clGetDeviceInfo to find out the number of computing units in my computer, it prints out 5 (see the figure below). It sounds like CUDA cores are somewhat different from what OpenCL considers as computing units? Or maybe a group of CUDA cores form an OpenCL computing unit? Can you explain this to me?
What is the relationship between NVIDIA GPUs' CUDA cores and OpenCL computing units?
Your GTX 960M is a Maxwell device with 5 Streaming Multiprocessors, each with 128 CUDA cores, for a total of 640 CUDA cores.
The NVIDIA Streaming Multiprocessor is equivalent to an OpenCL Compute Unit. The previously linked answer will also give you some useful information that may help with your kernel sizing question in the comments.
The CUDA architecture is a close match to the OpenCL architecture.
A CUDA device is built around a scalable array of multithreaded Streaming Multiprocessors (SMs). A multiprocessor corresponds to an OpenCL compute unit.
A multiprocessor executes a CUDA thread for each OpenCL work-item and a thread block for each OpenCL work-group. A kernel is executed over an OpenCLNDRange by a grid of thread blocks. As illustrated in Figure 2-1, each of the thread blocks that execute a kernel is therefore uniquely identified by its work-group ID, and each thread by its global ID or by a combination of its local ID and work-group ID.
Copied from OpenCL Programming Guide for the CUDA Architecture http://www.nvidia.com/content/cudazone/download/OpenCL/NVIDIA_OpenCL_ProgrammingGuide.pdf
I have quite impressed with this deployment kit. Instead of buying a new CUDA card, which might require new main board and etc, this card seems provide all in one.
At it's specs it says it has CUDA compute capability 3.2. AFAIK dynamic parallelism and more comes with cm_35, cuda compute capability 3.5. Does this card support Dynamic Parallelism and HyperQ features of Kepler architecture?
Does this card support Dynamic Parallelism and HyperQ features of Kepler architecture?
No.
Jetson has compute capability 3.2. Dynamic parallelism requires compute capability 3.5 or higher. From the documentation:
Dynamic Parallelism is only supported by devices of compute capability 3.5 and higher.
Hyper-Q also requires cc 3.5 or greater. We can deduce this from careful study of the simpleHyperQ sample code, excerpted:
// HyperQ is available in devices of Compute Capability 3.5 and higher
if (deviceProp.major < 3 || (deviceProp.major == 3 && deviceProp.minor < 5))
Is there a way to discover CUDA sm_xx version by card name?
My specific problem is: I have a CUDA application which requires SM_12 or higher and I have a customer who has a Quadro Q5000. How can I discover whether that card has SM_12 or higher?
The following resources are reasonably accurate:
NVIDIA
wikipedia
Q5000 is a sm_20 device (compute capability 2.0)
Google is your friend.
What is the maximum number of concurrent kernels possible for NVIDIA devices of compute capability 3.0 ? I hope its not the same as the one for Compute Capability 2.0..
From the CUDA C programming guide version 4.2:
3.2.5.3 Concurrent Kernel Execution
The maximum number of kernel launches that a device can execute concurrently is sixteen.