Does Tesla K80 GPU support int8 - deep-learning

Hi I am trying to deploy DL models on tesla k80 GPU that are int8 quantized my question is does it support int8 and is a good go for it?

Tesla K80 does not have any hardware support for int8 calculations.
int8 hardware support in CUDA GPUs was first introduced in the pascal generation
Later, it was introduced in the Tensor Core unit in the Turing generation
Recent CUDA architecture generation naming in chronological order is like this:
Fermi, Kepler, Maxwell, Pascal, Volta, Turing, Ampere, Hopper/Ada Lovelace
K80 is a member of the Kepler generation.

Related

Does the nVidia Titan V support GPUDirect?

I was wondering if someone might be able to help me figure out if the new Titan V from nVidia support GPUDirect. As far as I can tell it seems limited to Tesla and Quadro cards.
Thank you for taking the time to read this.
GPUDirect Peer-to-Peer (P2P) is supported between any 2 "like" CUDA GPUs (of compute capability 2.0 or higher), if the system topology supports it, and subject to other requirements and restrictions. In a nutshell, the system topology requirement is that both GPUs participating must be enumerated under the same PCIE root complex. If in doubt, "like" means identical. Other combinations may be supported (e.g. 2 GPUs of the same compute capability) but this is not specified, or advertised as supported. If in doubt, try it out. Finally, these things must be "discoverable" by the GPU driver. If the GPU driver cannot ascertain these facts, and/or the system is not part of a whitelist maintained in the driver, then P2P support will not be possible.
Note that in general, P2P support may vary by GPU or GPU family. The ability to run P2P on one GPU type or GPU family does not necessarily indicate it will work on another GPU type or family, even in the same system/setup. The final determinant of GPU P2P support are the tools provided that query the runtime via cudaDeviceCanAccessPeer. So the statement here "is supported" should not be construed to refer to a particular GPU type. P2P support can vary by system and other factors as well. No statements made here are a guarantee of P2P support for any particular GPU in any particular setup.
GPUDirect RDMA is only supported on Tesla and possibly some Quadro GPUs.
So, if you had a system that had 2 Titan V GPUs plugged into PCIE slots that were connected to the same root complex (usually, except in Skylake CPUs, it should be sufficient to say "connected to the same CPU socket"), and the system (i.e. core logic) was recognized by the GPU driver, I would expect P2P to work between those 2 GPUs.
I would not expect GPUDirect RDMA to work to a Titan V, under any circumstances.
YMMV. If in doubt, try it out, before making any large purchasing decisions.

What is the difference and relation among 'cuda' 'cudnn' 'cunn' and 'cutorch' in torch?

I see many torch codes use:
require cudnn
require cunn
require cutorch
What are these package used for? What is their relation with Cuda?
All 3 are used for CUDA GPU implementations for torch7.
cutorch is the cuda backend for torch7, offering various support for CUDA implementations in torch, such as a CudaTensor for tensors in GPU memory. Also adds some helpful features when interacting with the GPU.
cunn provides additional modules over the nn library, mainly converting those nn modules to GPU CUDA versions transparently. This makes it easy to switch neural networks to the GPU and vice versa via cuda!
cuDNN is a wrapper of NVIDIA's cuDNN library, which is an optimized library for CUDA containing various fast GPU implementations, such as for convolutional networks and RNN modules.
Not sure what 'cutorch' is but from my understanding:
Cuda: Library to use GPUs.
cudnn: Library to do Neural Net stuff on GPUs (probably uses Cuda to talk to the GPUs)
source: https://www.quora.com/What-is-CUDA-and-cuDNN
Cuda is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
And cuDNN is a Cuda Deep neural network library which is accelerated on GPU's. It's built on underlying Cuda framework.

What is the relationship between NVIDIA GPUs' CUDA cores and OpenCL computing units?

My computer has a GeForce GTX 960M which is claimed by NVIDIA to have 640 CUDA cores. However, when I run clGetDeviceInfo to find out the number of computing units in my computer, it prints out 5 (see the figure below). It sounds like CUDA cores are somewhat different from what OpenCL considers as computing units? Or maybe a group of CUDA cores form an OpenCL computing unit? Can you explain this to me?
What is the relationship between NVIDIA GPUs' CUDA cores and OpenCL computing units?
Your GTX 960M is a Maxwell device with 5 Streaming Multiprocessors, each with 128 CUDA cores, for a total of 640 CUDA cores.
The NVIDIA Streaming Multiprocessor is equivalent to an OpenCL Compute Unit. The previously linked answer will also give you some useful information that may help with your kernel sizing question in the comments.
The CUDA architecture is a close match to the OpenCL architecture.
A CUDA device is built around a scalable array of multithreaded Streaming Multiprocessors (SMs). A multiprocessor corresponds to an OpenCL compute unit.
A multiprocessor executes a CUDA thread for each OpenCL work-item and a thread block for each OpenCL work-group. A kernel is executed over an OpenCLNDRange by a grid of thread blocks. As illustrated in Figure 2-1, each of the thread blocks that execute a kernel is therefore uniquely identified by its work-group ID, and each thread by its global ID or by a combination of its local ID and work-group ID.
Copied from OpenCL Programming Guide for the CUDA Architecture http://www.nvidia.com/content/cudazone/download/OpenCL/NVIDIA_OpenCL_ProgrammingGuide.pdf

Nvidia Jetson TK1 Development Board - Cuda Compute Capability

I have quite impressed with this deployment kit. Instead of buying a new CUDA card, which might require new main board and etc, this card seems provide all in one.
At it's specs it says it has CUDA compute capability 3.2. AFAIK dynamic parallelism and more comes with cm_35, cuda compute capability 3.5. Does this card support Dynamic Parallelism and HyperQ features of Kepler architecture?
Does this card support Dynamic Parallelism and HyperQ features of Kepler architecture?
No.
Jetson has compute capability 3.2. Dynamic parallelism requires compute capability 3.5 or higher. From the documentation:
Dynamic Parallelism is only supported by devices of compute capability 3.5 and higher.
Hyper-Q also requires cc 3.5 or greater. We can deduce this from careful study of the simpleHyperQ sample code, excerpted:
// HyperQ is available in devices of Compute Capability 3.5 and higher
if (deviceProp.major < 3 || (deviceProp.major == 3 && deviceProp.minor < 5))

NVML Power readings with nvmlDeviceGetPowerUsage

I'm running an application using the NVML function nvmlDeviceGetPowerUsage().
The problem is that I always get the same number for different applications I'm running using on a TESLA M2050.
Any suggestions?
If you read the documentation, you'll discover that there are some qualifiers on whether this function is available:
For "GF11x" Tesla ™and Quadro ®products from the Fermi family.
• Requires NVML_INFOROM_POWER version 3.0 or higher.
For Tesla ™and Quadro ®products from the Kepler family.
• Does not require NVML_INFOROM_POWER object.
And:
It is only available if power management mode is supported. See nvmlDeviceGetPowerManagementMode.
I think you'll find that power management mode is not supported on the M2050, and if you run that nvmlDeviceGetPowerManagementMode API call on your M2050 device, you'll get confirmation of that.
The M2050 is niether a Kepler GPU nor is it a GF11x Fermi GPU. It is using the GF100 Fermi GPU, so it is not covered by this API capability (and the GetPowerManagementMode API call would confirm that.)