How to enable RT cores on Cuda kernel? - cuda

From reading the Wikipead page, I understand that GeForce RTX model has RT cores and Tensor cores.
My question is that which cores are used on my Cuda code? Do I have control over that?
I have a ray tracing kernel and want to benefit from the RT cores?
Are the RT cores used by default or there is a flag to enable it?

Currently, the RT core functionality is not exposed via any CUDA API. CUDA device code also currently cannot access any RT core hardware.
If you have an application that you would like to use with RT cores, the way to do that is with Optix. It's possible for an Optix application to have interoperability with CUDA.

Related

A cuda wrapper to execute openCL

I'm involved in a project where I have to do gpu programming, one of my constraint is to do it on a nvidia device (thus in CUDA).
But I haven't access to a device equipped with nvidia gpu.
So I would like to know if there is any wrapper that exist which could allow me to write a CUDA code but executed as an openCL code to make it work on an amd gpu ?
ps : gpuocelot could fit well IF I would not have to do it on windows system.
Is the "CUDA" constraint an actual one? Because GPU programming on NVIDIA hardware doesn't necessarily imply CUDA. You have other possible solutions such as:
OpenCL which you mentioned already, which is quite complex and cumbersome to use, but which opens you up plenty of possible back-ends.
Thrust which permits you to target NVIDIA GPUs with a CUDA back-end, or CPUs with an OpenMP and a TBB back-end.
OpenACC with the PGI compiler which permits (AFAIK) to target both NVIDIA and AMD GPUs.
If it were me and the code permitting, I would try to develop using Thrust. But that's up to you.
You could take a look at GPU Ocelot. According to its website:
Ocelot currently allows CUDA programs to be executed on NVIDIA GPUs, AMD GPUs, and x86-CPUs at full speed without recompilation.

How to enable/disable a specific graphic card?

I'm working on a "fujitsu" machine. It has 2 GPUs installed: Quadro 2000 and Tesla C2075. The Quadro GPU has 1 GB RAM and Tesla GPU has 5GB of it. (I checked using the output of nvidia-smi -q). When I run nvidia-smi, the output shows 2 GPUs, but the Tesla ones display is shown as off.
I'm running a memory intensive program and would like to use 5 GB of RAM available, but whenever I run a program, it seems to be using the Quadro GPU.
Is there some way to use a particular GPU out of the 2 in a program? Does the Tesla GPU being "disabled" means it's drivers are not installed?
You can control access to CUDA GPUs either using the environment or programmatically.
You can use the environment variable CUDA_VISIBLE_DEVICES to specify a list of 1 or more GPUs that will be visible to any application, as well as their order of visibility. For example if nvidia-smi reports your Tesla GPU as GPU 1 (and your Quadro as GPU 0), then you can set CUDA_VISIBLE_DEVICES=1 to enable only the Tesla to be used by CUDA code.
See my blog post on the subject.
To control what GPU your application uses programmatically, you should use the device management API of CUDA. Query the number of devices using cudaGetDeviceCount, then you can cudaSetDevice to each device, query its properties using cudaGetDeviceProperties, and then select the device that fits your application criteria. You can also use cudaChooseDevice to select the device that most closely matches the device properties you specify.

NVML Power readings with nvmlDeviceGetPowerUsage

I'm running an application using the NVML function nvmlDeviceGetPowerUsage().
The problem is that I always get the same number for different applications I'm running using on a TESLA M2050.
Any suggestions?
If you read the documentation, you'll discover that there are some qualifiers on whether this function is available:
For "GF11x" Tesla ™and Quadro ®products from the Fermi family.
• Requires NVML_INFOROM_POWER version 3.0 or higher.
For Tesla ™and Quadro ®products from the Kepler family.
• Does not require NVML_INFOROM_POWER object.
And:
It is only available if power management mode is supported. See nvmlDeviceGetPowerManagementMode.
I think you'll find that power management mode is not supported on the M2050, and if you run that nvmlDeviceGetPowerManagementMode API call on your M2050 device, you'll get confirmation of that.
The M2050 is niether a Kepler GPU nor is it a GF11x Fermi GPU. It is using the GF100 Fermi GPU, so it is not covered by this API capability (and the GetPowerManagementMode API call would confirm that.)

CUDA How to launch a new kernel call in one kernel function?

I am new to CUDA programming. Now, I have a problem to handle: I am trying to use CUDA parallel programming to handle a set of datasets. And for each datasets, there are some matrix calculation needed to be done.
My design is like this:
Launch N threads to handle each dataset as they are independent to each other and the method to handle them are the same.
In each thread in 1, I want to use a new function and this function also works like a kernel as they are matrix calc... e.g. call M threads to parallel handle matrix calculation..
Does anyone know whether it is possible or not?
You can launch a kernel from a thread in another kernel if you use CUDA dynamic parallelism and your GPU supports it. GPUs that support CUDA dynamic parallelism currently are of compute capability 3.5.
You can discover the compute capability of your device from the CUDA deviceQuery sample.
You can learn more about how to use CUDA dynamic parallelism from the CUDA programming guide section.

OpenCL and CUDA kernels on same GPU

I'm new to this technology. I have an application which consists of OpenCL kernel and CUDA kernel. I want to execute OpenCL kernel and CUDA kernel one after another on the same GPU(Tesla M2050). Is it possible to execute.?
If it is possible, do we need to take care of any memory management.?
Thanks in advance
Yes it is possible to run OpenCL kernels and CUDA Kernels from the same application. Each has its own schedulers. Memory management will be taken care by the GPU Driver.