How to use cuda stream priority on GTX970? - cuda

Recently nvidia released cuda 7.5, and announced that the interface should be able to be used on other Video cards, not just quadro and tesla. But as I tested on my GTX 970, the cudaDeviceGetStreamPriorityRange retuns -1 and 0. The compute capability of GTX 970 is 3.5, and it should support configurable stream priority as of cuda 6.0.

Is it possible to use cuda priority on GTX970?
Yes, it is possible, and the return values from the cuda runtime API function cudaDeviceGetStreamPriorityRange of 0 ("LOW PRIORITY") and -1 ("HIGH PRIORITY") are correct (<-- refer to slide 70, it has not changed for Maxwell GeForce). There are only 2 priority levels offered in this case. (That could change in the future, for future GPUs or CUDA versions. That is why the runtime API function is provided.)
You may also be interested in reading the relevant documentation or running or studying the relevant cuda StreamPriorities sample code.

Related

NVML Power readings with nvmlDeviceGetPowerUsage

I'm running an application using the NVML function nvmlDeviceGetPowerUsage().
The problem is that I always get the same number for different applications I'm running using on a TESLA M2050.
Any suggestions?
If you read the documentation, you'll discover that there are some qualifiers on whether this function is available:
For "GF11x" Tesla ™and Quadro ®products from the Fermi family.
• Requires NVML_INFOROM_POWER version 3.0 or higher.
For Tesla ™and Quadro ®products from the Kepler family.
• Does not require NVML_INFOROM_POWER object.
And:
It is only available if power management mode is supported. See nvmlDeviceGetPowerManagementMode.
I think you'll find that power management mode is not supported on the M2050, and if you run that nvmlDeviceGetPowerManagementMode API call on your M2050 device, you'll get confirmation of that.
The M2050 is niether a Kepler GPU nor is it a GF11x Fermi GPU. It is using the GF100 Fermi GPU, so it is not covered by this API capability (and the GetPowerManagementMode API call would confirm that.)

CUDA 5 compatible with CUDA 4

CUDA 5 was released recently and I have been using CUDA 4 until now. So I was wondering whether the code I wrote in CUDA 4 will still run, if I install CUDA 5?
Is it completely compatible or partially? Will open source projects like gpuocelot, that require CUDA 4, work with CUDA 5 too?
Thanks
There is not 100% compatibility between CUDA 4 and CUDA 5.
To choose just one example, in CUDA 5, it is no longer permissible to use a character string to indicate a device symbol, which was possible
with certain API functions in CUDA 4. Instead, the symbol should be
used directly.
It's also been pointed out that the structure of the sample codes has changed significantly, which may impact your code if you are using elements of the sample codes. However this isn't a true compatibility issue, in my opinion.
It's likely that the changes required to move a cuda code from CUDA 4 to CUDA 5 will be minor, if any.
Emulators often depend on unpublished characteristics of the CUDA runtime, and frequently only work with specific CUDA versions. Check the emulator of your choice for any statements made about required runtime.

OpenCL dynamic parallelism / GPU-spawned threads?

CUDA 5 has just been released and with it the ability to spawn GPU threads from within another GPU (main?) thread, minimising callouts between CPU and GPU that we've seen thus far.
What plans are there to support GPU-spawned threads in the OpenCL arena? As I cannot afford to opt for a closed standard (my user base is "everygamer"), I need to know when OpenCL is ready for prime time in this regard.
OpenCL Standard is usually the way back of CUDA (except for device partitioning feature) and I guess this feature will be added to OpenCL in a year.
EDIT on Aug 8, 2013: This feature has been introduced in OpenCL 2.0.

When will OpenCL 1.2 for NVIDIA hardware be available?

I would have asked this question on the NVIDIA developer forum but since it's still down maybe someone here can tell me something.
Does anybody know if there is already OpenCL 1.2 support in NVIDIAs driver? If not, is it coming soon?
I don't have a GeForce 600 series card to check myself. According to Wikipedia there are already some cards that could support it though.
It somewhat seems like NVIDIA does not mention OpenCL a whole lot anymore and just focuses on CUDA C/C++ (see StreamComputing.eu). I guess it makes sense to them but I would like to see some more OpenCL love.
Thanks
NVidia's latest SDK (v4.2.9) does not support OpenCL 1.2 with regard to the header files or library it provides. I considered this might just be the SDK itself: as you point out, the GeForce 600 series appears to support it in hardware. Unfortunately I don't own any 600 series card, but OpenCL64.dll supplied with the latest drivers (v306.23) does not export OpenCL 1.2 symbols. Further, I can find no trace of the new symbols (such as "clLinkProgram") as strings in the driver package. Although this does not rule out the possibility of bootstrapping 1.2 functionality in the driver via an ICD Loader, there is no evidence that there is an 1.2 implementation there, and this would be undocumented and unsupported.
As to when OpenCL 1.2 will be officially supported by NVidia, unfortunately I don't know the answer to this, and would be equally keen to find out.
In the mean-time you might consider an alternative OpenCL 1.2 implementation for development; for example the Intel SDK 2013 Beta (Intel CPU) or AMD APP SDK v2.7 (AMD CPU or AMD/ATI GPU).
An aside, but personally I am considering switching from NVidia GPUs to ATI for production purposes, partly based on AMD's investment in OpenCL and also arguments comparing "bang for buck" between NVidia and the latest AMD cards: NVIDIA vs AMD: GPGPU performance
The NVIDIA hotfix driver version 350.05 (April 2015) adds support for OpenCL 1.2.
With the 350.12 (also April 2015) release, NVidia has clarified the situation:
With this driver release NVIDIA has also posted a bit more information on their OpenCL 1.2 driver. The driver has not yet passed OpenCL conformance testing over at Khronos, but it is expected to do so. OpenCL 1.2 functionality will only be available on Kepler and Maxwell GPUs, with Fermi getting left behind.
It looks like the 700 series supports OpenCL 1.2
I'm still looking for which driver I'll need to get that working.

Dearth of CUDA 5 Dynamic Parallelism Examples

I've been googling around and have only been able to find a trivial example of the new dynamic parallelism in Compute Capability 3.0 in one of their Tech Briefs linked from here. I'm aware that the HPC-specific cards probably won't be available until this time next year (after the nat'l labs get theirs). And yes, I realize that the simple example they gave is enough to get you going, but the more the merrier.
Are there other examples I've missed?
To save you the trouble, here is the entire example given in the tech brief:
__global__ ChildKernel(void* data){
//Operate on data
}
__global__ ParentKernel(void *data){
ChildKernel<<<16, 1>>>(data);
}
// In Host Code
ParentKernel<<<256, 64>>(data);
// Recursion is also supported
__global__ RecursiveKernel(void* data){
if(continueRecursion == true)
RecursiveKernel<<<64, 16>>>(data);
}
EDIT:
The GTC talk New Features In the CUDA Programming Model focused mostly on the new Dynamic Parallelism in CUDA 5. The link has the video and slides. Still only toy examples, but a lot more detail than the tech brief above.
Here is what you need, the Dynamic parallelism programming guide. Full of details and examples: http://docs.nvidia.com/cuda/pdf/CUDA_Dynamic_Parallelism_Programming_Guide.pdf
Just to confirm that dynamic parallelism is only supported on GPU's with a compute capability of 3.5 upwards.
I have a 3.0 GPU with cuda 5.0 installed I have compiled the Dynamic Parallelism examples
nvcc -arch=sm_30 test.cu
and received the below compile error
test.cu(10): error: calling a global function("child_launch") from a global function("parent_launch") is only allowed on the compute_35 architecture or above.
GPU info
Device 0: "GeForce GT 640"
CUDA Driver Version / Runtime Version 5.0 / 5.0
CUDA Capability Major/Minor version number: 3.0
hope this helps
I edited the question title to "...CUDA 5...", since Dynamic Parallelism is new in CUDA 5, not CUDA 4. We don't have any public examples available yet, because we don't have public hardware available that can run them. CUDA 5.0 will support dynamic parallelism but only on Compute Capability 3.5 and later (GK110, for example). These will be available later in the year.
We will release some examples with a CUDA 5 release candidate closer to the time the hardware is available.
I think compute capability 3.0 doesn´t include dynamic paralelism. It will be included in the GK110 architecture (aka "Big Kepler"), I don´t know what compute capability number will have assigned (3.1? maybe). Those cards won´t be available until late this year (I´m waiting sooo much for those). As far as I know the 3.0 corresponds to the GK104 chips like the GTX690 o the GT640M for laptops.
Just wanted to check in with you all given that the CUDA 5 RC was released recently. I looked in the SDK examples and wasn't able to find any dynamic parallelism there. Someone correct me if I'm wrong. I searched for kernel launches within kernels by grepping for "<<<" and found nothing.