Can I debug CUDA on the device that drives the display output? - cuda

I develop on VS2012. I have 3 monitors connected to my pc with one GTX 960 graphic card.
I knew that it's impossible to debug CUDA on the same device that drives the display output. Maybe I'm reading it wrong, but when I go to NSight->Windows->System Info->Display Devices, I can see that the monitor uses my graphic card. Since I have only one graphic card and I can debug (as the image shows in CUDA WarpWatch1) I deduct that either I do can debug on the same device that drives the display output or it uses my built-in Intel HD Graphics but doesn't show it in the Display Device .

Despite what you have apparently read somewhere, CUDA (and NSight) has supported debugging on GPUs with the WDDM driver on active display GPUs for a number of years. You can see the exact matrix of supported hardware, drviers and debugging modes in the documentation here.
When CUDA was first introduced, debugging was limited to non-display cards. However, this limitation was removed on Windows and Linux using more recent hardware some time ago.

Related

Does nvidia gpu work less efficiently when it is the only gpu in PC?

I want to assemble a new computer mainly for CUDA applications. When it comes to CPU I have to choose between AMD and Intel.
Most of the AMD's processors don't have integrated gpu while Intel's processors do.
My question is:
If the nvidia gpu would be the only graphic processing unit in the whole PC (without integrated one),
would its efficiency for CUDA programs be worse as it has to produce some graphics on a desktop (while using for example Matlab)?
The anwer is yes, efficiency would be slightly lower due to the GPU doing display tasks, like moving the cursor around or scrolling a display in a .pdf browser.
however if you are aiming for a reasonably mid-to-high-end GPU, the loss of efficiency is marginal. If you have enough money, you will buy dedicated GPU, but if not, then just don't bother. It might be like 1% or less.
A bigger problem is that the display takes up RAM, that (a) becomes unavailable to CUDA applications and (b) the CUDA manual states that the display driver is allowed to dis-own the CUDA application from it's memory at any time without warning (!).
If you ask me if that does really happen (display driver taking over the CUDA app memory), then yes, I have experienced it, with the prime example being when you change the resolution of your display.
So definetely don't do any banking with GPUs or you might see your accounts being randomly infused with millions :-)
That's why 'proffesional' CUDA cards (the tesla variety) have no display outputs - just in case.

Dynamically detecting a CUDA enabled NVIDIA card and only then initializing the CUDA runtime: How to do?

I have an application which has an algorithm, accelerated with CUDA. There is also a standard CPU implementation of it. We plan to release this application for various platforms, so most of the time, there won't be a NVIDIA card to run the accelerated CUDA code. What I want is to first check whether the user's system has a CUDA enabled NVIDIA card and if it does, initializing the CUDA runtime after. If the system does not support CUDA, then I want to execute the CPU path. This question is very similar to mine, but I don't want to use any other libraries other than the plain CUDA runtime. OpenCL is an alternative, but there isn't enough time to implement an OpenCL version of the algorithm for the first release. Without any CUDA existence check, the program will surely crash since it can't find the needed .dll's for the CUDA runtime and we surely don't want that. So, I need advices on how to handle this initialization step.
Use the calls cudaGetDeviceCount and cudaGetDeviceProperties to find CUDA devices on the running system. First find out how many, then loop through all the available devices, and inspect the properties to decide which ones qualify. What I mean by "qualify" depends on your application. Do you want to require a certain compute capability? Or need a certain amount of memory? If there's more than one device, you might want to sort on some criteria, then set the device cudaSetDevice. If there are no devices, or none that are sufficient, then fall back on the CPU code path.
I'd also suggest having some mechanism to force CUDA mode off, in case some user's environment just doesn't work due to driver issues, or an old board, or something else. You can use a command-line option, or an environment variable, or whatever...
EDITING:
Regarding DLLs, you should package cudart[whatever].dll with your application. That will ensure that the program starts, and at least the CUDA query functions will operate.

Developing for CUDA on "cheap" GPUs

I develop algorithms in CUDA on my desktop which should later run on a server.
Is it okay to use a recent low end card (like compute capability 2.1) to get all the nice debug and profiling features and then put the code on the server with the high end card (with the same cc)? Would I just need to adjust the thread/mesh sizes, or does it change everything™.
Example: I would develop on a Quadro 600 and the server would use a Tesla C2075.
As long your kernel call and kernel itself is scalable you have no problem.
Check out this question:
CUDA development on different cards?
There are some issues, like memory bandwith being different (25.6 GiB/s on Quadro and 148 GiB/s on Tesla, according to your links), or different number of SMs (the driver could distribute blocks across SMs differently). However, in most cases such small diffrences are of minor importance.
If the server has more than one GPU installed then you need to change your code to run on Multi-GPU to fully leverage the power of the server. Although the same code will run fine but on a single card.
In case there's only one card on server; general rule of thumb is that you do not need to change any line of code to harness the power of the stronger GPU as the driver distributes the load among SMs automatically.

Can't run CUDA nor OpenCL on GeForce 540M

I have problem running samples provided by Nvidia in their GPU Computing SDK (there's a library of compiled sample codes).
For cuda I get message "No CUDA-capable device is detected", for OpenCL there's error from function that should find OpenCL capable units.
I have installed all three parts from Nvidia to develop with OpenCL - devdriver for win7 64bit v.301.27, cuda toolkit 4.2.9 and gpu computing sdk 4.2.9.
I think this might have to do with Optimus technology that reroutes output from Nvidia GPU to Intel to render things (this notebook has also Intel 3000HD accelerator), but in Nvidia control pannel I set to use high performance Nvidia GPU, set power profile to prefer maximum performance and for PhysX I changed from automatic selection to Nvidia processor again. Nothing has changed though, those samples won't run (not even those targeted for GF8000 cards).
I would like to play somewhat with OpenCL and see what it is capable of but without ability to test things it's useless. I have found some info about this on forums, but it was mostly about linux users where you need Bumblebee to access Nvidia GPU. There's no such problem on Windows however, drivers are better and so you can access it without dark magic (or I thought so until I found this problem).
My laptop has a GeForce 540M as well, in an Optimus configuration since my Sandy Bridge CPU also has Intel's integrated graphics. To run CUDA codes, I have to:
Install NVIDIA Driver
Go to NVIDIA Control Panel
Click 3D Settings -> Manage 3D Settings -> Global Settings
In the Preferred Graphics processor drop down, select "High-performance NVIDIA processor"
Apply the settings
Note that the instructions above apply the settings for all applications, so you don't have to worry about CUDA errors any more. But it will drain more battery.
Here is a video recap as well. Good luck!
Ok this has proven to be totally crazy solution. I was thinking if something isn't hooking between the hardware and application and only thing that came to my mind was AV software. I'm using Comodo with sandbox and Defense+ on and after turning them off I could run all those samples. What's more, only Defense+ needs to be turned off.
Now I just think about how much apps could have been blocked from accessing that GPU..
That's most likely because of the architecture of Optimus. So I'd suggest you to read
NVIDIA CUDA Developer Guide for NVIDIA Optimus Platforms, especially the section "Querying for a CUDA Device" which addresses this issue, I believe.

Use NVIDA card for CUDA, motherboard for video

I want use the motherboard as the primary display adapter and my NVIDIA graphics card as a dedicated CUDA processor. My first thought was to simply plug the monitor's VGA cable into the motherboard's VGA port and hope the BIOS was smart enough to use the on-board video as the display adapter when it booted. That didn't work. The BIOS must have detected the NVIDIA card and continued to use it as the display adapter. The next thing I looked for was a setting in the BIOS to tell it "don't use the the NVIDIA 560 as the display adapter, use the on-board video as the display adapter". I search through the BIOS and the Web, but either this cannot be done or I cannot figure out how to do it. The mobo is a BIOSTAR TH67+ LGA 1155. Windows 7 OS.
RESULTS SUMMARY (from answers provided below)
Enabling the Integrated Graphics Device (IGD) in the BIOS will allow the system to be driven from the on-board graphics even with the graphics card connected to the system bus. However, the graphics card cannot be used for CUDA processing. Windows will not enable graphics devices unless a monitor is attached to them. The normal driver stack cannot see them. Solution: use Linux, or attach a display to the graphics card but do not use it. The Tesla cards (GPGPU-only) are not recognized by Windows as graphics devices, so they don't suffer from this.
Also ,a newer BIOSTAR motherboard, the TZ68A+, supports the Virtu drivers which permit sophisticated simultaneous use of the graphics cards and on-board video.
Looking at the BIOS manual (.zip), the setting you probably want is Chipset -> North Bridge -> Initiate Graphics Adapter. Try setting it to IGD (Integrated Graphics Device).
I believe this will happen automatically as the native video won't support CUDA. After installing the SDK, if you run DeviceQuery, do you see more than one result?
I believe h67 allows coexistence of both integrated & dedicated GPU. Check out Lucid Virtu here http://www.lucidlogix.com/driverdownloads-virtu.html it allows switching GPUs on the fly. But I don't know if it affects CUDA device query.
I never tried it on my rig, because its x58, I just heard it from tomshardware. Try it out and let us know. Lucid Virtu is definitely worth a try, its free, and it can cut you electric bill.