CUDA samples cause machine to crash - cuda

I was planning on starting to use CUDA on a machine with Kubuntu 12.04 LTS and a Quadro card. I installed CUDA 5.5 using the .deb from here, and the installation seems to have gone fine. Then I built the CUDA samples, again everything went fine.
When I run the samples in sequence, however, some of them botch my display, and others simply crash my computer.
What causes the crash? How can I fix it?
I'll mention that my NVidia card is the only display adapter the machine has, but that shouldn't make CUDA crash and burn.

The problem was due to the X server using the FOSS nouveau drivers. These are known to conflict with NVidia's way of accessing the card. When I restarted X (actually, I restarted the machine), the samples did run and work properly.

Not all the samples are runnable if you just installed CUDA on a clean ubuntu system. Some of them require additional libraries, and some of them require particular CC versions.
You could read the CUDA sample document of those crashed samples for more information.
http://docs.nvidia.com/cuda/cuda-samples/index.html

Related

Installing CUDA as a non-root user with no GPU

I have a desktop without a GPU, on which I would like to develop code; and a machine on some cluster which has a GPU, and CUDA installed, but where I really can't "touch" anything and on which I won't run an IDE etc. I don't have root on any of the machines, woe is me.
So, essentially, I want to be able to compile and build my CUDA code on my own GPU-less desktop machine, then just copy it and test it on the other machine.
Can this be done despite the two hindering factors: I seem to recall the CUDA installer requiring the presence of a GPU; playing with the kernel; and doing other root-y stuff.
Notes:
I'll be using the standalone installer, not a package.
I'm on Fedora 22 with an x86_64 CPU.
Assuming you want to develop codes that use the CUDA runtime API, you can install the cuda toolkit on a system that does not have a GPU. Using the runfile installer method, simply answer no when prompted to install the driver.
If you want to compile codes (successfully) that use the CUDA driver API, that process will require a libcuda.so on your machine. This file is installed by the driver installer. There are various methods to "force" the driver installer to run on a machine without a GPU. You can get started by extracting the driver runfile installer (or downloading it separately) and passing the --help command line switch to the installer to learn about some of the options.
These methods will not allow you to run those codes on a machine with no GPU of course. Furthermore, the process of moving a compiled binary from one machine to another, and expecting it to run correctly, is troublesome in my opinion. Therefore my suggestion would be to re-compile the code on a target machine. Otherwise getting a compiled binary to run from one machine to the next is a question that is not unique to CUDA, and is outside the scope of my answer.
If you have no intention of running the codes on the non-GPU machine, and are willing to recompile on the target machine, then you can probably develop driver API codes even without libcuda.so (or there is a libcuda.so stub that you could try linking against just for compilation-test purposes, which is installed by the CUDA installer, if you search for it: /usr/local/cuda/lib64/stubs). If you don't link your driver API code against -lcuda, then you'll get a link error of course, but it should not matter, given the previously stated caveats.
Fedora 22 is not officially supported by CUDA 7.5 or prior. YMMV.
If you don't run the driver installer, you don't need to be a root user for any of this. Of course the install locations you pass to the installer must be those that your user privilege allows access to.

Developing using CUDA on several computers, when only one has a GPU installed

I am a Java developer. To speed some of our algorithms, we have decided to try CUDA.
But the Issue is, currently we have only one server with GPU installed and 3 developers have to work on it (by transferring the file each time over ssh and compiling and running it over there). This obviously is a tedious process.
What I would like to know is: On my machine which does not have GPU, can I using NSight work on CUDA by compiling and generating files locally. This can automatically be transferred to server to get the result.
If we can at least work on algorithm locally using NSight (or any other IDE) and not pure vim and then compile it to remove compile time errors, this would save quite some time.
On Linux you can do remote debugging using Nsight Eclipse Edition as documented here. This requires 5.5 or later. On Windows you need to start the Nsight monitor on the server and then just configure Nsight Visual Studio Edition to use the remote machine.

CUDA 5.0 cuda-gdb on Linux Needs dedicated CPU?

With a fresh CUDA 5.0 Linux install on CentOS 5.5, I am not able to gdb. So I am wondering if you still need a dedicated GPU for the Linux cuda-gdb? I tried it with the Vesa device driver for X11, but get the same result. Profiling works, running the app works, but trying to run cuda-gdb gives :
warning: no loadable sections found in added symbol-file system-supplied DSO at 0x2aaaaaaab000
Any suggestions?
cuda-gdb still needs a GPU that is not used by graphical environment (e.g. if you are running Gnome/KDE/etc. you need to have system with several GPUs - not necessary all of them must be NVIDIA GPUs)
This particular message is not about this problem - you can ignore it. cuda-gdb will tell if it fails because no GPU can be used for debugging.

Debugger in CUDA 5

Nvidia has released extended eclipse for CUDA 5. They have Nsight plugin for VS2010 also. In VS2010 we can stop program execution at breakpoint in kernel but how to achieve this functionality in eclipse on Linux? I don't see any nsight specific keys to stop execution. I tried changing perspective but it debugs as a normal C/C++ application. I'm using Tesla C2070, Intel Xeon 8 core machine with Linux.
I'm from Nsight Eclipse Edition team.
Our goal is specifically for the application to be debugged as a normal C/C++ application. This means that you can set breakpoints, use "run to line", etc. regardless of whether you debug host or device code.
Basically, the process is quite standard for Eclipse:
Create a project (you can also import existing executable)
Click debug button
Debugger will run and by default will break in the main function. Note that no device code posted on the device so you will only see the host thread.
Set a breakpoint in the device code and hit resume (note that Breakpoints view toolbar also allows you setting breakpoint on any CUDA kernel launch)
Debugger will break when device code reaches the breakpoint. You can inspect your application state using visual debugger UI.
Couple things, and not sure which solved the issue. Drivers updated to latest ones with RC5.0, but I chose to run VNC server instead of native X server. Then the CUDA card(s) are dedicated to my apps and debugging, and it works like a charm, and now accessible from everywhere.
Eugene,
I just installed Cuda 5, and I wasn't able to break in any kernel code. It was a clean install of centos 5.5, with a fresh download of cuda-5, and i am running on a asus g71x laptop which has a gtx260m installed.
I thought maybe you cant run display and dedbug on one device still, so i switched to non-nv x display, but still had same issue, cant stop in the kernel code.
Have you tried CUDA 5.0 RC1? It is available now. You can download and try it. And I have tried the Nsight in it, it works well for debugging.
Best regards!
The 304.43 NVIDIA Driver does not let users other than root debug their CUDA application.
That problem is not present in any past or future public releases. The CUDA documentation recommends using only drivers listed in the CUDA DevZone. The 304.43 driver is not one of them.
That may or may not be the issue you are hitting. But I thought it was worth mentioning.

cuda with optimus just to access gpgpu

I have a Dell XPS L502 with the Nvidia 525M graphics card. I am only interested in using the gpgpu capabilities of the card for now.
I installed Ubuntu 12.04 as a dual boot with the Windows 7 that came with the machine and followed several installation procedures for installing the CUDA driver and developer kit from Nvidia ( many re-installs of Ubuntu ). In all cases the display drops to 640x480 resolution. Best I can determine this has something to do with Optimus technology and Linux. I tried Bumblebee to no avail.
I really don't care about using the NVidia card to drive the display. Is there any way that I can just install the NVidia drivers so that a program can use the CUDA capabilities of the graphics card and I still get the full resolution on the display?
I had a similar issue with my Alienware M11xR2, and posted the solution on the NVIDIA Forums. Unfortunately the forums are down at the moment but essentially the process is as follows:
Install the Nvidia Drivers, but when prompted to modify your X11 Config, select 'No'. This is because the Nvidia card cannot be used as a display device.
Install the CUDA SDK and run one of the samples as root. I found this to be a necessary step. After this you should be able to execute further CUDA programs as a normal user.
Hope that helps.
With the new release of CUDA 5 the, comes the installation guide, there you have just one file that installs drivers, toolkit and sdk (even nvidia nsight). And one thing that got my attention is that you also have optimus options in the installation process.
I also have and Alienware M14x, and i understand your problem, but i also wanted the drivers to work for me, so i didn't try too hard on that.
Maybe you could give that a try and comment with the rest of us.
Here you can look for the CUDA 5 release candidate: CUDA 5
and here is the installation guide (maybe give this a read first): CUDA 5 Starting Guide for Linux.