Nvidia CUDA Profiler's timeline contains many large gaps - cuda

I'm trying to profile my code using Nivida Profiler, but I'm getting strange gaps in the timeline as shown below:
Note: both kernels on the edges of the gaps are CudaMemCpyAsync (Host-to-Device)
I'm running on Ubuntu 14.04 with latest version of CUDA, 8.0.61 and latest Nvidia display driver.
Intel integrated graphics card is used in display not Nvidia. So, Nvidia Graphics card is only running the code, not anything else.
I've enabled CPU Profiling as well to check these gaps but nothing is shown!
Also, no Debugging options are enabled (-G nor -g)
and this is a "release build"
My laptop's specs:
Intel Core i7 4720HQ
Nvidia GTX 960m
16GB DDR3 Ram
1 TB Hard Drive
Is there anyway to trace what's happening in these empty time slots?
Thanks,

I'm afraid there are no automatic methods, but you can add custom traces in your code to find what's happening :
To do that you can use NVTX.
follow the links for some tutorials or documentation.
These profiling holes are probably due to data loading, memory allocations/initialisations done by the host between your kernels executions.

Related

Getting Theano to use the GPU and all CPU cores (at the same time)

I managed to get Theano working with either GPU or multicore CPU on Ubuntu 14.04 by following this tutorial.
First I got multicore working (I could verify that in System Monitor).
Then, after adding the config below to .theanorc, I got GPU working:
[global]
device = gpu
floatX = float32
I verified it by running the test from the tutorial and checking the execution times, and also by the log message when running my program:
"Using gpu device 0: GeForce GT 525M"
But as soon as GPU started working I wouldn't see multicore in System Monitor anymore. It uses just one core at 100% like before.
How can I use both? Is it even possible?
You can't fully utilize both multicore and GPU at the same time.
Maybe this can be impoved in the future.

CUDA samples cause machine to crash

I was planning on starting to use CUDA on a machine with Kubuntu 12.04 LTS and a Quadro card. I installed CUDA 5.5 using the .deb from here, and the installation seems to have gone fine. Then I built the CUDA samples, again everything went fine.
When I run the samples in sequence, however, some of them botch my display, and others simply crash my computer.
What causes the crash? How can I fix it?
I'll mention that my NVidia card is the only display adapter the machine has, but that shouldn't make CUDA crash and burn.
The problem was due to the X server using the FOSS nouveau drivers. These are known to conflict with NVidia's way of accessing the card. When I restarted X (actually, I restarted the machine), the samples did run and work properly.
Not all the samples are runnable if you just installed CUDA on a clean ubuntu system. Some of them require additional libraries, and some of them require particular CC versions.
You could read the CUDA sample document of those crashed samples for more information.
http://docs.nvidia.com/cuda/cuda-samples/index.html

Can't debug CUDA: CUDA dynamic parallelism debugging is not supported in preemption mode

I have CUDA 5.5, latest drivers, Nsight studio 3.1 for VC2010 on Windows7 64bit.
The target machine has a headless Titan card, and another simple NVidia card, to which the monitor is connected.
I'm trying to debug my CUDA code which includes some dynamic parallelism. Whenever I click "Start CUDA Debugging" in VC, I get this error from Nsight Monitor: CUDA dynamic parallelism debugging is not supported in preemption mode. From what little I found regarding this issue, this is because I'm trying to debug CUDA on the same device that drives my screen. This however is not true, as I mentioned, I have a separate card to drive the screen.
I went even further with this, disconnected the monitor from the second card as well, rebooted, and set up remote debugging from a different machine. Same result.
Does anyone have an idea how to tackle this?
Right click the monitor's tray icon, check "Options\CUDA\Debugger". Except TCC GPUs, the others are by default force "Software Preemption".
You can set "Desktop GPUS must use Software Preemption" and "Headless GPUs must use software preemption" to false. And make sure in you VisualStuido, the setting "Nsight\Options\CUDA\Preemption Preference" is "Prefer no Software Preemption".

Disabled ECC support for Tesla C2070 and Ubuntu 12.04

I have a headless workstation running Ubuntu 12.04 server and recently installed new Tesla C2070 card, but when running the examples from the CUDA SDK, I get the following error:
NVIDIA_GPU_Computing_SDK/C/bin/linux/release% ./reduction
[reduction] starting...
Using Device 0: Tesla C2070
Reducing array of type int
16777216 elements
256 threads (max)
64 blocks
reduction.cpp(473) : cudaSafeCallNoSync() Runtime API error 39 : uncorrectable ECC error encountered.
Actually, this error occurs with all other examples except "deviceQuery".
I'm using kernel 3.2.0, nvidia driver 295.41 and Cuda 4.2.9.
After a lot of searching found a suggestion to disable the ecc support by:
nvidia-smi -g 0 --ecc-config=0
which worked. But the question is how reliable will be the GPU computing
with disabled ecc support?
Any advice, suggestion or solution will be highly appreciated.
-Konstantin
I'm wondering if this may be some sort of compatibility issue, rather than a bad card. I'm suffering from the same problem with a Tesla C2075, same Ubuntu version. We contacted nVidia and they told us that double-bit ECC errors (as seen using nvidia-smi -q in linux) meant that the card was probably broken. We obtained a replacement, but it has exactly the same issues.
It seems unlikely that both the boards I have had are broken in the same way, so we're going to try it in another machine if we can find a suitable one.
I'll post anything interesting that we learn.
I'll echo what aland said and add my own experience.
I worked with a number of Fermi equipped compute clusters and tested them variably with ECC on and off. We did this to increase the amount of memory available and the speed of the computations, which was noticeable. nvidia-smi never reported any ECC errors for those cards with ECC on, nor did we ever encounter any runtime errors that were indicative of ECC related problems.
If your card is detecting uncorrectable ECC problems, that indicates a flaw in the hardware, and turning ECC off is only masking the problem. The runtime is rightly warning you that something bad has gone wrong, and you can't depend on the results.
You can try running your calculations anyway and see what happens, but be prepared for anything going absolutely crazy for no real reason. A single bit flipped here or there can have enormous consequences for floating point math for example, and may flat out crash your kernel if an instruction gets corrupted.
If you can, I would try to get the card replaced rather than masking the symptoms.
It turned out my case is the same as carthurs's. I also got my card replaced, but the
error didn't go away. Only after setting the motherboard's onboard VGA as primary in
the BIOS it disappeared. There should be a warning about this in the Tesla installation manual!
Thanks everybody for the help.
Once a GPU uncorrectable ECC error occurs the GPU might be in unstable state (e.g. data corruption could have occurred not only in user allocated memory but also in memory region necessary for GPU operation). To recover the GPU you need to either power cycle/reboot your system or try to use GPU Reset from nvidia-smi
nvidia-smi -h
...
-r --gpu-reset Trigger secondary bus reset of the GPU.
Can be used to reset GPU HW state in situations
that would otherwise require a machine reboot.
Typically useful if a double bit ECC error has
occurred.
--id= switch is mandatory for this switch
Type man nvidia-smi for more help on that topic

Debugger in CUDA 5

Nvidia has released extended eclipse for CUDA 5. They have Nsight plugin for VS2010 also. In VS2010 we can stop program execution at breakpoint in kernel but how to achieve this functionality in eclipse on Linux? I don't see any nsight specific keys to stop execution. I tried changing perspective but it debugs as a normal C/C++ application. I'm using Tesla C2070, Intel Xeon 8 core machine with Linux.
I'm from Nsight Eclipse Edition team.
Our goal is specifically for the application to be debugged as a normal C/C++ application. This means that you can set breakpoints, use "run to line", etc. regardless of whether you debug host or device code.
Basically, the process is quite standard for Eclipse:
Create a project (you can also import existing executable)
Click debug button
Debugger will run and by default will break in the main function. Note that no device code posted on the device so you will only see the host thread.
Set a breakpoint in the device code and hit resume (note that Breakpoints view toolbar also allows you setting breakpoint on any CUDA kernel launch)
Debugger will break when device code reaches the breakpoint. You can inspect your application state using visual debugger UI.
Couple things, and not sure which solved the issue. Drivers updated to latest ones with RC5.0, but I chose to run VNC server instead of native X server. Then the CUDA card(s) are dedicated to my apps and debugging, and it works like a charm, and now accessible from everywhere.
Eugene,
I just installed Cuda 5, and I wasn't able to break in any kernel code. It was a clean install of centos 5.5, with a fresh download of cuda-5, and i am running on a asus g71x laptop which has a gtx260m installed.
I thought maybe you cant run display and dedbug on one device still, so i switched to non-nv x display, but still had same issue, cant stop in the kernel code.
Have you tried CUDA 5.0 RC1? It is available now. You can download and try it. And I have tried the Nsight in it, it works well for debugging.
Best regards!
The 304.43 NVIDIA Driver does not let users other than root debug their CUDA application.
That problem is not present in any past or future public releases. The CUDA documentation recommends using only drivers listed in the CUDA DevZone. The 304.43 driver is not one of them.
That may or may not be the issue you are hitting. But I thought it was worth mentioning.