cuda-gdb exits with "[1] stopped" when it hits a kernel call - cuda

I'm pretty new to CUDA and flying a bit by the seat of my pants here...
I'm trying to debug my CUDA program on a remote machine I don't have admin rights on. I compile my program with nvcc -g -G and then try to debug it with cuda-gdb. However, as soon as gdb hits a call to a kernel (doesn't even have to enter it, and it doesn't happen in host code), I get:
(cuda-gdb) run
Starting program: /path/to/my/binary/cuda_clustered_tree
[Thread debugging using libthread_db enabled]
[1]+ Stopped cuda-gdb cuda_clustered_tree
cuda-gdb then dumps me back to my terminal. If I try to run cuda-gdb again, I get
An instance of cuda-gdb (pid 4065) is already using device 0. If you believe
you are seeing this message in error, try deleting /tmp/cuda-dbg/cuda-gdb.lock.
The only way to recover is to kill -9 cuda-gdb and cuda_clustered_ (I assume the latter is part of my binary).
This machine has two GPUs, is running CUDA 4.1 (I believe -- there were a lot installed, but that's the one I set the PATH and LD_LIBRARY_PATH to) and compile + runs deviceQuery and bandwidthTest fine.
I can provide more info if need be. I've searched everywhere I could find online and found no help with this.

Figured it out! Turns out, cuda-gdb hates csh.
If you are running csh, it will cause cuda-gdb to exhibit the above anomalous behavior. Even running bash from within csh, then running cuda-gdb, I still saw the behavior. You need to start your shell as bash, and only bash.
On the machine, the default shell was csh, but I use bash. I wasn't allowed to change it directly, so I added 'exec /bin/bash --login' to my .login script.
So even though I was running bash, because it was started by csh, cuda-gdb would exhibit the above anomalous behavior. Getting rid of 'exec' command, so I was running csh directly with nothing on top, still showed the behavior.
In the end, I had to get IT to change my shell to bash directly (after much patient troubleshooting by them.) Now it works as intended.

Related

How do I use LLDB to debug a raw i386 MBR binary running in QEMU, on an Apple Silicon Mac?

I'm working on an i386 bootloader and I'm running it with QEMU on my Apple Silicon machine, and everything works just fine, except I can't debug it: GDB does not (yet?) work on AS and LLDB sternly refuses to load a raw binary. This starts up fine:
$ qemu-system-i386 -s -S -drive format=raw,file=boot.bin,media=disk,if=floppy -no-fd-bootchk
but this errors out:
$ lldb boot.bin
(lldb) target create "boot.bin"
error: '/Users/morpheu5/src/boots/cube/boot.bin' doesn't contain the architecture x86_64
and I also tried this, because well, it's supposed to be i386, not x86_64:
$ lldb --arch i386 boot.bin
(lldb) target create --arch=i386 "boot.bin"
error: '/Users/morpheu5/src/boots/cube/boot.bin' doesn't contain the architecture i386
but it didn't make much of a difference. The inline help is not greatly helpful and I am having zero success searching online.
Now, I have alternatives: bochs has an internal debugger but the text-based interface is a bit clunky and I can't even figure out how to pre-set certain breakpoints -- I like to break on 0x7c00 or otherwise I have to step through the entire BIOS code -- and I can't even run the gui debugger despite having configured it with display_library: sdl2, options=gui_debug. The other alternative is a Raspberry Pi in which I could probably use gdb but I haven't tried this out yet and it's a Zero so it's not even that powerful anyway -- not that I need it, but I'd rather keep my workflow smooth...
It seems clear that lldb isn't recognizing the binary's format so I'm wondering if there's a way of just asking it to disassemble it as a 32 bit binary and just roll with it the best it can. In the end, all I really need is a way of seeing what is in memory, in the registers, and in the stack.
Any ideas?
After a few weeks of experimentation, it doesn't look like lldb is a viable option, but Bochs' command-line debugger was somewhat useful. Shame I couldn't get the GUI to run on macOS.
brew install x86_64-elf-gdb
qemu-system-i386 -s -S result.bin
x86_64-elf-gdb -ex "target remote localhost:1234" -ex "set architecture i8086" -ex "set disassembly-flavor intel" ยทยทยทยท
this works for me, but this don't use lldb.

Debugging Issues in CLion of CUDA files: the debugger does not stop at breakpoints

I've started a CUDA application in the new CLion 2020.1 version. Although I can compile and run it, I am not able to debug it, not even the host code. Specifically, debug does not stop at breakpoints, even though I am running the debug build. I'm not encountering this issue with running a regular C project in CLion 2020.1. I don't receive any error message of any kind. Here is my CMakeLists.txt file:
# Setup the CUDA compiler
set(CMAKE_CUDA_COMPILER /usr/local/cuda-10.2/bin/nvcc)
# Setup the host compiler
set(CMAKE_CUDA_HOST_COMPILER /usr/bin/g++-8)
# CMAKE minimum required version
cmake_minimum_required(VERSION 3.16)
project(PageRank_GPU CUDA)
set(CMAKE_CUDA_STANDARD 14)
add_executable(PageRank_GPU main.cu graph.cu graph.cuh vertex.cuh error.cuh parser.cu parser.cuh)
set_target_properties(
PageRank_GPU
PROPERTIES
CUDA_SEPARABLE_COMPILATION ON)
Reporting that the issue has disappeared after playing around in the project settings a bit. Specifically, under Build, Execution, Deployment then Toolchain, I set the C and C++ compilers to gcc-8 and g++-8 respectively (even though I am specifying the compiler in the CMakeLists.txt file) and under CMake, I set the toolchain to "Default" (the one I just modified) instead of "Use Default". After doing that, the debugger stops at breakpoints and I am able to step through my code. I don't understand what happened because, even after reverting the changes, I cannot make the problem re-appear.

How to run Valgrind to find out memory leaks on my Embedded MIPSEL- linux box?

How can I run valgrind on an embedded Linux box to find memory leaks in my main software?
In the rcS script, I am running like this:
./main_app
How can I associate the ./main_app program with valgrind? The main_app process never terminates.
I want to constantly log the data to file. Also, I want to access the log file without terminating the main_app process. I can do telnet and can access the log file. But the problem is until and unless the handler is closed, how can I open the file i.e. I don't quite understand which valgrind parameters control how memory leaks are logged to file. Please help!
You can try to build it by your own for mips, here the steps:
download valgrind from here http://valgrind.org/downloads/ - I used Valgrind 3.8.1
unpack archive with valgrind and move to the valgrinds folder
execute:
./autogen.sh
./configure --host=mipsel-linux-gnu --prefix=/home/pub/valgrind CFLAGS="-mips32r2" CC=/opt/toolchains/mips-4.3/bin/mips-linux-gnu-gcc CXX=/opt/toolchains/mips-4.3/bin/mips-linux-gnu-c++
./make -j6
./make install
prefix - folder to install compiled binaries of valgrind;
CC and CXX - path to compilers;
CFLAGS - "-mips32r2" and "-mplt" flags should be passed to compilers if it older then gcc (GCC) 4.5.1
on target mips box export path to valgrind lib folder:
export VALGRIND_LIB=/mnt/nfs/lib/valgrind
Now you can use it as usual, for memory checking features you can look here http://valgrind.org/docs/manual/mc-manual.html
It works for me, good luck.
Valgrind only works on x86. You can still track down your leak if you build your application for x86 and run it under valgrind there. Its unlikely the problem is specific to the target architecture.
The answer above describes how to build valgrind but to actually get a full leak check as opposed to just a list of memory problems, your program does have to terminate, I am guessing you never terminate your program.
Assuming your process is some kind of daemon process your best bet is just run it in a loop, monitor memory usage using something like top and then when you see signs of excessive memory usage send it a shutdown command somehow. If you then run valgrind with the following options, you will get a unique log for each process run, including a dump of leaks at exit:
while true ; do
valgrind --leak-check=yes --log-file=/tmp/log.%p.txt main_app
sleep 1
done
The %p in the filename inserts the process id.
You can also specify --show-possibly-lost=no which will reduce the volume of leaks reported to those that valgrind has a much higher degree of confidence are lost.

Nsight debugger doesn't go to device functions [duplicate]

I'we been writing some simple cuda program (I'm student so I need to practice), and the thing is I can compile it with nvcc from terminal (using Kubuntu 12.04LTS) and then execute it with optirun ./a.out (hardver is geforce gt 525m on dell inspiron) and everything works fine. The major problem is that I can't do anything from Nsight. When I try to start debug version of code the message is "Launch failed! Binaries not found!". I think it's about running command with optirun but I'm not sure. Any similar experiences? Thanks, for helping in advance folks. :)
As this was the first post I found when searching for "nsight optirun" I just wanted to wanted to write down the steps I took to make it working for me.
Go to Run -> Debug Configurations -> Debugger
Find the textbox for CUDA GDB executable (in my case it was set to "${cuda_bin}/cuda-gdb")
Prepend "optirun --no-xorg", in my case it was then "optirun --no-xorg ${cuda_bin}/cuda-gdb"
The "--no-xorg" option might not be required or even counterproductive if you have an OpenGL window as it prevents any of that to appear. For my scientific code however it is required as it prevents me from running into kernel timeouts.
Happy bug hunting.
We tested Nsight on Optimus systems without optirun - see "Install the cuda toolkit" in CUDA Toolkit Getting Started on using CUDA toolkit on the Optimus system. We have not tried optirun with Nsight EE.
If you still need to use optirun for debugging, you can try making a shell script that uses optirun to start cuda-gdb and set that shell script as cuda-gdb executable in the debug configuration properties.
The simplest thing to do is to run eclipse with optirun, that will also run your app properly.

How to start debug version of project in nsight with optirun command?

I'we been writing some simple cuda program (I'm student so I need to practice), and the thing is I can compile it with nvcc from terminal (using Kubuntu 12.04LTS) and then execute it with optirun ./a.out (hardver is geforce gt 525m on dell inspiron) and everything works fine. The major problem is that I can't do anything from Nsight. When I try to start debug version of code the message is "Launch failed! Binaries not found!". I think it's about running command with optirun but I'm not sure. Any similar experiences? Thanks, for helping in advance folks. :)
As this was the first post I found when searching for "nsight optirun" I just wanted to wanted to write down the steps I took to make it working for me.
Go to Run -> Debug Configurations -> Debugger
Find the textbox for CUDA GDB executable (in my case it was set to "${cuda_bin}/cuda-gdb")
Prepend "optirun --no-xorg", in my case it was then "optirun --no-xorg ${cuda_bin}/cuda-gdb"
The "--no-xorg" option might not be required or even counterproductive if you have an OpenGL window as it prevents any of that to appear. For my scientific code however it is required as it prevents me from running into kernel timeouts.
Happy bug hunting.
We tested Nsight on Optimus systems without optirun - see "Install the cuda toolkit" in CUDA Toolkit Getting Started on using CUDA toolkit on the Optimus system. We have not tried optirun with Nsight EE.
If you still need to use optirun for debugging, you can try making a shell script that uses optirun to start cuda-gdb and set that shell script as cuda-gdb executable in the debug configuration properties.
The simplest thing to do is to run eclipse with optirun, that will also run your app properly.