How do I capture console output from a remote NSight session? - cuda

I have a set of CUDA apps that both write to the console via cout. I have a host machine with VS and NSight plug-in and a target machine with NSight service. However, when I execute the console app, it actually runs on the target machine (literally pops up a console).
So here's the question: how can I get the console to show up on the host and only the GPU stuff to execute on the target? Is this even possible?
Thanks!

The short answer is that it is currently not possible. The application on the target is executed by the Nsight Monitor process but Nsight Monitor currently does not forward the output back to host.
Currently your only option is to take care of it your self by capturing the output of your application on the target and somehow display it on the host.
If this feature is important to you i suggest you file a feature request via your Nvidia developer account.

The CUDA application completely runs on the target machine, so the console or UI for the application will be seen on the target machine only. You can set breakpoints in the GPU code in the VS side (your host machine), and it should break there.
If you feel the application quits too quickly and is not launching the kernels as expected (and you are not hitting the breakpoints), it may be that you have not deployed all the required DLLs on the target machine (e.g. CUDART).

Related

What is the difference between qemu-sparc64 and qemu-system-sparc64?

I am trying to run some simple SPARC tests on bare metal QEMU. I am using qemu-sparc64 -g 1234 simple_example and seems to be working fine (I can connect gdb to localhost:1234, step through, etc) but was wondering what does qemu-system-sparc64 do ? I tried running it with the same cmd line switches but got some errors. Any help is appreciated, thank you.
For any QEMU architecture target, the qemu-system-foo binary runs a complete system emulation of the CPU and all other devices that make up a machine using that CPU type. It typically is used to run a guest OS kernel, like Linux; it can run other bare-metal guest code too.
The qemu-foo binary (sometimes also named qemu-foo-static if it has been statically linked) is QEMU's "user-mode" or "linux-user" emulation. This expects to run a single Linux userspace binary, and it translates all the system calls that process makes into direct host system calls.
If you're running qemu-sparc64 then you are not running your program in a bare-metal environment -- it's a proper Linux userspace process, even if you're not necessarily using all of the facilities that allows. If you want bare-metal then you need qemu-system-sparc64, but your program needs to actually be compiled to run correctly on the specific machine type that you tell QEMU to emulate (eg the Sun4u hardware, which is the default). Also, by default qemu-system-sparc64 will run the OpenBIOS firmware, so your bare-metal guest code needs to either run under that OpenBIOS environment, or else you need to tell QEMU not to run the BIOS (and then you get to deal with all the hardware setup that the BIOS would do for you).

How to set breakpoint on symbols in qemu user mode emulated process?

As qemu user mode emulation doesn't support ptrace system call, I am trying to debug a qemu user mode emulated process via qemu's gdbstub, and use another gdb instance to connect to it via target remote :1234.
This works fine for some basic command like si to single step instructions, but I cannot set breakpoint on the symbols in the main emulated executable such as main. Simply run break main will say the breakpoint is set to some raw un-relocated address (like 0x63a, but if I hit c in gdb client, the symbols for the main executable is never resolved to the real virtual address, and then is never hit.
Is this a general issue for debugging qemu user mode emulated process, and is there any way to set up the breakpoint correctly?

How to profile CUDA code on a headless node?

I'm working on a CUDA application I'd like to profile. Up to now all I've used is the command line profiler, nvprof, which just displayes the summarized statistics.
I thought about using the GUI profiler, NVVP. The problem is that the remote Linux node I'm running the application on doesn't have anything GUI (even X.org). Moreover, even if I managed to get some X11 stack on the remote node, keeping my own laptop alive for the whole time of the profiling would be, well, tedious.
I tried collecting all the needed information in the following way:
nvprof --analysis-metrics -o application.nvprof ./myapplication
Then I copy the output file onto my laptop and view it in NVVP. This has three problems, though.
First of all, I don't get any file transfer information when I load the output file into NVVP. It's not shown at all in the NVVP window.
Secondly, the call graph is completely distorted. The gaps between kernel launches are at least 100x bigger than the kernel durations, which makes any dependency and flow analysis impossible.
Lastly, my application uses a lot of the GPU memory. During the profiling the device gets out of memory, which is not the case during the standalone run.
How should I properly profile my CUDA application on a headless node?
NVVP supports headless nodes as a first-class citizen. Remote profiling is a major feature of NVVP.
The way this works is that NVVP runs on your local GUI-enabled host machine and invokes nvprof on the headless machine, generates the required files there, copies the files over, and opens them. All of this happens transparently and automatically. You can run further analyses from NVVP as usual and it will repeat these steps for you.
To use remote profiling, open NVVP, then File->New Session. Add a Connection instead of using Local, putting in details of the headless machine. Click on Manage... to point NVVP to the toolkit path on the remote machine. Once this one-time setup is done, enter the path to the executable and run as usual.
You can read about remote profiling in the relevant documentation.

Remote Debugging won't connect

I sort of make shift followed this guide on how to setup remote debugging. Since I am using Adobe Animate to compile my app I assume it has done the majority of the build steps already as I get a similar screen described.
I don't understand though. Here I have port forwarding up on my router so that it goes to my PC. I have TCP port 7935 up and open. Windows firewall on or off doesn't seem to make difference. Windows firewall even prompted me to allow or deny fdb after I ran it. I can't get my phone to connect via remote debugging. I want to be able to send this to my client who is having issue with the app so I can see what's going on under the hood instead of relying on a giant sum of try/catch statements and screenshots. Any help?
I tried a dummy domain and it seems to know that it can't connect to it. When I try mine or my IPv4 it doesn't let me connect. It just freezes up the app.
I don't know whether it works or not in Animate CC, but it works via Flash Builder. I'm using Android real device and I have Android SDK tools installed on my PC
Yes, I have followed that tuts from official Adobe docs, but that doesn't work
First: Simply connect your device to your PC
Actually , you can debug your app remotely as long as your device has been connected with your PC. This step, doesn't necessarily requires FDB.
In my case , all I need was things like
adb connect 192.168.xx.xx:port
this will connect your Android device with your PC on your default network .
Second, set debug setting over network
You've done it in Animate CC, with addition you might want to check "install application on the connected device'
Third, just debug as usual
You can get all those debugging stuff including traces

Sikuli with jenkins setup for continuous integration

I have my test writtern in Sikuli. If I RDP into my Jenkins machine and have an active session then all sikuli test pass.
However, for overnight run, my Jenkins machine do get locked. I want to understand if anyone has encountered and solved this issue before. Thanks!
Note: I cannot leave my Jenkins slave unlocked due to security reasons.
It's a known limitation of RDP.
Two optional solutions:
install VNC Server (like UltraVNC), and run it as Windows service (make sure it is launched during Windows logon).
OR
Create a batch file that disconnects Remote Desktop, and use it instead of closing the RDP session with the regular X button. The batch command is:
%windir%\system32\tscon.exe %SESSIONNAME% /dest:console