Building GPL C program with CUDA module - cuda

I am attempting to modify a GPL program written in C. My goal is to replace one method with a CUDA implementation, which means I need to compile with nvcc instead of gcc. I need help building the project - not implementing it (You don't need to know anything about CUDA C to help, I don't think).
This is my first time trying to change a C project of moderate complexity that involves a .configure and Makefile. Honestly, this is my first time doing anything in C in a long time, including anything involving gcc or g++, so I'm pretty lost.
I'm not super interested in learning configure and Makefiles - this is more of an experiment. I would like to see if the project implementation goes well before spending time creating a proper build script. (Not unwilling to learn as necessary, just trying to give an idea of the scope).
With that said, what are my options for building this project? I have a myriad of questions...
I tried adding "CC=nvcc" to the configure.in file after AC_PROG_CC. This appeared to work - output from running configure and make showed nvcc as the compiler. However make failed to compile the source file with the CUDA kernel, not recognizing the CUDA specific syntax. I don't know why, was hoping this would just work.
Is it possible to compile a source file with nvcc, and then include it at the linking step in the make process for the main program? If so, how? (This question might not make sense - I'm really rusty at this)
What's the correct way to do this?
Is there a quick and dirty way I could use for testing purposes?
Is there some secret tool everyone uses to setup and understand these configure and Makefiles? This is even worse than the Apache Ant scripts I'm used to (Yeah, I'm out of my realm)

You don't need to compile everything with nvcc. Your guess that you can just compile your CUDA code with NVCC and leave everything else (except linking) is correct. Here's the approach I would use to start.
Add a 1 new header (e.g. myCudaImplementation.h) and 1 new source file (with .cu extension, e.g. myCudaImplementation.cu). The source file contains your kernel implementation as well as a (host) C wrapper function that invokes the kernel with the appropriate execution configuration (aka <<<>>>) and arguments. The header file contains the prototype for the C wrapper function. Let's call that wrapper function runCudaImplementation()
I would also provide another host C function in the source file (with prototype in the header) that queries and configures the GPU devices present and returns true if it is successful, false if not. Let's call this function configureCudaDevice().
Now in your original C code, where you would normally call your CPU implementation you can do this.
// must include your new header
#include "myCudaImplementation.h"
// at app initialization
// store this variable somewhere you can access it later
bool deviceConfigured = configureCudaDevice;
...
// then later, at run time
if (deviceConfigured)
runCudaImplementation();
else
runCpuImplementation(); // run the original code
Now, since you put all your CUDA code in a new .cu file, you only have to compile that file with nvcc. Everything else stays the same, except that you have to link in the object file that nvcc outputs. e.g.
nvcc -c -o myCudaImplementation.o myCudaImplementation.cu <other necessary arguments>
Then add myCudaImplementation.o to your link line (something like:)
g++ -o myApp myCudaImplementation.o
Now, if you have a complex app to work with that uses configure and has a complex makefile already, it may be more involved than the above, but this is the general approach. Bottom line is you don't want to compile all of your source files with nvcc, just the .cu ones. Use your host compiler for everything else.
I'm not expert with configure so can't really help there. You may be able to run configure to generate a makefile, and then edit that makefile -- it won't be a general solution, but it will get you started.
Note that in some cases you may also need to separate compilation of your .cu files from linking them. In this case you need to use NVCC's separate compilation and linking functionality, for which this blog post might be helpful.

Related

Is it possible to execute part of the decompiled code?

I am currently trying to solve a reversing challenge, where c code is compiled for a 32bit linux system.
To solve this challenge I am trying to make use of ghidra but am faced with a few issues. A bit of a summary what I have done up to this point:
I have two OS available to me, one 64bit Linux System on my Laptop and this 64bit Windows 10. Apparantly the programm was compiled with gcc without a -g option making ghidra fail to debug the programm. Manually debugging it with gdb in Terminal is possible but terrible to use (at least for me).
So all I can do is look at the assembler code in the CodeBrowser of Ghidra and its respective decomipled c code. With that I got to understand that some of the instructions are decrypted during the runtime of the programm and in order to further analyse the code, I want to be able to execute parts of the instructions to slowly but surely decrypt and understand the hidden parts of the programm.
That being said, the only issue here is that I do not know how I can do that. I have noticed that ghidra has the ability to run java code, but all the examples I looked at that were provided by ghidra allow me to only patch hardcoded instructions into the programm but not to actually execute/evaluate them.
My specific issue at hand is following part of the programm (green marked part):
Ghidra has all the knowledge it needs to execute this part and I just do not know how to do that. I could of cause do it by hand, but that is just boring and not really why I am doing these challenges and that is the same reason as why I am not looking for finished scripts that unpack this programm for me but for a way to execute my analysis.
Finally to summarize my question: I am asking for a way to execute the green marked decrypting part of the targeted programm in ghidra without starting the debugger (since the ghidra debugger keeps failing on me).
I think you are mixing up a few things here. You say:
the programm was compiled with gcc without a -g option making ghidra fail to debug the programm
The debug information added with -g makes it easier to analyze and debug a program because you have information that would have otherwise have to be recovered by reverse engineering. This should not have an influence on whether you can run the program under a debugger in the first place, and as you noted running it with gdb in the terminal works. The Ghidra debugger basically just runs gdb in the background and attaches to it to exchange information, so it should work.
You have a few options now:
1. Get the Ghidra Debugger to run with this binary
Whatever issue you are encountering with the Ghidra debugger is probably a valid question for https://reverseengineering.stackexchange.com/
From then on you can pursue your initial plan to solve this via debugging.
2. Write a GhidraScript to reimplement the decryption
Understand the basic idea of what you recognized correctly as some kind of decryption loop. Then you can use one of Ghidra's scripting options[0] to write a simple script that reimplements this decryption, but writes the decrypted values to the Ghidra memory directly.
Any scripting language will obviously include basic arithmetic operations like + -, and xor and loops, and the Ghidra API provides the functions byte getByte(Address address) and setByte(Address address, byte value). If you encounter any issues or API questions while writing this script that will also be a valid follow up question for the RE Stack Exchange.
This approach has the advantage that you can then statically analyse the resulting data inside Ghidra again, e.g. disassemble the resulting code.
[0] Ghidra natively supports Python 2.7 and Java based Scripts and a rudimentary Python REPL, but there are other options like Jupyter and Script based Kotlin or Ruby, Kotlin and Clojure Scripts

Is there a list of headers that can be used in an string to compile with NVRTC? [duplicate]

Specifically, my issue is that I have CUDA code that needs <curand_kernel.h> to run. This isn't included by default in NVRTC. Presumably then when creating the program context (i.e. the call to nvrtcCreateProgram), I have to send in the name of the file (curand_kernel.h) and also the source code of curand_kernel.h? I feel like I shouldn't have to do that.
It's hard to tell; I haven't managed to find an example from NVIDIA of someone needing standard CUDA files like this as a source, so I really don't understand what the syntax is. Some issues: curand_kernel.h also has includes... Do I have to do the same for each of these? I am not even sure the NVRTC compiler will even run correctly on curand_kernel.h, because there are some language features it doesn't support, aren't there?
Next: if you've sent in the source code of a header file to nvrtcCreateProgram, do I still have to #include it in the code to be executed / will it cause an error if I do so?
A link to example code that does this or something like it would be appreciated much more than a straightforward answer; I really haven't managed to find any.
You have to send the "filename" and the source of each header separately.
When the preprocessor does its thing, it'll use any #include filenames as a key to find the source for the header, based on the collection that you provide.
I suspect that, in this case, the compiler (driver) doesn't have file system access, so you have to give it the source in much the same way that you would for shader includes in OpenGL.
So:
Include your header's name when calling nvrtcCreateProgram. The compiler will, internally, generate the equivalent of a std::map<string,string> containing the source of each header indexed by the given name.
In your kernel source, use #include "foo.cuh" as usual.
The compiler will use foo.cuh as an index or key into its internal map (created when you called nvrtcCreateProgram), and will retrieve the header source from that collection
Compilation proceeds as normal.
One of the reasons that nvrtc provides only a "subset" of features is that the compiler plays in a somewhat sandboxed environment, without necessarily having all of the supporting tools and utilities lying around that you have with offline compilation. So, you have to manually handle a lot of the stuff that the normal nvcc + (gcc | MSVC| clang) combination provides.
A possible, but non-ideal, solution would be to preprocess the file that you need in your IDE, save the result and then #include that. However, I bet there is a better way to do that. if you just want curand, consider diving into the library and extracting the part you need (blech) or using another GPU-friendly rand implementation. On older CUDA versions, I just generated a big array of random floats on the host, uploaded it to the GPU, and sampled it in the kernels.
This related link may be helpful.
You do not need to load curand_kernel.h yourself and add it to the include "aliases" mechanism.
Instead, you can simply add the CUDA include directory to your (set of) include paths, e.g. by adding --include-path=/usr/local/cuda/include to your NVRTC compiler options.
(I do this in my GPU-kernel-runner test harness, by default, to be on the safe side.)

How to use Chisel C++ emulator

How to use the C++ files generated by the Chisel compiler? the documentations are not clear on this, is there any other source to point me to it? I am really clueless on this, specially that I don't know C++.
Say for example for a simple adder circuit Adder.scala I will get the following files related to the emulator:
Adder.cpp, Adder.h, Adder-emulator.cpp, emul_api.h, emulator.h and sim_api.h.
For which I can compile by running
g++ Adder.cpp Adder-emulator.cpp
This generates the output a.out running this in the terminal generates three more files that I have no clue what they are.
00003710.cmd, 00003710.in and 00003710.out.
The C++ code is used to build an emulation of your design. You need to also define a tester that will drive the emulation, using poke() to set signal values, and peek() or expect() to read them.
You should not be compiling the C++ yourself. If you pass the --genHarness and --test options to Chisel, it will compile the C++ code, build the emulation and run your tester to drive it.
Have a look at the chisel-tutorial code for examples of this process.

When does a file need to be compiled exactly?

This is probably a dumb question, but how do you know when something needs to go through a "build" or be compiled? My understanding of it is that when you code, you are writing source files, when then, in order to be understood by the computer, will need to be compiled to executable files. But then, how come I can just write a index.html and pull it into my browser for it to work? When did the compiling happen in this situation? How does this compare to the compiling that happens when you use build tools like npm?
Compiling means to make a list of machine codes that can be stored and executed by the machine. Browsers do not do that they just parse and execute the language without building a list. (Although often as mentioned in comments for faster execution, code may be compiled on the fly and sent to CPU as machine codes.)

how to use cublas library

I'm novice to cuda. I want to use cublas functionality in cuda programs and would like to know the procedure on how to use it. As my program is giving error i.e. *"cublas_v2.h file not present"* when I try to run it on GPU.
The cublas documentation is contained
here. So you
should familiarize yourself with it.
Try and run a sample program. There are samples in the CUDA samples that come with the CUDA 5 toolkit, or you can try the one here. Make sure you understand how to build and run that example.
If you're still having trouble with your own program, post a new question here with the specifics. For example, if your program is giving error output, paste in the complete error output that you are getting, and paste in the code from your program that is generating that error output (at least). Preferably paste in your whole program, or a simple version that will reproduce the error.
The error you're getting right now looks like it should be coming from the compiler not from when you run the program. If so you simply need to tell the compiler where to find that header file. If you look at a cuda sample like this one, you can see from the included makefile how to tell the compiler where to look for header files.
There can be multiple things because of which you must be struggling to run a code which makes use of the CuBlas library.
The most important thing is to compile your source code with -lcublas flag.
It should look like nvcc -c example.cu -o example -lcublas
Secondly, confirm whether you have Cublas Library in your system.
As it says "cublas_v2.h file not present", try doing "whereis cublas_v2.h"
or search manually for the file, if it is not there you need to install
Cublas library from Nvidia's website.
Confirm your Cuda Installation path and LD_LIBRARY_PATH
Your cuda path should be /usr/local/cuda.
and LD_LIBRARY_PATH should be /usr/local/cuda/lib64 OR
/usr/local/cuda/lib.