how to use cublas library - cuda

I'm novice to cuda. I want to use cublas functionality in cuda programs and would like to know the procedure on how to use it. As my program is giving error i.e. *"cublas_v2.h file not present"* when I try to run it on GPU.

The cublas documentation is contained
here. So you
should familiarize yourself with it.
Try and run a sample program. There are samples in the CUDA samples that come with the CUDA 5 toolkit, or you can try the one here. Make sure you understand how to build and run that example.
If you're still having trouble with your own program, post a new question here with the specifics. For example, if your program is giving error output, paste in the complete error output that you are getting, and paste in the code from your program that is generating that error output (at least). Preferably paste in your whole program, or a simple version that will reproduce the error.
The error you're getting right now looks like it should be coming from the compiler not from when you run the program. If so you simply need to tell the compiler where to find that header file. If you look at a cuda sample like this one, you can see from the included makefile how to tell the compiler where to look for header files.

There can be multiple things because of which you must be struggling to run a code which makes use of the CuBlas library.
The most important thing is to compile your source code with -lcublas flag.
It should look like nvcc -c example.cu -o example -lcublas
Secondly, confirm whether you have Cublas Library in your system.
As it says "cublas_v2.h file not present", try doing "whereis cublas_v2.h"
or search manually for the file, if it is not there you need to install
Cublas library from Nvidia's website.
Confirm your Cuda Installation path and LD_LIBRARY_PATH
Your cuda path should be /usr/local/cuda.
and LD_LIBRARY_PATH should be /usr/local/cuda/lib64 OR
/usr/local/cuda/lib.

Related

Is it possible to execute part of the decompiled code?

I am currently trying to solve a reversing challenge, where c code is compiled for a 32bit linux system.
To solve this challenge I am trying to make use of ghidra but am faced with a few issues. A bit of a summary what I have done up to this point:
I have two OS available to me, one 64bit Linux System on my Laptop and this 64bit Windows 10. Apparantly the programm was compiled with gcc without a -g option making ghidra fail to debug the programm. Manually debugging it with gdb in Terminal is possible but terrible to use (at least for me).
So all I can do is look at the assembler code in the CodeBrowser of Ghidra and its respective decomipled c code. With that I got to understand that some of the instructions are decrypted during the runtime of the programm and in order to further analyse the code, I want to be able to execute parts of the instructions to slowly but surely decrypt and understand the hidden parts of the programm.
That being said, the only issue here is that I do not know how I can do that. I have noticed that ghidra has the ability to run java code, but all the examples I looked at that were provided by ghidra allow me to only patch hardcoded instructions into the programm but not to actually execute/evaluate them.
My specific issue at hand is following part of the programm (green marked part):
Ghidra has all the knowledge it needs to execute this part and I just do not know how to do that. I could of cause do it by hand, but that is just boring and not really why I am doing these challenges and that is the same reason as why I am not looking for finished scripts that unpack this programm for me but for a way to execute my analysis.
Finally to summarize my question: I am asking for a way to execute the green marked decrypting part of the targeted programm in ghidra without starting the debugger (since the ghidra debugger keeps failing on me).
I think you are mixing up a few things here. You say:
the programm was compiled with gcc without a -g option making ghidra fail to debug the programm
The debug information added with -g makes it easier to analyze and debug a program because you have information that would have otherwise have to be recovered by reverse engineering. This should not have an influence on whether you can run the program under a debugger in the first place, and as you noted running it with gdb in the terminal works. The Ghidra debugger basically just runs gdb in the background and attaches to it to exchange information, so it should work.
You have a few options now:
1. Get the Ghidra Debugger to run with this binary
Whatever issue you are encountering with the Ghidra debugger is probably a valid question for https://reverseengineering.stackexchange.com/
From then on you can pursue your initial plan to solve this via debugging.
2. Write a GhidraScript to reimplement the decryption
Understand the basic idea of what you recognized correctly as some kind of decryption loop. Then you can use one of Ghidra's scripting options[0] to write a simple script that reimplements this decryption, but writes the decrypted values to the Ghidra memory directly.
Any scripting language will obviously include basic arithmetic operations like + -, and xor and loops, and the Ghidra API provides the functions byte getByte(Address address) and setByte(Address address, byte value). If you encounter any issues or API questions while writing this script that will also be a valid follow up question for the RE Stack Exchange.
This approach has the advantage that you can then statically analyse the resulting data inside Ghidra again, e.g. disassemble the resulting code.
[0] Ghidra natively supports Python 2.7 and Java based Scripts and a rudimentary Python REPL, but there are other options like Jupyter and Script based Kotlin or Ruby, Kotlin and Clojure Scripts

How to use clangd text-highlighting with CUDA on header files

I'm running VSCodium with the clangd extension, and I'd like to have proper CUDA highlighting (e.g. __device__ and __host__ keywords are understood). I ran CMake to generate the compile_commands.json file, and it includes CUDA-specific keywords (e.g. --cuda-gpu-arch=sm_52, --cuda-path=/usr/local/cuda). However, clangd still gives me the squiggle underline on CUDA-specific keywords. It seems like if clangd is just using the clang compiler to understand the source code, then clangd should work with CUDA (given that clang was able to compile the CUDA code).
So, is there any way to get clangd to work for CUDA? And if so, how do I do it via VSCodium?
Edit: As it turns out, the CUDA code highlighting works on the directly linked files (with a .cu extension), but it does not work on one of the included header files (with a .hpp extension). How do I get the text-highlighting to work on the header file?
create a config file ".clangd" in your project directory and specify the location of CUDA headers. This works for me.
CompileFlags:
Add:
- -xc++
- --cuda-path=/path_to_cuda_installation
- --cuda-gpu-arch=sm_52
- -I/path_to_cuda_installation/include

How to use Chisel C++ emulator

How to use the C++ files generated by the Chisel compiler? the documentations are not clear on this, is there any other source to point me to it? I am really clueless on this, specially that I don't know C++.
Say for example for a simple adder circuit Adder.scala I will get the following files related to the emulator:
Adder.cpp, Adder.h, Adder-emulator.cpp, emul_api.h, emulator.h and sim_api.h.
For which I can compile by running
g++ Adder.cpp Adder-emulator.cpp
This generates the output a.out running this in the terminal generates three more files that I have no clue what they are.
00003710.cmd, 00003710.in and 00003710.out.
The C++ code is used to build an emulation of your design. You need to also define a tester that will drive the emulation, using poke() to set signal values, and peek() or expect() to read them.
You should not be compiling the C++ yourself. If you pass the --genHarness and --test options to Chisel, it will compile the C++ code, build the emulation and run your tester to drive it.
Have a look at the chisel-tutorial code for examples of this process.

CUDA 5.5 - Relocatable device code causes unresolved external symbol

I'm using CUDA 5.5 and my compiler is VC2012. My projects consists of 2 .cu files. I need to use dynamic parallelism so I have enabled "Generate relocatable device code" option. As soon as this option is enabled I get the following linking error:
error LNK2001: unresolved external symbol __fatbinwrap_54_tmpxft_0000110c_00000000_8_cuda_device_runtime_cpp1_ii_5f6993ef
If I turn that option off the error disappears. The error does not depend on the contents of the files - if I comment out everything in them (#if 0 .... #endif so that they are essentially empty files) I still get the same error.
Update: I installed VC2010, and I still get the exact same error message!
Update 2: I got really tired with this whole thing so I took a dynamic parallelism sample project from VC2010 from the SDK samples (cdpSimplePrint), replaced the files in it and compiled. The issue is gone. Tried to compare the settings between the two projects, but I couldn't find anything. I don't know what caused it, but frankly, at this stage I don't care already, as long as I have a workaround.
Check that you are linking with the cudadevrt library, see the "Compiling and Linking" section in the CUDA Dynamic Parallelism Programming Guide for more info.

Building GPL C program with CUDA module

I am attempting to modify a GPL program written in C. My goal is to replace one method with a CUDA implementation, which means I need to compile with nvcc instead of gcc. I need help building the project - not implementing it (You don't need to know anything about CUDA C to help, I don't think).
This is my first time trying to change a C project of moderate complexity that involves a .configure and Makefile. Honestly, this is my first time doing anything in C in a long time, including anything involving gcc or g++, so I'm pretty lost.
I'm not super interested in learning configure and Makefiles - this is more of an experiment. I would like to see if the project implementation goes well before spending time creating a proper build script. (Not unwilling to learn as necessary, just trying to give an idea of the scope).
With that said, what are my options for building this project? I have a myriad of questions...
I tried adding "CC=nvcc" to the configure.in file after AC_PROG_CC. This appeared to work - output from running configure and make showed nvcc as the compiler. However make failed to compile the source file with the CUDA kernel, not recognizing the CUDA specific syntax. I don't know why, was hoping this would just work.
Is it possible to compile a source file with nvcc, and then include it at the linking step in the make process for the main program? If so, how? (This question might not make sense - I'm really rusty at this)
What's the correct way to do this?
Is there a quick and dirty way I could use for testing purposes?
Is there some secret tool everyone uses to setup and understand these configure and Makefiles? This is even worse than the Apache Ant scripts I'm used to (Yeah, I'm out of my realm)
You don't need to compile everything with nvcc. Your guess that you can just compile your CUDA code with NVCC and leave everything else (except linking) is correct. Here's the approach I would use to start.
Add a 1 new header (e.g. myCudaImplementation.h) and 1 new source file (with .cu extension, e.g. myCudaImplementation.cu). The source file contains your kernel implementation as well as a (host) C wrapper function that invokes the kernel with the appropriate execution configuration (aka <<<>>>) and arguments. The header file contains the prototype for the C wrapper function. Let's call that wrapper function runCudaImplementation()
I would also provide another host C function in the source file (with prototype in the header) that queries and configures the GPU devices present and returns true if it is successful, false if not. Let's call this function configureCudaDevice().
Now in your original C code, where you would normally call your CPU implementation you can do this.
// must include your new header
#include "myCudaImplementation.h"
// at app initialization
// store this variable somewhere you can access it later
bool deviceConfigured = configureCudaDevice;
...
// then later, at run time
if (deviceConfigured)
runCudaImplementation();
else
runCpuImplementation(); // run the original code
Now, since you put all your CUDA code in a new .cu file, you only have to compile that file with nvcc. Everything else stays the same, except that you have to link in the object file that nvcc outputs. e.g.
nvcc -c -o myCudaImplementation.o myCudaImplementation.cu <other necessary arguments>
Then add myCudaImplementation.o to your link line (something like:)
g++ -o myApp myCudaImplementation.o
Now, if you have a complex app to work with that uses configure and has a complex makefile already, it may be more involved than the above, but this is the general approach. Bottom line is you don't want to compile all of your source files with nvcc, just the .cu ones. Use your host compiler for everything else.
I'm not expert with configure so can't really help there. You may be able to run configure to generate a makefile, and then edit that makefile -- it won't be a general solution, but it will get you started.
Note that in some cases you may also need to separate compilation of your .cu files from linking them. In this case you need to use NVCC's separate compilation and linking functionality, for which this blog post might be helpful.