How do I get a PTX file to execute - cuda

I know how to generate a .ptx file from a .cu and how to generate a .cubin file from a .ptx. But I don't know how to get the final executable.
More specifically, I have a sample.cu file, which is compiled to sample.ptx. I then use nvcc to compile sample.ptx to sample.cubin. However, this .cubin file cannot be directly executed without host code. How can I link .cubin file to my original .cu file to produce the final executable?

You should be able to run ptx code directly from the cuda driver api with cuModuleLoadDataEx. There is an example here at page 5

As of CUDA 4.0, you can write inline PTX in CUDA kernels.

Related

How do I check, programmatically, which targets are available in a cubin?

Suppose I have a cubin file, or perhaps to make it easier, a cubin file I loaded into memory (so that I have a void* to the data).
Using the CUDA Driver API for modules, I can try loading the data into a module within the current context; and this would fail if compiled code is not available for a relevant target (and there's no PTX which could be JITed instead). But - what I actually want to do is check which targets have code in the module data (or module file)?
Non-programmatically, I know I can invoke:
cuobjdump my.fatbin
and get a listing of what's in there. But I want to do it from within my application's code.
Is this possible?
You could call cuobjdump from within your program and parse its output.
#include <cstdlib>
#include <string>
__global__
void kernel(){
}
int main(int argc, char** argv){
std::string command{};
command += "cuobjdump ";
command += argv[0];
command += " > out.txt";
int sysret = system(command.c_str());
kernel<<<1,1>>>();
cudaDeviceSynchronize();
return sysret;
}
You may be able do this using an ELF parser.
It seems that cubin files are actually slightly-non-standard ELF files. Specifically, they have an .nv_fatbin ELF section, containing regions with compiled code for different targets; see this analysis.
If you used an ELF library, and made it accept some invalid/different magic numbers / version numbers, it would probably parse the cubin file in a way you could then easily extract your meta-data of interest from, including the target architecture for each region.
See also how a cubin file is generated, using an ELF library, here (but note that's Python code with a lot of magic numbers and somewhat difficult to follow).
Now it's just the "simple matter of coding", adapting an ELF parser to collect and expose what you need. The remaining question is whether all of the relevant information is in the ELF meta-data or whether you also need to do further parsing of the contents.

Inject a preprocessor definition into the Eclipse parser for a certain file type?

I'm using Eclipse CDT (actually, nVIDIA's nSight, but the same goes for both) to edit some source files. Now, some of these are meant for use both with nvcc and with regular host-side compilers, and have some instances of:
#ifdef __CUDACC__
something
#else
some other thing
#endif
I want to get the __CUDACC__ part when the preprocessor is reaching a file while parsing a .cuh or .cu, but not when reaching it while parsing a .h or .cpp (or .c). Now, I know I can inject a preprocessor define through the project settings (using the "built-in compiler" command line), but I was wondering whether it's possible to make that conditional on the extension of the file being parsed originally (i.e. the file being edited in the IDE).
How are you configuring the project's include paths and defined macros?
If you use the Build Output Parser, can you arrange for the build system to include -D __CUDACC__ in the compiler commands for .cu files, but not for the compiler commands for .cpp files?
CDT allows for each file in the project to have its own settings, and the Build Output Parser will assign each file that has a compilation command in the build output its own settings based on the flags that appear in the command, so things should just work.

How does Cocos2d-x keep code (C++ file) safe from decompilation?

I just decompiled APK using dex2jar and JD-GUI to access the source code, but I could not find any .h or .cpp files in classes.dex folder. I want to know how does Cocos2d-x take care of that issue and where are all these files stored?
the c++ files are compiled into a dynamic library, in lib/armeabi/, like libxxx.so

cython extensions using cuda

I have a conv net implementation as a C++ class. The class is built on top of a template library ( mshadow ) that generates CUDA code, so it takes the form of a header file. Consequently, it can only be used in files compiled using nvcc. I am now trying to wrap this class in Python for easier loading and saving of parameters, data, etc.
How do I go about wrapping the C++ class using Cython? I looked at npcuda-example, which demonstrates how to write a wrapper pyx file around a C++ class. Unfortunately, in this example, the pyx file compiles to a cpp file. This will not work for me because I need to include the class header in the pyx file and compile it using nvcc.
I believe I could use the setup.py from npcuda-example if there were some way to force the wrapper pyx file to compile to a cu file so that nvcc would be called when distutils tried to compile the extension.
Any ideas?
in the npcuda-example, the wrapper.pyx will combine *.cu by defining
cdef extern from "src/manager.hh"
I guess it is exactly what you want?

CImg library in Cuda

I am working on a code in CUDA C on VS2008 ,Win 7. I got a matrix of float which is to be displayed as image ..i saved it as .bin file and load it in a separate .cpp file and successfully formed the image using CImg library...however when i try to add the similar code to .cu file it gives a strange error as shown below on compilation...
error: identifier "_ZN12cimg_library4cimg9superset2IfffE4typeE" is undefined
The code snippet i tried adding in .cu file is given as under
#include <CImg.h>
using namespace cimg_library;
....host code.....continues...
CImg<float> img1(448,448);
for (int nn=0;nn<200704;nn++)
img1[nn] = dR[nn]; // dR is obtained after cudamemcpy DtoH
img1.display();
At forums i cant find much help regarding this as well as use of CImg with Cuda..
is there any way i can use CImg with cuda..
Thanks
My suggestion is to move the code that uses CImg to a .cpp file. The code in the .cpp file would then invoke the host/device code in the .cu file. The code in the .cu file then returns a pointer or reference to the matrix of floats back to the code in the .cpp file.
Nvidia's nvcc is a compiler driver. It invokes a C/C++ compiler to compile files with a .c or .cpp file name. However, a .cu file has special meaning to nvcc. It does some parsing and what-not to look for kernel functions and certain #pragmas. I'm not an expert, but I know there is a copy a manual floating around. Here is a link to an older copy of the manual.