Suppose I have a cubin file, or perhaps to make it easier, a cubin file I loaded into memory (so that I have a void* to the data).
Using the CUDA Driver API for modules, I can try loading the data into a module within the current context; and this would fail if compiled code is not available for a relevant target (and there's no PTX which could be JITed instead). But - what I actually want to do is check which targets have code in the module data (or module file)?
Non-programmatically, I know I can invoke:
cuobjdump my.fatbin
and get a listing of what's in there. But I want to do it from within my application's code.
Is this possible?
You could call cuobjdump from within your program and parse its output.
#include <cstdlib>
#include <string>
__global__
void kernel(){
}
int main(int argc, char** argv){
std::string command{};
command += "cuobjdump ";
command += argv[0];
command += " > out.txt";
int sysret = system(command.c_str());
kernel<<<1,1>>>();
cudaDeviceSynchronize();
return sysret;
}
You may be able do this using an ELF parser.
It seems that cubin files are actually slightly-non-standard ELF files. Specifically, they have an .nv_fatbin ELF section, containing regions with compiled code for different targets; see this analysis.
If you used an ELF library, and made it accept some invalid/different magic numbers / version numbers, it would probably parse the cubin file in a way you could then easily extract your meta-data of interest from, including the target architecture for each region.
See also how a cubin file is generated, using an ELF library, here (but note that's Python code with a lot of magic numbers and somewhat difficult to follow).
Now it's just the "simple matter of coding", adapting an ELF parser to collect and expose what you need. The remaining question is whether all of the relevant information is in the ELF meta-data or whether you also need to do further parsing of the contents.
Related
I have a cython source file in which I would like to import the local python module.
This cython source file is translated using cython (python3 syntax) into a c++ source, which in turn is compiled into a library, and then used from the main C++ program.
When the main program is being executed the import of the local python module fails because the location of thereof module is not known to the executed code. I tried using python3's local import features in my pyx file but to no avail.
The only working solution I came up with (and the most obvious one) is to update python's module search path using sys.path.append. The problem is that I have to hardcode this path, which is ugly.
I tried to find any hints if it is possible to retrieve within cython code location of the source file (I could derive an absolute path from it) but without success. Usual pythonic ways to do that fail - for instance, one of the reasons is that __file__ evaluates to built-in, and retrieval of the absolute path at runtime gives the path where the executable is being run.
Sidenote: one of the searches I did was by querying GitHub search engine for occurrences of sys.path.append in cython files. Interestingly, all results either have paths hardcoded or they are not related to the location of the cython source file within the file system.
So my question is if it is possible within cython code to reliably retrieve the location of its source file?
Disclaimer: I could imagine instrumenting the build system to pass preprocessor variable set to the path in question while building the C++ file derived from the cython one, and then access this within the code, but this looks like an overkill.
Example:
bulba.py
def fn():
print('blah')
bulbulator.pyx
# tag: cpp
# tag: py3only
import sys
sys.path.append('/absolute_path_to_folder_with_bulba_py') # <-- this is the key part. I'd like to replace the hardcoded path with something better
from bulba import fn
fn()
bulbulator.pyx is translated into cpp with:
cython -3 --cplus bulbulator.pyx
lib_wrapper.cpp (this library, and executable which links against it, have a location different than that of py/pyx source code and its translated c++ part)
// import headers generated by cython
#include "bulbulator_api.h"
#include "bulbulator.h"
// global initialization of the cythonized part
__attribute__((constructor))
static void
__library_init()
{
if (int err = PyImport_AppendInittab("bulbulator", PyInit_bulbulator); err != 0)
{
std::fprintf(stderr, "PyImport_AppendInittab(bulbulator) failed with status code=%d\n", err);
std::exit(1);
}
Py_Initialize();
if (import_bulbulator() == -1) // <-- here it fails if I comment out sys.path.append, because bulbulator needs to know the location of bulba.py
{
PyErr_Print();
}
}
I would rather put the bulba.py next to the exe, but it is also possible to back in an absolute path into the Cython-extension, using for example a compile time environment variable (see cython --help for more details), let's call it ADDITIONAL_SYS_PATH:
import sys
sys.path.append(ADDITIONAL_SYS_PATH)
from bulba import fn
fn()
And now running Cython via:
cython -3 --cplus -E ADDITIONAL_SYS_PATH="the path to dir"
will set the right value to ADDITIONAL_SYS_PATH.
Cython saves the name of the pyx-file in the resulting cpp (for example for run-time error reporting), but this name doesn't include the full path, so we need to provide the path manually.
I'm using Eclipse CDT (actually, nVIDIA's nSight, but the same goes for both) to edit some source files. Now, some of these are meant for use both with nvcc and with regular host-side compilers, and have some instances of:
#ifdef __CUDACC__
something
#else
some other thing
#endif
I want to get the __CUDACC__ part when the preprocessor is reaching a file while parsing a .cuh or .cu, but not when reaching it while parsing a .h or .cpp (or .c). Now, I know I can inject a preprocessor define through the project settings (using the "built-in compiler" command line), but I was wondering whether it's possible to make that conditional on the extension of the file being parsed originally (i.e. the file being edited in the IDE).
How are you configuring the project's include paths and defined macros?
If you use the Build Output Parser, can you arrange for the build system to include -D __CUDACC__ in the compiler commands for .cu files, but not for the compiler commands for .cpp files?
CDT allows for each file in the project to have its own settings, and the Build Output Parser will assign each file that has a compilation command in the build output its own settings based on the flags that appear in the command, so things should just work.
I have a conv net implementation as a C++ class. The class is built on top of a template library ( mshadow ) that generates CUDA code, so it takes the form of a header file. Consequently, it can only be used in files compiled using nvcc. I am now trying to wrap this class in Python for easier loading and saving of parameters, data, etc.
How do I go about wrapping the C++ class using Cython? I looked at npcuda-example, which demonstrates how to write a wrapper pyx file around a C++ class. Unfortunately, in this example, the pyx file compiles to a cpp file. This will not work for me because I need to include the class header in the pyx file and compile it using nvcc.
I believe I could use the setup.py from npcuda-example if there were some way to force the wrapper pyx file to compile to a cu file so that nvcc would be called when distutils tried to compile the extension.
Any ideas?
in the npcuda-example, the wrapper.pyx will combine *.cu by defining
cdef extern from "src/manager.hh"
I guess it is exactly what you want?
I am working on a code in CUDA C on VS2008 ,Win 7. I got a matrix of float which is to be displayed as image ..i saved it as .bin file and load it in a separate .cpp file and successfully formed the image using CImg library...however when i try to add the similar code to .cu file it gives a strange error as shown below on compilation...
error: identifier "_ZN12cimg_library4cimg9superset2IfffE4typeE" is undefined
The code snippet i tried adding in .cu file is given as under
#include <CImg.h>
using namespace cimg_library;
....host code.....continues...
CImg<float> img1(448,448);
for (int nn=0;nn<200704;nn++)
img1[nn] = dR[nn]; // dR is obtained after cudamemcpy DtoH
img1.display();
At forums i cant find much help regarding this as well as use of CImg with Cuda..
is there any way i can use CImg with cuda..
Thanks
My suggestion is to move the code that uses CImg to a .cpp file. The code in the .cpp file would then invoke the host/device code in the .cu file. The code in the .cu file then returns a pointer or reference to the matrix of floats back to the code in the .cpp file.
Nvidia's nvcc is a compiler driver. It invokes a C/C++ compiler to compile files with a .c or .cpp file name. However, a .cu file has special meaning to nvcc. It does some parsing and what-not to look for kernel functions and certain #pragmas. I'm not an expert, but I know there is a copy a manual floating around. Here is a link to an older copy of the manual.
I know how to generate a .ptx file from a .cu and how to generate a .cubin file from a .ptx. But I don't know how to get the final executable.
More specifically, I have a sample.cu file, which is compiled to sample.ptx. I then use nvcc to compile sample.ptx to sample.cubin. However, this .cubin file cannot be directly executed without host code. How can I link .cubin file to my original .cu file to produce the final executable?
You should be able to run ptx code directly from the cuda driver api with cuModuleLoadDataEx. There is an example here at page 5
As of CUDA 4.0, you can write inline PTX in CUDA kernels.