CImg library in Cuda - cuda

I am working on a code in CUDA C on VS2008 ,Win 7. I got a matrix of float which is to be displayed as image ..i saved it as .bin file and load it in a separate .cpp file and successfully formed the image using CImg library...however when i try to add the similar code to .cu file it gives a strange error as shown below on compilation...
error: identifier "_ZN12cimg_library4cimg9superset2IfffE4typeE" is undefined
The code snippet i tried adding in .cu file is given as under
#include <CImg.h>
using namespace cimg_library;
....host code.....continues...
CImg<float> img1(448,448);
for (int nn=0;nn<200704;nn++)
img1[nn] = dR[nn]; // dR is obtained after cudamemcpy DtoH
img1.display();
At forums i cant find much help regarding this as well as use of CImg with Cuda..
is there any way i can use CImg with cuda..
Thanks

My suggestion is to move the code that uses CImg to a .cpp file. The code in the .cpp file would then invoke the host/device code in the .cu file. The code in the .cu file then returns a pointer or reference to the matrix of floats back to the code in the .cpp file.
Nvidia's nvcc is a compiler driver. It invokes a C/C++ compiler to compile files with a .c or .cpp file name. However, a .cu file has special meaning to nvcc. It does some parsing and what-not to look for kernel functions and certain #pragmas. I'm not an expert, but I know there is a copy a manual floating around. Here is a link to an older copy of the manual.

Related

How do I check, programmatically, which targets are available in a cubin?

Suppose I have a cubin file, or perhaps to make it easier, a cubin file I loaded into memory (so that I have a void* to the data).
Using the CUDA Driver API for modules, I can try loading the data into a module within the current context; and this would fail if compiled code is not available for a relevant target (and there's no PTX which could be JITed instead). But - what I actually want to do is check which targets have code in the module data (or module file)?
Non-programmatically, I know I can invoke:
cuobjdump my.fatbin
and get a listing of what's in there. But I want to do it from within my application's code.
Is this possible?
You could call cuobjdump from within your program and parse its output.
#include <cstdlib>
#include <string>
__global__
void kernel(){
}
int main(int argc, char** argv){
std::string command{};
command += "cuobjdump ";
command += argv[0];
command += " > out.txt";
int sysret = system(command.c_str());
kernel<<<1,1>>>();
cudaDeviceSynchronize();
return sysret;
}
You may be able do this using an ELF parser.
It seems that cubin files are actually slightly-non-standard ELF files. Specifically, they have an .nv_fatbin ELF section, containing regions with compiled code for different targets; see this analysis.
If you used an ELF library, and made it accept some invalid/different magic numbers / version numbers, it would probably parse the cubin file in a way you could then easily extract your meta-data of interest from, including the target architecture for each region.
See also how a cubin file is generated, using an ELF library, here (but note that's Python code with a lot of magic numbers and somewhat difficult to follow).
Now it's just the "simple matter of coding", adapting an ELF parser to collect and expose what you need. The remaining question is whether all of the relevant information is in the ELF meta-data or whether you also need to do further parsing of the contents.

Inject a preprocessor definition into the Eclipse parser for a certain file type?

I'm using Eclipse CDT (actually, nVIDIA's nSight, but the same goes for both) to edit some source files. Now, some of these are meant for use both with nvcc and with regular host-side compilers, and have some instances of:
#ifdef __CUDACC__
something
#else
some other thing
#endif
I want to get the __CUDACC__ part when the preprocessor is reaching a file while parsing a .cuh or .cu, but not when reaching it while parsing a .h or .cpp (or .c). Now, I know I can inject a preprocessor define through the project settings (using the "built-in compiler" command line), but I was wondering whether it's possible to make that conditional on the extension of the file being parsed originally (i.e. the file being edited in the IDE).
How are you configuring the project's include paths and defined macros?
If you use the Build Output Parser, can you arrange for the build system to include -D __CUDACC__ in the compiler commands for .cu files, but not for the compiler commands for .cpp files?
CDT allows for each file in the project to have its own settings, and the Build Output Parser will assign each file that has a compilation command in the build output its own settings based on the flags that appear in the command, so things should just work.

cython extensions using cuda

I have a conv net implementation as a C++ class. The class is built on top of a template library ( mshadow ) that generates CUDA code, so it takes the form of a header file. Consequently, it can only be used in files compiled using nvcc. I am now trying to wrap this class in Python for easier loading and saving of parameters, data, etc.
How do I go about wrapping the C++ class using Cython? I looked at npcuda-example, which demonstrates how to write a wrapper pyx file around a C++ class. Unfortunately, in this example, the pyx file compiles to a cpp file. This will not work for me because I need to include the class header in the pyx file and compile it using nvcc.
I believe I could use the setup.py from npcuda-example if there were some way to force the wrapper pyx file to compile to a cu file so that nvcc would be called when distutils tried to compile the extension.
Any ideas?
in the npcuda-example, the wrapper.pyx will combine *.cu by defining
cdef extern from "src/manager.hh"
I guess it is exactly what you want?

SWIG TCL Static Linking

I am trying to use SWIG to generate wrappers for some of my C++ function calls.
Also, I am trying to do build my own TCL shell so I need to static link the generated SWIG libraries. I have my own main function with a Tcl_AppInit call where I do some prior setup.
To do this what function should I include in my program's Tcl_AppInit call? I found that SWIG_init is not the right function. I even tried Cell_Init where cell is the name of the class in my code, but that doesn't help either.
How do I static link SWIG object files with my own main function and Tcl_Appinit call?
Currently when I use the following command to link my executabel I get the following error:
g++ -o bin/icde src/core/*.o src/read/*.o src/swig/*.o src/icde/*.o -ltk -ltcl
I get the following error:
src/icde/main.o: In function `AppInit(Tcl_Interp*)':
main.cpp:(.text+0xa9): undefined reference to `Cell_Init(Tcl_Interp*)'
collect2: ld returned 1 exit status
I checked the src/swig/cell.o file which has the Cell_Init function or not using objdump:
~> objdump -d src/swig/cell.o | grep Cell_Init
00006461 <Cell_Init>:
646c: 75 0a jne 6478 <Cell_Init+0x17>
I am not sure if I am doing something wrong while linking.
------------------- UPDATE ----------------------------
I found that including the swig/swig.cxx file directly in the main file which calls the Tcl_AppInit function resolves the linking issue. Is there a reason for this.
Isn't it possible to create and seprately link the swig file and the file with the main function?
In general, with SWIG you'll end up with a bunch of generated source files that you compile. The normal thing you do then is package them up into a shared library (with appropriate bound dependencies on other shared libraries) that can be imported into a Tcl runtime with the load command.
But you don't want that this time. Instead, you want the object files that you would use to make that shared lib, and you want to include them in the instructions to build an executable along with the object file that holds your main and Tcl_AppInit. You also need to make sure that when linking your main executable that you make it dependent on those external shared libraries; executable building requires that you satisfy all dependencies and make all symbols be bound to their definitions. (You can use a static library to make this easier: it combines a bunch of object files into one file. There's very little difference to just using the object files from it though; in particular, static libraries aren't bound to their dependencies.)
Finally, you do want to include a call to Cell_Init in your Tcl_AppInit. That's the right place to put it (well, as long as you're not arranging for the package to be loaded into sub-interpreters). If it was failing before, that was because you'd got your linking wrong. (Tip: linkers work best when objects and libraries on the link line only depend on things later on the link line. Getting the link order right is a bit of a black art when you've got a complex build!)

How do I get a PTX file to execute

I know how to generate a .ptx file from a .cu and how to generate a .cubin file from a .ptx. But I don't know how to get the final executable.
More specifically, I have a sample.cu file, which is compiled to sample.ptx. I then use nvcc to compile sample.ptx to sample.cubin. However, this .cubin file cannot be directly executed without host code. How can I link .cubin file to my original .cu file to produce the final executable?
You should be able to run ptx code directly from the cuda driver api with cuModuleLoadDataEx. There is an example here at page 5
As of CUDA 4.0, you can write inline PTX in CUDA kernels.