cuda nvprof for mex file - cuda

My program consists of a Matlab file (.m) and a mex-cuda file(.cu). It starts with Matlab, then the Matlab file calls the mex-cuda file.
I want to get profiling the performance in the mex-cuda file using 'nvprof' command-line. I know that for regular CUDA program, we simply use (in Unix):
$nvprof file.out
However, the mex-cuda file now is called from matlab, not from the shell anymore. Is there a way to use 'nvprof' command-line for this mex-cuda file?

If you run matlab from the command line, try
nvprof "matlab < myfile.m"
The quotes may be unnecessary, i.e.this may work as well:
nvprof matlab < myfile.m

This blog explains how to profile CUDA kernels from MATLAB.
http://meerkat.gr/matlab-profile-cuda.html
It should work with CUDA-MEX as well.

Related

How to check if Caffe is using my GPU?

Same as this question but for caffe. I want a command I can put in my python script to check if gpu is utilized.
I have checked nvidia-smi while my model is running and I see that python is recognized as a process but Usage is N/A.
I also tried running the caffe.set_mode_cpu() command thinking that the times would be very different but the times with the command and without where the same.
I would like to suggest the use of GPUSTAT. You can query the GPU and check if your process is in the list returned by the command.
It is simple, not too elegant but it works.

How could I scan text file line by line using cuda c programming

I need to scan a text file and use the string for processing parallel. I need to know file reading and writing is possible in cuda c programming.If possible what are the keywords can be used. I used "File" but it gives undefined.
I using the visual studio 2013.
You cannot access the host filesystem from __global__ or __device__ functions. What you would want to do is read and parse the file with fopen, fgets, etc in a host function, copy the data to the device and then do your parallel processing on it.

How can I read the PTX?

I am working with Capabilities 3.5, CUDA 5 and VS 2010 (and obviously Windows).
I am interested in reading the compiled code to understand better the implication of my C code changes.
What configuration do I need in VS to compile the code for readability (is setting the compilation to PTX enough?)?
What tool do I need to reverse engineer the generated PTX to be able to read it?
In general, to create a ptx version of a particular .cu file, the command is:
nvcc -ptx mycode.cu
which will generate a mycode.ptx file containing the ptx code corresponding to the file you used. It's probably instructive to use the -src-in-ptx option as well:
nvcc -ptx -src-in-ptx mycode.cu
Which will intersperse the lines of source code with the lines of ptx they correspond to.
To comprehend ptx, start with the documentation
Note that the compiler may generate ptx code that doesn't correspond to the source code very well, or is otherwise confusing, due to optimizations. You may wish (perhaps to gain insight) to compile some test cases using the -G switch as well, to see how the non-optimized version compares.
Since the windows environment may vary from machine to machine, I think it's easier if you just look at the path your particular version of msvc++ is using to invoke nvcc (look at the console output from one of your projects when you compile it) and prepend the commands I give above with that path. I'm not sure there's much utility in trying to build this directly into Visual Studio, unless you have a specific need to compile from ptx to an executable. There are also a few sample codes that have to do with ptx in some fashion.
Also note for completeness that ptx is not actually what's executed by the device (but generally pretty close). It is an intermediate code that can be re-targetted to devices within a family by nvcc or a portion of the compiler that also lives in the GPU driver. To see the actual code executed by the device, we use the executable instead of the source code, and the tool to extract the machine assembly code is:
cuobjdump -sass mycode.exe
Similar caveats about prepending an appropriate path, if needed. I would start with the ptx. I think for what you want to do, it's enough.

how to extract ptx from cuda exe and some related cuda compiler questions

1)I want to extract ptx code from a CUDA exe and use that kernel code in another program .
Is there a way to identify the kernel ptx code from an exe. I know they are arbitrarily laid out in an exe file data section.
I learnt that in MAC executables the ptx kernels start with .version and ends with a null string. Is there something like that for win exe(PE) files. I guess i need to parse the exe file , gather ptx statements one at a time and group them together as kernels. But I dont know how i would go about it. some help would get me started. I also find a .nvFatBi section in Cuda exe. What is that supposed to be?
2)I also learnt that there are global constructors which register the cubin with the cuda runtime. I dont understand this part completely. Does the function cudaRegisterFatBinary come into play here. If so how can I use this ptx to supply the pointer to the cudaRegisterFatBinary ? I understand i have to compile the ptx to cubin file . is it possible programatically? In short i want to emulate the nvcc itself in some sense.
Try: cuobjdump --dump-ptx [executable-name]

Fortran: Hardcode some code in dependency on an environment variable

Hey there,
if the env var "XYZ" is set WHILE compiling, than I want the part:
write (STDOUT,*) "Compiled with XYZ"
here one more function call bla()
to be compiled into the binary. If not, than not.
Any way to do it?
Thanks a lot!
You can't check environment variables while compiling, but you can pass options to the compiler -- termed preprocessing. This isn't heavily documented, but works with at least gfortran and intel ifort. On the compile line use, or not, "-DMYOPTION" (or whatever option name you select). Then in the code:
#ifdef MYOPTION
Fortran source code
#else
Fortran source code
#endif
Apparently that the preprocessor lines must start in the first column.
If you use filetype "F90" the preprocessor will be automatically invoked, otherwise you can use a compiler option to invoke this step.
Maybe this will answer your need? If not, you could you a command script to check the environment variable and use different compile commands depending on its value, to make the preprocessor method respond to an environment variable.
Of course, you can check environment variables at run-time with the intrinsic get_environment_variable .. simply using if statements to respond to a value might be easier.
As part of the Fortran 2008 standard, there are intrinsic functions for finding the compiler options and version it was compiled with. compiler options and compiler version. Fortran compilers are slowly coming up to date with the new standard; gfortran has it, it doesn't look like ifort 12 does yet:
program compilerinfo
use iso_fortran_env
implicit none
print *,'This program was compiled with ', compiler_version()
print *,'with flags ', compiler_options()
end program compilerinfo
and running gives
$ ./compilerinfo
This program was compiled with GCC version 4.6.0
with flags -mtune=generic -march=x86-64
NoOnly very newest Fortran compilers provide such a feature.
The nearest mechanism would be to write a program which obtains the environment variable and writes a Fortran subroutine containing the information needed. Add to the project build:
Running the program to grab the environment variable and write the subroutine
Compile the subroutine
Link the object into the rest of the project.
Edited to reflect Fortran 2008+ compilers