I want to convert CUDA code to llvm bitcode so I can instrument it. I have tried gpuocelot, which compile ptx into CPU executable code. Nevertheless, I couldn't get llvm bitcode from it so I can't instrument it. There have been activities trying to get CUDA supported in llvm. Can anyone provide a robust solution to convert CUDA to workable llvm bitcode? Thanks.
NVIDIA's nvcc is actually using LLVM IR as one of its steps. They might have changed it a little bit - I haven't seen the details. They have explained it under:
https://developer.nvidia.com/cuda-llvm-compiler
You should be able to use Clang to compile CUDA (mixed-mode) to LLVM IR now. Check this page out. Note that this support is still experimental. Feel free to report bugs to the LLVM community.
Related
With CuObjDump SASS can be generated from Cubin file using
cuobjdump -sass <input file>, But is there any way to convert the SASS back to Cubin.
There are no "assemblers" provided as part of the official NVIDIA CUDA toolchain. The NVIDIA toolchain can take CUDA C/C++, or PTX, and convert it to a cubin or other executable format.
However there are some community-developed assemblers:
Perhaps the most recent one at this time (probably the only one worth considering at this time) is maxas.
There also was an older one asfermi developed in the Fermi generation of CUDA GPUs. I don't think it has been updated or maintained.
I would like to add that depending on the architecture (maxwell/kepler etc), you can use a community developed assembler/dissembler to convert the SASS back to Cubin. Here are some:
Maxas: https://github.com/NervanaSystems/maxas
KeplerAs: https://github.com/PAA-NCIC/PPoPP2017_artifact/tree/master/KeplerAs
I'm writing a single header library that executes a cuda kernel. I was wondering if there is a way to get around the <<<>>> syntax, or get C source output from nvcc?
You can avoid the host language extensions by using the CUDA driver API instead. It is a little more verbose and you will require a little more boilerplate code to manage the context, but it is not too difficult.
Conventionally, you would compile to PTX or a binary payload to load at runtime, however NVIDIA now also ship an experimental JIT CUDA C compiler library, libNVVM, which you could try if you want JIT from source.
NVCC has its own IR representation called NVVM which is a subet of LLVM IR. I read that libnvvm can be used to perform optimization on NVVM IR but I am not able to find any tutorial or beginners guide to use libnvvm?
Can anybody share some material regarding this?
Basically, how do I write an optimization pass? or even how do i write simple pretty printer using libnvvm?
The NVVM IR specification is here
The libnvvm API documentation is here
The CUDA LLVM compiler SDK is available here, including sample apps, demonstrating how to use libnvvm.
The NVVM IR verifier sample should give you a good framework for a simple pretty printer.
This is a bit of silly question, but I'm wondering if CUDA uses an interpreter or a compiler?
I'm wondering because I'm not quite sure how CUDA manages to get source code to run on two cards with different compute capabilities.
From Wikipedia:
Programmers use 'C for CUDA' (C with Nvidia extensions and certain restrictions), compiled through a PathScale Open64 C compiler.
So, your answer is: it uses a compiler.
And to touch on the reason it can run on multiple cards (source):
CUDA C/C++ provides an abstraction, it's a means for you to express how you want your program to execute. The compiler generates PTX code which is also not hardware specific. At runtime the PTX is compiled for a specific target GPU - this is the responsibility of the driver which is updated every time a new GPU is released.
These official documents CUDA C Programming Guide and The CUDA Compiler Driver (NVCC) explain all the details about the compilation process.
From the second document:
nvcc mimics the behavior of the GNU compiler gcc: it accepts a range
of conventional compiler options, such as for defining macros and
include/library paths, and for steering the compilation process.
Not just limited to cuda , shaders in directx or opengl are also complied to some kind of byte code and converted to native code by the underlying driver.
I've never really been into GPUs, not being a gamer but im aware of their parallel ability and wondered how could i get started programming on one? I recall (somewhere) there is a CUDA C-style programming language. What IDE do I use and is it relatively simple to execute code?
There are quick-start guides for getting the dev drivers and libraries set up on different platforms (win/mac/lin) here, there is also a link to the Cuda C programming guide.
http://developer.nvidia.com/object/nsight.html
Although all the CUDA stuff we do (fluid sims / particle sims etc) are done on Linux, essentially with emacs and and gcc.
Some suggestions:
(1) Download the CUDA SDK from Nvidia (http://developer.download.nvidia.com/compute/cuda/sdk/website/samples.html). They have extensive set of application examples that have been previous developed, tested and commented. Some useful examples to startwith are matrixMul,
histogram, convolutionSeparable. For more complex well documented code see the examples "nbody".
(2) If you are very good in C++ programmming, then using C++ Thrust libraries for GPU is another best place to start. It has extensive STL like support for doing operations on GPU. And the overall programming effort is much less for standard algorithms.
(3) Eclipse with CUDA plugin is a good IDE to work initially.
On windows visual studio. On linux eclipse, code::blocks and others depending on which you feel more comfortable.
IDE though is the last thing. There are steps preceding this (installing appropriate display driver, toolkit, run sdk samples). The manuals/ links provided above are really helpful. Also there is nvidia forum for cuda development an many getting started guides