Using libnvvm for code optimization - cuda

NVCC has its own IR representation called NVVM which is a subet of LLVM IR. I read that libnvvm can be used to perform optimization on NVVM IR but I am not able to find any tutorial or beginners guide to use libnvvm?
Can anybody share some material regarding this?
Basically, how do I write an optimization pass? or even how do i write simple pretty printer using libnvvm?

The NVVM IR specification is here
The libnvvm API documentation is here
The CUDA LLVM compiler SDK is available here, including sample apps, demonstrating how to use libnvvm.
The NVVM IR verifier sample should give you a good framework for a simple pretty printer.

Related

How can I get NVVM IR (LLVM IR) from .cu - file and how to compile NVVM IR to binary?

I have a CUDA C/C++ programm for CUDA 7.5. And as known: libNVVM Library - an optimizing compiler library that generates PTX from NVVM IR.
I can get PTX by using: nvcc -ptx <file>.cu -o <file>.ptx
But how can I get NVVM IR (LLVM IR) from <file>.cu?
And how can I compile NVVM IR (LLVM IR) or Optimized IR for the target architecture?
Do I need for this third-party libraries or programs such as: libcuda.lang, ...?
http://on-demand.gputechconf.com/gtc/2013/presentations/S3185-Building-GPU-Compilers-libNVVM.pdf
http://on-demand.gputechconf.com/gtc/2012/presentations/S0235-Compiling-CUDA-and-Other-Languages-for-GPUs.pdf
Read more at: http://docs.nvidia.com/cuda/nvvm-ir-spec/index.html#ixzz3tod7pdD7
The NVVM compiler (which is based on LLVM) generates PTX code from
NVVM IR.
NVVM IR and NVVM compilers are mostly agnostic about the source
language being used. The PTX codegen part of a NVVM compiler needs to
know the source language because of the difference in DCI
(driver/compiler interface).
Technically speaking, NVVM IR is LLVM IR with a set of rules,
restrictions, and conventions, plus a set of supported intrinsic
functions. A program specified in NVVM IR is always a legal LLVM
program. A legal LLVM program may not be a legal NVVM program.
The very short answer is that you cannot do this. NVIDIA's parser is proprietary and closed source, and they have not exposed the IR code generator in a way which can be used as you are asking about.
That said, you are not the first person to wonder about this, and you might be able to find some useful, but completely unofficial and unsupported information here.

Check if SIMD machine is generated for LLVM IR

I have a C++ program that uses the LLVM libraries to generate an LLVM IR module and it compiles and executes it.
The code uses vector types and I want to check if it translates to SIMD instructions correctly on my architecture.
How do I find this out? Is there a way to see the assembly code that is generated out of this IR?
You're probably looking for some combination of -emit-llvm which outputs IR instead of native assembly, and -S which outputs assembly instead of object files.

cuda to llvm bitcode conversion

I want to convert CUDA code to llvm bitcode so I can instrument it. I have tried gpuocelot, which compile ptx into CPU executable code. Nevertheless, I couldn't get llvm bitcode from it so I can't instrument it. There have been activities trying to get CUDA supported in llvm. Can anyone provide a robust solution to convert CUDA to workable llvm bitcode? Thanks.
NVIDIA's nvcc is actually using LLVM IR as one of its steps. They might have changed it a little bit - I haven't seen the details. They have explained it under:
https://developer.nvidia.com/cuda-llvm-compiler
You should be able to use Clang to compile CUDA (mixed-mode) to LLVM IR now. Check this page out. Note that this support is still experimental. Feel free to report bugs to the LLVM community.

start cuda (use of libraries or API s)

I want to start CUDA in C++ and I familiar with C++ , Qt and C#.
But i want to know it's better to use from CUDA libraries -at high level- or CUDA API s -at the lower level- ?
Is it more better that I'm starting from API and dont use of CUDA driver ?
(I start on "cuda by example" for its concepts in parallel)
Since you are familiar with C/C++, you'd better use the higher-level API, CUDA C or C for CUDA, which is more convenient to easy to write, because it consists of a minimal set of extensions to the C language and a runtime library.
The lower-level API, which is the CUDA driver API that provides an additional level of control by exposing lower-level concepts, requires more code, is harder to program and debug, but offers a better level of control and is language-independent since it handles binary or assembly code.
See Chapter 3 of CUDA programming guide for more details.

Does CUDA use an interpreter or a compiler?

This is a bit of silly question, but I'm wondering if CUDA uses an interpreter or a compiler?
I'm wondering because I'm not quite sure how CUDA manages to get source code to run on two cards with different compute capabilities.
From Wikipedia:
Programmers use 'C for CUDA' (C with Nvidia extensions and certain restrictions), compiled through a PathScale Open64 C compiler.
So, your answer is: it uses a compiler.
And to touch on the reason it can run on multiple cards (source):
CUDA C/C++ provides an abstraction, it's a means for you to express how you want your program to execute. The compiler generates PTX code which is also not hardware specific. At runtime the PTX is compiled for a specific target GPU - this is the responsibility of the driver which is updated every time a new GPU is released.
These official documents CUDA C Programming Guide and The CUDA Compiler Driver (NVCC) explain all the details about the compilation process.
From the second document:
nvcc mimics the behavior of the GNU compiler gcc: it accepts a range
of conventional compiler options, such as for defining macros and
include/library paths, and for steering the compilation process.
Not just limited to cuda , shaders in directx or opengl are also complied to some kind of byte code and converted to native code by the underlying driver.