Is there a list of headers that can be used in an string to compile with NVRTC? [duplicate] - cuda

Specifically, my issue is that I have CUDA code that needs <curand_kernel.h> to run. This isn't included by default in NVRTC. Presumably then when creating the program context (i.e. the call to nvrtcCreateProgram), I have to send in the name of the file (curand_kernel.h) and also the source code of curand_kernel.h? I feel like I shouldn't have to do that.
It's hard to tell; I haven't managed to find an example from NVIDIA of someone needing standard CUDA files like this as a source, so I really don't understand what the syntax is. Some issues: curand_kernel.h also has includes... Do I have to do the same for each of these? I am not even sure the NVRTC compiler will even run correctly on curand_kernel.h, because there are some language features it doesn't support, aren't there?
Next: if you've sent in the source code of a header file to nvrtcCreateProgram, do I still have to #include it in the code to be executed / will it cause an error if I do so?
A link to example code that does this or something like it would be appreciated much more than a straightforward answer; I really haven't managed to find any.

You have to send the "filename" and the source of each header separately.
When the preprocessor does its thing, it'll use any #include filenames as a key to find the source for the header, based on the collection that you provide.
I suspect that, in this case, the compiler (driver) doesn't have file system access, so you have to give it the source in much the same way that you would for shader includes in OpenGL.
So:
Include your header's name when calling nvrtcCreateProgram. The compiler will, internally, generate the equivalent of a std::map<string,string> containing the source of each header indexed by the given name.
In your kernel source, use #include "foo.cuh" as usual.
The compiler will use foo.cuh as an index or key into its internal map (created when you called nvrtcCreateProgram), and will retrieve the header source from that collection
Compilation proceeds as normal.
One of the reasons that nvrtc provides only a "subset" of features is that the compiler plays in a somewhat sandboxed environment, without necessarily having all of the supporting tools and utilities lying around that you have with offline compilation. So, you have to manually handle a lot of the stuff that the normal nvcc + (gcc | MSVC| clang) combination provides.
A possible, but non-ideal, solution would be to preprocess the file that you need in your IDE, save the result and then #include that. However, I bet there is a better way to do that. if you just want curand, consider diving into the library and extracting the part you need (blech) or using another GPU-friendly rand implementation. On older CUDA versions, I just generated a big array of random floats on the host, uploaded it to the GPU, and sampled it in the kernels.
This related link may be helpful.

You do not need to load curand_kernel.h yourself and add it to the include "aliases" mechanism.
Instead, you can simply add the CUDA include directory to your (set of) include paths, e.g. by adding --include-path=/usr/local/cuda/include to your NVRTC compiler options.
(I do this in my GPU-kernel-runner test harness, by default, to be on the safe side.)

Related

Minify ceylon-sdk and ceylon-language when compiling to javascript

For an in-browser application written in ceylon-js it would be desirable to reduce the size of the ceylon.language-1.2.0.js file to only that what is actually needed.
This question was answered already.
How to use ceylon js (also with google closure compiler)
But the given solution involves manually editing javascript code resulting from compilation. This is not desirable since a compiler should produce code that hasnĀ“t to be edited manually after compilation (abstraction).
And it is not clear to me if google closure compiler can cope with the ceylon flavour of it in a reliable way.
Is it instead a solution to copy ceylon.language source in ceylon into the project and import only those parts of ceylon.language into the project that are required by it? Then compile to javascript. And then leave away ceylon.language-1.2.0.js at all from the client / in-browser application.
Now my questions:
What parts are needed in the most simple browser application? I think of something like Array(String) and the like.
Has that solution a chance to work absolutely reliable?
Will there be a better solution coming from the authors of ceylon that make this attempt obsolete?
The compilation of the language module to JS is a tricky process, because of the native stuff involved and because there are a couple of declarations that have to be in a certain order for things to work.
Minification is still pending, we are going to do it but it's not the highest priority right now, and we have to determine the best way to solve this problem; one option that has been discussed is to have a version of the language module without any metamodel info, for example.

Code translation process

I'm going to do a presentation about programming languages in our class, gonna talk about the basics. It's going to be a brief one, around 5-10 minutes. The audience has no knowledge in this subject.
One of the things I'm going to talk about is low-level and high-level languages, and machine code. To simplify and visualize the difference I created this image.
But this is just a guess. I'm not sure if this is correct. Probably not. Could you enlighten me on how this process works without going into too much detail?
I'm not sure if this is the right place to ask this question. If not, I'll move it to somewhere else. Guide me. Also, about the title and the tags, you can correct them.
What happens largely depends on your environment, so there is no one answer. A general high level view, considering you're starting with what appears to be the C language and assuming its a standard environment (not something such as a Java virtual machine) is that:
A compiler converts C to assembly
An assembler converts assembly to object code (what you show as "low-level language")
A linker gathers one or more file of object code and attempts to fill out its needs with the content of libraries it knows about. This output is still object code, but step 3's object code was for a specific file's instructions only. This object code is in a format appropriate for step 4.
A loader reads the program into memory, potentially satisfying dynamic links that are required to run the program. It takes operating system specific steps to create a process that will execute the program.

Advantages of a VM

The majority of languages I have come across utilise a VM, or virtual machine. Languages such as Java (the JVM), Python, Ruby, PHP (the HHVM), etc.
Then there are languages such as C, C++, Haskell, etc. which compile directly to native.
My question is, what is the advantage of using a VM (outside of OS-independence)? Isn't using a VM just creating an extra interpretation step, by going [source code -> bytecode -> native] instead of just [source code -> native]?
Why use a VM when you can compile directly?
EDIT
My understanding is that Python, Ruby, et al. use something akin to a VM, if not exactly fitting under such a definition, where scripts are compiled to an intermediate representation (for Python, e.g. .pyc files).
EDIT 2
Yep. Looked it up. Python, Ruby and PHP all use intermediate representations, but are simply not stored in seperate files but executed by the VM directly. See question : Java "Virtual Machine" vs. Python "Interpreter" parlance?
" Even though Python uses a virtual machine under the covers, from a
user's perspective, one can ignore this detail most of the time. "
An advantage of VM is that, it is much easier to modify some parts of the code on runtime, which is called Reflection. It brings some elegance capabilities. For example, you can ask the user which function/class he want to call, and call the function/class by its STRING name. In Java programs (and maybe some other VM-based languages) users can add additional library to the program in runtime, and the library can be run immediately!
Another advantage is the ability to use advanced garbage collection, because the bytecode's structure is easier to analyze.
Let me note that a virtual machine does not always interpret the code, and therefore it is not always slower than machine code. For example, Java has a component named hotspot which searches for code blocks that are frequently called, and replaces their bytecode with native code (machine code). For instance, if a for loop is called for, say , 100+ times, hotspot converts it to machine-code, so that in the next calls it will run natively! This insures that just the bottlenecks of your code are running natively, while the rest part allows for the above advantages.
P.S. It is not impossible to compile the code directly to native code. Many VM-based languages have compiler versions (e.g. there is a compiler for PHP: http://www.phpcompiler.org). However, remember that you are disabling some of the above features by compiling the whole program to native code.
P.S. The [source-code -> byte-code] part is not a problem, it is compiled once and does not relate to execution time. I presumed you are asking why they do not execute the machine code while it is possible.
Python, Ruby, and PhP do not utilize VMs. They are, however, interpreted.
To answer your actual question: Java utilizes a VM in order to add some distance between the operating system/hardware and the code being executed. The goal there was security and hardiness (hardiness meaning there was a lower likelihood of code having an averse effect on other processes in the system.)
All the languages you listed are interpreted so I think what you may have actually meant to ask was the difference between interpreted and compiled languages. Interpreted languages are cross-platform. That is the biggest, and main, advantage. You need not compile them for each different set of hardware or operating system they operate on, and instead they will simply work everywhere.
The advantage of a compiled language, traditionally, is speed and efficiency.
Because a VM allows for the same set of instructions to be run on my different operating systems (provided they have the interperetor)
Let's take Java as an example. Java gets compiled into bytecode, which is basically a set of operations for a computer to follow. However, not all processors in computers understand the same set of instructions the same way - meaning, what one set of native instruction means on computer A could be something different on computer B.
As a result, a VM is run, with one specific to each computer. This way, the Java bytecode that is written is standardized, and only the interpreter has to work to convert it to machine language.
OS independence is a big part of it but you also get abstractions from other things like CPUs... the same Java code can execute on ARM, x86, whatever without modification so long as there is a JVM in place.

Building GPL C program with CUDA module

I am attempting to modify a GPL program written in C. My goal is to replace one method with a CUDA implementation, which means I need to compile with nvcc instead of gcc. I need help building the project - not implementing it (You don't need to know anything about CUDA C to help, I don't think).
This is my first time trying to change a C project of moderate complexity that involves a .configure and Makefile. Honestly, this is my first time doing anything in C in a long time, including anything involving gcc or g++, so I'm pretty lost.
I'm not super interested in learning configure and Makefiles - this is more of an experiment. I would like to see if the project implementation goes well before spending time creating a proper build script. (Not unwilling to learn as necessary, just trying to give an idea of the scope).
With that said, what are my options for building this project? I have a myriad of questions...
I tried adding "CC=nvcc" to the configure.in file after AC_PROG_CC. This appeared to work - output from running configure and make showed nvcc as the compiler. However make failed to compile the source file with the CUDA kernel, not recognizing the CUDA specific syntax. I don't know why, was hoping this would just work.
Is it possible to compile a source file with nvcc, and then include it at the linking step in the make process for the main program? If so, how? (This question might not make sense - I'm really rusty at this)
What's the correct way to do this?
Is there a quick and dirty way I could use for testing purposes?
Is there some secret tool everyone uses to setup and understand these configure and Makefiles? This is even worse than the Apache Ant scripts I'm used to (Yeah, I'm out of my realm)
You don't need to compile everything with nvcc. Your guess that you can just compile your CUDA code with NVCC and leave everything else (except linking) is correct. Here's the approach I would use to start.
Add a 1 new header (e.g. myCudaImplementation.h) and 1 new source file (with .cu extension, e.g. myCudaImplementation.cu). The source file contains your kernel implementation as well as a (host) C wrapper function that invokes the kernel with the appropriate execution configuration (aka <<<>>>) and arguments. The header file contains the prototype for the C wrapper function. Let's call that wrapper function runCudaImplementation()
I would also provide another host C function in the source file (with prototype in the header) that queries and configures the GPU devices present and returns true if it is successful, false if not. Let's call this function configureCudaDevice().
Now in your original C code, where you would normally call your CPU implementation you can do this.
// must include your new header
#include "myCudaImplementation.h"
// at app initialization
// store this variable somewhere you can access it later
bool deviceConfigured = configureCudaDevice;
...
// then later, at run time
if (deviceConfigured)
runCudaImplementation();
else
runCpuImplementation(); // run the original code
Now, since you put all your CUDA code in a new .cu file, you only have to compile that file with nvcc. Everything else stays the same, except that you have to link in the object file that nvcc outputs. e.g.
nvcc -c -o myCudaImplementation.o myCudaImplementation.cu <other necessary arguments>
Then add myCudaImplementation.o to your link line (something like:)
g++ -o myApp myCudaImplementation.o
Now, if you have a complex app to work with that uses configure and has a complex makefile already, it may be more involved than the above, but this is the general approach. Bottom line is you don't want to compile all of your source files with nvcc, just the .cu ones. Use your host compiler for everything else.
I'm not expert with configure so can't really help there. You may be able to run configure to generate a makefile, and then edit that makefile -- it won't be a general solution, but it will get you started.
Note that in some cases you may also need to separate compilation of your .cu files from linking them. In this case you need to use NVCC's separate compilation and linking functionality, for which this blog post might be helpful.

#include v/s performance

(I defined this program in terms of a C++ program because I faced this thing while coding a C++ program, but the actual question is language-agnostic).
I had to copy chars from one char* buffer to another buffer. So, instead of doing a #include <cstring> for strcpy, I wrote a small snippet of code myself to do the same.
Here are the thoughts that I could think of then:
Standard library functions are generally the fastest implementations that one can find of a certain code.
But it would be unwise to include a big header file if you're going to use only a very minor part of it (that's what I think happens).
I want to know how right I was in doing so, and what can be defined as a bound of coding own snippets after which one should revert to using headers.
The first rule of thumb is "don't reinvent the wheel". And remember that your wheel will probably be worse :-) (there are very good programmers that write the wheel provided by your compiler).
But yes, if I had to include the whole boost library for a single function, I would try to directly copy it from the library :-)
I'll add that the question is marked as "language-agnostic", so we can't simply speak about the difference between C/C++ headers and C/C++ libraries. If we speak of a generic language, the inclusion of an external library COULD have side-effects, even BIG side-effects. For example it could slow down very much the startup of your program even if it isn't used (because it has static initializers that need to be called at startup, or it references flocks of other dll/dynamic libraries that need to be loaded). And it wouldn't be the first time there is an error in the startup of a program caused by the static startup of one of its dependancies :-)
So in the end "it depends". I would say that if only have to copy up to one file (let's say 250-500 lines) from a BSD source there isn't any big problem, for anything bigger linking to the library is probably necessary.
#include-ing a header file should have no effect on runtime performance. It may, of course, slow down your compiler.
A decent linker should only pull in the pieces that it actually needs.
When you are talking about performance what do you want to optimize ? Compile time, size of binary/object, speed of execution ?
Using a standard library has the big advantage to improve code Maintainability and Readability. If someone else has to review or modify your code, it is much better to use the "standard" way.
It is much easier to spot a bug when calling memcpy() or strncpy() than when calling MyMemCpy() or MyStringCpy()