#include v/s performance - language-agnostic

(I defined this program in terms of a C++ program because I faced this thing while coding a C++ program, but the actual question is language-agnostic).
I had to copy chars from one char* buffer to another buffer. So, instead of doing a #include <cstring> for strcpy, I wrote a small snippet of code myself to do the same.
Here are the thoughts that I could think of then:
Standard library functions are generally the fastest implementations that one can find of a certain code.
But it would be unwise to include a big header file if you're going to use only a very minor part of it (that's what I think happens).
I want to know how right I was in doing so, and what can be defined as a bound of coding own snippets after which one should revert to using headers.

The first rule of thumb is "don't reinvent the wheel". And remember that your wheel will probably be worse :-) (there are very good programmers that write the wheel provided by your compiler).
But yes, if I had to include the whole boost library for a single function, I would try to directly copy it from the library :-)
I'll add that the question is marked as "language-agnostic", so we can't simply speak about the difference between C/C++ headers and C/C++ libraries. If we speak of a generic language, the inclusion of an external library COULD have side-effects, even BIG side-effects. For example it could slow down very much the startup of your program even if it isn't used (because it has static initializers that need to be called at startup, or it references flocks of other dll/dynamic libraries that need to be loaded). And it wouldn't be the first time there is an error in the startup of a program caused by the static startup of one of its dependancies :-)
So in the end "it depends". I would say that if only have to copy up to one file (let's say 250-500 lines) from a BSD source there isn't any big problem, for anything bigger linking to the library is probably necessary.

#include-ing a header file should have no effect on runtime performance. It may, of course, slow down your compiler.
A decent linker should only pull in the pieces that it actually needs.

When you are talking about performance what do you want to optimize ? Compile time, size of binary/object, speed of execution ?
Using a standard library has the big advantage to improve code Maintainability and Readability. If someone else has to review or modify your code, it is much better to use the "standard" way.
It is much easier to spot a bug when calling memcpy() or strncpy() than when calling MyMemCpy() or MyStringCpy()

Related

Is there a list of headers that can be used in an string to compile with NVRTC? [duplicate]

Specifically, my issue is that I have CUDA code that needs <curand_kernel.h> to run. This isn't included by default in NVRTC. Presumably then when creating the program context (i.e. the call to nvrtcCreateProgram), I have to send in the name of the file (curand_kernel.h) and also the source code of curand_kernel.h? I feel like I shouldn't have to do that.
It's hard to tell; I haven't managed to find an example from NVIDIA of someone needing standard CUDA files like this as a source, so I really don't understand what the syntax is. Some issues: curand_kernel.h also has includes... Do I have to do the same for each of these? I am not even sure the NVRTC compiler will even run correctly on curand_kernel.h, because there are some language features it doesn't support, aren't there?
Next: if you've sent in the source code of a header file to nvrtcCreateProgram, do I still have to #include it in the code to be executed / will it cause an error if I do so?
A link to example code that does this or something like it would be appreciated much more than a straightforward answer; I really haven't managed to find any.
You have to send the "filename" and the source of each header separately.
When the preprocessor does its thing, it'll use any #include filenames as a key to find the source for the header, based on the collection that you provide.
I suspect that, in this case, the compiler (driver) doesn't have file system access, so you have to give it the source in much the same way that you would for shader includes in OpenGL.
So:
Include your header's name when calling nvrtcCreateProgram. The compiler will, internally, generate the equivalent of a std::map<string,string> containing the source of each header indexed by the given name.
In your kernel source, use #include "foo.cuh" as usual.
The compiler will use foo.cuh as an index or key into its internal map (created when you called nvrtcCreateProgram), and will retrieve the header source from that collection
Compilation proceeds as normal.
One of the reasons that nvrtc provides only a "subset" of features is that the compiler plays in a somewhat sandboxed environment, without necessarily having all of the supporting tools and utilities lying around that you have with offline compilation. So, you have to manually handle a lot of the stuff that the normal nvcc + (gcc | MSVC| clang) combination provides.
A possible, but non-ideal, solution would be to preprocess the file that you need in your IDE, save the result and then #include that. However, I bet there is a better way to do that. if you just want curand, consider diving into the library and extracting the part you need (blech) or using another GPU-friendly rand implementation. On older CUDA versions, I just generated a big array of random floats on the host, uploaded it to the GPU, and sampled it in the kernels.
This related link may be helpful.
You do not need to load curand_kernel.h yourself and add it to the include "aliases" mechanism.
Instead, you can simply add the CUDA include directory to your (set of) include paths, e.g. by adding --include-path=/usr/local/cuda/include to your NVRTC compiler options.
(I do this in my GPU-kernel-runner test harness, by default, to be on the safe side.)

Code translation process

I'm going to do a presentation about programming languages in our class, gonna talk about the basics. It's going to be a brief one, around 5-10 minutes. The audience has no knowledge in this subject.
One of the things I'm going to talk about is low-level and high-level languages, and machine code. To simplify and visualize the difference I created this image.
But this is just a guess. I'm not sure if this is correct. Probably not. Could you enlighten me on how this process works without going into too much detail?
I'm not sure if this is the right place to ask this question. If not, I'll move it to somewhere else. Guide me. Also, about the title and the tags, you can correct them.
What happens largely depends on your environment, so there is no one answer. A general high level view, considering you're starting with what appears to be the C language and assuming its a standard environment (not something such as a Java virtual machine) is that:
A compiler converts C to assembly
An assembler converts assembly to object code (what you show as "low-level language")
A linker gathers one or more file of object code and attempts to fill out its needs with the content of libraries it knows about. This output is still object code, but step 3's object code was for a specific file's instructions only. This object code is in a format appropriate for step 4.
A loader reads the program into memory, potentially satisfying dynamic links that are required to run the program. It takes operating system specific steps to create a process that will execute the program.

What features of interpreted languages can a compiled one not have?

Interpreted languages are usually more high-level and therefore have features as dynamic typing (including creating new variables dynamically without declaration), the infamous eval and many many other features that make a programmer's life easier - but why can't compiled languages have these as well?
I don't mean languages like Java that run on a VM, but those that compile to binary like C(++).
I'm not going to make a list now but if you are going to ask which features I mean, please look into what PHP, Python, Ruby etc. have to offer.
Which common features of interpreted languages can't/don't/do exist in compiled languages? Why?
Whether source code is compiled - to native binaries, some kind of intermediate language (Java Bytecode/IL) - or interpreted is absolutely no trait of the language. It's just a question of the implementation.
You can actually have both compilers and interpreters for the same language like
Haskell: GHC <-> GHCI
C: gcc <-> ch
VB6: VS IDE <-> VB6 compiler
Certain language features like eval or dynamic typing may suggest a distinction between so called "dynamic languages" and static ones, but how this is run can never be the primary question.
Initially, one of the largest benefits of interpreted languages was debugging. That way you can get incredibly accurate and detailed information when looking for the reason a program isn't working. However, most compilers have become advanced enough that that is not too big of a deal any more.
The other main benefit (in my opinion anyway), is that with interpreted languages, you don't have to wait for eternity for your project to compile to test it out.
You couldn't plausibly do eval, for example, for reasons I'd have thought were pretty obvious: exactly how would you implement it? Make the runtime contain a full copy of the compiler? Every time you wanted to evaluate a string (keeping in mind that each time it could be different!) you'd save the string to a file, run the compiler on it to make a DLL/shared-lib, then load that DLL/shared-lib and call your code? You can't see why this might be a wee bit impractical? ;)
You can find this kind of thing in dynamic languages all over the place that you can't do with static code short of basically running an interpreter, in effect, behind the scenes.
Continuing on from Dario - I think you are really asking why a compiled program can't evaluate statements at runtime (e.g. eval). Here's some reasons I can think of:
The full compiler would have to be distributed with the program (or be part of the program)
For an eval function to have access to type information and symbols (such as variable names and function names) in the environment it was used the original program would have to be compiled with those symbols accessible (compiled languages usually remove these symbols at compile time).
Edit: As noted neither of these reasons make it impossible for a language/compiler to be able to evaluate code at runtime, but they are definitely things that need to be taken into consideration when developing a compiler or when designing a language.
Maybe the question is not about interpreted/compiled languages (compile is ambiguous anyway) but about languages that do/don't carry their own compiler around with them? For instance we've said C++ could do eval with a handy compiler floating around in the app, and reflection presumably is similar in some ways.

Is it bad practice to use the system() function when library functions could be used instead? Why?

Say there is some functionality needed for an application under development which could be achieved by making a system call to either a command line program or utilizing a library. Assuming efficiency is not an issue, is it bad practice to simply make a system call to a program instead of utilizing a library? What are the disadvantages of doing this?
To make things more concrete, an example of this scenario would be an application which needs to download a file from a web server, either the cURL program or the libcURL library could be used for this.
Unless you are writing code for only one OS, there is no way of knowing if your system call will even work. What happens when there is a system update or an OS upgrade?
Never use a system call if there is a library to do the same function.
I prefer libraries because of the dependency issue, namely the executable might not be there when you call it, but the library will be (assuming external library references get taken care of when the process starts on your platform). In other words, using libraries would seem to guarantee a more stable, predictable outcome in more environments than system calls would.
There are several factors to take into account. One key one is the reliability of whether the external program will be present on all systems where your software is installed. If there is a possibility that it will be missing, then maybe it is better to do it inside your program.
Weighing against that, you might consider that the extra code loaded into your program is prohibitive - you don't need the code bloat for such a seldom-used part of your application.
The system() function is convenient, but dangerous, not least because it invokes a shell, usually. You may be better off calling the program more directly - on Unix, via the fork() and exec() system calls. [Note that a system call is very different from calling the system() function, incidentally!] OTOH, you may need to worry about ensuring all open file descriptors in your program are closed - especially if your program is some sort of daemon running on behalf of other users; that is less of a problem if your are not using special privileges, but it is still a good idea not to give the invoked program access to anything you did not intend. You may need to look at the fcntl() system call and the FD_CLOEXEC flag.
Generally, it is easier to keep control of things if you build the functionality into your program, but it is not a trivial decision.
Security is one concern. A malicious cURL could cause havoc in your program. It depends if this is a personal program where coding speed is your main focus, or a commercial application where things like security play a factor.
System calls are much harder to make safely.
All sorts of funny characters need to be correctly encoded to pass arguments in, and the types of encoding may vary by platform or even version of the command. So making a system call that contains any user data at all requires a lot of sanity-checking and it's easy to make a mistake.
Yeah, as mentioned above, keep in mind the difference between system calls (like fcntl() and open()) and system() calls. :)
In the early stages of prototyping a c program, I often make external calls to programs like grep and sed for manipulation of files using popen(). It's not safe, it's not secure, and it's certainly not portable. But it can allow you to get going quickly. That's valuable to me. It lets me focus on the really important core of the program, usually the reason I used c in the first place.
In high level languages, you'd better have a pretty good reason. :)
Instead of doing either, I'd Unix it up and build a script framework around your app, using the command line arguments and stdin.
Other's have mentioned good points (reliability, security, safety, portability, etc) - but I'll throw out another. Performance. Generally it is many times faster to call a library function or even spawn a new thread then it is to start an entire new process (and then you still have to correctly check/verify it's execution and parse it's output!)

Does generated code need to be human readable?

I'm working on a tool that will generate the source code for an interface and a couple classes implementing that interface. My output isn't particularly complicated, so it's not going to be hard to make the output conform to our normal code formatting standards.
But this got me thinking: how human-readable does auto-generated code need to be? When should extra effort be expended to make sure the generated code is easily read and understood by a human?
In my case, the classes I'm generating are essentially just containers for some data related to another part of the build with methods to get the data. No one should ever need to look at the code for the classes themselves, they just need to call the various getters the classes provide. So, it's probably not too important if the code is "clean", well formatted and easily read by a human.
However, what happens if you're generating code that has more than a small amount of simple logic in it?
I think it's just as important for generated code to be readable and follow normal coding styles. At some point, someone is either going to need to debug the code or otherwise see what is happening "behind the scenes".
Yes!, absolutely!; I can even throw in a story for you to explain why it is important that a human can easily read the auto generated code...
I once got the opportunity to work on a new project. Now, one of the first things you need to do when you start writing code is to create some sort of connection and data representation to and from the database. But instead of just writing this code by hand, we had someone who had developed his own code generator to automatically build base classes from a database schema. It was really neat, the tedious job of writing all this code was now out of our hands... The only problem was, the generated code was far from readable for a normal human.
Of course we didn't about that, because hey, it just saved us a lot of work.
But after a while things started to go wrong, data was incorrectly read from the user input (or so we thought), corruptions occurred inside the database while we where only reading. Strange.. because reading doesn't change any data (again, so we thought)...
Like any good developer we started to question our own code, but after days of searching.. even rewriting code, we could not find anything... and then it dawned on us, the auto generated code was broken!
So now an even bigger task awaited us, checking auto generated code that no sane person could understand in a reasonable amount of time... I'm talking about non indented, really bad style code with unpronounceable variable and function names... It turned out that it would even be faster to rewrite the code ourselves, instead of trying to figure out how the code actually worked.
Eventually the developer who wrote the code generator remade it later on, so it now produces readable code, in case something went wrong like before.
Here is a link I just found about the topic at hand; I was acctually looking for a link to one of the chapters from the "pragmatic programmer" book to point out why we looked in our code first.
I think that depends on how the generated code will be used. If the code is not meant to be read by humans, i.e. it's regenerated whenever something changes, I don't think it has to be readable. However, if you are using code generation as an intermediate step in "normal" programming, the generated could should have the same readability as the rest of your source code.
In fact, making the generated code "unreadable" can be an advantage, because it will discourage people from "hacking" generated code, and rather implement their changes in the code-generator instead—which is very useful whenever you need to regenerate the code for whatever reason and not lose the changes your colleague did because he thought the generated code was "finished".
Yes it does.
Firstly, you might need to debug it -- you will be making it easy on yourself.
Secondly it should adhere to any coding conventions you use in your shop because someday the code might need to be changed by hand and thus become human code. This scenario typically ensues when your code generation tool does not cover one specific thing you need and it is not deemed worthwhile modifying the tool just for that purpose.
Look up active code generation vs. passive code generation. With respect to passive code generation, absolutely yes, always. With regards to active code generation, when the code achieves the goal of being transparent, which is acting exactly like a documented API, then no.
I would say that it is imperative that the code is human readable, unless your code-gen tool has an excellent debugger you (or unfortunate co-worker) will probably by the one waist deep in the code trying to track that oh so elusive bug in the system. My own excursion into 'code from UML' left a bitter tast in my mouth as I could not get to grips with the supposedly 'fancy' debugging process.
The whole point of generated code is to do something "complex" that is easier defined in some higher level language. Due to it being generated, the actual maintenance of this generated code should be within the subroutine that generates the code, not the generated code.
Therefor, human readability should have a lower priority; things like runtime speed or functionality are far more important. This is particularly the case when you look at tools like bison and flex, which use the generated code to pre-generate speedy lookup tables to do pattern matching, which would simply be insane to manually maintain.
You will kill yourself if you have to debug your own generated code. Don't start thinking you won't. Keep in mind that when you trust your code to generate code then you've already introduced two errors into the system - You've inserted yourself twice.
There is absolutely NO reason NOT to make it human parseable, so why in the world would you want to do so?
-Adam
One more aspect of the problem which was not mentioned is that the generated code should also be "version control-friendly" (as far as it is feasible).
I found it useful many times to double-check diffs in generated code vs the source code.
That way you could even occasionally find bugs in tools which generate code.
It's quite possible that somebody in the future will want to go through and see what your code does. So making it somewhat understandable is a good thing.
You also might want to include at the top of each generated file a comment saying how and why this file was generated and what it's purpose is.
Generally, if you're generating code that needs to be human-modified later, it needs to be as human-readable as possible. However, even if it's code that will be generated and never touched again, it still needs to be readable enough that you (as the developer writing the code generator) can debug the generator - if your generator spits out bad code, it may be hard to track down if it's difficult to understand.
I would think it's worth it to take the extra time to make it human readable just to make it easier to debug.
Generated code should be readable, (format etc can usually be handled by a half decent IDE). At some stage in the codes lifetime it is going to be viewed by someone and they will want to make sense of it.
I think for data containers or objects with very straightforward workings, human readability is not very important.
However, as soon as a developer may have to read the code to understand how something happens, it needs to be readable. What if the logic has a bug? How will anybody ever discover it if no one is able to read and understand the code? I would go so far as generating comments for the more complicated logic sections, to express the intent, so it's easier to determine if there really is a bug.
Logic should always be readable. If someone else is going to read the code, try to put yourself in their place and see if you would fully understand the code in high (and low?) level without reading that particular piece of code.
I wouldn't spend too much time with code that never would be read, but if it's not too much time i would go through the generated code. If not, at least make comment to cover the loss of readability.
If this code is likely to be debugged, then you should seriously consider to generate it in a human readable format.
There are different types of generated code, but the most simple types would be:
Generated code that is not meant to be seen by the developer. e.g., xml-ish code that defines layouts (think .frm files, or the horrible files generated by SSIS)
Generated code that is meant to be a basis for a class that will be later customized by your developer, e.g., code is generated to reduce typing tedium
If you're making the latter, you definitely want your code to be human readable.
Classes and interfaces, no matter how "off limits" to developers you think they should be, would almost certainly fall under generated code type number 2. They will be hit by the debugger at one point of another -- applying code formatting is the least you can do the ease that debugging process when the compiler hits those generated classes
Like virtually everybody else here, I say make it readable. It costs nothing extra in your generation process and you (or your successor) will appreciate it when they go digging.
For a real world example - look at anything Visual Studio generates. Well formatted, with comments and everything.
Generated code is code, and there's no reason any code shouldn't be readable and nicely formatted. This is cheap especially in generated code: you don't need to apply formatting yourself, the generator does it for you everytime! :)
As a secondary option in case you're really that lazy, how about piping the code through a beautifier utility of your choice before writing it to disk to ensure at least some level of consistency. Nevertheless, almost all good programmers I know format their code rather pedantically and there's a good reason for it: there's no write-only code.
Absolutely yes for tons of good reasons already said above. And one more is that if your code need to be checked by an assesor (for safety and dependability issues), it is pretty better if the code is human redeable. If not, the assessor will refuse to assess it and your project will be refected by authorities. The only solution is then to assess... the code generator (that's usually much more difficult ;))
It depends on whether the code will only be read by a compiler or also by a human. In addition, it matters whether the code is supposed to be super-fast or whether readability is important. When in doubt, put in the extra effort to generate readable code.
I think the answer is: it depends.
*It depends upon whether you need to configure and store the generated code as an artefact. For example, people very rarely keep or configure the object code output from a c-compiler, because they know they can reproduce it from the source every time. I think there may be a similar analogy here.
*It depends upon whether you need to certify the code to some standard, e.g. Misra-C or DO178.
*It depends upon whether the source will be generated via your tool every time the code is compiled, or if it will you be stored for inclusion in a build at a later time.
Personally, if all you want to do is build the code, compile it into an executable and then throw the intermediate code away, then I can't see any point in making it too pretty.