Code translation process - binary

I'm going to do a presentation about programming languages in our class, gonna talk about the basics. It's going to be a brief one, around 5-10 minutes. The audience has no knowledge in this subject.
One of the things I'm going to talk about is low-level and high-level languages, and machine code. To simplify and visualize the difference I created this image.
But this is just a guess. I'm not sure if this is correct. Probably not. Could you enlighten me on how this process works without going into too much detail?
I'm not sure if this is the right place to ask this question. If not, I'll move it to somewhere else. Guide me. Also, about the title and the tags, you can correct them.

What happens largely depends on your environment, so there is no one answer. A general high level view, considering you're starting with what appears to be the C language and assuming its a standard environment (not something such as a Java virtual machine) is that:
A compiler converts C to assembly
An assembler converts assembly to object code (what you show as "low-level language")
A linker gathers one or more file of object code and attempts to fill out its needs with the content of libraries it knows about. This output is still object code, but step 3's object code was for a specific file's instructions only. This object code is in a format appropriate for step 4.
A loader reads the program into memory, potentially satisfying dynamic links that are required to run the program. It takes operating system specific steps to create a process that will execute the program.

Related

#include v/s performance

(I defined this program in terms of a C++ program because I faced this thing while coding a C++ program, but the actual question is language-agnostic).
I had to copy chars from one char* buffer to another buffer. So, instead of doing a #include <cstring> for strcpy, I wrote a small snippet of code myself to do the same.
Here are the thoughts that I could think of then:
Standard library functions are generally the fastest implementations that one can find of a certain code.
But it would be unwise to include a big header file if you're going to use only a very minor part of it (that's what I think happens).
I want to know how right I was in doing so, and what can be defined as a bound of coding own snippets after which one should revert to using headers.
The first rule of thumb is "don't reinvent the wheel". And remember that your wheel will probably be worse :-) (there are very good programmers that write the wheel provided by your compiler).
But yes, if I had to include the whole boost library for a single function, I would try to directly copy it from the library :-)
I'll add that the question is marked as "language-agnostic", so we can't simply speak about the difference between C/C++ headers and C/C++ libraries. If we speak of a generic language, the inclusion of an external library COULD have side-effects, even BIG side-effects. For example it could slow down very much the startup of your program even if it isn't used (because it has static initializers that need to be called at startup, or it references flocks of other dll/dynamic libraries that need to be loaded). And it wouldn't be the first time there is an error in the startup of a program caused by the static startup of one of its dependancies :-)
So in the end "it depends". I would say that if only have to copy up to one file (let's say 250-500 lines) from a BSD source there isn't any big problem, for anything bigger linking to the library is probably necessary.
#include-ing a header file should have no effect on runtime performance. It may, of course, slow down your compiler.
A decent linker should only pull in the pieces that it actually needs.
When you are talking about performance what do you want to optimize ? Compile time, size of binary/object, speed of execution ?
Using a standard library has the big advantage to improve code Maintainability and Readability. If someone else has to review or modify your code, it is much better to use the "standard" way.
It is much easier to spot a bug when calling memcpy() or strncpy() than when calling MyMemCpy() or MyStringCpy()

runnable pseudocode?

I am attempting to determine prior art for the following idea:
1) user types in some code in a language called (insert_name_here);
2) user chooses a destination language from a list of well-known output candidates (javascript, ruby, perl, python);
3) the processor translates insert_name_here into runnable code in destination language;
4) the processor then runs the code using the relevant system call based on the chosen language
The reason this works is because there is a pre-established 1 to 1 mapping between all language constructs from insert_name_here to all supported destination languages.
(Disclaimer: This obviously does not produce "elegant" code that is well-tailored to the destination language. It simply does a rudimentary translation that is runnable. The purpose is to allow developers to get a quick-and-dirty implementation of algorithms in several different languages for those cases where they do not feel like re-inventing the wheel, but are required for whatever reason to work with a specific language on a specific project.)
Does this already exist?
The .NET CLR is designed such that C++.Net, C#.Net, and VB.Net all compile to the same machine language, and you can "decompile" that CLI back in to any one of those languages.
So yes, I would say it already exists though not exactly as you describe.
There are converters available for different languages. The problem you are going to have is dealing with libraries. While mapping between language statements might be easy, finding mappings between library functions will be very difficult.
I'm not really sure how useful that type of code generator would be. Why would you want to write something in one language and then immediately convert it to something else? I can see the rationale for 4th Gen languages that convert diagrams or models into code but I don't really see the point of your effort.
Yes, a program that transform a program from one representation to another does exist. It's called a "compiler".
And as to your question whether that is always possible: as long as your target language is at least as powerful as the source language, then it is possible. So, if your target language is Turing-complete, then it is always possible, because there can be no language that is more powerful than a Turing-complete language.
However, there does not need to be a dumb 1:1 mapping.
For example: the Microsoft Volta compiler which compiles CIL bytecode to JavaScript sourcecode has a problem: .NET has threads, JavaScript doesn't. But you can implement threads with continuations. Well, JavaScript doesn't have continuations either, but you can implement continuations with exceptions. So, Volta transforms the CIL to CPS and then implements CPS with exceptions. (Newer versions of JavaScript have semi-coroutines in the form of generators; those could also be used, but Volta is intended to work across a wide range of JavaScript versions, including obviously JScript in Internet Explorer.)
This seems a little bizarre. If you're using the term "prior art" in its most common form, you're discussing a potentially patentable idea. If that is the case, you have:
1/ Published the idea, starting the clock running on patent filing - I'm assuming, perhaps incorrectly, that you're based in the U.S. Other jurisdictions may have other rules.
2/ Told the entire planet your idea, which means it's pretty much useless to try and patent it, unless you act very fast.
If you're not thinking about patenting this and were just using the term "prior art" in a laypersons sense, I apologize. I work for a company that takes patents very seriously and it's drilled into us, in great detail, what we're allowed to do with information before filing.
Having said that, patentable ideas must be novel, useful and non-obvious. I would think that your idea would not pass on the third of these since you're describing a language translator which would have the prior art of the many pascal-to-c and fortran-to-c converters out there.
The one glimmer of hope would be the ability of your idea to generate one of multiple output languages (which p2c and f2c don't do) but I think even that would be covered by the likes of cross compilers (such as gcc) which turn source into one of many different object languages.
IBM has a product called Visual Age Generator in which you code in one (proprietary) language and it's converted into COBOL/C/Java/others to run on different target platforms from PCs to the big honkin' System z mainframes, so there's your first problem (thinking about patenting an idea that IBM, the biggest patenter in the world, is already using).
Tons of them. p2c, f2c, and the original implementation s of C++ and Objective C strike me immediately. Beyond that, it's kind of hard to distinguish what you're describing from any compiler, especially for us old guys whose compilers generated ASM code for an intermediate represetation anyway.

What's the difference between a "script" and an "application"?

I'm referring to distinctions such as in this answer:
...bash isn't for writing applications it's for, well, scripting. So sure, your application might have some housekeeping scripts but don't go writing critical-business-logic.sh because another language is probably better for stuff like that.
As programmer who's worked in many languages, this seems to be C, Java and other compiled language snobbery. I'm not looking for reenforcement of my opinion or hand-wavy answers. Rather, I genuinely want to know what technical differences are being referred to.
(And I use C in my day job, so I'm not just being defensive.)
Traditionally a program is compiled and a script is interpreted, but that is not really important anymore. You can generate a compiled version of most scripts if you really want to, and other 'compiled' languages like Java are in fact interpreted (at the byte code level.)
A more modern definition might be that a program is intended to be used by a customer (perhaps an internal one) and thus should include documentation and support, while a script is primarily intended for the use of the author.
The web is an interesting counter example. We all enjoy looking things up with the Google search engine. The bulk of the code that goes into creating the 'database' it references is used only by its authors and maintainers. Does that make it a script?
I would say that an application tends to be used interactively, where a script would run its course, suitable for batch work. I don't think it's a concrete distinction.
Usually, it is "script" versus "program".
I am with you that this distinction is mostly "compiled language snobbery", or to quote Larry Wall and take the other side of the fence, "a script is what the actors have, a programme is given to the audience".
This is an interesting topic, and I don't think there are very good guidelines for the differentiating a "script" and a "application."
Let's take a look at some Wikipedia articles to get a feel of the distinction.
Script (Wikipedia -> Scripting language):
A scripting language, script language or extension language, is a programming language that controls a software application. "Scripts" are often treated as distinct from "programs", which execute independently from any other application. At the same time they are distinct from the core code of the application, which is usually written in a different language, and by being accessible to the end user they enable the behavior of the application to be adapted to the user's needs.
Application (Wikipedia -> Application software -> Terminology)
In computer science, an application is a computer program designed to help people perform a certain type of work. An application thus differs from an operating system (which runs a computer), a utility (which performs maintenance or general-purpose chores), and a programming language (with which computer programs are created). Depending on the work for which it was designed, an application can manipulate text, numbers, graphics, or a combination of these elements.
Reading the above entries seems to suggest that the distinction is that a script is "hosted" by another piece of software, while an application is not. I suppose that can be argued, such as shell scripts controlling the behavior of the shell, and perl scripts controlling the behavior of the interpreter to perform desired operations. (I feel this may be a little bit of a stretch, so I may not completely agree with it.)
When it comes down to it, it is in my opinion that the colloquial distinction can be made in terms of the scale of the program. Scripts are generally smaller in scale when compared to applications.
Also, in terms of the purpose, a script generally performs tasks that needs taken care of, say for example, build scripts that produce multiple release versions for a certain piece of software. On the otherhand, applications are geared toward providing functionality that is more refined and geared toward an end user. For example, Notepad or Firefox.
John Ousterhout (the inventor of TCL) has a good article at http://www.tcl.tk/doc/scripting.html where he proposes a distinction between system programming languages (for implementing building blocks, emphasis on correctness, type safety) vs scripting languages (for combining building blocks, emphasis on responsiveness to changing environments and requirements, easy conversion in and out of textual representations). If you go with that categorisation system, then 99% of programmers are doing jobs that are more appropriate to scripting languages than to system programming languages.
A script tends to be a series of commands that starts, runs, and terminates. It often requires no/little human interaction. An application is a "program"... it often requires human interaction, it tends to be larger.
Script to me implies line-by-line interpretation of the code. You can open a script and view its programmer-readable contents. An application implies a stand-alone compiled executable.
It's often just a semantic argument, or even a way of denigrating certain programming languages. As far as I'm concerned, a "script" is a type of program, and the exact definition is somewhat vague and varies with context.
I might use the term "script" to mean a program that primarily executes linearly, rather than with lots of sequential logic or subroutines, much like a "script" in Hollywood is a linear sequence of instructions for an actor to execute. I might use it to mean a program that is written in a language embedded inside a larger program, for the purpose of driving that program. For example, automating tasks under the old Mac OS with AppleScript, or driving a program that exposes itself in some way with an embedded TCL interface.
But in all those cases, a script is a type of program.
The term "scripting language" has been used for dynamically interpreted (sometimes compiled) languages, usually these have a lot of common features such as very high level instructions, built in hashes and arbitrary-length lists and other high level data structures, etc. But those languages are capable of very large, complicated, modular, well-designed programs, so if you think of a "script" as something other than a program, that term might confuse you.
See also Is it a Perl program or a Perl script? in perlfaq1.
A script generally runs as part of a larger application inside a scripting engine
eg. JavaScript -> Browser
This is in contrast to both traditional static typed compiled languages and to dynamic languages, where the code is intended to form the main part of the application.
An application is a collection of scripts geared toward a common set of problems.
A script is a bit of code for performing one fairly specific task.
IMO, the difference has nothing whatsoever to do with the language that's used. It's possible to write a complex application with bash, and it's possible to write a simple script with C++.
Personally, I think the separation is a step back from the actual implementation.
In my estimation, an application is planned. It has multiple goals, it has multiple deliverables. There are tasks set aside at design time in advance of coding that the application must meet.
A script however, is just thrown together as suits, and little planning is involved.
Lack of proper planning does not however downgrade you to a script. Possibly, it makes your application a poorly organized collection of poorly planned scripts.
Further more, an application can contain scripts that aggregated comprise the whole. But a script can only reference an application.
Taking perl as an example, you can write perl scripts or perl applications.
A script would imply a single file or a single namespace. (e.g. updateFile.pl).
An application would be something made up of a collection of files or namespaces/classes (e.g. an OO-designed perl application with many .pm module files).
An application is big and will be used over and over by people and maybe sold to a customer.
A script starts out small, stays small if you're lucky, is rarely sold to a customer, and might either be run automatically or fall into disuse.
What about:
Script:
A script is text file (or collection of text files) of programming statements written in a language which allows individual statements written in it to be interpreted to machine executable code directly before each is executed and with the intention of this occurring.
Application:
An application is any computer program whose primary functionality involves providing service to a human Actor.
A script-based program written in a scripting language can therefore, theoretically, have its textual statements altered while the script is being executed (at great risk of , of course). The analogous situation for compiled programs is flipping bits in memory.
Any takers? :)
First of all, I would like to make it crystal clear that a script is a program. In other words, a script is a set of instructions.
Program:
A set of instructions which is going to be compiled is known as a Program.
Script:
A set of instructions which is going to be interpreted is known as a Script.
#Jeff's answer is good. My favorite explanation is
Many (most?) scripting languages are interpreted, and few compiled
languages are considered to be scripting languages, but the question
of compiled vs. interpreted is only loosely connected to the question
of "scripting" vs. "serious" languages.
A lot of the problem here is that "scripting" is a pretty vague
designation -- it means a language that's convenient for writing
scripts in, as opposed to writing "full-blown programs" (or
applications). But how does one distinguish a complex script from a
simple application? That's an essentially unanswerable question.
Generally, a script is a series of commands applied to some set of
data, possibly in a user-defined order... but then, one could stretch
that description to apply to Photoshop, which is clearly a major
application. Scripts are generally smaller than applications, do
some well-defined thing and are "simpler" to use, and typically can
be decomposed into a clear series of sub-operations, but all of these
things are subjective.
Referenced from here.
I think that there is no matter at all whether code is compiled or interpreted.
The true difference is in core logic of code:
If code makes new functionality that is not implemented in other programs in system - it's a program. It even can be manipulated by a script.
If code is MAINLY manipulates by actions of other programs and total result is MAINLY the results of work of manipulated programs - it's a script. Literally a script of actions for some programs.
Actually the difference between a script ( or a scripting language) and an application is that a script don't require it to be compiled into machine language.. You run the source of the script with an interpreter.. A application compiles the source into machine code so that you can run it as a stand alone application.
I would say a script is usually a set of commands or instructions written in plain text that are executed by a hosting application (browser, command interpreter or shell,...).
It does not mean it's not powerfull or not compiled in some way when it's actually executed. But a script cannot do anything by itself, it's just plain text.
By nature it can be a fragment only, needing to be combined to build a program or an application, but extended and fully developed scripts or set of scripts can be considered programs or applications when executed by the host, just like a bunch of source files can become an application once compiled.
A scripting language doesn't have a standard library or platform (or not much of one). It's small and light, designed to be embedded into a larger application. Bash and Javascript are great examples of scripting languages because they rely absolutely on other programs for their functionality.
Using this definition, a script is code designed to drive a larger application (suite). A Javascript might call on Firefox to open windows or manipulate the DOM. A Bash script executes existing programs or other scripts and connects them together with pipes.
You also ask why not scripting languages, so:
Are there even any unit-testing tools for scripting languages? That seems a very important tool for "real" applications that is completely missing. And there's rarely any real library bindings for scripting languages.
Most of the times, scripts could be replaced with a real, light language like Python or Ruby anyway.

What's the difference between data and code?

To take an example, consider a set of discounts available to a supermarket shopper.
We could define these rules as data in some standard fashion (lists of qualifying items, applicable dates, coupon codes) and write generic code to handle these. Or, we could write each as a chunk of code, which checks for the appropriate things given the customer's shopping list and returns any applicable discounts.
You could reasonably store the rules as objects, serialised into Blobs or stored in code files, so that each rule could choose its own division between data and code, to allow for future rules that wouldn't fit the type of generic processor considered above.
It's often easy to criticise code that mixes data in, via if statements that check for 6 different things that should be in a file or a database, but is there a rule that helps in the edge cases?
Or is this the point of Object Oriented design, to stop us worrying about the line between data and code?
To clarify, the underlying question is this: How would you code the above example? Is there a rule of thumb that made you decide what is data and what is code?
(Note: I know, code can be compiled, but in a world of dynamic languages and JIT compilation, even that is a blurry concept.)
Fundamentally, there is of course no difference between data and code, but for real software infrastructures, there can be a big difference. Apart from obvious things like, as you mentioned, compilation, the biggest issue is this:
Most sufficiently large projects are designed to produce "releases" that are one big bundle, produced in 3-month (or longer) cycles, tested extensively and cannot be changed afterwards except in tightly controlled ways. "Code" most definitely cannot be changed, so anything that does need to be changed has to be factored out and made "configuration data" so that changing it becomes palatable those whose job it is to ensure that a release works.
Of course, in most cases bad configuration data can break a release just as thoroughly as bad code, so the whole thing is largely an illusion - in reality it doesn't matter whether it's code or "configuration data" that changes, what matters is that the interface between the main system and the parts that change is narrow and well-defined enough to give you a good chance that the person who does the change understands all consequences of what he's doing.
This is already harder than most people think when it's really just a few strings and numbers that are configured (I've personally witnessed a production mainframe system crash because it had one boolean value set differently than another system it was talking to). When your "configuration data" contains complex logic, it's almost impossible to achieve. But the situation isn't going to be any better ust because you use a badly-designed ad hoc "rules configuration" language instead of "real" code.
This is a rather philosophical question (which I like) so I'll answer it in a philosophical way: with nothing much to back it up. ;)
Data is the part of a system that can change. Code defines behavior; the way in which data can change into new data.
To put it more accurately: Data can be described by two components: a description of what the datum is supposed to represent (for instance, a variable with a name and a type) and a value.
The value of the variable can change according to rules defined in code. The description does not change, of course, because if it does, we have a whole new piece of information.
The code itself does not change, unless requirements (what we expect of the system) change.
To a compiler (or a VM), code is actually the data on which it performs its operations. However, the to-be-compiled code does not specify behavior for the compiler, the compiler's own code does that.
It all depends on the requirement. If the data is like lookup data and changes frequently you dont really want to do it in code, but things like Day of the Week, should not chnage for the next 200 years or so, so code that.
You might consider changing your topic, as the first thing I thought of when I saw it, was the age old LISP discussion of code vs data. Lucky in Scheme code and data looks the same, but thats about it, you can never accidentally mix code with data as is very possible in LISP with unhygienic macros.
Data are information that are processed by instructions called Code. I'm not sure I feel there's a blurring in OOD, there are still properties (Data) and methods (Code). The OO theory encapsulates both into a gestalt entity called a Class but they are still discrete within the Class.
How flexible you want to make your code in a matter of choice. Including constant values (what you are doing by using if statements as described above) is inflexible without re-processing your source, whereas using dynamically sourced data is more flexible. Is either approach wrong? I would say it really depends on the circumstances. As Leppie said, there are certain 'data' points that are invariate, like the days of the week that can be hard coded but even there it may be advantageous to do it dynamically in certain circumstances.
In Lisp, your code is data, and your
data is code
In Prolog clauses are terms, and terms
are clauses.
The important note is that you want to separate out the part of your code that will execute the same every time, (i.e. applying a discount) from the part of your code which could change (i.e. the products to be discounted, or the % of the discount, etc.)
This is simply for safety. If a discount changes, you won't have to re-write your discount code, you'll only need to go into your discounts repository (DB, or app file, or xml file, or however you choose to implement it) and make a small change to a number.
Also, if the discount code is separated into an XML file, then you can give the entire application to a manager, and with sufficient instructions, they won't need to pester you whenever they want to change the discount rates.
When you mix in data and code, you are exponentially increasing the odds of breaking when anything changes. So, as leppie said, you need to extract the constantly changing parts, and put them in a separate place.
Huge difference. Data is a given to system while code is a part of system.
Wrong data is senseless: our code===handler is good and what you put that you take, it is not a trouble of system that you meant something else. But if code is bad - system is bad.
In example, let's consider some JSON, some bad code parser.js by me and let's say good V8. For my system bad parser.js is a code and my system works wrong. But for Google system my bad parser is data that no how says about quality of V8.
The question is very practical, no sophistic.
https://en.wikipedia.org/wiki/Systems_engineering tries to make good answer and money.
Data is information. It's not about where you decide to put it, be it a db, config file, config through code or inside the classes.
The same happens for behaviors / code. It's not about where you decide to put it or how you choose to represent it.
The line between data and code (program) is blurry. It's ultimately just a question of terminology - for example, you could say that data is everything that is not code. But, as you wrote, they can be happily mixed together (although usually it's better to keep them separate).
Code is any data which can be executed. Now since all data is used as input to some program at some point of time, it can be said that this data is executed by a program! Thus your program acts as a virtual machine for your data. Hence in theory there is no difference between data and code!
In the end what matters is software engineering/development considerations like performance, efficiency etc. For example data driven programs may not be as efficient as programs which have hard coded (and hence fragile) conditional statements. Hence I choose to define code as any data which can be efficiently executed and all else being plain data.
It's a tradeoff between flexibility and efficiency. Executable data (like XML rules) offers more flexibility (sometimes) while the same data/rules when coded as part of the application will run more efficiently but changing it frequently becomes cumbersome. In other words executable data is easy to deploy but is inefficient and vice-versa. So ultimately the decision rests with you - the software designer.
Please correct me if I wrong.
Relationship between code and data is as follows:
code after compiled to a program processes the data while execution
program can extract data, transform data, load data, generate data ...
Also
program can extract code, transform code, load code, generate code tooooooo...
Hence code without compiled or interperator is useless, data is always worth..., but code after compiled can do all the above activities....
For eg)
Sourcecontrolsystem process Sourcecodes
here source code itself is a code
Backupscripts process files
here files is a data and so on...
I would say that the distinction between data, code and configuration is something to be made within the context of a particular component. Sometimes it's obvious, sometimes less so.
For example, to a compiler, the source code it consumes and the object code it creates are both data - and should be separated from the compiler's own code.
In your case you seem to be describing the option of a particularly powerful configuration file, which can contain code. Much as, for example, the GIMP lets you 'configure' plugins using Scheme. As the developer of the component that reads this configuration, you would think of it as data. When working at a different level -- writing the configuration -- you would think of it as code.
This is a very powerful way of designing.
Applying this to the underlying question ("How would you code the above example?"), one option might be to adopt or design a high level Domain Specific Language (DSL) for specifying rules. At startup, or when first required, the server reads the rule and executes it.
Provide an admin interface allowing the administrator to
test a new rule file
replace the current configuration with that from a new rule file
... all of which would happen at runtime.
A DSL might be something as simple as a table parser or an XML parser, or it could be something as sophisticated as a scripting language. From C, it's easy to embed Python or Lua. From Java it's easy to embed Groovy or Clojure.
You could switch in compiled code at runtime, with clever linking or classloader tricks. This seems more difficult and less valuable than the embedded DSL option, in my opinion.
The best practical answer to this question I found is this:
Any class that needs to be serialized, now or in any foreseeable future, is data.
Everything else is code.
That's why, for example, Java's HashMap is data - although it has a lot of code, API methods and specific implementation (i.e., it might look as code at first glance).

Does generated code need to be human readable?

I'm working on a tool that will generate the source code for an interface and a couple classes implementing that interface. My output isn't particularly complicated, so it's not going to be hard to make the output conform to our normal code formatting standards.
But this got me thinking: how human-readable does auto-generated code need to be? When should extra effort be expended to make sure the generated code is easily read and understood by a human?
In my case, the classes I'm generating are essentially just containers for some data related to another part of the build with methods to get the data. No one should ever need to look at the code for the classes themselves, they just need to call the various getters the classes provide. So, it's probably not too important if the code is "clean", well formatted and easily read by a human.
However, what happens if you're generating code that has more than a small amount of simple logic in it?
I think it's just as important for generated code to be readable and follow normal coding styles. At some point, someone is either going to need to debug the code or otherwise see what is happening "behind the scenes".
Yes!, absolutely!; I can even throw in a story for you to explain why it is important that a human can easily read the auto generated code...
I once got the opportunity to work on a new project. Now, one of the first things you need to do when you start writing code is to create some sort of connection and data representation to and from the database. But instead of just writing this code by hand, we had someone who had developed his own code generator to automatically build base classes from a database schema. It was really neat, the tedious job of writing all this code was now out of our hands... The only problem was, the generated code was far from readable for a normal human.
Of course we didn't about that, because hey, it just saved us a lot of work.
But after a while things started to go wrong, data was incorrectly read from the user input (or so we thought), corruptions occurred inside the database while we where only reading. Strange.. because reading doesn't change any data (again, so we thought)...
Like any good developer we started to question our own code, but after days of searching.. even rewriting code, we could not find anything... and then it dawned on us, the auto generated code was broken!
So now an even bigger task awaited us, checking auto generated code that no sane person could understand in a reasonable amount of time... I'm talking about non indented, really bad style code with unpronounceable variable and function names... It turned out that it would even be faster to rewrite the code ourselves, instead of trying to figure out how the code actually worked.
Eventually the developer who wrote the code generator remade it later on, so it now produces readable code, in case something went wrong like before.
Here is a link I just found about the topic at hand; I was acctually looking for a link to one of the chapters from the "pragmatic programmer" book to point out why we looked in our code first.
I think that depends on how the generated code will be used. If the code is not meant to be read by humans, i.e. it's regenerated whenever something changes, I don't think it has to be readable. However, if you are using code generation as an intermediate step in "normal" programming, the generated could should have the same readability as the rest of your source code.
In fact, making the generated code "unreadable" can be an advantage, because it will discourage people from "hacking" generated code, and rather implement their changes in the code-generator instead—which is very useful whenever you need to regenerate the code for whatever reason and not lose the changes your colleague did because he thought the generated code was "finished".
Yes it does.
Firstly, you might need to debug it -- you will be making it easy on yourself.
Secondly it should adhere to any coding conventions you use in your shop because someday the code might need to be changed by hand and thus become human code. This scenario typically ensues when your code generation tool does not cover one specific thing you need and it is not deemed worthwhile modifying the tool just for that purpose.
Look up active code generation vs. passive code generation. With respect to passive code generation, absolutely yes, always. With regards to active code generation, when the code achieves the goal of being transparent, which is acting exactly like a documented API, then no.
I would say that it is imperative that the code is human readable, unless your code-gen tool has an excellent debugger you (or unfortunate co-worker) will probably by the one waist deep in the code trying to track that oh so elusive bug in the system. My own excursion into 'code from UML' left a bitter tast in my mouth as I could not get to grips with the supposedly 'fancy' debugging process.
The whole point of generated code is to do something "complex" that is easier defined in some higher level language. Due to it being generated, the actual maintenance of this generated code should be within the subroutine that generates the code, not the generated code.
Therefor, human readability should have a lower priority; things like runtime speed or functionality are far more important. This is particularly the case when you look at tools like bison and flex, which use the generated code to pre-generate speedy lookup tables to do pattern matching, which would simply be insane to manually maintain.
You will kill yourself if you have to debug your own generated code. Don't start thinking you won't. Keep in mind that when you trust your code to generate code then you've already introduced two errors into the system - You've inserted yourself twice.
There is absolutely NO reason NOT to make it human parseable, so why in the world would you want to do so?
-Adam
One more aspect of the problem which was not mentioned is that the generated code should also be "version control-friendly" (as far as it is feasible).
I found it useful many times to double-check diffs in generated code vs the source code.
That way you could even occasionally find bugs in tools which generate code.
It's quite possible that somebody in the future will want to go through and see what your code does. So making it somewhat understandable is a good thing.
You also might want to include at the top of each generated file a comment saying how and why this file was generated and what it's purpose is.
Generally, if you're generating code that needs to be human-modified later, it needs to be as human-readable as possible. However, even if it's code that will be generated and never touched again, it still needs to be readable enough that you (as the developer writing the code generator) can debug the generator - if your generator spits out bad code, it may be hard to track down if it's difficult to understand.
I would think it's worth it to take the extra time to make it human readable just to make it easier to debug.
Generated code should be readable, (format etc can usually be handled by a half decent IDE). At some stage in the codes lifetime it is going to be viewed by someone and they will want to make sense of it.
I think for data containers or objects with very straightforward workings, human readability is not very important.
However, as soon as a developer may have to read the code to understand how something happens, it needs to be readable. What if the logic has a bug? How will anybody ever discover it if no one is able to read and understand the code? I would go so far as generating comments for the more complicated logic sections, to express the intent, so it's easier to determine if there really is a bug.
Logic should always be readable. If someone else is going to read the code, try to put yourself in their place and see if you would fully understand the code in high (and low?) level without reading that particular piece of code.
I wouldn't spend too much time with code that never would be read, but if it's not too much time i would go through the generated code. If not, at least make comment to cover the loss of readability.
If this code is likely to be debugged, then you should seriously consider to generate it in a human readable format.
There are different types of generated code, but the most simple types would be:
Generated code that is not meant to be seen by the developer. e.g., xml-ish code that defines layouts (think .frm files, or the horrible files generated by SSIS)
Generated code that is meant to be a basis for a class that will be later customized by your developer, e.g., code is generated to reduce typing tedium
If you're making the latter, you definitely want your code to be human readable.
Classes and interfaces, no matter how "off limits" to developers you think they should be, would almost certainly fall under generated code type number 2. They will be hit by the debugger at one point of another -- applying code formatting is the least you can do the ease that debugging process when the compiler hits those generated classes
Like virtually everybody else here, I say make it readable. It costs nothing extra in your generation process and you (or your successor) will appreciate it when they go digging.
For a real world example - look at anything Visual Studio generates. Well formatted, with comments and everything.
Generated code is code, and there's no reason any code shouldn't be readable and nicely formatted. This is cheap especially in generated code: you don't need to apply formatting yourself, the generator does it for you everytime! :)
As a secondary option in case you're really that lazy, how about piping the code through a beautifier utility of your choice before writing it to disk to ensure at least some level of consistency. Nevertheless, almost all good programmers I know format their code rather pedantically and there's a good reason for it: there's no write-only code.
Absolutely yes for tons of good reasons already said above. And one more is that if your code need to be checked by an assesor (for safety and dependability issues), it is pretty better if the code is human redeable. If not, the assessor will refuse to assess it and your project will be refected by authorities. The only solution is then to assess... the code generator (that's usually much more difficult ;))
It depends on whether the code will only be read by a compiler or also by a human. In addition, it matters whether the code is supposed to be super-fast or whether readability is important. When in doubt, put in the extra effort to generate readable code.
I think the answer is: it depends.
*It depends upon whether you need to configure and store the generated code as an artefact. For example, people very rarely keep or configure the object code output from a c-compiler, because they know they can reproduce it from the source every time. I think there may be a similar analogy here.
*It depends upon whether you need to certify the code to some standard, e.g. Misra-C or DO178.
*It depends upon whether the source will be generated via your tool every time the code is compiled, or if it will you be stored for inclusion in a build at a later time.
Personally, if all you want to do is build the code, compile it into an executable and then throw the intermediate code away, then I can't see any point in making it too pretty.