Related
What is the key difference between natural languages (such as English and French) and programming languages like C++ and Perl?
I am familiar with the ambiguity problem, but can't it be solved using an interactive compiler or using a subset of the natural language using a strict grammar but all the time still retaining the essence of the language?
Another issue is context. But lawyers have ways to solve this issue. (This question is not about reducing the programming complexity, it's simply about concise reasons and roadblock in using natural languages for instructing computer.)
Is there any other significant problem besides these two? Or do these two have greater consequences than I mentioned above? Is the interactive solution and lawyers language technically not feasible for programming?
This is an extremely interesting question and in short, yes, there are some very good reasons why we don't use English to write programs.
It's been said before that the greatest gift that computer science has given us is not the ability to talk to computers but now that formal languages exist for describing algorithms we now have even better tools for communicating these ideas to other people. Even if a computer is not involved. Indeed the best software engineers see their jobs primarily as writing software that is readable to other people so as to make maintenance and addition of new features as easy as possible. This is not possible in a language as big and as free form as any natural, spoken language.
Ambiguity
One reason is that of ambiguity. Have you ever looked a menu in a restaurant and seen that with your burger you can get "Coleslaw and fries or salad"? What does this mean? Can I get both coleslaw and fries or the other option is a salad alone? Or do I always get coleslaw and I have to chose between the fries or a salad? English is full of these things.
I used to teach a class on this and an example I liked to use to explain ambiguity was as follows. I asked the students to write a one paragraph story ending with the sentence "Tom asked Chris if he could help him". About half the time the stories written indicated that the student interpreted the sentence as Tom asking for assistance from Chris. The other half of the time people thought Tom was offering to lend Chris a hand.
If you think about it, there are a lot of people who do write programs in English. They're called product managers and the compiler they use is software engineers. The problem here is that a software engineer has to inject a lot of his own understanding of the problem to understand what the description really means. And trust me, there is a lot of back and forth. Even on very simple business requirements I must clarify ambiguities.
Context
I would not agree that lawyers have ways to solve the context problem. We continually have ongoing arguments in courts among, in some cases, some of the most educated people in the country, about the meanings of various laws. Sometimes this involves arguing about the context in which the laws were written long ago. Sometimes in involves applying it to a new, previously non-existent context like the Internet. The fact that we have thousands of lawyers working on disambiguating these issues is proof that it cannot be handled by a simple computer program like a compiler. It's just too hard of a problem.
Conciseness
Another issue is just the ability to be concise. Mathematics long ago invented notations for many different concepts in the maths because it's just easier to read if there's a special syntax that is concise and has a well defined meaning. A mathematician knows what it means when I say "f(x) = 3x+1". It means the same thing as "There is a function called f and it has one argument. The value of applying f to a number is the number that is one more than three times the number given." But the former is a lot easier to read once you've learned the syntax. The same is true for programming languages. Programming languages are specialized to describe computations.
Implementation
Creators of programming languages deliberately create very small languages. These are, in fact, subsets of English also with some extra syntax. The idea of understanding all of English in all of its free form ways and, worse yet, increasing vocabulary is a job for Natural Language Processing (NPL). A very hard job. If you want to be able to assign unambiguous meaning to a program and have the program's behavior never change, you need a well defined syntax and semantics.
The take-away point here is that English is a very big, very flexible language with no formal specification. Programming languages need to have a well-defined syntax and semantics in order for algorithms to have unambiguous and unchanging meaning. Someone could, in fact, write a formal syntax for a subset of English and give it unambiguous meaning. But this would be a huge, huge job.
Its been done
Check out BabelBuster. The idea here was to take C and convert it to and from a very small subset of very rigorous English such that one could write a program in C and then convert it to English. During the DeCSS DVD decryption arguments, the MPAA was trying to get programs that could decrypt their DVDs declared illegal. BabelBuster fought back with a very interesting idea. Create a way to convert English, which is protected under the freedom of speech into working code in C and thus make the point that C code is also just a language which should be protected as such. Therefor one should be able to publish code that cracks DeCSS. It's an interesting piece of work relevant to your question regardless of which side you're on.
The problem with BabelBuster is that you need to write your program in a very, very limited subset of English. But it is possible to do this.
Conclusion
English, like all natural languages, allows us to describe a computation or algorithm but the language is verbose, offers many ways to say the same thing, is dependent on the context of the speaker, and not formally specified. If your goal is to describe computations, you should take English, chose a minimal workable subset in which you can say everything you need. Formally specify what each word in this subset will mean. Then create a few special notations to make it concise to say the things you say just like mathematics did. If you do this, you do this you'll end up with a typical programming language, or something like it.
There are three principal reasons.
First, as Gabe says - people have figured out through trial and error that programming in things that are close to English sentences only forces programmers to type more useless cruft. (And yes, COBOL was explicitly designed to read more "naturally".)
To a programmer,
windows++
is more readable than
You should now increment the number of windows by one.
For example, Tetris is a rather easy game to code. I would be terribly surprised if you managed to make an English explanation that is detailed enough for a computer (remember, computers are dumb, so you have to spell it all out) in less pages than a short novel.
The second reason is that the range of things a computer knows how to do is rather small, so the number of language constructs that are needed for that is also limited. In contrast, natural languages need to be able to express the entirety of human experience, which does require many language constructs to pull off. For example, "According to his wife, John would have caught the fish yesterday if it hadn't rained" is not expressible in C - and does not need to be.
And third is, indeed, ambiguity, as you yourself note. There are a lot of places where a software error is simply not permissible. People do enough bugs in unambiguous languages; allowing ambiguity would be a disaster waiting to happen. And on the same subject, we are still unable to parse human language sufficiently well - state of the art parsers still have unacceptably high error rates.
It is possible to automatically translate structured English into code, as long as a restricted subset of the English language is used.
As a proof-of-concept, I have developed an programming language called EngScript, which translates English sentences sentences into Python source code.
Arithmetic operations can be written in plain English:
#print{3 to the power of 2}
#print{3 raised to the power of 2}
#Both of these statements print "9".
print{3 plus (the sum of 1 and 2)}
#This prints "5".
Variables can be initialized in plain English, too:
let x be (x plus 1)
if (x is not equal to 7) :
print x
I was wondering if there are some nice overviews of the majority language features existing in the current programming languages?
I am asking this because when looking at the programming features of some programming languages, such as D Language Feature Comparison Table, I feel like to understand how these features are related to and different from each other, in somehow organized and systematic way instead of a list, and also possibly with examples from specific languages but independent of a specific language.
Thanks for your reply and any nice reference (book or link) are also appreciated.
Added: I am looking for some short and concise overview, maybe just a reply is enough, not necessarily a monograph (but don't mind if you recommend it here). Details can be pursued if I feel necessary.
Does Comparison of programming languages or Comparison of programming languages (syntax) help? Both are from Wikipedia.
I greatly enjoyed Douglas Crockford's recent lecture series, particularly the talk which covered the history of programming languages. I'd like to learn about this subject in more detail.
Consider this question language agnostic. I'm not interested in books that teach programming. I'm interested in books which discuss decisions made during the design of one or more languages.
Following three are IMO the must-read books for any programming langauges junky :)
Project Oberon by Niklaus Wirth
Language Implementation Patterns by Terence Parr
Programming Language Pragmatics by Michael Scott
Every 15 years, the ACM puts on a History of Programming Languages conference (affectionately known as HoPL). The proceedings are of exceptionally high quality, and are available, unfortunately only behind the ACM paywall. (However, if you access them from a university, college or school IP address, you should be able to access them.)
For HoPL-III (2007), Guido van Rossum wanted to submit a paper about Python, but he wasn't able to meet the review requirements in time, so he published it in form of a blog instead.
Several presenters also published their papers for free, in addition to the official conference proceedings. Also, several presenters gave the same talk again, at a different venue. For example, Guy L. Steele, Jr. and Richard P. "Dick" Gabriel repeated their "50 in 50" talk (which, as you can imagine if you've ever seen a talk by Guy Steele or Dick Gabriel, is not really a talk, more like multimedia performance art crossed with poetry slam meets Broadway), which presents 50 programming languages in 50 words each.
As #Missing Faktor mentioned above, not only Project Oberon, but all of Niklaus Wirth's languages are tremendously well documented: Algol-60, Algol-X, Algol-W, Pascal, Modula-2, and Oberon.
Structure and Interpretation of Computer Programs. I have a print copy, but it's now available online for free:
http://mitpress.mit.edu/sicp/full-text/book/book-Z-H-4.html#%_toc_start
The Design and Evolution of C++
http://www2.research.att.com/~bs/dne.html
Programming Language Essentials
Rationale for the Design of the Ada Programming Language:
http://www.amazon.com/Rationale-Design-Programming-Language-Companion/dp/0521392675
Although the book discusses the original version of the language, it still makes interesting reading. For each design decision, rationale and discussion is included, both from the point view the programmer and compiler implementer.
"Architecture of Concurrent Programs", by the late Per Brinch Hansen, includes a good overview of the design and rationale for his Concurrent Pascal language, which added monitors (and other things) to his Sequential Pascal, a proper subset of Pascal.
The big thing missing from Sequential Pascal is pointers. However, given the restrictions intended to be placed on Sequential Pascal programs, everything you can do with a pointer you can also do with an array index, and in a more secure way, "secure" in the sense that it is impossible (and checked by the compiler!) to do illegal things.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
After competing in and following this year's Google Code Jam competition, I couldn't help but notice the incredible number of [successful] contestants that used C/C++ and Java. The distribution of languages used throughout the competition can be seen here.
After programming in C/C++ for several years, I recently fell in love with Python for its readable/straightforward nature. More recently, I learned functional languages like OCaml, Scheme, and even logic languages like Prolog. These languages certainly have their merits and, in my opinion, can be applied more easily than C++ and Java for certain situations. For example, Scheme's use of call/cc simplifies backtracking (a tool required to answer several problems) and Prolog's logic specification, although inefficient due to its brute-force nature, can drastically simplify (and even automatically solve) certain problems that are difficult to wrap one's brain around.
It is clear that a competition contestant should use the tools that are best suited for the challenge. Even x86 assembly is Turing complete - that doesn't justify solving problems with it. In this case, why are the contestants that use less common languages like Scheme/Lisp, Prolog, and even Python significantly less successful than contestants that use C/C++ and Java? Worded differently, why don't successful contestants use languages that, although may be less mainstream, are arguably better tools for the job?
There are several motivations for my question. Most importantly, I would like to become a better programmer - both in the practical aspect and the competition aspect. After being introduced to such beautiful paradigms like functional and logic programming, it is discouraging to see so many people discard them in favor of C/C++ and Java. It even makes me question my admiration for said paradigms, worrying that I cannot be successful as a Lisp/Scheme/Prolog programmer in a programming competition.
Great question! As someone who has dabbled in programming contests a bit myself, I may have something to say.
[Let's get the standard disclaimer out of the way: contest programming is only loosely related to "programming in the real world", and while it tests algorithmic and problem-solving skills and the ability to come up with fast bug-free working code under time pressure, it does not necessarily correlate with being able to build large software projects, write maintainable code, etc (beyond the fact that well-structured programs are easier to debug).]
Now for some answers:
C++/Java are more common than other languages in the real world as well, so you'd expect to see a higher proportion anywhere. (But it's even higher in the contest population.)
Many of these participants are students, or got into contests as students, and C++/Java are more common "first languages" that students learn. (Undergrad students these days may start with Scheme, Haskell, Python, etc., but high-schoolers (often self-taught) less often.) In fact, many of the Eastern European participants still use Pascal, and are more amazing with it than the rest of us will ever be with any language.
The school- and college-level contests usually use these languages. The International Olympiad in Informatics (IOI) allows only C, C++ and Pascal (or maybe it allows Java now; I haven't kept up), and the ACM Intercollegiate Programming Contest (ACM ICPC) allows only C, C++ and Java. TopCoder allows C++, Java, C# and VB (really :p); and recently, Python. So you could say the "contest ecosystem" has more C++/Java programmers in it. Google Code Jam and IPSC are among the few contests that allow code in any language, actually.
Now the question is, in GCJ where the contestants are free to choose a language, why wouldn't they choose Python or Scheme? The most relevant factor is that these languages are slow. Sure, for most real-world programming they are easily fast enough, but for the tight loops that are often involved in getting a program to run under the n-second limit for all test cases, these languages don't cut it for any of the algorithmically more involved problems. (A problem designed to accept O(n log n) solutions but not Θ(n2) solutions for C/C++ frequently rules out even optimal O(n log n) solutions in slower languages. Even Java used to be given a handicap at USACO; I'm not sure this is still the case.)
Another factor is the libraries: C++ and Java have better libraries for frequently useful algorithms and data structures (e.g. red-black trees, C++'s next_permutation), while Python's libraries (good enough for the real world) are less useful here, and Prolog and Scheme... I don't know about their libraries. This is a relatively minor factor, because these programmers can write their own code when necessary. :-)
General-purpose multi-paradigm languages are more useful for just getting things done within the time constraints of the contest, than languages that force a philosophy or way of doing things on you. This is why Prolog will always remain unpopular, for instance. (General philosophy: some languages are "enabling" languages that let you do anything including shooting yourself in the foot, some are "directing" that force you to do things the right way.) This is also why C++ is three times more popular than Java in the general contest participants, and much more popular among the top contestants. Since code doesn't have to be read by anyone else, it's ok and even useful to have loop macros like FOR(i,n) (less code to type, and more importantly less chance of making a bug when in a hurry). Nothing against Java, there are a few top programmers who use Java too. :-)
Finally, although many of these top programmers may have C++/Java/Pascal as their "first language", they are not good because of their language, so you don't have to despair about that. Many of these same programmers have won contests like the ICFP contest even with intentionally using crazy languages like shell scripts, m4 (used in autoconf), and assembly (the team named "You Can't Spell Awesome Without ASM").
I liked Jerry Coffin's idea of plotting contestants of the Google AI contest, so I took all of the results and plotted them (calculated mean, standard deviation, and then graphed the normal distribution curves in Excel).
With Lua and JS, got this:
Without (there were few contestants, so maybe the results are skewed):
It looks like Java participants did markedly worse than the rest, while Go, Common Lisp, and C are on the better end.
Why we all speak English and not Esperanto? Well, it just happened so. Even though English is inconsistent and bloated and Esperanto is intentionally designed as 'better tool'.
Thus, one reason is a tradition. In most schools programming is still taught in C/C++, Java, Pascal or even Basic. And participate in those contests mostly students, which choose language they know better.
Also, you can notice that most algorithmic books feature psedudocode in style of Pascal or Ada, and very very rarely - Lisp. I don't know why, perhaps also a tradition. Or perhaps it's just not so good for the algorithms.
Another reason would be speed. Although it's not a problem for Google Code Jam, in almost all contests 2x speed gap is a difference between 'Accepted' and 'Time Limit' verdicts.
In other words, if optimal algorithm in C++ runs 10 times faster than in Ruby, it may mean that sub-optimal algorithm in C++ will still be faster than a good one in Ruby. And contest authors usually don't want to allow O(n^2) submissions, if O(n*logn) can be achieved.
First, I'd question your premise [edit: or what I take to be a premise -- that contestants using C++ and Java fare about equally well]. For example, here's what languages were used for the entries that came in the first 100 places and the last 100 places in Google's recent AI contest:
Contestants using C++ and Java did not seem to be anywhere close to equally successful in that contest. Contestants using Python didn't seem to fare particularly well either, though there were considerably fewer of them, weakening any conclusion in that regard.
Second, of course, an awful lot of the explanation (as others have pointed out) is undoubtedly just the number of people who are familiar with each language. There are probably more people taking a course in Java right now than the total number of people who've ever written any Lisp, Scheme or Prolog.
Edit: I think a third possibility is simply versatility. To pick an extreme example, Prolog is very well suited to a few problems, but equally poorly suited to many others. Few people can (or at least do) learn more than one or two languages well enough to use them in a contest, so most people who are interested in such things are likely to choose languages that can work reasonably well for almost anything, rather than attempting to learn a specialized language for every problem that might be chosen.
In nearly all Google Code Jam rounds, more of the higher-performing contestants code in C++.
Below are the language stats from Google Code Jam 2012 Round 1A, 1B, and 1C (listed top to bottom).
The number of contestants in each round are 3,686, 3,281, and 3,189 respectively.
fun question, probably should be community wiki.
Look at number of finalists by countries: http://www.go-hero.net/jam/10/regions. notice number of people from East Europe and Russia. those places have very strong C++ communities, as well as Java, for number of reasons.
look at number languages in qualifiers: http://www.go-hero.net/jam/10/languages/0 and finals: http://www.go-hero.net/jam/10/languages/6. C++ starts out less than half and has 75 percent in finals. either good programmers prefer C++ or C++ makes the programmers. Probably by the time you master C++, other things become trivial.
You are free to draw your own conclusions though.
First of all, as you have pointed C++ and Java are mainstream languages. These automatically means that people who start doing programming competitions will be introduced to them first - by the way who learns Lisp as a first language:) I also participate regularly in such competitions - I use C++ to compete, although my favorite language is Java. It is just I want to practice another language apart from Java - also C++ is a little bit less verbose and runs faster which is important for programming competitions.
Now to my point - people become experts first in mainstream languages. To participate in programming competitions you must have quite a good grasp of the language you are using. You don't have time to search on the internet for stupid things - like forgot a construct. It is just that speed is an important factor there. To use Lisp in a competition, you must be fond of it. I don't think there are such many people out there. Correct me if I am wrong. And honestly the pros you have mentioned like simplifies backtracking: In whatever language backtracking is easy - declare a method and just call it again for every possible outcome. It couldn't be simpler. I haven't felt till now that the language I am using is trying to trip up my feet for programming competitions.
OMG ... People are all going through the Stats and Figures !!
Lets not forget the basics.. These are the only two languages (mostly) which are taught to people in college/schools...!
That might answer the heavy rush!
A vital reason might be that every contests don't support languages like python or prolog. Specially ACM ICPC World Finals support C/C++ and Java. And TopCoder also supports only C++, Java, C#, VB, and now Python. It is natural for the contestants that they will choose one language that is available in every contest. Another reason might be execution speed. And yes, another reason is these are the languages that most of the people learn first.
Big libraries were a selling point for Java in ACM ICPC. It's handy to be able to realize you want some random data structure or algorithm and just pull it out of the standard libraries.
Keep in mind that C++ is not only the majority among all contestants, but as the rounds progress, its percentage just keeps and keeps improving.
I'd say it is true that most of the participants are students (However, since it is an open tournament with chances to a job interview with google, then you have to consider that many who participate are graduated). But the latest rounds are only for people with ton of experience. They are not just students who just learned to code in C++ / Java.
Of course, the student argument also works against languages like LISP and OcaML or ProLog. That is languages, that are used a lot in AI areas but in the mainstream world students are the most likely to be learning and use them.
Big contests other than google's support few languages, but that still wouldn't explain why Pascal or .net are not near the level of Java (As they tend to be equally supported in the major contest events).
A lot of the best coders in these contests know a lot of languages. But they still prefer to use C++ during the rounds it must be for a bigger reason than "learned C++" first.
I would argue against the claim that languages other than C++ or Java are better tools for the job. If direct data says that the finalists are more likely to use C++ and Java it is a direct contradiction to that claim.
Google AI competition data does not actually contradict any premise regarding the code jam. It actually does show that top coders are able to use languages like Common Lisp when it is truly the better tool for the job. If we want to use this data to assume that CLISP is a great tool for AI competitions, then we should also assume that C++ is a great tool for algorithm competitions like GCJ.
Many programming languages share generic and even fairly universal features. For example, if you compared Java, VB6, .NET, PHP, Python, then you would find common functions such as control structures, numeric and string manipulation, etc.
What has been done to define these features at a meta-language (or language-agnostic) level?
UML offers a descriptive reference of software in every aspect, but the real-world focus seems to be data processes. Is UML relevant?
I'm not asking "Why we don't have a single language that replaces the current plethora." We need many different tools (at least in this eon).
I'm not asking that all languages fit a template -- assembly vs. compiled languages are different enough to make that unfeasible (and some folks call HTML a language, though I wouldn't). Any attempt would start with a properly narrow scope. In line with this, I wouldn't expect the model to cover even a small selection with full validity.
I would expect however that such a model could be used to transpose from one language to another (with limited goals -- think jist translation).
There have been many attempts at this, but none have been very successful. The earliest I'm aware of is UNCOL more than 50 years ago.
You've given a list of languages that have a lot in common because they're pretty similar -- they're all procedural languages with common roots and some OO extensions thrown in, so that's not too suprising. If you start looking at different languages like LISP, haskell, erlang, prolog, or even SQL you start seeing very different things.
What you're describing sounds like the formal semantics of programming languages. There are a variety of approaches and each will give a way to formally specify the meaning of a program in some programming language. In some cases, this specification is essentially a translation into another language such as lambda calculus, or compilation for a formally specified abstract machine such as SECD.
There is so much work here it's hard to pick a specific reference. But I hope I've given you some useful keywords to continue your search.
UML is typically used to define algorithms/code in simpler terms before moving on to real code.
To answer what I am guessing to be your question, there is already a defined set of required parts of languages while,for,if,else... Will this ever be set as a standard, or made into a base library that is used by all languages: no, this is because the different developers of languages like to do it themselves.
I think the closest you can get to this without loss of generality is a Turing machine, which is not very useful for practical purposes. But if you allow Turing machine languages to be "labeled" and reused, you could build up the concepts you need, working from low- to high-level.
I think that MOF is the universal language.
You can for example create UML diagrams from MOF via a UML metamodel. If you save this metamodel information into xmi then you can save what ever information you need and even more than in any language. XMI semantic is so rich that there is no limit to its use. If you map UML to xmi on the top of a metamodel live synchronize with MOF then this is for me the universal language.
The author of Pattern Calculus seems to propose such a universal model. I expect that it will turn out to be just as useful as previous attempts to define a universal model, that is to say, good in parts but not the last word.