How would I code a complex formula parser manually? - language-agnostic

Hm, this is language - agnostic, I would prefer doing it in C# or F#, but I'm more interested this time in the question "how would that work anyway".
What I want to accomplish ist:
a) I want to LEARN it - it's about my ego this time, it's for a fun project where I want to show myself that I'm a really good at this stuff
b) I know a tiny little bit about EBNF (although I don't know yet, how operator precedence works in EBNF - Irony.NET does it right, I checked the examples, but this is a bit ominous to me)
c) My parser should be able to take this: 5 * (3 + (2 - 9 * (5 / 7)) + 9) for example and give me the right results
d) To be quite frankly, this seems to be the biggest problem in writing a compiler or even an interpreter for me. I would have no problem generating even 64 bit assembler code (I CAN write assembler manually), but the formula parser...
e) Another thought: even simple computers (like my old Sharp 1246S with only about 2kB of RAM) can do that... it can't be THAT hard, right? And even very, very old programming languages have formula evaluation... BASIC is from 1964 and they already could calculate the kind of formula I presented as an example
f) A few ideas, a few inspirations would be really enough - I just have no clue how to do operator precedence and the parentheses - I DO, however, know that it involves an AST and that many people use a stack
So, what do you think?

You should go learn about Recursive Descent parsers.
Check out a Code Golf exercise in doing just this, 10 different ways:
Code Golf: Mathematical expression evaluator (that respects PEMDAS)
Several of these "golf" solutions are recursive descent parsers just coded in different ways.
You'll find that doing just expression parsing is by far the easiest thing in a compiler. Parsing the rest of the language is harder, but understanding how the code elements interact and how to generate good code is far more difficult.
You may also be interested in how to express a parser using BNF, and how to do something with that BNF. Here's
an example of how to parse and manipulate algebra symbolically with an explicit BNF and an implicit AST as a foundation. This isn't what compilers traditionally do, but the machinery that does is founded deeply in compiler technology.

For a stack-based parser implemented in PHP that uses Djikstra's shunting yard algorithm to convert infix to postfix notation, and with support for functions with varying number of arguments, you can look at the source for the PHPExcel calculation engine

Traditionally formula processors on computers use POSTFIX notation. They use a stack, pop 2 items as operands, pop the third item as the operator, and push the result.
What you want is an INFIX to POSTFIX notation converter which is really quite simple. Once you're in postfix processing is the simplest thing you'll ever do.

If you want to go for an existing solution I can recommend a working, PSR-0 compatible implementation of the shunting yard algorithm: https://github.com/andig/php-shunting-yard/tree/dev.

Related

How to translate from a language to another?

I don't want an automated solution.
When you have to translate a program from a language to another, what do you do? You prefer to rewrite it from the beginning or copy and paste it and change only what need to be changed?
What's the best choice?
It depends on
the goals (quick hack for one time use? long-lived production project for work?)
the resources I have (how many man-hours? Test suit and/or functional spec for old code? familiarity with both languages?)
most importantly, differences between the languages. Both the conceptual (OO? functional? reflection? control structures?) as well as available libraries.
Please note that this bullet is not as trivial as it seems - this depends in large part on how idiomatic the original program is - as an example, some people write very "C-like" Perl code (e.g. using C control flow and very C++ like OO design) which can be trivially copied to C or C++, and some people write incredibly intricate idiomatic Perl using functional programming, closures and reflective capabilties; which can't be obviously translated into C/C++.
Also, quality of the original code. E.g. good program will have separated business logic, usually expressed in standard configuration and control flow that's easier to directly clone.
E.g. translating from PHP to Perl for a hack job, you can often start out with copying, since many PHP constructs can be 1-to-1 mapped onto equivalent Perl constructs (just take your pick of Perl templating web library). The resulting code won't be GOOD Perl but will be Good Enough for some purposes.
On the other hand, translating, say, LISP code to Java, you're better off just translating the original code into a functionality specification and re-write from scratch. Your example of Python and JavaScript is probably in the same box.
Usually you have two languages that share at least some concepts (e.g. both have OO, and some imperative control structures) and thus you end up with some combination of the two approaches - parts of the code can be "thoughtlessly" translated, parts need to be re-written from scratch.
The more of the second (complete rewrite) approach, the better quality idiomatic and powerful code you end up with.
Normally, a rewrite using the original for inspiration and direction is the best choice.
The way you may do something in one language might very well not be the right way to do it in another.
When it comes to copy-paste, it is rare that you can just do that - languages are different and follow different syntax rules.
Of course, this all depends on the source and destination languages.
With your comment - javascript and python, I would say a rewrite is the best option.
That probably mostly depends on the similarity between the two languages and the meaning of "translation".
For instance, translating a bit of C89 code to C++ might not be so hard considering it should compile out of the box when you copy-paste (C++ is a compatible super-set to C89). I would hardly consider that a "translation", though.
On the other hand, translating Java to Haskell would certainly require a complete rewrite as the language paradygms, even worse than the syntax, are completely different.
Consider Wikipedia's list of programming languages. I'm too lazy to count how many of them are on this list, but let's assume there are 100.
If you want to translate one of them into another, that means that there are at least 100*99 = 9900 possible combinations for translation.
And that's an awful lot. Since most languages are unique, translation is very, very dependent on source and destination language.
Consider this Pascal to C converter. The author states it took him one and a half year to make a good translator for these particular languages. Obviously, this isn't a trivial task.
Depending on your ambitions, you might spend either one day, or many years translating a program from language A to language B.
How long this takes depends on your skill, size of your source code, complexity of languages A and B and their similarity.
As you can see, this isn't a trivial task and is highly dependent on your situation.

Convert chinese characters to hanyu pinyin

How to convert from chinese characters to hanyu pinyin?
E.g.
你 --> Nǐ
马 --> Mǎ
More Info:
Either accents or numerical forms of hanyu pinyin are acceptable, the numerical form being my preference.
A Java library is preferred, however, a library in another language that can be put in a wrapper is also OK.
I would like anyone who has personally used such a library before to recommend or comment on it, in terms of its quality/ reliabilitty.
The problem of converting hanzi to pinyin is a fairly difficult one. There are many hanzi characters which have multiple pinyin representations, depending on context. Compare 长大 (pinyin: zhang da) to 长城 (pinyin: chang cheng). For this reason, single-character conversion is often actually useless, unless you have a system that outputs multiple possibilities. There is also the issue of word segmentation, which can affect the pinyin representation as well. Though perhaps you already knew this, I thought it was important to say this.
That said, the Adso Package contains both a segmenter and a probabilistic pinyin annotator, based on the excellent Adso library. It takes a while to get used to though, and may be much larger than you are looking for (I have found in the past that it was a bit too bulky for my needs). Additionally, there doesn't appear to be a public API anywhere, and its C++ ...
For a recent project, because I was working with place names, I simply used the Google Translate API (specifically, the unofficial java port, which, for common nouns at least, usually does a good job of translating to pinyin. The problem is commonly-used alternative transliteration systems, such as "HongKong" for what should be "XiangGang". Given all of this, Google Translate is pretty limited, but it offers a start. I hadn't heard of pinyin4j before, but after playing with it just now, I have found that it is less than optimal--while it outputs a list of potential candidate pinyin romanizations it makes no attempt to statistically determine their likelihood. There is a method to return a single representation, but it will soon be phased out, as it currently only returns the first romanization, not the most likely. Where the program seems to do well is with conversion between romanizations and general configurability.
In short then, the answer may be either any one of these, depending on what you need. Idiosyncratic proper nouns? Google Translate. In need of statistics? Adso. Willing to accept candidate lists without context information? Pinyin4j.
In Python try
from cjklib.characterlookup import CharacterLookup
cjk = CharacterLookup('C')
cjk.getReadingForCharacter(u'北', 'Pinyin')
You would get
['běi', 'bèi']
Disclaimer: I'm the author of that library.
For Java, I'd try the pinyin4j library
As mentioned in other answers the conversion is fuzzy and even google translate apparently gets a certain percentage of character combinations wrong.
A reasonable result which will not be 100% accurate can be achieved with open-source libraries available for some programming languages.
The simplest code to do the conversion with python with the pypinyin library (to install it use pip3 install pypinyin):
from pypinyin import pinyin
def to_pinyin(chin):
return ' '.join([seg[0] for seg in pinyin(chin)])
print(to_pinyin('好久不见'))
# OUTPUT: hǎo jiǔ bú jiàn
NOTE: The pinyin method from the module returns a list of possible candidate segments, and the to_pinyin method takes the first variant whenever more than one conversion is available. For tricky corner cases this is likely to produce incorrect results, but generally you'll probably get at least a ~90..95% success rate.
There are a few other python libraries for pinyin conversion but in my tests they proved to have a higher error rate than pypinyin. Also, they don't appear to be actively maintained.
If you need better accuracy then you'll need a more complex approach that will rely on bigger datasets and possibly some machine learning.

Programming Concepts That Were "Automated" By Modern Languages [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
Weird question but here it is. What are the programming concepts that were "automated" by modern languages? What I mean are the concepts you had to manually do before. Here is an example: I have just read that in C, you manually do garbage collection; with "modern" languages however, the compiler/interpreter/language itself takes care of it. Do you know of any other, or there aren't any more?
Optimizations.
For the longest time, it was necessary to do this by hand. Now most compilers can do it infinitely better than any human ever could.
Note: This is not to say that hand-optimizations aren't still done, as pointed out in the comments. I'm merely saying that a number of optimizations that used to be done by hand are automatic now.
I think writing machine code deserves a mention.
Data collections
Hash tables, linked lists, resizable arrays, etc
All these had to be done by hand before.
Iteration over a collection:
foreach (string item in items)
{
// Do item
}
Database access, look at the ActiveRecord pattern in Ruby.
The evil goto.
Memory management, anybody? I know it's more efficient to allocate and deallocate your own memory explicitly, but it also leads to Buffer Overruns when it's not done correctly, and it's so time consuming - a number of modern languages will allocate and garbage collect automatically.
Event System
Before you had to implement the Observer Pattern all by yourself. Today ( at least in .NET ) you can simply use "delegates" and "events"
Line Numbers
No more BASIC, no more Card Decks!
First in list, extension method. Facilitates fluent programming
Exceptions, compartmentalization of what is the program trying to do (try block) and what it will do if something fail (catch block), makes for saner programming. Whereas before, you should be always in alert to intersperse error handling in between your program statements
Properties, makes the language very component-centric, very modern. But sadly that would make Java not modern.
Lambda, functions that captures variables. whereas before, we only have function pointer. This also precludes the need for nested function(Pascal has nested function)
Convenient looping on collection, i.e. foreach, whereas before you have to make a design pattern for obj->MoveNext, obj->Eof
goto-less programming using modern construct like break, continue, return. Whereas before, I remember in Turbo Pascal 5, there's no break continue, VB6 has Exit Do/For(analogous to break), but it doesn't have continue
C#-wise, I love the differentiation of out and ref, so the compiler can catch programming errors earlier
Reflection and attributes, making programs/components able to discover each other functionalities, and invoke them during runtime. Whereas in C language before (I don't know in modern C, been a long time not using C), this things are inconceivable
Remote method invocations, like WebServices, Remoting, WCF, RMI. Gone are the days of low-level TCP/IP plumbing for communication between computers
Declarations
In single-pass-compiler languages like C and C++, declarations had to precede usage of a function. More modern languages use multi-pass compilers and don't need declarations anymore. Unfortunately, C and C++ were defined in such a poor way that they can't deprecate declarations now that multi-pass compilers are feasible.
Pattern matching and match expressions
In modern languages you can use pattern matching which is more powerful than a switch statement, imbricated if statements of ternary operations:
E.g. this F# expression returns a string depending the value of myList:
match myList with
| [] -> "empty list"
| 2::3::tl -> "list starts with elements 2 and 3"
| _ -> "other kind of list"
in C# you would have to write such equivalent expression that is less readable/maintanable:
(myList.Count == 0) ? "empty list" :
(myList[0] == 2 && myList[1] == 3 ? "list starts with elements 2 and 3" :
"other kind of list")
If you go back to assembly you could find many more, like the concept of classes, which you could mimic to a certain extent, were quite difficult to achieve... or even having multiple statements in a single line, like saying "int c = 5 + 10 / 20;" is actually many different "instructions" put into a single line.
Pretty much anything you can think of beyond simple assembly (concepts such as scope, inheritance & polymorphism, overloading, "operators" anything beyond your basic assembly are things that have been automated by modern languages, compilers and interpreters.)
Some languages support Dynamic Typing, like Python! That is one of the best things ever (for certain fields of applications).
Functions.
It used to be that you needed to manually put all the arguments to stack, jump to piece of code where you defined your function, then manually take care of its return values. Now you just write a function!
Programming itself
With some modern IDE (e.g. Xcode/Interface Builder) a text editor or an address book is just a couple of clicks away.
Also built-in functions for sorting(such as bubble sort,quick sort,....).
Especially in Python 'containers' are a powerful data structures that in also in other high level and modern programming languages require a couple of lines to implement.You can find many examples of this kind in Python description.
Multithreading
Native support for multithreading in the language (like in java) makes it more "natural" than adding it as an external lib (e.g. posix thread layer in C). It helps in understanding the concept because there are many examples, documentation, and tools available.
Good String Types make you have to worry less about messing up your string code,
Most Languages other then c and occasionally c++ (depending on how c like they are being) have much nicer strings then c style arrays of char with a '\0' at the end (easier to work with a lot less worry about buffer overflows,ect.). C strings suck.
I probably have not worked with c strings enough to give such a harsh (ok not that harsh but I'd like to be harsher) but after reading this (especially the parts about saner seeming pascal string arrays which used the zeroth element to mark the length of the string), and a bunch of flamewars over which strcpy/strcat is better to use (the older strn* first security enhancement efforts, the openbsd strl*, or the microsoft str*_s) I just have come to really dislike them.
Type inference
In languages like F# or Haskell, the compiler infers types, making programming task much easier:
Java: float length = ComputeLength(new Vector3f(x,y,z));
F#: let length = ComputeLength(new Vector3f(x,y,z))
Both program are equivalent and statically type. But F# compiler knows for instance that ComputeLength function returns a float so it automagically deduces the type of length, etc..
A whole bunch of the Design Patterns. Many of the design patterns, such as Observer (as KroaX mentioned), Command, Strategy, etc. exist to model first-class functions, and so many more languages have begun to support them natively. Some of the patterns related to other concepts, such as Lock and Iterator, have also been built into languages.
dynamic library
dynamic libraries automatically share common code between processes, saving RAM and speed up starting time.
plugins
a very clean and efficient way to extend functionality.
Data Binding. Sure cuts down on wiring to manually shuffle data in and out of UI elements.
OS shell Scripting, bash/sh/or even worse batch scripts can to a large extent be replaced with python/perl/ruby(especially for long scripts, less so at least currently for some of the core os stuff).
You can have most of the same ability throw out a few lines of script to glue stuff together while still working in a "real" programming language.
Context Switching.
Most modern programming languages use the native threading model instead of cooperative threads. Cooperative threads had to actively yield control to let the next thread work, with native threads this is handled by the operating system.
As Example (pseudo code):
volatile bool run = true;
void thread1()
{
while(run)
{
doHeavyWork();
yield();
}
}
void thread2()
{
run = false;
}
On a cooperative system thread2 would never run without the yield() in the while loop.
Variable Typing
Ruby, Python and AS will let you declare variables without a type if that's what you want. Let me worry about whether this particular object's implementation of Print() is the right one, is what I say.
How about built-in debuggers?
How many here remember "The good old days" when he had to add print-lines all over the program to figure out what was happening?
Stupidity
That's a thing I've gotten lot of help from modern programming languages. Some programming languages are a mess to start with so you don't need to spend effort to shuffle things around without reason. Good modern libraries enforce stupidity through forcing the programmer inside their framework and writing redundant code. Java especially helps me enormously in stupidity by forcing me inside OOPS box.
Auto Type Conversion.
This is something that I don`t even realize that language is doing to me, except when I got errors for wrong type conversion.
So, in modern languages you can:
Dim Test as integer = "12"
and everthing should work fine...
Try to do something like that in a C compiler for embedded programming for example. You have to manually convert all type conversions!!! That is a lot of work.

What general purpose language should I learn next?

I'm currently participating in a programming contest (http://contest.github.com), which has as goal, to create a recommendation engine. I started coding in ruby, but soon realised it wasn't fast enough for the algorithms I had in mind. So I switched to C, which is the only non-scripting language I know. It was fast, of course, but I cringed every time I had to write a for loop, to go through the elements of an array (which was very often).
That's when it dawned: I wish I knew a fast, yet high-level language, to program all these intensive computations with ease!
So I looked at my options, but there are a lot of options these days! Here the best candidates I've found over the months, with something which bothers me about each of them (that hopefully you can clear up):
Clojure: I'm not sure I want to get into the whole lisp thing, I like my syntax and cruft. I could be convinced, though.
Haskell: Too academic? I don't really care for pure functional, I just want something which works. But it has nice syntax, and I don't mind static typing.
Scala: Weird language. I tried it out but it feels messy/inconsistent to me.
OCaml: Also wondering if this is too academic? The poor concurrency support also bothers me.
Arc: Paul Graham's lisp, too obscure, and again, I'm not sure I want to learn a lisp. But I trust this man!
Any advice? I really like the functional languages, for their ability to manipulate lists with ease, but I'm open to other options too. I'd like something about as fast as Java..
The kind of things I want to be able to do with lists are like (ruby):
([1, 2, 3, 4] - [2, 3]).map {|i| i * 2 } # which results in [2, 8]
I would also prefer an open-source language.
Thanks
Out of the languages that you've listed, neither Haskell nor Arc match your "fast" requirement - both are slower than Java. Your idea that Haskell is faster than Java and approaches C is most likely coming from one well-known flawed test that tried to measure performance by implementing sort. One thing that they've missed is that Haskell is lazy, and thus you need to use the results of the sort for it to actually perform that; and they measured performance simply by remembering current time, "calling" the sort function, and checking the time delta. C version of the test faithfully performed the sort, Haskell version simply returned a thunk for lazy evaluation which was never called.
In practice, there are a number of reasons why Haskell cannot be that fast even in theory. First, because of pervasive lazy evaluation, it often cannot pass around raw values, and has to generate thunks for expressions - the optimizer can trim down on those in trivial cases, but not for more complicated ones. Second, polymorphic Haskell functions are implemented as runtime-polymorphic, and not like C++ templates where every new type parameter instantiates a new version of code that is optimally compiled. Obviously, this necessitates extra boxing/unboxing. In the end, Haskell will struggle to beat any decent VM (such as HotSpot JVM, or CLR in .NET 2.0+), much less C/C++.
Now that's settled in, let's move on to the rest. Scala uses JVM as a backend, and thus is not going to be any faster than Java - and if you use higher-level abstractions, it will most likely be slower somewhat, but probably in the same ballpark. Clojure also runs on JVM, but it's also dynamically typed, and that carries an unavoidable performance penalty (I heard it does clever tricks to mitigate that to some extent, but some of it really is unavoidable no matter what).
That leaves OCaml, and out of your list, it is the only language that had actually been conclusively shown to reach the performance of C/C++ compilers on valid tests. It should be noted however that this would not be typical of idiomatic OCaml code - for example, its polymorphism is also runtime, similar to Haskell, and that carries the appropriate penalty; also, its OOP system is structural rather than nominal, which precludes an optimal vtable-based implementation; so that is going to be slower than C++, too (I'd expect perf penalty close to that of Objective-C dispatch compared to C++ dispatch, but I don't have any numbers to back that up). So you can beat C++ in OCaml if you steer away from certain language features, but unfortunately, it's those features that make OCaml so attractive in the first place.
My advice would be this: if you really need speed, go with C++. It can be fairly high-level if you use high-level libraries such as STL and Boost. It doesn't have some high-level language abstractions you might be used to, but libraries can compensate for that - sometimes fully, sometimes in part. For example, you don't have to write a for-loop to iterate over an array - you can use std::for_each, std::copy_if, std::transform, std::accumulate and similar algorithms (which are mostly analogous to map, filter, fold, and similar traditional FP primitives), and also Boost.Lambda to cut down on boilerplace.
Why not simple Java or C#? Should be faster then Ruby, more high level then C and have a huge userbase.
Your criticism of pretty much everything seems to be that it's "weird" or "too academic." But what does that mean? It's the sort of vague criticism that you can throw at any unfamiliar language that isn't totally mainstream (i.e., not C, C++, Objective-C, Java, Ruby, Python or PHP). There's nothing about all those languages that's inherently good for academia and bad for anything else. Try to break down your analysis a little further: Specifically, what is it that troubles you about those languages? You might find that your brain is just instinctively pushing away something unfamiliar. If that's the case, learning one of those languages might be a good way to expand your mind.
Alternatively: It sounds like you're looking for a functional language, so you might look at F#. It's a first-class CLR language created by Microsoft, so it doesn't carry any "academic" mental baggage, and it's very similar to OCaml.
newLISP is fast, small, integrates extremely easily with C, and it has quite a few statistical functions built-in.
Haskell is my current preference as a performant, high-level language. I've also heard very good things about OCaml, but haven't personally used it much.
Scala and Clojure will have similar performance to Java -- slow, slow, slow! Sure, they'll be faster than Ruby, but what isn't?
Arc is a set of macros for MzScheme, and is not particularly fast. If you want a performant LISP, try Common LISP -- it can be compiled to machine code.
How about Delphi / FreePascal? They're native code & fast. I do a lot of real-time graphics & processing with them. They dont require that you work 'low level', but you can if you need to. Plus you can embed assembler if needed for extra performance. FreePascal is cross platform if you want to stay off Windows.
D might fit the bill? Compiles to machine code but allows for programming using higher-level concepts.
Python can be made to run fast, especially using the NumPy package. Relevant links below:
http://www.scipy.org/PerformancePython
Cython and numpy speed
You seem uncomfortable with any language that doesn't look like one you already use. That's going to limit you, so I'd suggest one you won't be comfortable with if you're interested in expanding your horizons. I'm not saying you'll want to continue with any particular language (I have a definite preference never to touch Tcl again), but you should try it sometime.
There are nice fast implementations of Common Lisp, and that's an easy language to write functional programs in. Besides, if you can get along with it, you'll find a lot of neat things you can do with it.
Computation? Fortran. Beats the pants off of anything else.
If you don't mind .NET...
F# - based on O'Caml, multiparadigm language with full access to .NET Framework. Included officially in .NET FW 4.0
Nemerle - see F# and add to that a POWERFUL metaprogramming capabilities.
After your update:
If you want to manipulate lists easily you should go with Common Lisp. It is only 2 times slower that C in average (and actually faster in some things), it is great for list processing and it is multi-paradigm (imperative, functional and OO) - so you don't have to stick to functional-only programming. SBCL is a good Common Lisp to try first, IMO.
And don't get bothered by strange "lispy" things like parentheses. It is not only quite stupid to judge language by its syntax, rather than semantics, but also parentheses are one of the greatest strengths of LISP, because they eliminate differences between data and expressions and you can manipulate language itself to make it fit your needs.
Don't listen to people who advice C++/C#/Java. Java functional part is non-existant. C++ functional part is terrible. C# delegates makes me sick because of their complexity. They are not REAL multi-paradigm imperative/functional languages, they are imperative/OO languages that have some small functional bits, you can't do real functional programming in them.
C++ or alternatively C# and mono.
Honestly, to accomplish much in the world of software engineering, you will likely have to wrap your head around these languages you find distasteful. Java, C, C++, C#, etc. are likely to come up in a career that involves programming.
Looks like you've done some interesting work. I encourage you to push your technical skills harder. It will be worth the effort.
Alternatively, Python might be good, given your interests. You might find Smalltalk interesting, or even ATS.
For some ideas, look at the Language Shootout and analysis by Oscar Boykin. You have already discovered this, but comparing Ruby to C we see that Ruby is between 14 and 600 times slower (several tests are more than 100 times slower). He also points out that Python is faster than Ruby. The benchmarks for all languages is interesting.
Also interesting are benchmarks from Dan Corlan.
You might consider python; it supports writing modules in C or C++, so you can get it working in a high-level language, profile it, rework the algorithms, and if it still isn't fast enough, translate the hotspots to C or C++ for speed.
Consider Tcl, combined with C. Do the really compute-intensive stuff in C since that's what you know how to do, then use Tcl as the glue to combine the high level code with your C-based code.
I make this recommendation not because Tcl is necessarily the best language for the job (there really is no "best" for something like this) but because you'll learn a lot about the concept of combining the strengths of two different languages. It's an important technique that could serve you well in your career whether it's Tcl/C, Lua/C, Groovy/Java, Python/C, etc.
Python with pyrex or psyco may be a better fit? Probably not as fast as C, but you can see some significant speedups from regular Python.
If you want something that's "about as fast as Java," the obvious solution is JRuby.
If you install Netbeans (use the download button under the Ruby column), JRuby is the default interpreter. It doesn't get much easier!
If your problem is C's clunky loops, I'd suggest looking at Ada. It allows you to loop through a whole array with a simple statement like so:
for I in array_name'range loop
--'// Code goes here
end loop;
For AI projects, I'd also suggest you look into using Clips, which is a freely-available inference engine.
Rather than OCAML, you might consider F# -- it's source compatible with OCAML (or you can use a lighter weight syntax) and it supports actor-style concurrency with what it calls asynchronous workflows (which are really an almost-monad for applying asynchronous execution).
Not that -- as Scala shows -- you need to have actor style concurrency baked into the language, if you build it into a library. The rest is just syntactic sugar.
Learn C++ and familiarize yourself with its standard library. It won't be that hard to learn as you already 'speak' C, but keep in mind that C++ is not just a better C, it's another language with its own concepts and methods.
Why not Erlang?
It's not too much like the languages you already know, so you can learn new concepts
It has some interesting capabilities for multiprocessing
It's not out of academia. Erlang was a commercial language first.
There are at least two significant open source applications written in it: CouchDB and Wings3d
I believe going through C C++ and Java or .net then moving on from here to any one way java or .net , because c is more machine oriented and C++ and java will give you hands on with object oriented learning, then later on switching to python (to really appreciate the clean code than in C C++ and JAVA ).

Why don't I see pipe operators in most high-level languages?

In Unix shell programming the pipe operator is an extremely powerful tool. With a small set of core utilities, a systems language (like C) and a scripting language (like Python) you can construct extremely compact and powerful shell scripts, that are automatically parallelized by the operating system.
Obviously this is a very powerful programming paradigm, but I haven't seen pipes as first class abstractions in any language other than a shell script. The code needed to replicate the functionality of scripts using pipes seems to always be quite complex.
So my question is why don't I see something similar to Unix pipes in modern high-level languages like C#, Java, etc.? Are there languages (other than shell scripts) which do support first class pipes? Isn't it a convenient and safe way to express concurrent algorithms?
Just in case someone brings it up, I looked at the F# pipe-forward operator (forward pipe operator), and it looks more like a function application operator. It applies a function to data, rather than connecting two streams together, as far as I can tell, but I am open to corrections.
Postscript: While doing some research on implementing coroutines, I realize that there are certain parallels. In a blog post Martin Wolf describes a similar problem to mine but in terms of coroutines instead of pipes.
Haha! Thanks to my Google-fu, I have found an SO answer that may interest you. Basically, the answer is going against the "don't overload operators unless you really have to" argument by overloading the bitwise-OR operator to provide shell-like piping, resulting in Python code like this:
for i in xrange(2,100) | sieve(2) | sieve(3) | sieve(5) | sieve(7):
print i
What it does, conceptually, is pipe the list of numbers from 2 to 99 (xrange(2, 100)) through a sieve function that removes multiples of a given number (first 2, then 3, then 5, then 7). This is the start of a prime-number generator, though generating prime numbers this way is a rather bad idea. But we can do more:
for i in xrange(2,100) | strify() | startswith(5):
print i
This generates the range, then converts all of them from numbers to strings, and then filters out anything that doesn't start with 5.
The post shows a basic parent class that allows you to overload two methods, map and filter, to describe the behavior of your pipe. So strify() uses the map method to convert everything to a string, while sieve() uses the filter method to weed out things that aren't multiples of the number.
It's quite clever, though perhaps that means it's not very Pythonic, but it demonstrates what you are after and a technique to get it that can probably be applied easily to other languages.
You can do pipelining type parallelism quite easily in Erlang. Below is a shameless copy/paste from my blogpost of Jan 2008.
Also, Glasgow Parallel Haskell allows for parallel function composition, which amounts to the same thing, giving you implicit parallelisation.
You already think in terms of
pipelines - how about "gzcat
foo.tar.gz | tar xf -"? You may not
have known it, but the shell is
running the unzip and untar in
parallel - the stdin read in tar just
blocks until data is sent to stdout by
gzcat.
Well a lot of tasks can be expressed
in terms of pipelines, and if you can
do that then getting some level of
parallelisation is simple with David
King's helper code (even across erlang
nodes, ie. machines):
pipeline:run([pipeline:generator(BigList),
{filter,fun some_filter/1},
{map,fun_some_map/1},
{generic,fun some_complex_function/2},
fun some_more_complicated_function/1,
fun pipeline:collect/1]).
So basically what he's doing here is
making a list of the steps - each step
being implemented in a fun that
accepts as input whatever the previous
step outputs (the funs can even be
defined inline of course). Go check
out David's blog entry for the
code and more detailed explanation.
magrittr package provides something similar to F#'s pipe-forward operator in R:
rnorm(100) %>% abs %>% mean
Combined with dplyr package, it brings a neat data manipulation tool:
iris %>%
filter(Species == "virginica") %>%
select(-Species) %>%
colMeans
You can find something like pipes in C# and Java, for example, where you take a connection stream and put it inside the constructor of another connection stream.
So, you have in Java:
new BufferedReader(new InputStreamReader(System.in));
You may want to look up chaining input streams or output streams.
Thanks to all of the great answers and comments, here is a summary of what I learned:
It turns out that there is an entire paradigm related to what I am interested in called Flow-based programming. A good example of a language designed specially for flow-based programming is Hartmann pipelines. Hartamnn pipelines generalize the idea of streams and pipes used in Unix and other OS's, to allows for multiple input and output streams (rather than just a single input stream, and two output streams). Erlang contains powerful abstractions that make it easy to express concurrent processes in a manner which resembles pipes. Java provides PipedInputStream and PipedOutputStream which can be used with threads to achieve the same kind of abstractions in a more verbose manner.
I think the most fundamental reason is because C# and Java tend to be used to build more monolithic systems. Culturally, it's just not common to even want to do pipe-like things -- you just make your application implement the necessary functionality. The notion of building a multitude of simple tools and then gluing them together in arbitrary ways just isn't common in those contexts.
If you look at some of the scripting languages, like Python and Ruby, there are some pretty good tools for doing pipe-like things from within those scripts. Check out the Python subprocess module, for example, which allows you to do things like:
proc = subprocess.Popen('cat -',
shell=True,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,)
stdout_value = proc.communicate('through stdin to stdout')[0]
print '\tpass through:', stdout_value
Are you looking at the F# |> operator? I think you actually want the >> operator.
Usually you just don't need it and programs run faster without it.
Basically piping is consumer/producer pattern. And it's not that hard to write those consumers and producers because they don't share much data.
Piping for Python : pypes
Mozart-OZ can do pipes using ports and threads.
Objective-C has the NSPipe class. I use it quite frequently.
I've had a lot of fun building pipeline functions in Python. I have a library I wrote, I put the contents and a sample run here. The best fit me for has been XML processing, described in this Wikipedia article.
You can do pipe like operations in Java by chaining/filtering/transforming iterators.
You can use Google's Guava Iterators.
I will say even with the very helpful guava library and static imports its still ends up being lots of Java code.
In Scala its quite easy to make your own pipe operator.
Streaming libraries based on coroutines have existed in Haskell for quite some time now. Two popular examples are conduit and pipes.
Both libraries are well-written and well-documented, and are relatively mature. The Yesod web framework is based on conduit, and it's pretty damn fast. Yesod is competitive with Node on performance, even beating it in a few places.
Interestingly, all of the these libraries are single-threaded by default. This is because the single motivating use case for pipelines is servers, which are I/O bound.
Since R added pipe operator today, it's worth to mention Julialang has pipe all a long:
help?> |>
search: |>
|>(x, f)
Applies a function to the preceding argument. This allows for easy function chaining.
Examples
≡≡≡≡≡≡≡≡≡≡
julia> [1:5;] |> x->x.^2 |> sum |> inv
0.01818181818181818
if you're still interested in an answer...
you can look at factor, or the older joy and forth for the concatenative paradigm.
in arguments and out arguments are implicit, dumped to a stack. then the next word (function) takes that data and does something with it.
the syntax is postfix.
"123" print
where print takes one argument, whatever is in the stack.
You can use my library in python: github.com/sspipe/sspipe
In Mathematica, you can use //
for example
f[g[h[x,parm1],parm2]]
quite a mess.
could be written as
x // h[#, parm1]& // g[#, parm2]& // f
the # and & is lambda in Mathematica
In js, there seems to be pipe operator |> soon.
https://github.com/tc39/proposal-pipeline-operator