From postfix to expression-tree algorithm (with function call) - function

After reviewing all related entries in Stackoveflow and also after googleing a lot, I can not find the algorithm I need.
Having expressions similar to this one (in postfix notation):
"This is a string" 1 2 * 4.2 ceil mid "is i" ==
I need an algorithm that transforms it into an Expression Tree.
I've found many places that explain how this transformation can be done, but they only use operators (+,-,*, &&, ||, etc). I can not find how to do it when functions (with zero or more arguments) are involved, like in the example above.
Just for clarity, the postfix expression above will look as follows when using infix notation:
mid( "This is a string", 1*2, ceil( 4.2 ) ) == "is i"
A general algorithm in pseudo-code or Java or JavaScript or C would be very much appreciated (please keep this in mind: from postfix to expression-tree).
Thanks a million in advance.

Related

Why bitwise operators are popular in sport programming

I'm a newbie in one of a sport programming service and found that winning solutions often use bitwise operators.
Here is an example.
Write a function, which finds a difference between two
arrays (consider that they differ by one element).
The solutions are:
let s = x => ~eval(x.join`+`);
let findDiff = (a, b) => s(b) - s(a);
and
let findDiff = (a, b) => eval(a.concat(b).join`^`);
I would like to know:
The explanation of those two examples (bitwise part).
What's the advantage of using bitwise operators on decimal numbers?
Is that faster to operate with bitwise operations rather than normal one?
Update:
I didn't fully understand why my question is marked as a duplicate to ~~ vs parseInt. That's good to know, why this operator replaces parseInt and probably helpful for sport programming. But it doesn't answer on my question.
Code golf isn't focused on bitwise operators, is about code length.
Bitwise operators are not necessary faster, but generally fast enough. They are concise (and usually harder to read, but that's side effect).
~~ is a shorter (and usually more preformant) alternative to parseInt with a considerable number of remarks. In regular (non-golf) code it should be used only if it provides the behaviour that is more desirable than parseInt or in performance-sensitive context.
~a is roughly equal to parseInt(a) * -1 - 1. It can be used as a shorter alternative to ~~a in this particular example, s(b) - s(a), because * -1 - 1 part is eliminated on subtraction (the sign should be taken into account).

Checking understanding of: "Variable" v.s. "Value", and "function" vs "abstraction"

(This question is a follow-up of this one while studying Haskell.)
I used to find the notion between "variable" and "value" confusing. Therefore I read about the wiki-page of lambda calculus as well as the previous answer above. I come out with below interpretations.
May I confirm whether these are correct? Just want to double confirm because these concept are quite basic but essential to functional programming. Any advice is welcome.
Premises from wiki:
Lambda Calculus syntax
exp → ID
| (exp)
| λ ID.exp // abstraction
| exp exp // application
(Notation: "<=>" equivalent to)
Interpretations:
"value": it is the actual data or instructions stored in computer.
"variable": it is a way locating the data, a value-replacing reference , but not itself the set of data or instruction stored in computer.
"abstraction" <=> "function" ∈ syntactic form. (https://stackoverflow.com/a/25329157/3701346)
"application": it takes an input of "abstraction", and an input of "lambda expression", results in an "lambda expression".
"abstraction" is called "abstraction" because in usual function definition, we abbreviate the (commonly longer) function body into a much shorter form, i.e. a function identifier followed by a list of formal parameters. (Though lambda abstractions are anonymous functions, other functions usually do have name.)
"variable" <=> "symbol" <=> "reference"
a "variable" is associated with a "value" via a process called "binding".
"constant" ∈ "variable"
"literal" ∈ "value"
"formal parameter" ∈ "variable"
"actual parameter"(argument) ∈ "value"
A "variable" can have a "value" of "data"
=> e.g. variable "a" has a value of 3
A "variable"can also have a "value" of "a set of instructions"
=> e.g. an operator "+" is a variable
"value": it is the actual data or instructions stored in computer.
You're trying to think of it very concretely in terms of the machine, which I'm afraid may confuse you. It's better to think of it in terms of math: a value is just a thing that never changes, like the number 42, the letter 'H', or the sequence of letters that constitutes "Hello world".
Another way to think of it is in terms of mental models. We invent mental models in order to reason indirectly about the world; by reasoning about the mental models, we make predictions about things in the real world. We write computer programs to help us work with these mental models reliably and in large volumes.
Values are then things in the mental model. The bits and bytes are just encodings of the model into the computer's architecture.
"variable": it is a way locating the data, a value-replacing reference , but not itself the set of data or instruction stored in computer.
A variable is just a name that stands for a value in a certain scope of the program. Every time a variable is evaluated, its value needs to be looked up in an environment. There are several implementations of this concept in computer terms:
A stack frame in an eager language is an implementation of an environment for looking up the values of local variable, on each invocation of a routine.
A linker provides environments for looking up global-scope names when a program is compiled or loaded into memory.
"abstraction" <=> "function" ∈ syntactic form.
Abstraction and function are not equivalent. In the lambda calculus, "abstraction" a type of syntactic expression, but a function is a value.
One analogy that's not too shabby is names and descriptions vs. things. Names and descriptions are part of language, while things are part of the world. You could say that the meaning of a name or description is the thing that it names or describes.
Languages contain both simple names for things (e.g., 12 is a name for the number twelve) and more complex descriptions of things (5 + 7 is a description of the number twelve). A lambda abstraction is a description of a function; e.g., the expression \x -> x + 7 is a description of the function that adds seven to its argument.
The trick is that when descriptions get very complex, it's not easy to figure out what thing they're describing. If I give you 12345 + 67890, you need to do some amount of work to figure out what number I just described. Computers are machines that do this work way faster and more reliably than we can do it.
"application": it takes an input of "abstraction", and an input of "lambda expression", results in an "lambda expression".
An application is just an expression with two subexpressions, which describes a value by this means:
The first subexpression stands for a function.
The second subexpression stands for some value.
The application as a whole stands for the value that results for applying the function in (1) to the value from (2).
In formal semantics (and don't be scared of that word) we often use the double brackets ⟦∙⟧ to stand for "the meaning of"; e.g. ⟦dog⟧ = "the meaning of dog." Using that notation:
⟦e1 e2⟧ = ⟦e1⟧(⟦e2⟧)
where e1 and e2 are any two expressions or terms (any variable, abstraction or application).
"abstraction" is called "abstraction" because in usual function definition, we abbreviate the (commonly longer) function body into a much shorter form, i.e. a function identifier followed by a list of formal parameters. (Though lambda abstractions are anonymous functions, other functions usually do have name.)
To tell you the truth, I've never stopped to think whether the term "abstraction" is a good term for this or why it was picked. Generally, with math, it doesn't pay to ask questions like that unless the terms have been very badly picked and mislead people.
"constant" ∈ "variable"
"literal" ∈ "value"
The lambda calculus, in and of itself, doesn't have the concepts of "constant" nor "literal." But one way to define these would be:
A literal is an expression that, because of the rules of the language, always has the same value no matter where it occurs.
A constant, in a purely functional language, is a variable at the topmost scope of a program. Every (non-shadowed) use of that variable will always have the same value in the program.
"formal parameter" ∈ "variable"
"actual parameter"(argument) ∈ "value"
Formal parameter is one kind of use of a variable. In any expression of the form λv.e (where v is a variable and e is an expression), v is a formal variable.
An argument is any expression (not value!) that occurs as the second subexpression of an application.
A "variable" can have a "value" of "data" => e.g. variable "a" has a value of 3
All expressions have values, not just variables. For example, 5 + 7 is an application, and it has the value of twelve.
A "variable"can also have a "value" of "a set of instructions" => e.g. an operator "+" is a variable
The value of + is a function—it's the function that adds its arguments. The set of instructions is an implementation of that function.
Think of a function as an abstract table that says, for each combination of argument values, what the result is. The way the instructions come in is this:
For a lot of functions we cannot literally implement them as a table. In the case of addition it's because the table would be infinitely large.
Even for functions where we can enumerate the cases, we want to implement them much more briefly and efficiently.
But the way you check whether a function implementation is correct is, in some sense, to check that in every case it does the same thing the "infinite table" would do. Two sets of instructions that both check out in this way are really two different implementations of the same function.
The word "abstraction" is used because we can't "look inside" a function and see what's going on for the most part so it's "abstract" (contrast with "concrete"). Application is the process of applying a function to an argument. This means that its body is run, but with the thing that's being applied to it replacing the argument name (avoiding any capture). Hopefully this example will explain better than I can (in Haskell syntax. \ represents lambda):
(\x -> x + x) 5 <=> 5 + 5
Here we are applying the lambda expression on the left to the value 5 on the right. We get 5 + 5 as our result (which then may be further reduced to 10).
A "reference" might refer to something somewhat different in the context of Haskell (IORefs and STRefs), but, internally, all bindings ("variables") in Haskell have a layer of indirection like references in other languages (actually, they have even more indirection than that in a way because of the non-strict evaluation).
This mostly looks okay except for the reference issue I mentioned above.
In Haskell, there isn't really a distinction between a variable and a constant.
A "literal" usually is specifically a constructor for a value. For example, 20 constructs the the number 20, but a function application (\x -> 2 * x) 10 wouldn't be considered a literal for 20 because it has an extra step before you get the value.
Right, not all variables are parameters. A parameter is something that is passed to a function. The xs in the lambda expressions above are examples of parameters. A non-example would be something like let a = 15 in a * a. a is a "variable" but not a parameter. Actually, I would call a a "binding" here because it can never change or take on a different value (vary).
The formal parameter vs actual parameter part looks about right.
That looks okay.
I would say that a variable can be a function instead. Usually, in functional programming, we typically think in terms of functions and function applications instead of lists of instructions.
I'd like to point out also that you might get in trouble by thinking of functions as just syntactic forms. You can create new functions by applying certain kinds of higher order functions without using one of the syntactic forms to construct a function directly. A simple example of this is function composition, (.) in Haskell
(f . g) x = f (g x) -- Definition of (.)
(* 10) . (+ 1) <=> \x -> ((* 10) ((+ 1) x)) <=> \x -> 10 * (x + 1)
Writing it as (* 10) . (+ 1) doesn't directly use the lambda syntax or the function definition syntax to create the new function.

Expression Trees: Alternatives or Alternate Evaluation Methods

I'm not even sure if this is the right place to ask a question like this.
As a part of my MSc thesis, I am doing some parallel algorithm stuff. To put it simply part of the thing that I am doing is evaluating thousands of expression trees in parallel (expressions like sin(exp (x + y) * cos (z))). What I am doing right now is converting these expression trees to Prefix/Postfix expressions and evaluating them using conventional methods (stack, recursion, etc). These are the basic things that we've all been taught in Data Structures and basic Computer Science courses.
I'm wondering if there is anything else to be used instead of expression trees for dealing with expressions. I know that compilers are heavily using expression trees for parsing phase so I'm assuming there are no alternatives to expression trees (or else someone would have used it in a compiler).
Are there any alternative evaluation methods for such expressions (rather than stacks and recursion). Something more "parallel" friendly? Parsing such expression with stack is sequential and will create a bottleneck in parallel systems. (Exotic/weird/theoretic methods -if any- are also acceptable for my work)
I think that evaluating expression trees is parallelizable, you just don't convert them to the prefix or postfix form.
For example, the tree for the expression you gave would look like this:
sin
|
*
/ \
exp cos
| |
+ z
/ \
x y
When you encounter the *, you could evaluate the exp subexpression on one thread and the cos subexpression on another one. (You could use a future here to make the code simpler, assuming your programming language supports them.)
Although if your expressions really are as simple as this one and you have thousands of them, then I don't see any reason why you would need to evaluate a single expression in parallel. Parallelizing on the expressions themselves should be more than enough (e.g. with 1000 expressions and 2 cores, evaluate 500 on one core and the rest on the other core).

Clojure - test for equality of function expression?

Suppose I have the following clojure functions:
(defn a [x] (* x x))
(def b (fn [x] (* x x)))
(def c (eval (read-string "(defn d [x] (* x x))")))
Is there a way to test for the equality of the function expression - some equivalent of
(eqls a b)
returns true?
It depends on precisely what you mean by "equality of the function expression".
These functions are going to end up as bytecode, so I could for example dump the bytecode corresponding to each function to a byte[] and then compare the two bytecode arrays.
However, there are many different ways of writing semantically equivalent methods, that wouldn't have the same representation in bytecode.
In general, it's impossible to tell what a piece of code does without running it. So it's impossible to tell whether two bits of code are equivalent without running both of them, on all possible inputs.
This is at least as bad, computationally speaking, as the halting problem, and possibly worse.
The halting problem is undecidable as it is, so the general-case answer here is definitely no (and not just for Clojure but for every programming language).
I agree with the above answers in regards to Clojure not having a built in ability to determine the equivalence of two functions and that it has been proven that you can not test programs functionally (also known as black box testing) to determine equality due to the halting problem (unless the input set is finite and defined).
I would like to point out that it is possible to algebraically determine the equivalence of two functions, even if they have different forms (different byte code).
The method for proving the equivalence algebraically was developed in the 1930's by Alonzo Church and is know as beta reduction in Lambda Calculus. This method is certainly applicable to the simple forms in your question (which would also yield the same byte code) and also for more complex forms that would yield different byte codes.
I cannot add to the excellent answers by others, but would like to offer another viewpoint that helped me. If you are e.g. testing that the correct function is returned from your own function, instead of comparing the function object you might get away with just returning the function as a 'symbol.
I know this probably is not what the author asked for but for simple cases it might do.

Pattern matching with associative and commutative operators

Pattern matching (as found in e.g. Prolog, the ML family languages and various expert system shells) normally operates by matching a query against data element by element in strict order.
In domains like automated theorem proving, however, there is a requirement to take into account that some operators are associative and commutative. Suppose we have data
A or B or C
and query
C or $X
Going by surface syntax this doesn't match, but logically it should match with $X bound to A or B because or is associative and commutative.
Is there any existing system, in any language, that does this sort of thing?
Associative-Commutative pattern matching has been around since 1981 and earlier, and is still a hot topic today.
There are lots of systems that implement this idea and make it useful; it means you can avoid write complicated pattern matches when associtivity or commutativity could be used to make the pattern match. Yes, it can be expensive; better the pattern matcher do this automatically, than you do it badly by hand.
You can see an example in a rewrite system for algebra and simple calculus implemented using our program transformation system. In this example, the symbolic language to be processed is defined by grammar rules, and those rules that have A-C properties are marked. Rewrites on trees produced by parsing the symbolic language are automatically extended to match.
The maude term rewriter implements associative and commutative pattern matching.
http://maude.cs.uiuc.edu/
I've never encountered such a thing, and I just had a more detailed look.
There is a sound computational reason for not implementing this by default - one has to essentially generate all combinations of the input before pattern matching, or you have to generate the full cross-product worth of match clauses.
I suspect that the usual way to implement this would be to simply write both patterns (in the binary case), i.e., have patterns for both C or $X and $X or C.
Depending on the underlying organisation of data (it's usually tuples), this pattern matching would involve rearranging the order of tuple elements, which would be weird (particularly in a strongly typed environment!). If it's lists instead, then you're on even shakier ground.
Incidentally, I suspect that the operation you fundamentally want is disjoint union patterns on sets, e.g.:
foo (Or ({C} disjointUnion {X})) = ...
The only programming environment I've seen that deals with sets in any detail would be Isabelle/HOL, and I'm still not sure that you can construct pattern matches over them.
EDIT: It looks like Isabelle's function functionality (rather than fun) will let you define complex non-constructor patterns, except then you have to prove that they are used consistently, and you can't use the code generator anymore.
EDIT 2: The way I implemented similar functionality over n commutative, associative and transitive operators was this:
My terms were of the form A | B | C | D, while queries were of the form B | C | $X, where $X was permitted to match zero or more things. I pre-sorted these using lexographic ordering, so that variables always occurred in the last position.
First, you construct all pairwise matches, ignoring variables for now, and recording those that match according to your rules.
{ (B,B), (C,C) }
If you treat this as a bipartite graph, then you are essentially doing a perfect marriage problem. There exist fast algorithms for finding these.
Assuming you find one, then you gather up everything that does not appear on the left-hand side of your relation (in this example, A and D), and you stuff them into the variable $X, and your match is complete. Obviously you can fail at any stage here, but this will mostly happen if there is no variable free on the RHS, or if there exists a constructor on the LHS that is not matched by anything (preventing you from finding a perfect match).
Sorry if this is a bit muddled. It's been a while since I wrote this code, but I hope this helps you, even a little bit!
For the record, this might not be a good approach in all cases. I had very complex notions of 'match' on subterms (i.e., not simple equality), and so building sets or anything would not have worked. Maybe that'll work in your case though and you can compute disjoint unions directly.