I already posted a question about function equality. It quickly concluded that general function equality is an incredibly hard problem and might be mathematically disprovable.
I would like to stub up a function
function equal(f, g, domain) {
}
f & g are halting functions that take one argument. Their argument is an natural number. These functions will return a boolean.
If no domain is passed then you may assume the domain defaults to all natural numbers.
The structure of domain is whatever is most convenient for the equal function.
Another important fact is that f & g are deterministic. and will consistantly return the same boolean m for f(n).
You may assume that f and g always return and don't throw any exceptions or crash due to errors as long as their input is within the domain
The question is language-agnostic and Asking for an implementation of the equal function. i'm not certain whether SO is the right place for this anymore.
f & g have no side-effects. and the domain does not have to be finite.
It's still not possible.
You could test both functions for some finite number of inputs and check them for equality on those inputs. If they are unequal for any input then the two functions are not identical. If they are equal in every case you tested then there is a reasonable chance that they are the same function, but you can't be completely certain.
In general it is infeasible to test every possible input unless the domain is small. If the domain is a 32 bit integer and your function is quite fast to evaluate then it might be feasible to check every possible input.
I believe the following to be the best you can do without doing static analysis on the source code:
function equal(f, g, domain) {
var n;
for (n in domain) {
if (f(domain[n]) != g(domain[n])) return false;
}
return true;
}
Note that this assumes the domain to be finite.
If the domain is not finite, Rice's theorem prevents such an algorithm from existing:
If we let f and g be the implementations and F and G be the mathematical functions these implementations calculate the values of, then it's Rice's theorem says that it's impossible to determine if f calculates G or g calculates F, as these are non-trivial properties of the implementations.
For further detail, see my answer to the previous question.
Depending on your use-case, you might be able to do some assumptions about f & g . Maybe in your case, they apply under specific conditions what might make it solvable.
In other cases, the only thing what I might recommend is fuzzy testing , on Abstract Syntax Tree or other representation.
Related
Are there any rules that you follow to determine the order of function arguments? For example, float pow(float x, float exponent) vs float pow(float exponent, float x). For concreteness, C++ could be used, but the question is valid for all programming languages.
My main concern is from the usability point of view, not runtime performance.
Edit:
Some possible bases for ordering could be:
Inputs versus Output
The way a "formula" is usually written, i.e., arguments from left-to-write.
Specificity to the argument to the context of the function, i.e., whether it is a "general" argument, e.g., a singleton object of the system, or specific.
In the example you cite, I think the order was decided on the basis of the mathematical notation xexponent, in which the base is written before the exponent and becomes the left parameter.
I'm not aware of any really sound general principle other than to try to imagine what your users will expect and/or easily remember. People aren't even wholly agreed whether you should write (source, destination) or (destination, source) when copying (compare std::copy with std::memcpy), although I'm pretty sure that the former is now much more common.
There are a whole lot of general conventions, though, followed to different extents by different people:
if the function is considered primarly to act upon a particular object, put it first
parameters that are considered to "configure" the operation of the function come after parameters that are considered the main subject of the function.
out-params come last (but I suspect some people follow the reverse)
To some extent it doesn't really matter -- namely the extent to which your users have IDEs that tell them the parameter order as they type the function name.
In C++ one can overload the == and != operators for user types, but the language doesn't care how you do it. You can overload both to return true no matter what, so !(a==b) and (a!=b) don't necessarily have to evaluate to the same thing. How many other languages have a situation where ¬(a = b) and (a ≠ b) can be different? Is it a common thing?
It's not just an issue of overloads, but of strange corner cases even for primitive types. NaN in C and C++ doesn't compare equal to anything, including NaN. It is true that NaN != NaN in C, but maybe there are similar cases in other languages that cause ¬(a = b) and (a ≠ b) to be different?
Guy L. Steele famously said
...the ability to define your own operator functions means that a simple statement such as x=a+b; in an inner loop might involve the sending of e-mail to Afghanistan.
And as corsiKa says, just because you can do it, doesn't make it a good idea.
I know for a fact that Python and Ruby can and Java and PHP can not. (In Java == determines if two objects are the same thing in memory, not just semantically equivalent values. In PHP...never mind.) I'd also imagine that Lisp and JS can and C can not, but that's a bit more speculative.
It's nothing unusual to be able to overload operators. It is very rare for !(a==b) and (a!=b) to have different results though.
Suppose I have the following clojure functions:
(defn a [x] (* x x))
(def b (fn [x] (* x x)))
(def c (eval (read-string "(defn d [x] (* x x))")))
Is there a way to test for the equality of the function expression - some equivalent of
(eqls a b)
returns true?
It depends on precisely what you mean by "equality of the function expression".
These functions are going to end up as bytecode, so I could for example dump the bytecode corresponding to each function to a byte[] and then compare the two bytecode arrays.
However, there are many different ways of writing semantically equivalent methods, that wouldn't have the same representation in bytecode.
In general, it's impossible to tell what a piece of code does without running it. So it's impossible to tell whether two bits of code are equivalent without running both of them, on all possible inputs.
This is at least as bad, computationally speaking, as the halting problem, and possibly worse.
The halting problem is undecidable as it is, so the general-case answer here is definitely no (and not just for Clojure but for every programming language).
I agree with the above answers in regards to Clojure not having a built in ability to determine the equivalence of two functions and that it has been proven that you can not test programs functionally (also known as black box testing) to determine equality due to the halting problem (unless the input set is finite and defined).
I would like to point out that it is possible to algebraically determine the equivalence of two functions, even if they have different forms (different byte code).
The method for proving the equivalence algebraically was developed in the 1930's by Alonzo Church and is know as beta reduction in Lambda Calculus. This method is certainly applicable to the simple forms in your question (which would also yield the same byte code) and also for more complex forms that would yield different byte codes.
I cannot add to the excellent answers by others, but would like to offer another viewpoint that helped me. If you are e.g. testing that the correct function is returned from your own function, instead of comparing the function object you might get away with just returning the function as a 'symbol.
I know this probably is not what the author asked for but for simple cases it might do.
Imagine a simple (made up) language where functions look like:
function f(a, b) = c + 42
where c = a * b
(Say it's a subset of Lisp that includes 'defun' and 'let'.)
Also imagine that it includes immutable objects that look like:
struct s(a, b, c = a * b)
Again analogizing to Lisp (this time a superset), say a struct definition like that would generate functions for:
make-s(a, b)
s-a(s)
s-b(s)
s-c(s)
Now, given the simple set up, it seems clear that there is a lot of similarity between what happens behind the scenes when you either call 'f' or 'make-s'. Once 'a' and 'b' are supplied at call/instantiate time, there is enough information to compute 'c'.
You could think of instantiating a struct as being like a calling a function, and then storing the resulting symbolic environment for later use when the generated accessor functions are called. Or you could think of a evaluting a function as being like creating a hidden struct and then using it as the symbolic environment with which to evaluate the final result expression.
Is my toy model so oversimplified that it's useless? Or is it actually a helpful way to think about how real languages work? Are there any real languages/implementations that someone without a CS background but with an interest in programming languages (i.e. me) should learn more about in order to explore this concept?
Thanks.
EDIT: Thanks for the answers so far. To elaborate a little, I guess what I'm wondering is if there are any real languages where it's the case that people learning the language are told e.g. "you should think of objects as being essentially closures". Or if there are any real language implementations where it's the case that instantiating an object and calling a function actually share some common (non-trivial, i.e. not just library calls) code or data structures.
Does the analogy I'm making, which I know others have made before, go any deeper than mere analogy in any real situations?
You can't get much purer than lambda calculus: http://en.wikipedia.org/wiki/Lambda_calculus. Lambda calculus is in fact so pure, it only has functions!
A standard way of implementing a pair in lambda calculus is like so:
pair = fn a: fn b: fn x: x a b
first = fn a: fn b: a
second = fn a: fn b: b
So pair a b, what you might call a "struct", is actually a function (fn x: x a b). But it's a special type of function called a closure. A closure is essentially a function (fn x: x a b) plus values for all of the "free" variables (in this case, a and b).
So yes, instantiating a "struct" is like calling a function, but more importantly, the actual "struct" itself is like a special type of function (a closure).
If you think about how you would implement a lambda calculus interpreter, you can see the symmetry from the other side: you could implement a closure as an expression plus a struct containing the values of all the free variables.
Sorry if this is all obvious and you just wanted some real world example...
Both f and make-s are functions, but the resemblance doesn't go much further. Applying f calls the function and executes its code; applying make-s creates a structure.
In most language implementations and modelizations, make-s is a different kind of object from f: f is a closure, whereas make-s is a constructor (in the functional languages and logic meaning, which is close to the object oriented languages meaning).
If you like to think in an object-oriented way, both f and make-s have an apply method, but they have completely different implementations of this method.
If you like to think in terms of the underlying logic, f and make-s have a type build on the samme type constructor (the function type constructor), but they are constructed in different ways and have different destruction rules (function application vs. constructor application).
If you'd like to understand that last paragraph, I recommend Types and Programming Languages by Benjamin C. Pierce. Structures are discussed in §11.8.
Is my toy model so oversimplified that it's useless?
Essentially, yes. Your simplified model basically boils down to saying that each of these operations involves performing a computation and putting the result somewhere. But that is so general, it covers anything that a computer does. If you didn't perform a computation, you wouldn't be doing anything useful. If you didn't put the result somewhere, you would have done work for nothing as you have no way to get the result. So anything useful you do with a computer, from adding two registers together, to fetching a web page, could be modeled as performing a computation and putting the result somewhere that it can be accessed later.
There is a relationship between objects and closures. http://people.csail.mit.edu/gregs/ll1-discuss-archive-html/msg03277.html
The following creates what some might call a function, and others might call an object:
Taken from SICP ( http://mitpress.mit.edu/sicp/full-text/book/book-Z-H-21.html )
(define (make-account balance)
(define (withdraw amount)
(if (>= balance amount)
(begin (set! balance (- balance amount))
balance)
"Insufficient funds"))
(define (deposit amount)
(set! balance (+ balance amount))
balance)
(define (dispatch m)
(cond ((eq? m 'withdraw) withdraw)
((eq? m 'deposit) deposit)
(else (error "Unknown request -- MAKE-ACCOUNT"
m))))
dispatch)
I have code similar to:
if conditionA(x, y, z) then doA()
else if conditionB(x, y, z) then doB()
...
else if conditionZ(x, y, z) then doZ()
else throw ShouldNeverHappenException
I would like to validate two things (using static analysis):
If all conditions conditionA, conditionB, ..., conditionZ are mutually exclusive (i.e. it is not possible that two or more conditions are true in the same time).
All possible cases are covered, i.e. "else throw" statement will never be called.
Could you recommend me a tool and/or a way I could (easily) do this?
I would appreciate more detailed informations than "use Prolog" or "use Mathematica"... ;-)
UPDATE:
Let assume that conditionA, conditionB, ..., conditionZ are (pure) functions and x, y, z have "primitive" types.
The item 1. that you want to do is a stylistic issue. The program makes sense even if the conditions are not exclusive. Personally, as an author of static analysis tools, I think that users get enough false alarms without trying to force style on them (and since another programmer would write overlapping conditions on purpose, to that other programmer what you ask would be a false alarm). This said, there are tools that are configurable: for one of those, you could write a rule stating that the cases have to be exclusive when this construct is encountered. And as suggested by Jeffrey, you can wrap your code in a context in which you compute a boolean condition that is true iff there is no overlap, and check that condition instead.
The item 2. is not a style issue: you want to know if the exception can be raised.
The problem is difficult in theory and in practice, so tools usually give up at least one of correctness (never fail to warn if there is an issue) or completeness (never warn for a non-issue). If the types of the variables were unbounded integers, computability theory would state that an analyzer cannot be both correct and complete and terminate for all input programs. But enough with the theory. Some tools give up both correctness and completeness, and that doesn't mean they are not useful either.
An example of tool that is correct is Frama-C's value analysis: if it says that a statement (such as the last case in the sequence of elseifs) is unreachable, you know that it is unreachable. It is not complete, so when it doesn't say that the last statement is unreachable, you don't know.
An example of tool that is complete is Cute: it uses the so-called concolic approach to generate test cases automatically, aiming for structural coverage (that is, it will more or less heuristically try to generate tests that activate the last case once all the others have been taken). Because it generates test cases (each a single, definite input vector on which the code is actually executed), it never warns for a non-problem. This is what it means to be complete. But it may fail to find the test case that causes the last statement to be reached even though there is one: it is not correct.
This appears to be isomorphic to solving a 3-sat equation, which is NP-hard. It is unlikely a static analyzer would attempt to cover this domain, unfortunately.
In the general case this is—as #Michael Donohue ponts out—an NP-hard problem.
But if you have only a reasonable number of conditions to check, you could just write a program that checks all of them.
for (int x = lowestX; x <= highestX; x++)
for (int y ...)
for (int z ...)
{
int conditionsMet = 0;
if conditionA(x, y, z) then conditionsMet++;
if conditionB(x, y, z) then conditionsMet++;
...
if conditionZ(x, y, z) then conditionsMet++;
if (conditionsMet != 1)
PrintInBlinkingRed("Found an exception!", x, y, z)
}
Assuming your conditions are boolean expression (and/or/not) over boolean-valued predicates X,Y,Z, your question is easily solved with a symbolic boolean evaluation engine.
The question about whether they cover all cases is answered by taking a disjunctiton of all the conditions and asking if is a tautology. Wang's algorithm does this just fine.
The question about whether they intersect is answered pairwise; for formulas a and b,
symbolically construct a & b == false and apply Wang's tautology test again.
We've used the DMS Software Reengineering Toolkit to carry out similar boolean value computations (partial evaluations) over preprocessor conditionals in C. DMS provides the ability to parse source code (important if you intend to do this across a large code base and/or repeatedly as you modify your program over time), extract the program fragments, symbolically compose them, and then apply rewriting rules (to carry out boolean simplifications or algorithms such as Wang's).