Static analysis of multiple if statements (conditions) - language-agnostic

I have code similar to:
if conditionA(x, y, z) then doA()
else if conditionB(x, y, z) then doB()
...
else if conditionZ(x, y, z) then doZ()
else throw ShouldNeverHappenException
I would like to validate two things (using static analysis):
If all conditions conditionA, conditionB, ..., conditionZ are mutually exclusive (i.e. it is not possible that two or more conditions are true in the same time).
All possible cases are covered, i.e. "else throw" statement will never be called.
Could you recommend me a tool and/or a way I could (easily) do this?
I would appreciate more detailed informations than "use Prolog" or "use Mathematica"... ;-)
UPDATE:
Let assume that conditionA, conditionB, ..., conditionZ are (pure) functions and x, y, z have "primitive" types.

The item 1. that you want to do is a stylistic issue. The program makes sense even if the conditions are not exclusive. Personally, as an author of static analysis tools, I think that users get enough false alarms without trying to force style on them (and since another programmer would write overlapping conditions on purpose, to that other programmer what you ask would be a false alarm). This said, there are tools that are configurable: for one of those, you could write a rule stating that the cases have to be exclusive when this construct is encountered. And as suggested by Jeffrey, you can wrap your code in a context in which you compute a boolean condition that is true iff there is no overlap, and check that condition instead.
The item 2. is not a style issue: you want to know if the exception can be raised.
The problem is difficult in theory and in practice, so tools usually give up at least one of correctness (never fail to warn if there is an issue) or completeness (never warn for a non-issue). If the types of the variables were unbounded integers, computability theory would state that an analyzer cannot be both correct and complete and terminate for all input programs. But enough with the theory. Some tools give up both correctness and completeness, and that doesn't mean they are not useful either.
An example of tool that is correct is Frama-C's value analysis: if it says that a statement (such as the last case in the sequence of elseifs) is unreachable, you know that it is unreachable. It is not complete, so when it doesn't say that the last statement is unreachable, you don't know.
An example of tool that is complete is Cute: it uses the so-called concolic approach to generate test cases automatically, aiming for structural coverage (that is, it will more or less heuristically try to generate tests that activate the last case once all the others have been taken). Because it generates test cases (each a single, definite input vector on which the code is actually executed), it never warns for a non-problem. This is what it means to be complete. But it may fail to find the test case that causes the last statement to be reached even though there is one: it is not correct.

This appears to be isomorphic to solving a 3-sat equation, which is NP-hard. It is unlikely a static analyzer would attempt to cover this domain, unfortunately.

In the general case this is—as #Michael Donohue ponts out—an NP-hard problem.
But if you have only a reasonable number of conditions to check, you could just write a program that checks all of them.
for (int x = lowestX; x <= highestX; x++)
for (int y ...)
for (int z ...)
{
int conditionsMet = 0;
if conditionA(x, y, z) then conditionsMet++;
if conditionB(x, y, z) then conditionsMet++;
...
if conditionZ(x, y, z) then conditionsMet++;
if (conditionsMet != 1)
PrintInBlinkingRed("Found an exception!", x, y, z)
}

Assuming your conditions are boolean expression (and/or/not) over boolean-valued predicates X,Y,Z, your question is easily solved with a symbolic boolean evaluation engine.
The question about whether they cover all cases is answered by taking a disjunctiton of all the conditions and asking if is a tautology. Wang's algorithm does this just fine.
The question about whether they intersect is answered pairwise; for formulas a and b,
symbolically construct a & b == false and apply Wang's tautology test again.
We've used the DMS Software Reengineering Toolkit to carry out similar boolean value computations (partial evaluations) over preprocessor conditionals in C. DMS provides the ability to parse source code (important if you intend to do this across a large code base and/or repeatedly as you modify your program over time), extract the program fragments, symbolically compose them, and then apply rewriting rules (to carry out boolean simplifications or algorithms such as Wang's).

Related

Do we input only 1s for minterms and 0s for maxterms?

This has been bugging me since a long time.
Suppose I have a boolean function F defined as follows:
Now, it can be expressed in its SOP form as:
F = bar(X)Ybar(Z)+ XYZ
But I fail to understand why we always complement the 0s to express them as 1. Is it assumed that the inputs X, Y and Z will always be 1?
What is the practical application of that? All the youtube videos I watched on this topic, how to express a function in SOP form or as sum of minterms but none of them explained why we need this thing? Why do we need minterms in the first place?
As of now, I believe that we design circuits to yield and take only 1 and that's where minterms come in handy. But I couldn't get any confirmation of this thing anywhere so I am not sure I am right.
Maxterms are even more confusing. Do we design circuits that would yield and take only 0s? Is that the purpose of maxterms?
Why do we need minterms in the first place?
We do not need minterms, we need a way to solve a logic design problem, i.e. given a truth table, find a logic circuit able to reproduce this truth table.
Obviously, this requires a methodology. Minterm and sum-of-products is mean to realize that. Maxterms and product-of-sums is another one. In either case, you get an algebraic representation of your truth table and you can either implement it directly or try to apply standard theorems of boolean algebra to find an equivalent, but simpler, representation.
But these are not the only tools. For instance, with Karnaugh maps, you rewrite your truth table with some rules and you can simultaneously find an algebraic representation and reduce its complexity, and it does not consider minterms. Its main drawback is that it becomes unworkable if the number of inputs rises and it cannot be considered as a general way to solve the problem of logic design.
It happens that minterms (or maxterms) do not have this drawback, and can be used to solve any problem. We get a trut table and we can directly convert it in an equation with ands, ors and nots. Indeed minterms are somehow simpler to human beings than maxterms, but it is just a matter of taste or of a reduced number of parenthesis, they are actually equivalent.
But I fail to understand why we always complement the 0s to express them as 1. Is it assumed that the inputs X, Y and Z will always be 1?
Assume that we have a truth table, with only a given output at 1. For instance, as line 3 of your table. It means that when x=0, y=1 and z=0 , the output will be zero. So, can I express that in boolean logic? With the SOP methodology, we say that we want a solution for this problem that is an "and" of entries or of their complement. And obviously the solution is "x must be false and y must be true and z must be false" or "(not x) must be true and y must be true and (not z) must be true", hence the minterm /x.y./z. So complementing when we have a 0 and leaving unchanged when we have a 1 is way to find the equation that will be true when xyz=010
If I have another table with only one output at 1 (for instance line 8 of your table), we can find similarly that I can implement this TT with x.y.z.
Now if I have a TT with 2 lines at 1, one can use the property of OR gates and do the OR of the previous circuits. when the output of the first one is 1, it will force this behavior and ditto for the second. And we directly get the solution for your table /xy/z+xyz
This can be extended to any number of ones in the TT and gives a systematic way to find an equation equivalent to a truth table.
So just think of minterms and maxterms as a tool to translate a TT into equations. What is important is the truth table (that describes the behaviour of what you want to do) and the equations (that give you a way to realize it).

In CUDA, is there a way to ensure the consistence of FP maths in the same program?

Is there a way to ensure that:
if a==b then devfun(a)==devfun(b);
where devfun() is a device function involves some floating point maths ops (e.g. polynomials) and returns floating point results, a and b are floating point variables.
I don't care about cross-implentation consistence (e.g. different compiler/different OS/different driver versions or different compiler options), I only care about, within the same building/program, at runtime, can it ensure that during each function call, the result returned by devfun() are consistent in a way such that as long as a==b, devfun(a)==devfun(b)?
I am talking about SM2.0+ hardware and CUDA 5.0+, just in case being relevant.
Let's assume that your numbers a and b represent properly normalized IEEE-754 representation floating point numbers and that niether a nor b is a NaN value. Let's also assume a and b are both 32-bit, or else a and b are both 64-bit (IEEE-754 floating point representations).
In that case, I believe the (ISO C/C++, or CUDA C/C++) floating point test for equality (==) will return TRUE when the two numbers a and b are bitwise identical (and FALSE otherwise).
Under the TRUE case, with one exception, I believe it is safe to assume that devfun(a) == devfun(b) without any additional conditions except the obvious ones: there is no difference in the behavior of devfun on either side of the == operation, that is, it's the same code, compiled in the same way, executed under the same conditions (e.g. other variables that may be taking part in devfun, same GPU type, etc.), just as you've indicated in your question: "same building/program".
The one exception is if the result of devfun(a) is NaN, since (IEEE-754) NaN != NaN.
It would be interesting (to me) if you think you have a piece of code that disproves this assertion.
Perhaps floating point ninjas will come along and correct me.
Perhaps also I would be remiss if I did not say something about the hazards of floating point comparisons. If you're not familiar with this (most folks would never recommend performing a test a==b on two floating point numbers) you can find many questions about it on SO.
For the same reasons that floating point equality comparison (==) in general is unwise, I think relying on the above assertion, even if it's true, is unwise. Let me give you one example.
Suppose you compile code for architecture sm_20. Now you run the code on an sm_21 device. This one simple variation could result in a JIT-compile at runtime. Now you are no longer running the same code, and all bets are off.
So, again, even if the above is true, I think it's unwise for you to rely on such a statement:
if a==b, then devfun(a) == devfun(b)

Clojure - test for equality of function expression?

Suppose I have the following clojure functions:
(defn a [x] (* x x))
(def b (fn [x] (* x x)))
(def c (eval (read-string "(defn d [x] (* x x))")))
Is there a way to test for the equality of the function expression - some equivalent of
(eqls a b)
returns true?
It depends on precisely what you mean by "equality of the function expression".
These functions are going to end up as bytecode, so I could for example dump the bytecode corresponding to each function to a byte[] and then compare the two bytecode arrays.
However, there are many different ways of writing semantically equivalent methods, that wouldn't have the same representation in bytecode.
In general, it's impossible to tell what a piece of code does without running it. So it's impossible to tell whether two bits of code are equivalent without running both of them, on all possible inputs.
This is at least as bad, computationally speaking, as the halting problem, and possibly worse.
The halting problem is undecidable as it is, so the general-case answer here is definitely no (and not just for Clojure but for every programming language).
I agree with the above answers in regards to Clojure not having a built in ability to determine the equivalence of two functions and that it has been proven that you can not test programs functionally (also known as black box testing) to determine equality due to the halting problem (unless the input set is finite and defined).
I would like to point out that it is possible to algebraically determine the equivalence of two functions, even if they have different forms (different byte code).
The method for proving the equivalence algebraically was developed in the 1930's by Alonzo Church and is know as beta reduction in Lambda Calculus. This method is certainly applicable to the simple forms in your question (which would also yield the same byte code) and also for more complex forms that would yield different byte codes.
I cannot add to the excellent answers by others, but would like to offer another viewpoint that helped me. If you are e.g. testing that the correct function is returned from your own function, instead of comparing the function object you might get away with just returning the function as a 'symbol.
I know this probably is not what the author asked for but for simple cases it might do.

Function equality on restricted functions

I already posted a question about function equality. It quickly concluded that general function equality is an incredibly hard problem and might be mathematically disprovable.
I would like to stub up a function
function equal(f, g, domain) {
}
f & g are halting functions that take one argument. Their argument is an natural number. These functions will return a boolean.
If no domain is passed then you may assume the domain defaults to all natural numbers.
The structure of domain is whatever is most convenient for the equal function.
Another important fact is that f & g are deterministic. and will consistantly return the same boolean m for f(n).
You may assume that f and g always return and don't throw any exceptions or crash due to errors as long as their input is within the domain
The question is language-agnostic and Asking for an implementation of the equal function. i'm not certain whether SO is the right place for this anymore.
f & g have no side-effects. and the domain does not have to be finite.
It's still not possible.
You could test both functions for some finite number of inputs and check them for equality on those inputs. If they are unequal for any input then the two functions are not identical. If they are equal in every case you tested then there is a reasonable chance that they are the same function, but you can't be completely certain.
In general it is infeasible to test every possible input unless the domain is small. If the domain is a 32 bit integer and your function is quite fast to evaluate then it might be feasible to check every possible input.
I believe the following to be the best you can do without doing static analysis on the source code:
function equal(f, g, domain) {
var n;
for (n in domain) {
if (f(domain[n]) != g(domain[n])) return false;
}
return true;
}
Note that this assumes the domain to be finite.
If the domain is not finite, Rice's theorem prevents such an algorithm from existing:
If we let f and g be the implementations and F and G be the mathematical functions these implementations calculate the values of, then it's Rice's theorem says that it's impossible to determine if f calculates G or g calculates F, as these are non-trivial properties of the implementations.
For further detail, see my answer to the previous question.
Depending on your use-case, you might be able to do some assumptions about f & g . Maybe in your case, they apply under specific conditions what might make it solvable.
In other cases, the only thing what I might recommend is fuzzy testing , on Abstract Syntax Tree or other representation.

Repeated application of functions

Reading this question got me thinking: For a given function f, how can we know that a loop of this form:
while (x > 2)
x = f(x)
will stop for any value x? Is there some simple criterion?
(The fact that f(x) < x for x > 2 doesn't seem to help since the series may converge).
Specifically, can we prove this for sqrt and for log?
For these functions, a proof that ceil(f(x))<x for x > 2 would suffice. You could do one iteration -- to arrive at an integer number, and then proceed by simple induction.
For the general case, probably the best idea is to use well-founded induction to prove this property. However, as Moron pointed out in the comments, this could be impossible in the general case and the right ordering is, in many cases, quite hard to find.
Edit, in reply to Amnon's comment:
If you wanted to use well-founded induction, you would have to define another strict order, that would be well-founded. In case of the functions you mentioned this is not hard: you can take x << y if and only if ceil(x) < ceil(y), where << is a symbol for this new order. This order is of course well-founded on numbers greater then 2, and both sqrt and log are decreasing with respect to it -- so you can apply well-founded induction.
Of course, in general case such an order is much more difficult to find. This is also related, in some way, to total correctness assertions in Hoare logic, where you need to guarantee similar obligations on each loop construct.
There's a general theorem for when then sequence of iterations will converge. (A convergent sequence may not stop in a finite number of steps, but it is getting closer to a target. You can get as close to the target as you like by going far enough out in the sequence.)
The sequence x, f(x), f(f(x)), ... will converge if f is a contraction mapping. That is, there exists a positive constant k < 1 such that for all x and y, |f(x) - f(y)| <= k |x-y|.
(The fact that f(x) < x for x > 2 doesn't seem to help since the series may converge).
If we're talking about floats here, that's not true. If for all x > n f(x) is strictly less than x, it will reach n at some point (because there's only a limited number of floating point values between any two numbers).
Of course this means you need to prove that f(x) is actually less than x using floating point arithmetic (i.e. proving it is less than x mathematically does not suffice, because then f(x) = x may still be true with floats when the difference is not enough).
There is no general algorithm to determine whether a function f and a variable x will end or not in that loop. The Halting problem is reducible to that problem.
For sqrt and log, we could safely do that because we happen to know the mathematical properties of those functions. Say, sqrt approaches 1, log eventually goes negative. So the condition x < 2 has to be false at some point.
Hope that helps.
In the general case, all that can be said is that the loop will terminate when it encounters xi≤2. That doesn't mean that the sequence will converge, nor does it even mean that it is bounded below 2. It only means that the sequence contains a value that is not greater than 2.
That said, any sequence containing a subsequence that converges to a value strictly less than two will (eventually) halt. That is the case for the sequence xi+1 = sqrt(xi), since x converges to 1. In the case of yi+1 = log(yi), it will contain a value less than 2 before becoming undefined for elements of R (though it is well defined on the extended complex plane, C*, but I don't think it will, in general converge except at any stable points that may exist (i.e. where z = log(z)). Ultimately what this means is that you need to perform some upfront analysis on the sequence to better understand its behavior.
The standard test for convergence of a sequence xi to a point z is that give ε > 0, there is an n such that for all i > n, |xi - z| < ε.
As an aside, consider the Mandelbrot Set, M. The test for a particular point c in C for an element in M is whether the sequence zi+1 = zi2 + c is unbounded, which occurs whenever there is a |zi| > 2. Some elements of M may converge (such as 0), but many do not (such as -1).
Sure. For all positive numbers x, the following inequality holds:
log(x) <= x - 1
(this is a pretty basic result from real analysis; it suffices to observe that the second derivative of log is always negative for all positive x, so the function is concave down, and that x-1 is tangent to the function at x = 1). From this it follows essentially immediately that your while loop must terminate within the first ceil(x) - 2 steps -- though in actuality it terminates much, much faster than that.
A similar argument will establish your result for f(x) = sqrt(x); specifically, you can use the fact that:
sqrt(x) <= x/(2 sqrt(2)) + 1/sqrt(2)
for all positive x.
If you're asking whether this result holds for actual programs, instead of mathematically, the answer is a little bit more nuanced, but not much. Basically, many languages don't actually have hard accuracy requirements for the log function, so if your particular language implementation had an absolutely terrible math library this property might fail to hold. That said, it would need to be a really, really terrible library; this property will hold for any reasonable implementation of log.
I suggest reading this wikipedia entry which provides useful pointers. Without additional knowledge about f, nothing can be said.