Should we compare floating point numbers for equality against a *relative* error?

Should we compare floating point numbers for equality against a *relative* error? - language-agnostic

So far I've seen many posts dealing with equality of floating point numbers. The standard answer to a question like "how should we decide if x and y are equal?" is
abs(x - y) < epsilon
where epsilon is a fixed, small constant. This is because the "operands" x and y are often the results of some computation where a rounding error is involved, hence the standard equality operator == is not what we mean, and what we should really ask is whether x and y are close, not equal.
Now, I feel that if x is "almost equal" to y, then also x*10^20 should be "almost equal" to y*10^20, in the sense that the relative error should be the same (but "relative" to what?). But with these big numbers, the above test would fail, i.e. that solution does not "scale".
How would you deal with this issue? Should we rescale the numbers or rescale epsilon? How?
(Or is my intuition wrong?)
Here is a related question, but I don't like its accepted answer, for the reinterpret_cast thing seems a bit tricky to me, I don't understand what's going on. Please try to provide a simple test.

It all depends on the specific problem domain. Yes, using relative error will be more correct in the general case, but it can be significantly less efficient since it involves an extra floating-point division. If you know the approximate scale of the numbers in your problem, using an absolute error is acceptable.
This page outlines a number of techniques for comparing floats. It also goes over a number of important issues, such as those with subnormals, infinities, and NaNs. It's a great read, I highly recommend reading it all the way through.

As an alternative solution, why not just round or truncate the numbers and then make a straight comparison? By setting the number of significant digits in advance, you can be certain of the accuracy within that bound.

The problem is that with very big numbers, comparing to epsilon will fail.
Perhaps a better (but slower) solution would be to use division, example:
div(max(a, b), min(a, b)) < eps + 1
Now the 'error' will be relative.

Using relative error is at least not as bad as using absolute errors, but it has subtle problems for values near zero due to rounding issues. A far from perfect, but somewhat robust algorithm combines absolute and relative error approaches:
boolean approxEqual(float a, float b, float absEps, float relEps) {
// Absolute error check needed when comparing numbers near zero.
float diff = abs(a - b);
if (diff <= absEps) {
return true;
}
// Symmetric relative error check without division.
return (diff <= relEps * max(abs(a), abs(b)));
}
I adapted this code from Bruce Dawson's excellent article Comparing Floating Point Numbers, 2012 Edition, a required read for anyone doing floating-point comparisons -- an amazingly complex topic with many pitfalls.

Most of the time when code compares values, it is doing so to answer some sort of question. For example:
If I know what a function returned when given a value of X, can I assume it will return the same thing if given Y?
If I have a method of computing a function which is slow but accurate, I am willing to accept some inaccuracy in exchange for speed, and I want to test a candidate function which seems to fit the bill, are the outputs from that function close enough to the known-accurate one to be considered "correct".
To answer the first question, code should ideally do a bit-wise comparison on the value, though unless a language supports the new operators added to IEEE-754 in 2009 that may be less efficient than ideal. To answer the second question, one should define what degree of accuracy is required and test against that.
I don't think there's much merit in a general-purpose method which regards as equal things which are close, since different applications will have differing requirements for both absolute and relative tolerance, based upon what exact questions the tests are supposed to answer.

Related

In CUDA, is there a way to ensure the consistence of FP maths in the same program?

Is there a way to ensure that:
if a==b then devfun(a)==devfun(b);
where devfun() is a device function involves some floating point maths ops (e.g. polynomials) and returns floating point results, a and b are floating point variables.
I don't care about cross-implentation consistence (e.g. different compiler/different OS/different driver versions or different compiler options), I only care about, within the same building/program, at runtime, can it ensure that during each function call, the result returned by devfun() are consistent in a way such that as long as a==b, devfun(a)==devfun(b)?
I am talking about SM2.0+ hardware and CUDA 5.0+, just in case being relevant.

Let's assume that your numbers a and b represent properly normalized IEEE-754 representation floating point numbers and that niether a nor b is a NaN value. Let's also assume a and b are both 32-bit, or else a and b are both 64-bit (IEEE-754 floating point representations).
In that case, I believe the (ISO C/C++, or CUDA C/C++) floating point test for equality (==) will return TRUE when the two numbers a and b are bitwise identical (and FALSE otherwise).
Under the TRUE case, with one exception, I believe it is safe to assume that devfun(a) == devfun(b) without any additional conditions except the obvious ones: there is no difference in the behavior of devfun on either side of the == operation, that is, it's the same code, compiled in the same way, executed under the same conditions (e.g. other variables that may be taking part in devfun, same GPU type, etc.), just as you've indicated in your question: "same building/program".
The one exception is if the result of devfun(a) is NaN, since (IEEE-754) NaN != NaN.
It would be interesting (to me) if you think you have a piece of code that disproves this assertion.
Perhaps floating point ninjas will come along and correct me.
Perhaps also I would be remiss if I did not say something about the hazards of floating point comparisons. If you're not familiar with this (most folks would never recommend performing a test a==b on two floating point numbers) you can find many questions about it on SO.
For the same reasons that floating point equality comparison (==) in general is unwise, I think relying on the above assertion, even if it's true, is unwise. Let me give you one example.
Suppose you compile code for architecture sm_20. Now you run the code on an sm_21 device. This one simple variation could result in a JIT-compile at runtime. Now you are no longer running the same code, and all bets are off.
So, again, even if the above is true, I think it's unwise for you to rely on such a statement:
if a==b, then devfun(a) == devfun(b)

numerical issues causing the difference in outputs of two programs?

I have two codes that theoretically should return the exact same output. However, this does not happen. The issue is that the two codes handle very small numbers (doubles) to the order of 1e-100 or so. I suspect that there could be some numerical issues which are related to that, and lead to the two outputs being different even though they should be theoretically the same.
Does it indeed make sense that handling numbers on the order of 1e-100 cause such problems? I don't mind the difference in output, if I could safely assume that the source is numerical issues. Does anyone have a good source/reference that talks about issues that come up with stability of algorithms when they handle numbers in such order?
Thanks.

Does anyone have a good source/reference that talks about issues that come up with stability of algorithms when they handle numbers in such order?
The first reference that comes to mind is What Every Computer Scientist Should Know About Floating-Point Arithmetic. It covers floating-point maths in general.
As far as numerical stability is concerned, the best references probably depend on the numerical algorithm in question. Two wide-ranging works that come to mind are:
Numerical Recipes by Press et al;
Matrix Computations by Golub and Van Loan.

It is not necessarily the small numbers that are causing the problem.
How do you check whether the outputs are the "exact same"?
I would check equality with tolerance. You may consider the floating point numbers x and y equal if either fabs(x-y) < 1.0e-6 or fabs(x-y) < fabs(x)*1.0e-6 holds.
Usually, there is a HUGE difference between the two algorithms if there are numerical issues. Often, a small change in the input may result in extreme changes in the output, if the algorithm suffers from numerical issues.
What makes you think that there are "numerical issues"?

If possible, change your algorithm to use Kahan Summation (aka compensated summation). From Wikipedia:
function KahanSum(input)
var sum = 0.0
var c = 0.0 //A running compensation for lost low-order bits.
for i = 1 to input.length do
y = input[i] - c //So far, so good: c is zero.
t = sum + y //Alas, sum is big, y small, so low-order digits of y are lost.
c = (t - sum) - y //(t - sum) recovers the high-order part of y; subtracting y recovers -(low part of y)
sum = t //Algebraically, c should always be zero. Beware eagerly optimising compilers!
//Next time around, the lost low part will be added to y in a fresh attempt.
return sum
This works by keeping a second running total of the cumulative error, similar to the Bresenham line drawing algorithm. The end result is that you get precision that is nearly double the data type's advertised precision.
Another technique I use is to sort my numbers from small to large (by manitude, ignoring sign) and add or subtract the small numbers first, then the larger ones. This has the virtue that if you add and subtract the same value multiple times, such numbers may cancel exactly and can be removed from the list.

When to use sentinel values?

I recently had to use a GPS location API where each location object had among other things two properties altitude and verticalAccuracy. A negative verticalAccuracy signifies that altitude is invalid, whereas normally a smaller but positive value of verticalAccuracy actually means that altitude is more precise (since it's the vertical distance that it may be off by - I'll leave the discussion as to why this measure is called verticalAccuracy and not verticalInaccuracy for some other time).
This got me thinking: When is it a good idea to use sentinel values like this API does and when would it be better to explicitly make a separate hasValidAltitude property? Are there other options?

Sometimes, sentinel answers aren't really possible; maybe the function's range coincides with the codomain (range). This isn't the case with altitude, unless you allow negative altitudes (maybe in the future, there will be underwater cities). For instance, maybe we're talking about the intersection between lines (not a great example, since floating-points have a few built-in sentinels like +INF and NaN) or the precise integer quotient (without rounding, this is not guaranteed to exist... 7 and 3, for instance... here, the remainder after division can be viewed as either a sentinel or a "exact integer quotient exists" property). More generally, any reliable sentinel can be trivially used to construct a property-based mechanism.
Based on this, I'd recommend avoiding sentinels wherever this is possible and makes sense. My reasoning is that they are an internal implementation detail of the module, and should be encapsulated behind an information-hiding interface.

Can coordinates of constructable points be represented exactly?

I'd like to write a program that lets users draw points, lines, and circles as though with a straightedge and compass. Then I want to be able to answer the question, "are these three points collinear?" To answer correctly, I need to avoid rounding error when calculating the points.
Is this possible? How can I represent the points in memory?
(I looked into some unusual numeric libraries, but I didn't find anything that claimed to offer both exact arithmetic and exact comparisons that are guaranteed to terminate.)

Yes.
I highly recommend Introduction to constructions, which is a good basic guide.
Basically you need to be able to compute with constructible numbers - numbers that are either rational, or of the form a + b sqrt(c) where a,b,c were previously created (see page 6 on that PDF). This could be done with algebraic data type (e.g. data C = Rational Integer Integer | Root C C C in Haskell, where Root a b c = a + b sqrt(c)). However, I don't know how to perform tests with that representation.
Two possible approaches are:
Constructible numbers are a subset of algebraic numbers, so you can use algebraic numbers.
All algebraic numbers can be represented using polynomials of whose they are roots. The operations are computable, so if you represent a number a with polynomial p and b with polynomial q (p(a) = q(b) = 0), then it is possible to find a polynomial r such that r(a+b) = 0. This is done in some CASes like Mathematica, example. See also: Computional algebraic number theory - chapter 4
Use Tarski's test and represent numbers. It is slow (doubly exponential or so), but works :) Example: to represent sqrt(2), use the formula x^2 - 2 && x > 0. You can write equations for lines there, check if points are colinear etc. See A suite of logic programs, including Tarski's test
If you turn to computable numbers, then equality, colinearity etc. get undecidable.

I think the only way this would be possible is if you used a symbolic representation,
as opposed to trying to represent coordinate values directly -- so you would have
to avoid trying to coerce values like sqrt(2) into some numerical format. You will
be dealing with irrational numbers that are not finitely representable in binary,
decimal, or any other positional notation.

To expand on Jim Lewis's answer slightly, if you want to operate on points that are constructible from the integers with exact arithmetic, you will need to be able to operate on representations of the form:
a + b sqrt(c)
where a, b, and c are either rational numbers, or representations in the form given above. Wikipedia has a pretty decent article on the subject of what points are constructible.
Answering the question of exact equality (as necessary to establish colinearity) with such representations is a rather tricky problem.

If you try to compare co-ordinates for your points, then you have a problem. Leaving aside co-linearity for a moment, how about just working out whether two points are the same or not?
Supposing that one has given co-ordinates, and the other is a compass-straightedge construction starting from certain other co-ordinates, you want to determine with certainty whether they're the same point or not. Either way is a theorem of Euclidean geometry, it's not something you can just measure. You can prove they aren't the same by spotting some difference in their co-ordinates (for example by computing decimal places of each until you encounter a difference). But in general to prove they are the same cannot be done by approximate methods. Compute as many decimal places as you like of some expansions of 1/sqrt(2) and sqrt(2)/2, and you can prove they're very close together but you won't ever prove they're equal. That takes algebra (or geometry).
Similarly, to show that three points are co-linear you will need theorem-proving software. Represent the points A, B, C by their constructions, and attempt to prove the theorem "A, B and C are colinear". This is very hard - your program will prove some theorems but not others. Much easier is to ask the user for a proof that they are co-linear, and then verify (or refute) that proof, but that's probably not what you want.

In general, constructable points may have an arbitrarily complex symbolic form, so you must use a symbolic representation to work them exactly. As Stephen Canon noted above, you often need numbers of the form a+b*sqrt(c), where a and b are rational and c is an integer. All numbers of this form form a closed set under arithmetic operations. I have written some C++ classes (see rational_radical1.h) to work with these numbers if that is all you need.
It is also possible to construct numbers which are sums of any number of terms of rational multiples of radicals. When dealing with more than a single radicand, the numbers are no longer closed under multiplication and division, so you will need to store them as variable length rational coefficient arrays. The time complexity of operations will then be quadratic in the number of terms.
To go even further, you can construct the square root of any given number, so you could potentially have nested square roots. Here, the representations must be tree-like structures to deal with root hierarchy. While difficult to implement, there is nothing in principle preventing you from working with these representations. I'm not sure just what additional numbers can be constructed, but beyond a certain point, your symbolic representation will be expressive enough to handle very large classes of numbers.
Addendum
Found this Google Books link.

If the grid axes are integer valued then the answer is fairly straight forward, the points are either exactly colinear or they are not.
Typically however, one works with real numbers (well, floating points) and then draws the rounded values on the screen which does exist in integer space. In this case you have no choice but to pick a tolerance and use it to determine colinearity. Keep it small and the users will never know the difference.

You seem to be asking, in effect, "Can the normal mathematics (integer or floating point) used by computers be made to represent real numbers perfectly, with no rounding errors?" And, of course, the answer to that is "No." If you want theoretical correctness, then you will be stuck with the much harder problem of symbolic manipulation and coding up the equivalent of the inferences that are done in geometry. (In short, I'm agreeing with Steve Jessop, above.)

Some thoughts in the hope that they might help.
The sort of constructions you're talking about will require multiplication and division, which means that to preserve exactness you'll have to use rational numbers, which are generally easy to implement on top of a suitable sort of big integer (i.e., of unbounded magnitude). (Common Lisp has these built-in, and there have to be other languages.)
Now, you need to represent square roots of arbitrary numbers, and these have to be mixed in.
Therefore, a number is one of: a rational number, a rational number multiplied by a square root of a rational number (or, alternately, just the square root of a rational), or a sum of numbers. In order to prove anything, you're going to have to get these numbers into some sort of canonical form, which for all I can figure offhand may be annoying and computationally expensive.
This of course means that the users will be restricted to rational points and cannot use arbitrary rotations, but that's probably not important.

I would recommend no to try to make it perfectly exact.
The first reason for this is what you are asking here, the rounding error and all that stuff that comes with floating point calculations.
The second one is that you have to round your input as the mouse and screen work with integers. So, initially all user input would be integers, and your output would be integers.
Beside, from a usability point of view, its easier to click in the neighborhood of another point (in a line for example) and that the interface consider you are clicking in the point itself.

Upper limit of bugs in a given program

Is there an upper limit to the number of bugs contained in a given program? If the number of instructions are known, could one say the program cannot contain more than 'n' bugs? For example, how many bugs could the following function contain?
double calcInterest(double amount) {
return -O.07 / amount;
}
A parser would count four terms in the function, and I could count these errors:
wrong number syntax
wrong interest rate (business requirements error)
wrong calculation (should be multiply)
Potential divide by zero
Clearly the number of bugs is not infinite given a finite number of instructions. Alternatively, one could say the function accepts 2^64 inputs, and of those, how many produce the correct output. However, is there any way to formally prove an upper limit?

If bug is "a requirement not met by the program", then there is no limit on the number of bugs (per line or otherwise), since there is no limit on the number of requirements.
print "hello world"
Might contain a million bugs. It doesn't create a pink elephant. I leave it to the reader to come up with 999999 other requirements not satisfied by this program.

Number of instructions have nothing to do with whether the program does what the user wants it to do. I mean, look at how poorly GCC does balancing my check book. Buggy as all get out, down right useless!

This would all depend on how you define a 'bug'.
If you define a program as a function from some input to some output, and a specification as a definition of that function, and a bug as any difference in output from the specification on a given input, then yes, you can conceivably have countably infinite bugs - however this is a somewhat useless definition of a bug.

The upper limit is the number of states your program can be in. Since this number is finite on real machines you could number the states from 1 to n. For each state you could label if this state is a bug or not. So yes, but even a small program having 16 bytes of memory has 2^128 states and the problem of analyzing all the different states is intractable.

There is a theoretical upper limit for bugs, but for all but the most trivial programs it is very nearly impossible to calculate, although engines such as Pex do give it the old college try.

Law of programming:
"If You will find all compile-time bugs, then n logical ones are still hidden, waiting to surprise You at run-time."

Depends on how you count bugs, which leads me to say "nope, no limit." I don't know about you, but I can easily write several bugs in the same line of code. For instance, how many bugs are in this Java code? :-P
public int addTwoNumbers(int x, String y)
{{
z == x + y;
return y;
}

As little as one if the bug is significant enough.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Should we compare floating point numbers for equality against a relative error? - language-agnostic

As an alternative solution, why not just round or truncate the numbers and then make a straight comparison? By setting the number of significant digits in advance, you can be certain of the accuracy within that bound.

The problem is that with very big numbers, comparing to epsilon will fail. Perhaps a better (but slower) solution would be to use division, example: div(max(a, b), min(a, b)) < eps + 1 Now the 'error' will be relative.

Related

In CUDA, is there a way to ensure the consistence of FP maths in the same program?

numerical issues causing the difference in outputs of two programs?

When to use sentinel values?

Can coordinates of constructable points be represented exactly?

Upper limit of bugs in a given program

Categories

Resources