Bitwise comparison for 16 bitstrings - binary

I have 16 unrelated binary strings (of the same length). eg. 100000001010, 010100010010 and so on, and I need to find out a bitstring in which position x is a 1 IF position x is 1 for ATLEAST 2 bitstrings out of the 16.
Initially, I tries using bitwise XOR and this works great as long as even number of strings contain a 1, but when odd number of strings contain 1, the answer given is reverse.
A simple example (with 3 strings) would be:
A: 10101010
B: 01010111
C: 11011011
f(A,B,C)= answer
Expected answer: 11011011
Answer I'm getting right now: 11011001
I know I'm wrong somewhere but I'm at a loss on how to proceed
Help much appreciated

You can do something like
unsigned once = x[0], twice = 0;
for (int i = 1; i < 16; ++i) {
twice |= once & x[i];
once |= x[i];
}

(A AND B) OR (A AND C) OR (B AND C)
This is higher complexity than what you had originally.

Related

Equal numbers in arrays are not equal in Octave [duplicate]

I'm currently writing some code where I have something along the lines of:
double a = SomeCalculation1();
double b = SomeCalculation2();
if (a < b)
DoSomething2();
else if (a > b)
DoSomething3();
And then in other places I may need to do equality:
double a = SomeCalculation3();
double b = SomeCalculation4();
if (a == 0.0)
DoSomethingUseful(1 / a);
if (b == 0.0)
return 0; // or something else here
In short, I have lots of floating point math going on and I need to do various comparisons for conditions. I can't convert it to integer math because such a thing is meaningless in this context.
I've read before that floating point comparisons can be unreliable, since you can have things like this going on:
double a = 1.0 / 3.0;
double b = a + a + a;
if ((3 * a) != b)
Console.WriteLine("Oh no!");
In short, I'd like to know: How can I reliably compare floating point numbers (less than, greater than, equality)?
The number range I am using is roughly from 10E-14 to 10E6, so I do need to work with small numbers as well as large.
I've tagged this as language agnostic because I'm interested in how I can accomplish this no matter what language I'm using.
TL;DR
Use the following function instead of the currently accepted solution to avoid some undesirable results in certain limit cases, while being potentially more efficient.
Know the expected imprecision you have on your numbers and feed them accordingly in the comparison function.
bool nearly_equal(
float a, float b,
float epsilon = 128 * FLT_EPSILON, float abs_th = FLT_MIN)
// those defaults are arbitrary and could be removed
{
assert(std::numeric_limits<float>::epsilon() <= epsilon);
assert(epsilon < 1.f);
if (a == b) return true;
auto diff = std::abs(a-b);
auto norm = std::min((std::abs(a) + std::abs(b)), std::numeric_limits<float>::max());
// or even faster: std::min(std::abs(a + b), std::numeric_limits<float>::max());
// keeping this commented out until I update figures below
return diff < std::max(abs_th, epsilon * norm);
}
Graphics, please?
When comparing floating point numbers, there are two "modes".
The first one is the relative mode, where the difference between x and y is considered relatively to their amplitude |x| + |y|. When plot in 2D, it gives the following profile, where green means equality of x and y. (I took an epsilon of 0.5 for illustration purposes).
The relative mode is what is used for "normal" or "large enough" floating points values. (More on that later).
The second one is an absolute mode, when we simply compare their difference to a fixed number. It gives the following profile (again with an epsilon of 0.5 and a abs_th of 1 for illustration).
This absolute mode of comparison is what is used for "tiny" floating point values.
Now the question is, how do we stitch together those two response patterns.
In Michael Borgwardt's answer, the switch is based on the value of diff, which should be below abs_th (Float.MIN_NORMAL in his answer). This switch zone is shown as hatched in the graph below.
Because abs_th * epsilon is smaller that abs_th, the green patches do not stick together, which in turn gives the solution a bad property: we can find triplets of numbers such that x < y_1 < y_2 and yet x == y2 but x != y1.
Take this striking example:
x = 4.9303807e-32
y1 = 4.930381e-32
y2 = 4.9309825e-32
We have x < y1 < y2, and in fact y2 - x is more than 2000 times larger than y1 - x. And yet with the current solution,
nearlyEqual(x, y1, 1e-4) == False
nearlyEqual(x, y2, 1e-4) == True
By contrast, in the solution proposed above, the switch zone is based on the value of |x| + |y|, which is represented by the hatched square below. It ensures that both zones connects gracefully.
Also, the code above does not have branching, which could be more efficient. Consider that operations such as max and abs, which a priori needs branching, often have dedicated assembly instructions. For this reason, I think this approach is superior to another solution that would be to fix Michael's nearlyEqual by changing the switch from diff < abs_th to diff < eps * abs_th, which would then produce essentially the same response pattern.
Where to switch between relative and absolute comparison?
The switch between those modes is made around abs_th, which is taken as FLT_MIN in the accepted answer. This choice means that the representation of float32 is what limits the precision of our floating point numbers.
This does not always make sense. For example, if the numbers you compare are the results of a subtraction, perhaps something in the range of FLT_EPSILON makes more sense. If they are squared roots of subtracted numbers, the numerical imprecision could be even higher.
It is rather obvious when you consider comparing a floating point with 0. Here, any relative comparison will fail, because |x - 0| / (|x| + 0) = 1. So the comparison needs to switch to absolute mode when x is on the order of the imprecision of your computation -- and rarely is it as low as FLT_MIN.
This is the reason for the introduction of the abs_th parameter above.
Also, by not multiplying abs_th with epsilon, the interpretation of this parameter is simple and correspond to the level of numerical precision that we expect on those numbers.
Mathematical rumbling
(kept here mostly for my own pleasure)
More generally I assume that a well-behaved floating point comparison operator =~ should have some basic properties.
The following are rather obvious:
self-equality: a =~ a
symmetry: a =~ b implies b =~ a
invariance by opposition: a =~ b implies -a =~ -b
(We don't have a =~ b and b =~ c implies a =~ c, =~ is not an equivalence relationship).
I would add the following properties that are more specific to floating point comparisons
if a < b < c, then a =~ c implies a =~ b (closer values should also be equal)
if a, b, m >= 0 then a =~ b implies a + m =~ b + m (larger values with the same difference should also be equal)
if 0 <= λ < 1 then a =~ b implies λa =~ λb (perhaps less obvious to argument for).
Those properties already give strong constrains on possible near-equality functions. The function proposed above verifies them. Perhaps one or several otherwise obvious properties are missing.
When one think of =~ as a family of equality relationship =~[Ɛ,t] parameterized by Ɛ and abs_th, one could also add
if Ɛ1 < Ɛ2 then a =~[Ɛ1,t] b implies a =~[Ɛ2,t] b (equality for a given tolerance implies equality at a higher tolerance)
if t1 < t2 then a =~[Ɛ,t1] b implies a =~[Ɛ,t2] b (equality for a given imprecision implies equality at a higher imprecision)
The proposed solution also verifies these.
Comparing for greater/smaller is not really a problem unless you're working right at the edge of the float/double precision limit.
For a "fuzzy equals" comparison, this (Java code, should be easy to adapt) is what I came up with for The Floating-Point Guide after a lot of work and taking into account lots of criticism:
public static boolean nearlyEqual(float a, float b, float epsilon) {
final float absA = Math.abs(a);
final float absB = Math.abs(b);
final float diff = Math.abs(a - b);
if (a == b) { // shortcut, handles infinities
return true;
} else if (a == 0 || b == 0 || diff < Float.MIN_NORMAL) {
// a or b is zero or both are extremely close to it
// relative error is less meaningful here
return diff < (epsilon * Float.MIN_NORMAL);
} else { // use relative error
return diff / (absA + absB) < epsilon;
}
}
It comes with a test suite. You should immediately dismiss any solution that doesn't, because it is virtually guaranteed to fail in some edge cases like having one value 0, two very small values opposite of zero, or infinities.
An alternative (see link above for more details) is to convert the floats' bit patterns to integer and accept everything within a fixed integer distance.
In any case, there probably isn't any solution that is perfect for all applications. Ideally, you'd develop/adapt your own with a test suite covering your actual use cases.
I had the problem of Comparing floating point numbers A < B and A > B
Here is what seems to work:
if(A - B < Epsilon) && (fabs(A-B) > Epsilon)
{
printf("A is less than B");
}
if (A - B > Epsilon) && (fabs(A-B) > Epsilon)
{
printf("A is greater than B");
}
The fabs--absolute value-- takes care of if they are essentially equal.
We have to choose a tolerance level to compare float numbers. For example,
final float TOLERANCE = 0.00001;
if (Math.abs(f1 - f2) < TOLERANCE)
Console.WriteLine("Oh yes!");
One note. Your example is rather funny.
double a = 1.0 / 3.0;
double b = a + a + a;
if (a != b)
Console.WriteLine("Oh no!");
Some maths here
a = 1/3
b = 1/3 + 1/3 + 1/3 = 1.
1/3 != 1
Oh, yes..
Do you mean
if (b != 1)
Console.WriteLine("Oh no!")
Idea I had for floating point comparison in swift
infix operator ~= {}
func ~= (a: Float, b: Float) -> Bool {
return fabsf(a - b) < Float(FLT_EPSILON)
}
func ~= (a: CGFloat, b: CGFloat) -> Bool {
return fabs(a - b) < CGFloat(FLT_EPSILON)
}
func ~= (a: Double, b: Double) -> Bool {
return fabs(a - b) < Double(FLT_EPSILON)
}
Adaptation to PHP from Michael Borgwardt & bosonix's answer:
class Comparison
{
const MIN_NORMAL = 1.17549435E-38; //from Java Specs
// from http://floating-point-gui.de/errors/comparison/
public function nearlyEqual($a, $b, $epsilon = 0.000001)
{
$absA = abs($a);
$absB = abs($b);
$diff = abs($a - $b);
if ($a == $b) {
return true;
} else {
if ($a == 0 || $b == 0 || $diff < self::MIN_NORMAL) {
return $diff < ($epsilon * self::MIN_NORMAL);
} else {
return $diff / ($absA + $absB) < $epsilon;
}
}
}
}
You should ask yourself why you are comparing the numbers. If you know the purpose of the comparison then you should also know the required accuracy of your numbers. That is different in each situation and each application context. But in pretty much all practical cases there is a required absolute accuracy. It is only very seldom that a relative accuracy is applicable.
To give an example: if your goal is to draw a graph on the screen, then you likely want floating point values to compare equal if they map to the same pixel on the screen. If the size of your screen is 1000 pixels, and your numbers are in the 1e6 range, then you likely will want 100 to compare equal to 200.
Given the required absolute accuracy, then the algorithm becomes:
public static ComparisonResult compare(float a, float b, float accuracy)
{
if (isnan(a) || isnan(b)) // if NaN needs to be supported
return UNORDERED;
if (a == b) // short-cut and takes care of infinities
return EQUAL;
if (abs(a-b) < accuracy) // comparison wrt. the accuracy
return EQUAL;
if (a < b) // larger / smaller
return SMALLER;
else
return LARGER;
}
The standard advice is to use some small "epsilon" value (chosen depending on your application, probably), and consider floats that are within epsilon of each other to be equal. e.g. something like
#define EPSILON 0.00000001
if ((a - b) < EPSILON && (b - a) < EPSILON) {
printf("a and b are about equal\n");
}
A more complete answer is complicated, because floating point error is extremely subtle and confusing to reason about. If you really care about equality in any precise sense, you're probably seeking a solution that doesn't involve floating point.
I tried writing an equality function with the above comments in mind. Here's what I came up with:
Edit: Change from Math.Max(a, b) to Math.Max(Math.Abs(a), Math.Abs(b))
static bool fpEqual(double a, double b)
{
double diff = Math.Abs(a - b);
double epsilon = Math.Max(Math.Abs(a), Math.Abs(b)) * Double.Epsilon;
return (diff < epsilon);
}
Thoughts? I still need to work out a greater than, and a less than as well.
I came up with a simple approach to adjusting the size of epsilon to the size of the numbers being compared. So, instead of using:
iif(abs(a - b) < 1e-6, "equal", "not")
if a and b can be large, I changed that to:
iif(abs(a - b) < (10 ^ -abs(7 - log(a))), "equal", "not")
I suppose that doesn't satisfy all the theoretical issues discussed in the other answers, but it has the advantage of being one line of code, so it can be used in an Excel formula or an Access query without needing a VBA function.
I did a search to see if others have used this method and I didn't find anything. I tested it in my application and it seems to be working well. So it seems to be a method that is adequate for contexts that don't require the complexity of the other answers. But I wonder if it has a problem I haven't thought of since no one else seems to be using it.
If there's a reason the test with the log is not valid for simple comparisons of numbers of various sizes, please say why in a comment.
You need to take into account that the truncation error is a relative one. Two numbers are about equal if their difference is about as large as their ulp (Unit in the last place).
However, if you do floating point calculations, your error potential goes up with every operation (esp. careful with subtractions!), so your error tolerance needs to increase accordingly.
The best way to compare doubles for equality/inequality is by taking the absolute value of their difference and comparing it to a small enough (depending on your context) value.
double eps = 0.000000001; //for instance
double a = someCalc1();
double b = someCalc2();
double diff = Math.abs(a - b);
if (diff < eps) {
//equal
}

How to look at a certain bit in C programming?

I'm having trouble trying to find a function to look at a certain bit. If, for example, I had a binary number of 1111 1111 1111 1011, and I wanted to just look at the most significant bit ( the bit all the way to the left, in this case 1) what function could I use to just look at that bit?
The program is to test if a binary number is positive or negative. I started off by using hex number 0x0005, and then using a two's compliment function to make it negative. But now, I need a way to check if the first bit is 1 or 0 and to return a value out of that. The integer n would be equal to 1 or 0 depending on if it is negative or positive. My code is as follows:
#include <msp430.h>
signed long x=0x0005;
int y,i,n;
void main(void)
{
y=~x;
i=y+1;
}
There are two main ways I have done something like this in the past. The first is a bit mask which you would use if you always are checking the exact same bit(s). For example:
#define MASK 0x80000000
// Return value of "0" means the bit wasn't set, "1" means the bit was.
// You can check as many bits as you want with this call.
int ApplyMask(int number) {
return number & MASK;
}
Second is a bit shift, then a mask (for getting an arbitrary bit):
int CheckBit(int number, int bitIndex) {
return number & (1 << bitIndex);
}
One or the other of these should do what you are looking for. Best of luck!
bool isSetBit (signed long number, int bit)
{
assert ((bit >= 0) && (bit < (sizeof (signed long) * 8)));
return (number & (((signed long) 1) << bit)) != 0;
}
To check the sign bit:
if (isSetBit (y, sizeof (y) * 8 - 1))
...

Exponent Binary Numbers

Could someone tell me the logic behind exponenting binary numbers? For example, I want to take 110^10, but I don't know the logic behind it. If someone could supply me with that, it'd be a great help.. (And I want it to be done in pure binary with no conversions and no looping multiplication. Just logic...)
peenut is correct in that exponentiation doesn't care what base you're representing your numbers in, and I don't know what you mean by "just logic," but here's a stab at it.
A quick search over at Wikipedia reveals this algorithm. The basic ideas is to square your base, store the result, and then square the result and repeat. This will give you the factors of your answer, which you can then multiply together. I think of it as a "binary search"-flavored exponentiation algorithm since you can skip a lot of intermediate steps by squaring and storing.
Binary exponents are very easy. They are simply additions and shifts only.
the number 110 is where you start.
Working backwards from the number 10 - (i.e. 0) - it's a zero, so this means "do not add it in."
Now you shift left - so 110 becomes 1100
Now you work on the next bit of the 10 (i.e. 1) - it's a one, so this means "add this to the result" - it's 0 so far, because we didn't already add it, so the result is now 1100
there are no more bits to do - so the answer is 1100
If you were doing 110^110 - you would have one more to do - so - you again shift and get 11000 now.
The last bit is again a one, so now you add:
1100 +
11000 =
100100
110^10=1100 i.e. 6^2=12
110^110=100100 i.e. 6^6=36
Exponentiation is operation that is independent of actual textual representation of number (e.g. in base 2 - binary, base 10 - decimal).
Maybe you want to ask about binary XOR (eXclusive OR) operation?
Unfortunately the easiest way for your computer to handle simple exponents is your "looping multiplication" (or the naïve approach), which is the most rudimentary (and literal) way of handling it. As #user1561358 commented, it is NOT just binary adds and shifts. That is multiplication. To raise 66 (110110) the naïve approach has you multiplying the base n times (as below):
110
x 110
--------------
100100 = 36
x 110
--------------
11011000 = 216
x 110
--------------
10100010000 = 1296
x 110
--------------
1111001100000 = 7776
x 110
--------------
01011011001000000 = 46656
The simple code for a naïve multiplication is elegant for most applications:
long long binpow(long long a, long long b) {
if (b == 0)
return 1;
long long res = binpow(a, b / 2);
if (b % 2)
return res * res * a;
else
return res * res;
}
For larger or arbitrary exponents you can dramatically reduce the number of calculations by applying Horner's Method, explained in great detail in this video specifically calculating binary exponents.
In essence, you are just multiplying the bits with non-zero exponents. Let's look at 11021102, (or 66):
11021102 breaks down into the following exponents:
There is no "1" bit set so 61 won't be multiplied, but we do have the two and four bits to calculate:
6102 = 36
61002 = 1296
So, 66 = 36 x 1296 = 46656
The above code can be modified only slightly to check for non-zero exponents with a while {.. test:
long long binpow(long long a, long long b) {
long long res = 1;
while (b > 0) {
if (b & 1)
res = res * a;
a = a * a;
b >>= 1;
}
return res;
}
To really see the advantage of this let's try the binary exponentiation of
11121000000002, which is 7256.
The naïve approach would require us to make 256 multiplication iterations!
Instead, all the exponents except 2256 are zero, so they are skipped in the while loop. There is one single iterative calculation where a * a happens 256 times:
11121000000002 = (a 718 digit binary beginning with 11001101011....)
728 = 2213595400046048155450188615474945937162517050260073069916366390524704974007989996848003433837940380782794455262312607598867363425940560014856027866381946458951205837379116473663246733509680721264246243189632348313601

Addition as binary operations

I'm adding a pair of unsigned 32bit binary integers (including overflow). The addition is expressive rather than actually computed, so there's no need for an efficient algorithm, but since each component is manually specified in terms of individual bits, I need one with a compact representation. Any suggestions?
Edit: In terms of boolean operators. So I'm thinking that carry = a & b; sum = a ^ b; for the first bit, but the other 31?
Oh, and subtraction!
You can not perform addition with simple boolean operators, you need an adder. (Of course the adder can be built using some more complex boolean operators.)
The adder adds two bits plus carry, and passes carry out to next bit.
Pseudocode:
carry = 0
for i = 31 to 0
sum = a[i] + b[i] + carry
result[i] = sum & 1
carry = sum >> 1
next i
Here is an implementation using the macro language of VEDIT text editor.
The two numbers to be added are given as ASCII strings, one on each line.
The results are inserted on the third line.
Reg_Empty(10) // result as ASCII string
#0 = 0 // carry bit
for (#9=31; #9>=0; #9--) {
#1 = CC(#9)-'0' // a bit from first number
#2 = CC(#9+34)-'0' // a bit from second number
#3 = #0+#1+#2 // add with carry
#4 = #3 & 1 // resulting bit
#0 = #3 >> 1 // new carry
Num_Str(#4, 11, LEFT) // convert bit to ASCII
Reg_Set(10, #11, INSERT) // insert bit to start of string
}
Line(2)
Reg_Ins(10) IN
Return
Example input and output:
00010011011111110101000111100001
00110110111010101100101101110111
01001010011010100001110101011000
Edit:
Here is pseudocode where the adder has been implemented with boolean operations:
carry = 0
for i = 31 to 0
sum[i] = a[i] ^ b[i] ^ carry
carry = (a[i] & b[i]) | (a[i] & carry) | (b[i] & carry)
next i
Perhaps you can begin by stating addition for two 1-bit numbers, with overflow (=carry):
A | B | SUM | CARRY
===================
0 0 0 0
0 1 1 0
1 0 1 0
1 1 0 1
To generalize this further, you need a "full adder" which also takes a carry as an input, from the preceding stage. Then you can express the 32-bit addition as a chain of 32 such full adders (with the first stage's carry input tied to 0).
Regarding data structure part to represent these numbers. There are 4 ways
1) Bit Array
A bit array is an array data structure that compactly stores individual bits.
They are also known as bitmap, bitset or bitstring.
2) Bit Field
A bit field is a common idiom used in computer programming to compactly store multiple logical values as a short series of bits where each of the single bits can be addressed separately.
3) Bit Plane
A bit plane of a digital discrete signal (such as image or sound) is a set of bits corresponding to a given bit position in each of the binary numbers representing the signal.
4) Bit Board
A bitboard or bit field is a format that stuffs a whole group of related boolean variables into the same integer, typically representing positions on a board game.
Regarding implementation, you can check that at each step, we have following
S = a xor b xor c
S is result of sum of current bits a an b
c is input carry
Cout - output carry is (a & b) xor (c & (a xor b))

How should I do floating point comparison?

I'm currently writing some code where I have something along the lines of:
double a = SomeCalculation1();
double b = SomeCalculation2();
if (a < b)
DoSomething2();
else if (a > b)
DoSomething3();
And then in other places I may need to do equality:
double a = SomeCalculation3();
double b = SomeCalculation4();
if (a == 0.0)
DoSomethingUseful(1 / a);
if (b == 0.0)
return 0; // or something else here
In short, I have lots of floating point math going on and I need to do various comparisons for conditions. I can't convert it to integer math because such a thing is meaningless in this context.
I've read before that floating point comparisons can be unreliable, since you can have things like this going on:
double a = 1.0 / 3.0;
double b = a + a + a;
if ((3 * a) != b)
Console.WriteLine("Oh no!");
In short, I'd like to know: How can I reliably compare floating point numbers (less than, greater than, equality)?
The number range I am using is roughly from 10E-14 to 10E6, so I do need to work with small numbers as well as large.
I've tagged this as language agnostic because I'm interested in how I can accomplish this no matter what language I'm using.
TL;DR
Use the following function instead of the currently accepted solution to avoid some undesirable results in certain limit cases, while being potentially more efficient.
Know the expected imprecision you have on your numbers and feed them accordingly in the comparison function.
bool nearly_equal(
float a, float b,
float epsilon = 128 * FLT_EPSILON, float abs_th = FLT_MIN)
// those defaults are arbitrary and could be removed
{
assert(std::numeric_limits<float>::epsilon() <= epsilon);
assert(epsilon < 1.f);
if (a == b) return true;
auto diff = std::abs(a-b);
auto norm = std::min((std::abs(a) + std::abs(b)), std::numeric_limits<float>::max());
// or even faster: std::min(std::abs(a + b), std::numeric_limits<float>::max());
// keeping this commented out until I update figures below
return diff < std::max(abs_th, epsilon * norm);
}
Graphics, please?
When comparing floating point numbers, there are two "modes".
The first one is the relative mode, where the difference between x and y is considered relatively to their amplitude |x| + |y|. When plot in 2D, it gives the following profile, where green means equality of x and y. (I took an epsilon of 0.5 for illustration purposes).
The relative mode is what is used for "normal" or "large enough" floating points values. (More on that later).
The second one is an absolute mode, when we simply compare their difference to a fixed number. It gives the following profile (again with an epsilon of 0.5 and a abs_th of 1 for illustration).
This absolute mode of comparison is what is used for "tiny" floating point values.
Now the question is, how do we stitch together those two response patterns.
In Michael Borgwardt's answer, the switch is based on the value of diff, which should be below abs_th (Float.MIN_NORMAL in his answer). This switch zone is shown as hatched in the graph below.
Because abs_th * epsilon is smaller that abs_th, the green patches do not stick together, which in turn gives the solution a bad property: we can find triplets of numbers such that x < y_1 < y_2 and yet x == y2 but x != y1.
Take this striking example:
x = 4.9303807e-32
y1 = 4.930381e-32
y2 = 4.9309825e-32
We have x < y1 < y2, and in fact y2 - x is more than 2000 times larger than y1 - x. And yet with the current solution,
nearlyEqual(x, y1, 1e-4) == False
nearlyEqual(x, y2, 1e-4) == True
By contrast, in the solution proposed above, the switch zone is based on the value of |x| + |y|, which is represented by the hatched square below. It ensures that both zones connects gracefully.
Also, the code above does not have branching, which could be more efficient. Consider that operations such as max and abs, which a priori needs branching, often have dedicated assembly instructions. For this reason, I think this approach is superior to another solution that would be to fix Michael's nearlyEqual by changing the switch from diff < abs_th to diff < eps * abs_th, which would then produce essentially the same response pattern.
Where to switch between relative and absolute comparison?
The switch between those modes is made around abs_th, which is taken as FLT_MIN in the accepted answer. This choice means that the representation of float32 is what limits the precision of our floating point numbers.
This does not always make sense. For example, if the numbers you compare are the results of a subtraction, perhaps something in the range of FLT_EPSILON makes more sense. If they are squared roots of subtracted numbers, the numerical imprecision could be even higher.
It is rather obvious when you consider comparing a floating point with 0. Here, any relative comparison will fail, because |x - 0| / (|x| + 0) = 1. So the comparison needs to switch to absolute mode when x is on the order of the imprecision of your computation -- and rarely is it as low as FLT_MIN.
This is the reason for the introduction of the abs_th parameter above.
Also, by not multiplying abs_th with epsilon, the interpretation of this parameter is simple and correspond to the level of numerical precision that we expect on those numbers.
Mathematical rumbling
(kept here mostly for my own pleasure)
More generally I assume that a well-behaved floating point comparison operator =~ should have some basic properties.
The following are rather obvious:
self-equality: a =~ a
symmetry: a =~ b implies b =~ a
invariance by opposition: a =~ b implies -a =~ -b
(We don't have a =~ b and b =~ c implies a =~ c, =~ is not an equivalence relationship).
I would add the following properties that are more specific to floating point comparisons
if a < b < c, then a =~ c implies a =~ b (closer values should also be equal)
if a, b, m >= 0 then a =~ b implies a + m =~ b + m (larger values with the same difference should also be equal)
if 0 <= λ < 1 then a =~ b implies λa =~ λb (perhaps less obvious to argument for).
Those properties already give strong constrains on possible near-equality functions. The function proposed above verifies them. Perhaps one or several otherwise obvious properties are missing.
When one think of =~ as a family of equality relationship =~[Ɛ,t] parameterized by Ɛ and abs_th, one could also add
if Ɛ1 < Ɛ2 then a =~[Ɛ1,t] b implies a =~[Ɛ2,t] b (equality for a given tolerance implies equality at a higher tolerance)
if t1 < t2 then a =~[Ɛ,t1] b implies a =~[Ɛ,t2] b (equality for a given imprecision implies equality at a higher imprecision)
The proposed solution also verifies these.
Comparing for greater/smaller is not really a problem unless you're working right at the edge of the float/double precision limit.
For a "fuzzy equals" comparison, this (Java code, should be easy to adapt) is what I came up with for The Floating-Point Guide after a lot of work and taking into account lots of criticism:
public static boolean nearlyEqual(float a, float b, float epsilon) {
final float absA = Math.abs(a);
final float absB = Math.abs(b);
final float diff = Math.abs(a - b);
if (a == b) { // shortcut, handles infinities
return true;
} else if (a == 0 || b == 0 || diff < Float.MIN_NORMAL) {
// a or b is zero or both are extremely close to it
// relative error is less meaningful here
return diff < (epsilon * Float.MIN_NORMAL);
} else { // use relative error
return diff / (absA + absB) < epsilon;
}
}
It comes with a test suite. You should immediately dismiss any solution that doesn't, because it is virtually guaranteed to fail in some edge cases like having one value 0, two very small values opposite of zero, or infinities.
An alternative (see link above for more details) is to convert the floats' bit patterns to integer and accept everything within a fixed integer distance.
In any case, there probably isn't any solution that is perfect for all applications. Ideally, you'd develop/adapt your own with a test suite covering your actual use cases.
I had the problem of Comparing floating point numbers A < B and A > B
Here is what seems to work:
if(A - B < Epsilon) && (fabs(A-B) > Epsilon)
{
printf("A is less than B");
}
if (A - B > Epsilon) && (fabs(A-B) > Epsilon)
{
printf("A is greater than B");
}
The fabs--absolute value-- takes care of if they are essentially equal.
We have to choose a tolerance level to compare float numbers. For example,
final float TOLERANCE = 0.00001;
if (Math.abs(f1 - f2) < TOLERANCE)
Console.WriteLine("Oh yes!");
One note. Your example is rather funny.
double a = 1.0 / 3.0;
double b = a + a + a;
if (a != b)
Console.WriteLine("Oh no!");
Some maths here
a = 1/3
b = 1/3 + 1/3 + 1/3 = 1.
1/3 != 1
Oh, yes..
Do you mean
if (b != 1)
Console.WriteLine("Oh no!")
Idea I had for floating point comparison in swift
infix operator ~= {}
func ~= (a: Float, b: Float) -> Bool {
return fabsf(a - b) < Float(FLT_EPSILON)
}
func ~= (a: CGFloat, b: CGFloat) -> Bool {
return fabs(a - b) < CGFloat(FLT_EPSILON)
}
func ~= (a: Double, b: Double) -> Bool {
return fabs(a - b) < Double(FLT_EPSILON)
}
Adaptation to PHP from Michael Borgwardt & bosonix's answer:
class Comparison
{
const MIN_NORMAL = 1.17549435E-38; //from Java Specs
// from http://floating-point-gui.de/errors/comparison/
public function nearlyEqual($a, $b, $epsilon = 0.000001)
{
$absA = abs($a);
$absB = abs($b);
$diff = abs($a - $b);
if ($a == $b) {
return true;
} else {
if ($a == 0 || $b == 0 || $diff < self::MIN_NORMAL) {
return $diff < ($epsilon * self::MIN_NORMAL);
} else {
return $diff / ($absA + $absB) < $epsilon;
}
}
}
}
You should ask yourself why you are comparing the numbers. If you know the purpose of the comparison then you should also know the required accuracy of your numbers. That is different in each situation and each application context. But in pretty much all practical cases there is a required absolute accuracy. It is only very seldom that a relative accuracy is applicable.
To give an example: if your goal is to draw a graph on the screen, then you likely want floating point values to compare equal if they map to the same pixel on the screen. If the size of your screen is 1000 pixels, and your numbers are in the 1e6 range, then you likely will want 100 to compare equal to 200.
Given the required absolute accuracy, then the algorithm becomes:
public static ComparisonResult compare(float a, float b, float accuracy)
{
if (isnan(a) || isnan(b)) // if NaN needs to be supported
return UNORDERED;
if (a == b) // short-cut and takes care of infinities
return EQUAL;
if (abs(a-b) < accuracy) // comparison wrt. the accuracy
return EQUAL;
if (a < b) // larger / smaller
return SMALLER;
else
return LARGER;
}
The standard advice is to use some small "epsilon" value (chosen depending on your application, probably), and consider floats that are within epsilon of each other to be equal. e.g. something like
#define EPSILON 0.00000001
if ((a - b) < EPSILON && (b - a) < EPSILON) {
printf("a and b are about equal\n");
}
A more complete answer is complicated, because floating point error is extremely subtle and confusing to reason about. If you really care about equality in any precise sense, you're probably seeking a solution that doesn't involve floating point.
I tried writing an equality function with the above comments in mind. Here's what I came up with:
Edit: Change from Math.Max(a, b) to Math.Max(Math.Abs(a), Math.Abs(b))
static bool fpEqual(double a, double b)
{
double diff = Math.Abs(a - b);
double epsilon = Math.Max(Math.Abs(a), Math.Abs(b)) * Double.Epsilon;
return (diff < epsilon);
}
Thoughts? I still need to work out a greater than, and a less than as well.
I came up with a simple approach to adjusting the size of epsilon to the size of the numbers being compared. So, instead of using:
iif(abs(a - b) < 1e-6, "equal", "not")
if a and b can be large, I changed that to:
iif(abs(a - b) < (10 ^ -abs(7 - log(a))), "equal", "not")
I suppose that doesn't satisfy all the theoretical issues discussed in the other answers, but it has the advantage of being one line of code, so it can be used in an Excel formula or an Access query without needing a VBA function.
I did a search to see if others have used this method and I didn't find anything. I tested it in my application and it seems to be working well. So it seems to be a method that is adequate for contexts that don't require the complexity of the other answers. But I wonder if it has a problem I haven't thought of since no one else seems to be using it.
If there's a reason the test with the log is not valid for simple comparisons of numbers of various sizes, please say why in a comment.
You need to take into account that the truncation error is a relative one. Two numbers are about equal if their difference is about as large as their ulp (Unit in the last place).
However, if you do floating point calculations, your error potential goes up with every operation (esp. careful with subtractions!), so your error tolerance needs to increase accordingly.
The best way to compare doubles for equality/inequality is by taking the absolute value of their difference and comparing it to a small enough (depending on your context) value.
double eps = 0.000000001; //for instance
double a = someCalc1();
double b = someCalc2();
double diff = Math.abs(a - b);
if (diff < eps) {
//equal
}