How to determine if the square root of a number is integer? - octave

isinteger(sqrt(3))
0
isinteger(sqrt(4))
0
Both answers give zero. The answers must be:
isinteger(sqrt(3))
0
isinteger(sqrt(4))
1

isinteger checks for type. integer is a type of variable, not a property of a number. e.g. isinteger(2.0) returns 0.
Try:
mod(sqrt(x),1) == 0
However, you may still have issues with this due to numerical precision.

You may do as well
y = sqrt(4);
y==round(y)
or to take round-off error into account with a (2*eps) relative tolerance
abs(y-round(y)) <= 2*eps*y

Others have touched on this, but you need to be careful that floating point effects are taken into account for your application. Limited precision issues can give unexpected results. E.g., take this example:
Here you start with a non-integer value that is very close to 4 (x). The square root of this number in double precision is exactly 2 (y), but squaring this number does not equal the original x. So the calculated square root y is exactly an integer, but it really isn't indicative of the situation since the original x isn't an integer. The actual square root of x isn't an integer even though the floating point calculation of sqrt(x) is exactly an integer.
What if we also checked to see if the original x is an integer? Well, take this example:
Here the original x is so large that every floating point number near x is an integer, so the x+eps(x) is an integer. The calculated square root in double precision is also an integer (y). But even though both are integers, y*y does not equal x. Again we have the situation where the actual square root of x isn't an integer, but the floating point calculated value of sqrt(x) is exactly an integer.
So, bottom line is this can be a bit trickier than you might have anticipated. We don't know your application, but you might want to check that both x and y are integers and that y*y == x is true before convincing yourself that the square root of x is an integer. And even then, there might be cases where all these checks pass but still there is a discrepancy that floating point effects simply didn't uncover.

Related

Incorrect data from MariaDB POLYGON SELECT

Server: MariaDB 10.4.17
INSERTing a POLYGON with 14 digits to the right of the decimal point, then SELECTing the same data, returns a POLYGON with 15 digits to the right of the decimal point, which is more data than actually exists, and the excess precision is incorrect.
INSERTing a 0-padded POLYGON with 15 digits to the right of the decimal point, then SELECTing the same data, returns a POLYGON with 15 digits to the right of the decimal point, however the SELECTed data is incorrect in the last digit and is not the 0 used for right-padding.
Because the table data is incorrect, the various Geometry functions like ST_Contains() produce incorrect results. This appears to be some sort of floating point type of error, but I'm not sure how to work around it.
Is there any way to make MariaDB save, use and return the same data is was given?
Example:
INSERT INTO `Area`
(`Name`, `Coords`)
VALUES ('Test ', GeomFromText('POLYGON((
-76.123527198020080 43.010597920077250,
-76.128263410842290 43.016193091211520,
-76.130763247573610 43.033194256815040,
-76.140676208063910 43.033514863935440,
-76.13626333248750 43.008550330099250,
-76.123527198020080 43.010597920077250))'));
SELECT Coords FROM `Area` WHERE `Name` = 'Test';
POLYGON ((
-76.123527198020085 43.010597920077252,
-76.128263410842294 43.01619309121152,
-76.130763247573611 43.033194256815037,
-76.140676208063908 43.033514863935437,
-76.136263332487502 43.008550330099247,
-76.123527198020085 43.010597920077252
))
Edit:
As per #Michael-Entin the floating point error was a dead end and could not be responsible for the size of the errors I was getting.
Update:
The problem was "me". I had accidentally used MBRContains() in one of the queries instead of ST_Contains().
MBRContains uses the "Minimum Bounding Rectangle" that will contain the polygon, not the actual POLYGON coordinates.
Using MBRContains had caused the area to be significantly larger than expected, and appeared to be a processing error, which it was not.
ST_Contains() is slower but respects all the POLYGON edges and yields correct results.
Thanks to #Michael-Entin for noticing that the floating point error couldn't account for the magnitude of the error I was experiencing. This information pointed me in the right direction.
I think the precision you have is reaching the limit of the 64-bit floating point, and what you get is really the nearest floating point value representable by CPU.
The code below prints the input value without any modification, and then the very next double floating point values decremented and incremented by smallest possible amounts:
int main() {
const double f = -76.123527198020080;
cout << setprecision(17) << f << endl
<< nextafter(f, -INFINITY) << endl
<< nextafter(f, INFINITY) << endl;
}
The results I get
-76.123527198020085
-76.123527198020099
-76.123527198020071
As you see, -76.123527198020085 is the nearest value to your coordinate -76.123527198020080, and its closest possible neighbors are -76.123527198020099 (even further), and -76.123527198020071 (also slightly further, but to a different direction).
So I don't think there is any way to keep the precision you want. Nor there should be a practical reason to keep such precision (the difference is less than a micron, i.e. 1e-6 of a meter).
What you should be looking at is how exactly ST_Contains does not meet your expectations. The geometric libraries usually do snapping with tolerance distance that is slightly higher than the numeric precision of coordinates, and this should ideally make sure such minor differences in input values don't affect the outcome of such function.
Most floating point hardware will be in base 2.
If we try and decompose the absolute value of -76.128263410842290 in base 2 it's:
64 (2^6) + 8 (2^3) + 4 (2^2) + 0.125 (2^-3) + ...
Somehow we can note this number in base two with a sequence of bits 1001100.001...
Bad luck, in base 2, this number would require an infinite sequence of such bits.
The sequence begins with:
1001100.001000001101010111011110111100101101011101001110111000...
But floats have limited precision, the significand only has 53 bits in IEEE double precision, including the bits BEFORE the fraction separator.
That means that the least significant bit (the unit of least precision) represents 2^-46...
1001100.001000001101010111011110111100101101011101001110111000...
1001100.00100000110101011101111011110010110101110101
Notice that the floating point value has been rounded up (to the nearest float).
Let's multiply 2^-46 by appropriate power of five 5^46/5^46: it is 5^46/10^46.
It means that its DECIMAL representation ends exactly 46 places after the DECIMAL point, or a bit less if the trailing bits of float significand are zero (not the case here, trailing bit is 1).
So potentially, the fraction part of those floating point numbers has about 46 digits, not even 14 nor 15 as you seem to assume.
If we turn this floating point value back to decimal, we indeed get:
-76.12826341084229397893068380653858184814453125
-76.128263410842290
See it's indeed slightly greater than your initial input here, because the float was rounded to upper.
If you ask to print 15 decimal places AFTER the fraction separator, you get a rounded result.
-76.128263410842294
In this float number, the last bit 2^-46 has the decimal value
0.0000000000000142108547152020037174224853515625
where 142108547152020037174224853515625 is 5^46, you can do the math.
The immediate floating point values will differ in this last bit (we can add or subtract it)
1001100.00100000110101011101111011110010110101110100
1001100.00100000110101011101111011110010110101110101
1001100.00100000110101011101111011110010110101110110
It means that the immediate floating point neighbours are about +/- 1.42 10^-14 further...
This means that you cannot trust the 14th digits after the fraction, double precision does not have such resolution!
Not a surprise that the nearest float falls up to 7 10^-15 off your specified input sometimes (half the resolution, thanks to round to nearest rule).
Remember, float precision is RELATIVE, if we consume bits left of fraction separator, we reduce the precision of the fraction part (the point is floating literally).
This is very basic knowledge scientists should acquire before using floating point.
I hope those examples help as a very restricted introduction.

How to get the i-th bit of an integer without bit shifting?

Let's say I have a 16-bit integer x and I want to find out whether the i-th bit of x is 0 or 1. I'm not able to use bit shifts, but I can use a predefined array twoToThe of length 16 where twoToThe[j] holds 2 to the power of j. I think I can accomplish what I'm looking for using bitwise boolean operations, but I'm not sure how to go about it. Any suggestions?
Got it. You can check to see whether x & twoToThe[i] is equal to 0. If it is, then it must be true that i-th bit of x is 0. If it's anything other than 0, then the i-th bit of x is 1.

How to know when a float variable is going to stop increasing by 0.001?

I want to know how to determine at wich value a float (or double) variable is going to stop increasing its value if I am increasing it by 0.001.
If we talk about the binary representation of the float value: 1 bit for Sign, 8 exponent bits and 23 bits for mantissa. We know that when we reach a determined high value (32768) and then we add a very small value (0.001), due to the EXC 127 representation of the exponent, the addition result will be:
32768 + 0 = 32768
According to that, the variable will have the same value eventhough we are adding 0.001.
The next code never breaks.
float max =100000;
float delta=0.001F;
float time = 0;
while (time < max)
{
time += delta;
if(time == max)
break;
}
Can someone help me to determine an ecuation to know when a variable is going to stop increasing? (Independently if it is a float or a double, the idea is to have a floating comma variable).
Your addition will become idempotent (that is, the result will not change) after time gets large enough that the size of its ULP (unit in the last place) is greater than the size of your delta.
Your time variable is by default greater than max variable.
It's REALLY simple:
time is never gonna equal max if time starts off with 215100 and max is 100000 as long as you are adding some positive number to time. Also, comparing floats is kinda problematic due to float-imprecision.
To answer your question for an equation:
addition will fail completely if
[(log a)/(log 2)]<[((log b)/(log 2))^-c]
Where
a is your small float you want to add
b is your large fliat you want to add on
c is the length of the mantissa (23 for float)

Microsoft Access - Decimal Scale stuck at 0

I have a calculated field in my table called C. its the result of A-B=C. A & B are number fields (single, fixed). I have having trouble setting up C as a calculated (Decimal Field).
The precision / decimal places seem to work perfectly, I can modify them freely. But no matter what I do to "SCALE". It always seems to return to "0". I need it to be 2 since all my data in my reports are rounding off at the wrong locations giving me hole numbers.
As you can see "scale = 0", no matter what I do to this number. it will always revert to "0". Why is that?
You can’t change the scale in a calculated field, because it takes the values and settings from the calculation.
So the fact of a scale of 0 should not matter. The resulting number if it needs decimal places will (should) have the decimal value. The setting is IGNORED
I mean, if the calculation is:
2 x 3 = 6
Then you get 6.
If you have 4 / 3 = 1.3333
Then, in your case you get:
1.33333333333333
And you WILL get the above EVEN if the scale = 0. So the scale setting is NOT used nor available in a calculated field.
You are certainly free to round, or format the above result. And in fact you could (should) consider using the round() function in the actual calculation. So use something like:
Round([Field1] / [Field2],4)
And you thus get:
1.3333

For any finite floating point value, is it guaranteed that x - x == 0?

Floating point values are inexact, which is why we should rarely use strict numerical equality in comparisons. For example, in Java this prints false (as seen on ideone.com):
System.out.println(.1 + .2 == .3);
// false
Usually the correct way to compare results of floating point calculations is to see if the absolute difference against some expected value is less than some tolerated epsilon.
System.out.println(Math.abs(.1 + .2 - .3) < .00000000000001);
// true
The question is about whether or not some operations can yield exact result. We know that for any non-finite floating point value x (i.e. either NaN or an infinity), x - x is ALWAYS NaN.
But if x is finite, is any of this guaranteed?
x * -1 == -x
x - x == 0
(In particular I'm most interested in Java behavior, but discussions for other languages are also welcome.)
For what it's worth, I think (and I may be wrong here) the answer is YES! I think it boils down to whether or not for any finite IEEE-754 floating point value, its additive inverse is always computable exactly. Since e.g. float and double has one dedicated bit just for the sign, this seems to be the case, since it only needs flipping of the sign bit to find the additive inverse (i.e. the significand should be left intact).
Related questions
Correct Way to Obtain The Most Negative Double
How many double numbers are there between 0.0 and 1.0?
Both equalities are guaranteed with IEEE 754 floating-point, because the results of both x-x and x * -1 are representable exactly as floating-point numbers of the same precision as x. In this case, regardless of the rounding mode, the exact values have to be returned by a compliant implementation.
EDIT: Comparing to the .1 + .2 example.
You can't add .1 and .2 in IEEE 754 because you can't represent them to pass to +. Addition, subtraction, multiplication, division and square root return the unique floating-point value which, depending on the rounding mode, is immediately below, immediately above, nearest with a rule to handle ties, ..., the result of the operation on the same arguments in R. Consequently, when the result (in R) happens to be representable as a floating-point number, this number is automatically the result regardless of the rounding mode.
The fact that your compiler lets you write 0.1 as shorthand for a different, representable number without a warning is orthogonal to the definition of these operations. When you write - (0.1) for instance, the - is exact: it returns exactly the opposite of its argument. On the other hand, its argument is not 0.1, but the double that your compiler uses in its place.
In short, another part of the reason why the operation x * (-1) is exact is that -1 can be represented as a double.
Although x - x may give you -0 rather than true 0, -0 compares as equal to 0, so you will be safe with your assumption that any finite number minus itself will compare equal to zero.
See Is there a floating point value of x, for which x-x == 0 is false? for more details.