How to know when a float variable is going to stop increasing by 0.001? - binary

I want to know how to determine at wich value a float (or double) variable is going to stop increasing its value if I am increasing it by 0.001.
If we talk about the binary representation of the float value: 1 bit for Sign, 8 exponent bits and 23 bits for mantissa. We know that when we reach a determined high value (32768) and then we add a very small value (0.001), due to the EXC 127 representation of the exponent, the addition result will be:
32768 + 0 = 32768
According to that, the variable will have the same value eventhough we are adding 0.001.
The next code never breaks.
float max =100000;
float delta=0.001F;
float time = 0;
while (time < max)
{
time += delta;
if(time == max)
break;
}
Can someone help me to determine an ecuation to know when a variable is going to stop increasing? (Independently if it is a float or a double, the idea is to have a floating comma variable).

Your addition will become idempotent (that is, the result will not change) after time gets large enough that the size of its ULP (unit in the last place) is greater than the size of your delta.

Your time variable is by default greater than max variable.

It's REALLY simple:
time is never gonna equal max if time starts off with 215100 and max is 100000 as long as you are adding some positive number to time. Also, comparing floats is kinda problematic due to float-imprecision.
To answer your question for an equation:
addition will fail completely if
[(log a)/(log 2)]<[((log b)/(log 2))^-c]
Where
a is your small float you want to add
b is your large fliat you want to add on
c is the length of the mantissa (23 for float)

Related

How to determine if the square root of a number is integer?

isinteger(sqrt(3))
0
isinteger(sqrt(4))
0
Both answers give zero. The answers must be:
isinteger(sqrt(3))
0
isinteger(sqrt(4))
1
isinteger checks for type. integer is a type of variable, not a property of a number. e.g. isinteger(2.0) returns 0.
Try:
mod(sqrt(x),1) == 0
However, you may still have issues with this due to numerical precision.
You may do as well
y = sqrt(4);
y==round(y)
or to take round-off error into account with a (2*eps) relative tolerance
abs(y-round(y)) <= 2*eps*y
Others have touched on this, but you need to be careful that floating point effects are taken into account for your application. Limited precision issues can give unexpected results. E.g., take this example:
Here you start with a non-integer value that is very close to 4 (x). The square root of this number in double precision is exactly 2 (y), but squaring this number does not equal the original x. So the calculated square root y is exactly an integer, but it really isn't indicative of the situation since the original x isn't an integer. The actual square root of x isn't an integer even though the floating point calculation of sqrt(x) is exactly an integer.
What if we also checked to see if the original x is an integer? Well, take this example:
Here the original x is so large that every floating point number near x is an integer, so the x+eps(x) is an integer. The calculated square root in double precision is also an integer (y). But even though both are integers, y*y does not equal x. Again we have the situation where the actual square root of x isn't an integer, but the floating point calculated value of sqrt(x) is exactly an integer.
So, bottom line is this can be a bit trickier than you might have anticipated. We don't know your application, but you might want to check that both x and y are integers and that y*y == x is true before convincing yourself that the square root of x is an integer. And even then, there might be cases where all these checks pass but still there is a discrepancy that floating point effects simply didn't uncover.

Incorrect data from MariaDB POLYGON SELECT

Server: MariaDB 10.4.17
INSERTing a POLYGON with 14 digits to the right of the decimal point, then SELECTing the same data, returns a POLYGON with 15 digits to the right of the decimal point, which is more data than actually exists, and the excess precision is incorrect.
INSERTing a 0-padded POLYGON with 15 digits to the right of the decimal point, then SELECTing the same data, returns a POLYGON with 15 digits to the right of the decimal point, however the SELECTed data is incorrect in the last digit and is not the 0 used for right-padding.
Because the table data is incorrect, the various Geometry functions like ST_Contains() produce incorrect results. This appears to be some sort of floating point type of error, but I'm not sure how to work around it.
Is there any way to make MariaDB save, use and return the same data is was given?
Example:
INSERT INTO `Area`
(`Name`, `Coords`)
VALUES ('Test ', GeomFromText('POLYGON((
-76.123527198020080 43.010597920077250,
-76.128263410842290 43.016193091211520,
-76.130763247573610 43.033194256815040,
-76.140676208063910 43.033514863935440,
-76.13626333248750 43.008550330099250,
-76.123527198020080 43.010597920077250))'));
SELECT Coords FROM `Area` WHERE `Name` = 'Test';
POLYGON ((
-76.123527198020085 43.010597920077252,
-76.128263410842294 43.01619309121152,
-76.130763247573611 43.033194256815037,
-76.140676208063908 43.033514863935437,
-76.136263332487502 43.008550330099247,
-76.123527198020085 43.010597920077252
))
Edit:
As per #Michael-Entin the floating point error was a dead end and could not be responsible for the size of the errors I was getting.
Update:
The problem was "me". I had accidentally used MBRContains() in one of the queries instead of ST_Contains().
MBRContains uses the "Minimum Bounding Rectangle" that will contain the polygon, not the actual POLYGON coordinates.
Using MBRContains had caused the area to be significantly larger than expected, and appeared to be a processing error, which it was not.
ST_Contains() is slower but respects all the POLYGON edges and yields correct results.
Thanks to #Michael-Entin for noticing that the floating point error couldn't account for the magnitude of the error I was experiencing. This information pointed me in the right direction.
I think the precision you have is reaching the limit of the 64-bit floating point, and what you get is really the nearest floating point value representable by CPU.
The code below prints the input value without any modification, and then the very next double floating point values decremented and incremented by smallest possible amounts:
int main() {
const double f = -76.123527198020080;
cout << setprecision(17) << f << endl
<< nextafter(f, -INFINITY) << endl
<< nextafter(f, INFINITY) << endl;
}
The results I get
-76.123527198020085
-76.123527198020099
-76.123527198020071
As you see, -76.123527198020085 is the nearest value to your coordinate -76.123527198020080, and its closest possible neighbors are -76.123527198020099 (even further), and -76.123527198020071 (also slightly further, but to a different direction).
So I don't think there is any way to keep the precision you want. Nor there should be a practical reason to keep such precision (the difference is less than a micron, i.e. 1e-6 of a meter).
What you should be looking at is how exactly ST_Contains does not meet your expectations. The geometric libraries usually do snapping with tolerance distance that is slightly higher than the numeric precision of coordinates, and this should ideally make sure such minor differences in input values don't affect the outcome of such function.
Most floating point hardware will be in base 2.
If we try and decompose the absolute value of -76.128263410842290 in base 2 it's:
64 (2^6) + 8 (2^3) + 4 (2^2) + 0.125 (2^-3) + ...
Somehow we can note this number in base two with a sequence of bits 1001100.001...
Bad luck, in base 2, this number would require an infinite sequence of such bits.
The sequence begins with:
1001100.001000001101010111011110111100101101011101001110111000...
But floats have limited precision, the significand only has 53 bits in IEEE double precision, including the bits BEFORE the fraction separator.
That means that the least significant bit (the unit of least precision) represents 2^-46...
1001100.001000001101010111011110111100101101011101001110111000...
1001100.00100000110101011101111011110010110101110101
Notice that the floating point value has been rounded up (to the nearest float).
Let's multiply 2^-46 by appropriate power of five 5^46/5^46: it is 5^46/10^46.
It means that its DECIMAL representation ends exactly 46 places after the DECIMAL point, or a bit less if the trailing bits of float significand are zero (not the case here, trailing bit is 1).
So potentially, the fraction part of those floating point numbers has about 46 digits, not even 14 nor 15 as you seem to assume.
If we turn this floating point value back to decimal, we indeed get:
-76.12826341084229397893068380653858184814453125
-76.128263410842290
See it's indeed slightly greater than your initial input here, because the float was rounded to upper.
If you ask to print 15 decimal places AFTER the fraction separator, you get a rounded result.
-76.128263410842294
In this float number, the last bit 2^-46 has the decimal value
0.0000000000000142108547152020037174224853515625
where 142108547152020037174224853515625 is 5^46, you can do the math.
The immediate floating point values will differ in this last bit (we can add or subtract it)
1001100.00100000110101011101111011110010110101110100
1001100.00100000110101011101111011110010110101110101
1001100.00100000110101011101111011110010110101110110
It means that the immediate floating point neighbours are about +/- 1.42 10^-14 further...
This means that you cannot trust the 14th digits after the fraction, double precision does not have such resolution!
Not a surprise that the nearest float falls up to 7 10^-15 off your specified input sometimes (half the resolution, thanks to round to nearest rule).
Remember, float precision is RELATIVE, if we consume bits left of fraction separator, we reduce the precision of the fraction part (the point is floating literally).
This is very basic knowledge scientists should acquire before using floating point.
I hope those examples help as a very restricted introduction.

how many digits in FLOAT?

I've looked all over and can't find this answer.
How many actual digits are there for a MySQL FLOAT?
I know (think?) that it truncates what's in excess of the FLOAT's 4 byte limit, but what exactly is that?
From the manual (emphasis mine):
For FLOAT, the SQL standard permits an optional specification of the
precision (but not the range of the exponent) in bits following the
keyword FLOAT in parentheses. MySQL also supports this optional
precision specification, but the precision value is used only to
determine storage size. A precision from 0 to 23 results in a 4-byte
single-precision FLOAT column. A precision from 24 to 53 results in an
8-byte double-precision DOUBLE column.
So up to 23 bits of precision for the mantissa can be stored in a FLOAT, which is equivalent to about 7 decimal digits because 2^23 ~ 10^7 (8,388,608 vs 10,000,000). I tested it here. You can see that 12 decimal digits are returned, of which only the first 7 are really accurate.
for those of you who think that MySQL treats floats the same as, for example JAVA, I got some SHOCKING news: MySQL degrades the available accuracy which is possible to a float, in order to hide from you decimal places which might be incorrect! Check this out:
JAVA:
public static void main(String[] args) {
long i = 16777225;
DecimalFormat myFormatter = new DecimalFormat("##,###,###");
float iAsFloat = Float.parseFloat("" + i);
System.out.println("long i = " + i + " becomes " + myFormatter.format(iAsFloat));
}
the output is
long i = 16777225 becomes 16,777,224
So far, so normal. Our example integer is just above 2^24 = 16777216. Due to the 23 bit mantissa, between 2^23 and 2^24, a float can hold every integer. Then from 2^24 to 2^25, it can hold only even numbers, from 2^25 to 2^26 only numbers divisible by 4 and so on (also in the other direction: from 2^22 to 2^23, it can hold all multiples of 0.5). As long as the exponent isn't out of range, that's the rule of what a float can store.
16777225 is odd, so the "float version" is one off, because in that range (from 2^24 to 2^25) the "step size" of the float is 2.
And now, what does MySQL make of it.
Here is the fiddle, in case you don't believe me (I wouldn't)
http://www.sqlfiddle.com/#!2/a42e79/1
CREATE TABLE IF NOT EXISTS `test` (
`test` float NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO `test`(`test`) VALUES (16777225)
SELECT * FROM `test`
result:
16777200
the result is off by 25 rather than 1, but has the "advantage" of being divisible by 100. Thanks a lot.
I think I understand the "philosophy" behind this utter nonsense, but I can't say I approve. Here is the "reason":
They don't want you to see the decimal places which could be wrong, which they accomplish by rounding the "actual" float value (as it is in JAVA and according to the industry standard) to some suitable power of ten.
In the example, if we leave it as it is, the last digit is wrong, without being a zero, and we can't have that.
Then, if we round to multiples of ten, the correct value would be 16777230, while what the "actual" float would be rounded to 16777220. Now, the 7th digit is wrong (it wasn't wrong before, but now it is.) And it's not zero. We can't have that. Better round to multiples of 100. Now both the correct value and the "actual" float value round to 16777200. So you see only the 6 correct digits. You don't want to know the "24" at the end, telling you (since the step size is 2 in that range) that your original number must have been between 1677723 and 1677725. No, you don't want to know that; those 2 numbers differ in the 7th digit after rounding to the 7th digit, so you can't know the "proper" 7th digit, and hence you want to stop at the 6th digit. Anyway, that's what they think you want at MySQL.
So their goal is to round to some number of decimal digits (namely, 6), such those 6 digits are always "correct", in that you'd have gotten the same 6 digits if you'd rounded the original exact number (before converting it to a float) to 6 digits. And since log_base10(2^23) = 6.92, rounded down 6, I can see why they think that this will always work. Tragically, not even that is true.
example:
i = 33554450
the number is between 2^25 and 2^26, so the "float version" (that is the JAVA float version, not the MySQL float version) of it is the closest multiple of 4 (the smaller one, if it's right in the middle), so that is
i_as_float = 33554448
i rounded to 6 decimals (i.e. to multiples of 100, since it's an 8 digit number) gives 33554500.
i_as_float rounded to 6 decimals gives 33554400
Oops! those differ at the 6th digit! But don't tell the MySQL people. They might just start "improving" 16777200 towards 16777000.
UPDATE
other databases don't do it like that.
fiddle: http://www.sqlfiddle.com/#!15/d9144/1

flash: Math.pow calculates wrong answers for larger numbers

why is this so?
when i try out:
Math.pow(2,58)=288230376151711740
while in fact, it is 288230376151711744
or
Math.pow(2,57)=144115188075855870
while it really equals 144115188075855872
it just throws that number without any warning.
i would understand if it stopped going above some number in case of maximum value reached. however, this seems to calculate the first n digits correctly and then go wrong at the very end of the digits only
You've ran out of Number type display precision. The trick is that with powers of 2 the actual value stored in the variable will be precise, while when you'll trace it the engine will truncate the displayed value by 16 digits, as it divides by 10 in process, and leftovers will eventually hit "machine zero" if compared to original value taken without exponential part. This is made to prevent white noise generated by imprecise floating-point division to be displayed. You can work around this issue if you'll advance to big integers/floating point numbers, that store more bits than a double precision number.

What does "step" mean in stepSimulation and what do its parameters mean in Bullet Physics?

What does the term "STEP" means in bullet physics?
What does the function stepSimulation() and its parameters mean?
I have read the documentation but i could not get hold of anything.
Any valid explanation would be of great help.
I know I'm late, but I thought the accepted answer was only marginally better than the documentation's description.
timeStep: The amount of seconds, not milliseconds, passed since the last call to stepSimulation.
maxSubSteps: Should generally stay at one so Bullet interpolates current values on its own. A value of zero implies a variable tick rate, meaning Bullet advances the simulation exactly timeStep seconds instead of interpolating. This feature is buggy and not recommended. A value greater than one must always satisfy the equation timeStep < maxSubSteps * fixedTimeStep or you're losing time in the simulation.
fixedTimeStep: Inversely proportional to the simulation's resolution. Resolution increases as this value decreases. Keep in mind that a higher resolution means it takes more CPU.
btDynamicsWorld::stepSimulation(
btScalar timeStep,
int maxSubSteps=1,
btScalar fixedTimeStep=btScalar(1.)/btScalar(60.));
timeStep - time passed after last simulation.
Internally simulation is done for some internal constant steps. fixedTimeStep
fixedTimeStep ~~~ 0.01666666 = 1/60
if timeStep is 0.1 then it will include 6 (timeStep / fixedTimeStep) internal simulations.
To make glider movements BulletPhysics interpolate final step results according reminder after division (timeStep / fixedTimeStep)
timeStep - the amount of time in seconds to step the simulation by. Typically you're going to be passing it the time since you last called it.
maxSubSteps - the maximum number of steps that Bullet is allowed to take each time you call it.
fixedTimeStep - regulates resolution of the simulation. If your balls penetrates your walls instead of colliding with them try to decrease it.
Here i would like to address the issue in Proxy's answer about special meaning of value 1 for maxSubSteps. There is only one special value, that is 0 and you most likely don't want to use it because then simulation will go with non-constant time step. All other values are the same. Let's have a look at the actual code:
if (maxSubSteps)
{
m_localTime += timeStep;
...
if (m_localTime >= fixedTimeStep)
{
numSimulationSubSteps = int(m_localTime / fixedTimeStep);
m_localTime -= numSimulationSubSteps * fixedTimeStep;
}
}
...
if (numSimulationSubSteps)
{
//clamp the number of substeps, to prevent simulation grinding spiralling down to a halt
int clampedSimulationSteps = (numSimulationSubSteps > maxSubSteps) ? maxSubSteps : numSimulationSubSteps;
...
for (int i = 0; i < clampedSimulationSteps; i++)
{
internalSingleStepSimulation(fixedTimeStep);
synchronizeMotionStates();
}
}
So, there is nothing special about maxSubSteps equal to 1. You should really abide this formula timeStep < maxSubSteps * fixedTimeStep if you don't want to lose time.