MySQL round in query, wrong result - mysql

I have a question about a query that I'm running on a MySQL Server (v5.5.50-0+deb8u1).
SELECT 12 - (SELECT qty FROM Table WHERE id = 5213) AS Amount
so Amount value is 12 - 8,5500000000000007 = 3.4499999999999993
But if I run the query:
SELECT qty FROM Table WHERE id = 5213
it returns 8.55 that is the correct number written in the record, so I was expecting that the first querty returned 3.45.
The "qty" column in the table "Table" is a DOUBLE.
How is it possibile? How can I get the right answer from the query?
thanks in advance

Well that's just the way floating numbers are.
Floating-point numbers sometimes cause confusion because they are
approximate and not stored as exact values. A floating-point value as
written in an SQL statement may not be the same as the value
represented internally.
This statement holds true for many programming languages as well. Some numbers don't even have an exact representation. Here's something from the python manual
The problem is easier to understand at first in base 10. Consider the
fraction 1/3. You can approximate that as a base 10 fraction:
0.3 or, better,
0.33 or, better,
0.333 and so on. No matter how many digits you’re willing to write down, the result will never be exactly 1/3, but will be an
increasingly better approximation of 1/3.
In the same way, no matter how many base 2 digits you’re willing to
use, the decimal value 0.1 cannot be represented exactly as a base 2
fraction. In base 2, 1/10 is the infinitely repeating fraction
So in short generally doing is float1 = float2 type of comparison is a bad idea but everyone keeps forgetting it.

You can define 'qty' column as decimal(10,2)

Related

When to use float vs decimal

I'm building this API, and the database will store values that represent one of the following:
percentage
average
rate
I honestly have no idea how to represent something that the range is between 0 and 100% in numbers. Should it be
0.00 - 1.00
0.00 - 100.00
any other alternative that I don't know
Is there a clear choice for that? A global way of representing on databases something that goes from 0 to 100% percent? Going further, what's the correct that type for it, float or decimal?
Thank you.
I'll take the opposite stance.
FLOAT is for approximate numbers, such as percentages, averages, etc. You should do formatting as you display the values, either in app code or using the FORMAT() function of MySQL.
Don't ever test float_value = 1.3; there are many reasons why that will fail.
DECIMAL should be used for monetary values. DECIMAL avoids a second rounding when a value needs to be rounded to dollars/cents/euros/etc. Accountants don't like fractions of cents.
MySQL's implementation of DECIMAL allows 65 significant digits; FLOAT gives about 7 and DOUBLE about 16. 7 is usually more than enough for sensors and scientific computations.
As for "percentage" -- Sometimes I have used TINYINT UNSIGNED when I want to consume only 1 byte of storage and don't need much precision; sometimes I have used FLOAT (4 bytes). There is no datatype tuned specifically for percentage. (Note also, that DECIMAL(2,0) cannot hold the value 100, so technically you would need DECIMAL(3,0).)
Or sometimes I have used a FLOAT that held a value between 0 and 1. But then I would need to make sure to multiply by 100 before displaying the "percentage".
More
All three of "percentage, average, rate" smell like floats, so that would be my first choice.
One criterion for deciding on datatype... How many copies of the value will exist?
If you have a billion-row table with a column for a percentage, consider that TINYINT would take 1 byte (1GB total), but FLOAT would take 4 bytes (4GB total). OTOH, most applications do not have that many rows, so this may not be relevant.
As a 'general' rule, "exact" values should use some form of INT or DECIMAL. Inexact things (scientific calculations, square roots, division, etc) should use FLOAT (or DOUBLE).
Furthermore, the formatting of the output should usually be left to the application front end. That is, even though an "average" may compute to "14.6666666...", the display should show something like "14.7"; this is friendlier to humans. Meanwhile, you have the underlying value to later decide that "15" or "14.667" is preferable output formatting.
The range "0.00 - 100.00" could be done either with FLOAT and use output formatting or with DECIMAL(5,2) (3 bytes) with the pre-determination that you will always want the indicated precision.
I would generally recommend against using float. Floating point numbers do represent numbers in base-2, which causes some (exact) numbers to be round-up in operations or comparisons, because they just cannot be accurately stored in base-2. This may lead to suprising behaviors.
Consider the following example:
create table t (num float);
insert into t values(1.3);
select * from t;
| num |
| --: |
| 1.3 |
select * from t where num = 1.3;
| num |
| --: |
Base-2 comparison of number 1.3 fails. This is tricky.
In comparison, decimal provide an accurate representation of finite numbers within their range. If you change float to decimal(2, 1) in the above example, you do get the expected results.
I recommend using decimal(5,2) if you're going to store it in the same way you'll display it since decimal is for preserving the exact precision. (See https://dev.mysql.com/doc/refman/8.0/en/fixed-point-types.html)
Because floating-point values are approximate and not stored as exact values, attempts to treat them as exact in comparisons may lead to problems. They are also subject to platform or implementation dependencies.
(https://dev.mysql.com/doc/refman/8.0/en/floating-point-types.html)
A floating-point value as written in an SQL statement may not be the same as the value represented internally.
For DECIMAL columns, MySQL performs operations with a precision of 65 decimal digits, which should solve most common inaccuracy problems.
https://dev.mysql.com/doc/refman/8.0/en/problems-with-float.html
Decimal :
In case of financial applications it is better to use Decimal types because it gives you a high level of accuracy and easy to avoid rounding errors
Double :
Double Types are probably the most normally used data type for real values, except handling money.
Float :
It is used mostly in graphic libraries because very high demands for processing powers, also used situations that can endure rounding errors.
Reference: http://net-informations.com/q/faq/float.html
Difference between float and decimal are the precision. Decimal can 100% accurately represent any number within the precision of the decimal format, whereas Float, cannot accurately represent all numbers.
Use Decimal for e.g. financial related value and use float for e.g. graphical related value
mysql> create table numbers (a decimal(10,2), b float);
mysql> insert into numbers values (100, 100);
mysql> select #a := (a/3), #b := (b/3), #a * 3, #b * 3 from numbers \G
*********************************************************************
#a := (a/3): 33.333333333
#b := (b/3): 33.333333333333
#a + #a + #a: 99.999999999000000000000000000000
#b + #b + #b: 100
The decimal did exactly what's supposed to do on this cases, it
truncated the rest, thus losing the 1/3 part.
So for sums, the decimal is better, but for divisions, the float is
better, up to some point, of course. I mean, using DECIMAL will not give
you "fail-proof arithmetic" in any means.
I hope this will help.
In tsql:
Float, 0.0 store as 0 and it dont require to define after decimal point digit, e.g. you dont need to write Float(4,2).
Decimal, 0.0 store as 0.0 and it has option to define like decimal(4,2), I would suggest 0.00-1.00, by doing this you can calculate value of that percent without multiply by 100, and if you report then set data type of that column as percent as MS Excel and other platform view like 0.5 -> 50%.

MySQL weird rounding off results

I spotted some rounding bug in MySQL. Here is my query:
SELECT /*debugz*/ ROUND((SUM(grade)/2),0) AS grade, SUM(grade) FROM entry.computed_grade a WHERE a.stud_id='7901159' AND a.sy='2014' AND a.term=01 AND a.terms=01 AND a.catalog_no='Christian Life Formation';
and the result is this:
grade sum(grade)
------ ------------
92 185
The grade result should be 93, not 92 because 185/2 = 92.5
Try this
SELECT CEIL((SUM(grade)/2),0) AS grade, SUM(grade) FROM entry.computed_grade a WHERE ((a.stud_id='7901159') AND (a.sy='2014') AND (a.term=01) AND (a.terms=01) AND (a.catalog_no='Christian Life Formation'));
Try to use ceil instead of round.
e.g ceil(1.45) = 2
You should check rounding behavior artickle for mysql. I believe here is the reason of your problem:
For approximate-value numbers, the result depends on the C library. On
many systems, this means that ROUND() uses the “round to nearest even”
rule: A value with any fractional part is rounded to the nearest even
integer.
By the way it's IEEE standard for float point rounding, so you might want stay with it
Do not "patch" this problem by tweaking the query. Actually fix your database. If you are not storing the "grade" column as the DECIMAL data type, and are instead using FLOAT or DOUBLE, your design is inherently broken.
Because floating-point values are approximate and not stored as exact values, attempts to treat them as exact in comparisons may lead to problems. 
http://dev.mysql.com/doc/refman/5.6/en/floating-point-types.html
This is not a bug in MySQL. It is an inherent limitation in industry-standard floating point number storage. Use DECIMAL columns to store meaningful, precise numbers, and the other two types only when low storage space or a wide range of allowable values are more important than precision.

Storing statistical data, do I need DECIMAL, FLOAT or DOUBLE?

I am creating for fun, but I still want to approach it seriously, a site which hosts various tests. With these tests I hope to collect statistical data.
Some of the data will include the percentage of the completeness of the tests as they are timed. I can easily compute the percentage of the tests but I would like true data to be returned as I store the various different values concerning the tests on completion.
Most of the values are, in PHP floats, so my question is, if I want true statistical data should I store them in MYSQL as FLOAT, DOUBLE or DECIMAL.
I would like to utilize MYSQL'S functions such as AVG() and LOG10() as well as TRUNCATE(). For MYSQL to return true data based off of my values that I insert, what should I use as the database column choice.
I ask because some numbers may or may not be floats such as, 10, 10.89, 99.09, or simply 0.
But I would like true and valid statistical data to be returned.
Can I rely on floating point math for this?
EDIT
I know this is a generic question, and I apologise extensively, but for non mathematicians like myself, also I am not a MYSQL expert, I would like an opinion of an expert in this field.
I have done my research but I still feel I have a clouded judgement on the matter. Again I apologise if my question is off topic or not suitable for this site.
This link does a good job of explaining what you are looking for. Here is what is says:
All these three Types, can be specified by the following Parameters (size, d). Where size is the total size of the String, and d represents precision. E.g To store a Number like 1234.567, you will set the Datatype to DOUBLE(7, 3) where 7 is the total number of digits and 3 is the number of digits to follow the decimal point.
FLOAT and DOUBLE, both represent floating point numbers. A FLOAT is for single-precision, while a DOUBLE is for double-precision numbers. A precision from 0 to 23 results in a 4-byte single-precision FLOAT column. A precision from 24 to 53 results in an 8-byte double-precision DOUBLE column. FLOAT is accurate to approximately 7 decimal places, and DOUBLE upto 14.
Decimal’s declaration and functioning is similar to Double. But there is one big difference between floating point values and decimal (numeric) values. We use DECIMAL data type to store exact numeric values, where we do not want precision but exact and accurate values. A Decimal type can store a Maximum of 65 Digits, with 30 digits after decimal point.
So, for the most accurate and precise value, Decimal would be the best option.
Unless you are storing decimal data (i.e. currency), you should use a standard floating point type (FLOAT or DOUBLE). DECIMAL is a fixed point type, so can overflow when computing things like SUM, and will be ridiculously inaccurate for LOG10.
There is nothing "less precise" about binary floating point types, in fact, they will be much more accurate (and faster) for your needs. Go with DOUBLE.
Decimal : Fixed-Point Types (Exact Value). Use it when you care about exact precision like money.
Example: salary DECIMAL(8,2), 8 is the total number of digits, 2 is the number of decimal places. salary will be in the range of -999999.99 to 999999.99
Float, Double : Floating-Point Types (Approximate Value). Float uses 4 bytes to represent value, Double uses 8 bytes to represent value.
Example: percentage FLOAT(5,2), same as the type decimal, 5 is total digits and 2 is the decimal places. percentage will store values between -999.99 to 999.99.
Note that they are approximate value, in this case:
Value like 1 / 3.0 = 0.3333333... will be stored as 0.33 (2 decimal place)
Value like 33.009 will be stored as 33.01 (rounding to 2 decimal place)
Put it simply, Float and double are not as precise as decimal. decimal is recommended for money related number input.(currency and salary).
Another point need to point out is: Do NOT compare float number using "=","<>", because float numbers are not precise.
Linger: The website you mention and quote has IMO some imprecise info that made me confused. In the docs I read that when you declare a float or a double, the decimal point is in fact NOT included in the number. So it is not the number of chars in a string but all digits used.
Compare the docs:
"DOUBLE PRECISION(M,D).. Here, “(M,D)” means than values can be stored with up to M digits in total, of which D digits may be after the decimal point. For example, a column defined as FLOAT(7,4) will look like -999.9999 when displayed"
http://dev.mysql.com/doc/refman/5.1/en/floating-point-types.html
Also the nomenclature in misleading - acc to docs: M is 'precision' and D is 'scale', whereas the website takes 'scale' for 'precision'.
Thought it would be useful in case sb like me was trying to get a picture.
Correct me if I'm wrong, hope I haven't read some outdated docs:)
Float and Double are Floating point data types, which means that the numbers they store can be precise up to a certain number of digits only.
For example for a table with a column of float type if you store 7.6543219 it will be stored as 7.65432.
Similarly the Double data type approximates values but it has more precision than Float.
When creating a table with a column of Decimal data type, you specify the total number of digits and number of digits after decimal to store, and if the number you store is within the range you specified it will be stored exactly.
When you want to store exact values, Decimal is the way to go, it is what is known as a fixed data type.
Simply use FLOAT. And do not tack on '(m,n)'. Do display numbers to a suitable precision with formatting options. Do not expect to get correct answers with "="; for example, float_col = 0.12 will always return FALSE.
For display purposes, use formatting to round the results as needed.
Percentages, averages, etc are all rounded (at least in some cases). That any choice you make will sometimes have issues.
Use DECIMAL(m,n) for currency; use ...INT for whole numbers; use DOUBLE for scientific stuff that needs more than 7 digits of precision; use FLOAT` for everything else.
Transcendentals (such as the LOG10 that you mentioned) will do their work in DOUBLE; they will essentially never be exact. It is OK to feed it a FLOAT arg and store the result in FLOAT.
This Answer applies not just to MySQL, but to essentially any database or programming language. (The details may vary.)
PS: (m,n) has been removed from FLOAT and DOUBLE. It only added extra rounding and other things that were essentially no benefit.

Get rows product (multiplication)

SO,
The problem
I have an issue with rows multiplication. In SQL, there is a SUM() function which calculates sum for some field for set of rows. I want to get multiplication, i.e. for table
+------+
| data |
+------+
| 2 |
| -1 |
| 3 |
+------+
that will be 2*(-1)*3 = -6 as a result. I'm using DOUBLE data type for storing my data values.
My approach
From school math it is known that log(A x B) = log(A) + log(B) - so that could be used to created desired expression like:
SELECT
IF(COUNT(IF(SIGN(`col`)=0,1,NULL)),0,
IF(COUNT(IF(SIGN(`col`)<0,1,NULL))%2,-1,1)
*
EXP(SUM(LN(ABS(`col`))))) as product
FROM `test`;
-here you see weakness of this method - since log(X) is undefined when X<=0 - I need to count negative signs before calculating whole expression. Sample data and query for this is given in this fiddle.
Another weakness is that we need to find if there is 0 among column values (Since it is a sample, in real situation I'm going to select product for some subset of table rows with some condition(s) - i.e. I can not simply remove 0-s from my table, because result zero product is a valid and expected result for some rows subsets)
Specifics
And now, finally, my question main part: how to handle situation when we have expression like: X*Y*Z and here X < MAXF, Y<MAXF, but X*Y>MAXF and X*Y*Z<MAXF - so we have possible data type overflow (here MAXF is limit for double MySQL data type). The sample is here. Query above works well, but can I always be sure that it will handle that properly? I.e. may be there is another case with overflow issue when some sub-products causing overflow, but entire product is ok (without overflow).
Or may be there is another way to find rows product? Also, in table there possibly be millions of records (-1.1<X<=1.1 mainly, but probably with values such as 100 or 1000 - i.e. high enough to overflow DOUBLE if multiplied with certain quantity if we have an issue that I've described above) - may be calculating via log will be slow?
I guess this would work...
SELECT IF(MOD(COUNT(data < 0),2)=1
, EXP(SUM(LOG(data)))*-1
, EXP(SUM(LOG(data))))
x
FROM my_table;
If you need this type of calculations often, I suggest you store the signs and the logarithms in separate columns.
The signs can be stored as 1 (for positives), -1 (for negatives) and 0 (for zero.)
The logarithm can be assigned for zero as 0 (or any other value) but it should not be used in calculations.
Then the calculation would be:
SELECT
CASE WHEN EXISTS (SELECT 1 FROM test WHERE <condition> AND datasign = 0)
THEN 0
ELSE (SELECT 1-2*(SUM(datasign=-1)%2) FROM test WHERE <condition>)
END AS resultsign,
CASE WHEN EXISTS (SELECT 1 FROM test WHERE <condition> AND datasign = 0)
THEN -1 -- undefined log for result 0
ELSE (SELECT SUM(datalog) FROM test WHERE <condition> AND datasign <> 0)
END AS resultlog
;
This way, you have no overflow problems. You can check the resultlog if it exceeds some limits or just try to calculate resultdata = resultsign * EXP(resultlog) and see if an error is thrown.
This question is a remarkable one in the sea of low quality ones. Thank you, even reading it was a pleasure.
Precision
The exp(log(a)+log(b)) idea is a good one in itself. However, after reading "What Every Computer Scientist Should Know About Floating-Point Arithmetic", make sure you use DECIMAL or NUMERIC data types to be sure you are using Precision Math, or else your values will be surprisingly inaccurate. For a couple of million rows, errors can add up very quickly! DECIMAL (as per the MySQL doc) has a maximum of 65 digits precision, while for example 64bit IEEE754 floating point values have only up to 16 digits (log10(2^52) = 15.65) precision!
Overflow
As per the relevant part of the MySQL doc:
Integer overflow results in silent wraparound.
DECIMAL overflow results in a truncated result and a warning.
Floating-point overflow produces a NULL result. Overflow for some operations can result in +INF, -INF, or NaN.
So you can detect floating point overflow if it would ever happen.
Sadly, if a series of operations would result in a correct value, fitting into the data type used, but at least one subresult in the process of calculations would not, then you won't get the correct value at the end.
Performance
Premature optimization is the root of all evil. Try it, and if it is slow, take the appropriate actions. Doing this might not be lightning quick, but still might be quicker than getting all the results, and doing it on the application server. Only measurements can decide which gets to be quicker...

Intriguing sql query (not in select)

I have the following sql query in mysql:
SELECT *
FROM _t_test
WHERE pret NOT
IN ( 2.6700, 2.6560, 1.8200 )
I would expect the rows with the value 1.8200 not to be shown, yet I still get them.
Am I missing something?
The field "pret" is double(16,4).
This is a rounding error. A double is not an exact value, so 1.8200 isn't represented exactly, so the values are not exactly the same.
For MYSQL floating points, see http://dev.mysql.com/doc/refman/5.0/en/problems-with-float.html
The correct way to do floating-point number comparison is to first
decide on an acceptable tolerance for differences between the numbers
and then do the comparison against the tolerance value. For example,
if we agree that floating-point numbers should be regarded the same if
they are same within a precision of one in ten thousand (0.0001), the
comparison should be written to find differences larger than the
tolerance value
See http://en.wikipedia.org/wiki/Double_precision_floating-point_format