Why MySQL adds extra digits to floats? - mysql

I have a table of prices.
Each price is a FLOAT with two digits after the dot.
From some reason, when I use the price in IF expression, the result is the same float with many additional digits:
mysql> select price, IF(1, price,0) as my_price from tbl_prices limit 10;
+-------+------------------+
| price | my_price |
+-------+------------------+
| 79.95 | 79.9499969482422 |
| 99.95 | 99.9499969482422 |
| 89.95 | 89.9499969482422 |
| 89.95 | 89.9499969482422 |
| 79.95 | 79.9499969482422 |
| 89.95 | 89.9499969482422 |
| 89.95 | 89.9499969482422 |
| 79.95 | 79.9499969482422 |
| 79.95 | 79.9499969482422 |
| 69.95 | 69.9499969482422 |
+-------+------------------+
10 rows in set (0.00 sec)
As you can see, price looks good, however the result of IF expression that returns the same price contains garbage.
Does anybody know what is the reason for this garbage, and how can I get rid of it (without using ROUND)?
Thanks in advance!

Just don't. A float is not an exact value. Use DECIMAL fields for example for a price.

Because you can't represent the .95 in floating point. This is the closest you will get. This is why float is approximate.
If you want exact decimal places, use DECIMAL

At first you should know how the floating point works.
Fortunately mysql provides DECIMAL data type, which can specify exact precision, for example:
DECIMAL( 10, 2)
Will store 10 decimal places long number and 2 digits out of that on right side, for example:
12345678.12
1.03
and so on.

Related

How can I migrate from "float" to "points" in MySQL?

I'm looking for a faster way to calculate Euclidean distances in SQL.
Problem I want to solve
The following "Euclidean distance calculation" is slow.
SELECT
id,
sqrt(
power(f1 - (-0.09077361), 2) +
power(f2 - (0.10373443), 2) +
...
...
power(f127 - (0.0778369), 2) +
power(f128 - (0.00951046), 2)
) as distance
FROM
face_feature
ORDER BY
distance
LIMIT
1
;
What I want to know
Can you share how to migrate from "float" to "points"?
I received the following advice, but I don't understand how.
Switch to POINTs and a SPATIAL index. It may be possible your task orders of magnitude faster.
MySQL
mysql> SHOW VARIABLES LIKE '%version%';
+--------------------------+------------------------------+
| Variable_name | Value |
+--------------------------+------------------------------+
| version | 8.0.29 |
| version_comment | MySQL Community Server - GPL |
| version_compile_machine | x86_64 |
| version_compile_os | Linux |
+--------------------------+------------------------------+
Table
mysql> desc face_feature;
+-------+------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+------------+------+-----+---------+----------------+
| id | int | NO | PRI | NULL | auto_increment |
| f1 | float(9,8) | NO | | NULL | |
| f2 | float(9,8) | NO | | NULL | |
..
| f127 | float(9,8) | NO | | NULL | |
| f128 | float(9,8) | NO | | NULL | |
+-------+------------+------+-----+---------+----------------+
Data
mysql> SELECT count(*) FROM face_feature;
+----------+
| count(*) |
+----------+
| 100003 |
+----------+
mysql> SELECT * FROM face_feature LIMIT 1\G;
id: 1
f1: -0.07603023
f2: 0.13605964
...
f127: 0.09608927
f128: 0.00082345
Reference (My other question)
How can I make "euclidean distance calculation" faster in MySQL?
Don't use FLOAT(M,N) it adds an extra rounding that only hurts various operations.
FLOAT(9,8), if the numbers are near "1.0" will lose some precision. This is because there are only 24 bits of precision in any FLOAT.
(m,n) on FLOAT and DOUBLE has been deprecated (as useless and misleading) in newer versions of MySQL.
There are helper functions to convert numeric strings to POINT values. Internally, a POINT contains two DOUBLEs. Hence the original DECIMAL(9,8) loses only a round-from-decimal-to-binary at the 53rd significant bit.
But the real question is about using SPATIAL indexing when the universe has 128 dimensions. I don't think it will work. (I have not even heard of using SPATIAL for 3 dimensions, though it should be practical.)

is it possible to use case statement + round() in database

i have table called 'test' and want to calculate based on different codes, most of them should be saved in 5 decimals except certain code, like containing jpy in 3 decimals, and xua in 2 decimals
create table test(
id int, ymd date,
code varchar(10),
price int
)
insert into test(id, ymd, code, price) values
(1, '2019-01-01', 'auus', 75125),
(2, '2019-01-02', 'nzus', 68541),
(3, '2019-01-03', 'xuaus', 131485),
(4, '2019-01-04', 'aujp', 77852),
(5, '2019-01-05', 'usjp', 110852),
(6, '2019-01-06', 'xuaus', 131091)
So my execute code is:
select id, ymd, code, price,
case
when code like '%xua%' then round(price/100,2)
when code like '%jp%' then round(price/1000,3)
else round(price/100000,5)
end as t
from test
ideal result:
id ymd code price t
1 2019-01-01 auus 75125 0.75125
2 2019-01-02 nzus 68541 0.68541
3 2019-01-03 xuaus 131485 1314.85
4 2019-01-04 aujp 77852 77.852
5 2019-01-05 usjp 110852 110.852
6 2019-01-06 xuaus 131091 1310.91
intersting, above sql works well with Mysql, but i am using mariadb, and just can't get results as same as mysql, spent 2 days to fix problem, but still don't know, please help
mysql> select id, ymd, code, price,
-> case
-> when code like '%xua%' then round(price/100,2)
-> when code like '%jp%' then round(price/1000,3)
-> else round(price/100000,5)
-> end as t
-> from test ;
+------+------------+-------+--------+------------+
| id | ymd | code | price | t |
+------+------------+-------+--------+------------+
| 1 | 2019-01-01 | auus | 75125 | 0.75125 |
| 2 | 2019-01-02 | nzus | 68541 | 0.68541 |
| 3 | 2019-01-03 | xuaus | 131485 | 1314.85000 |
| 4 | 2019-01-04 | aujp | 77852 | 77.85200 |
| 5 | 2019-01-05 | usjp | 110852 | 110.85200 |
| 6 | 2019-01-06 | xuaus | 131091 | 1310.91000 |
+------+------------+-------+--------+------------+
6 rows in set (0.04 sec)
Using FORMAT instead of ROUND:
mysql> select id, ymd, code, price,
case when code like '%xua%' then format(price/100,2)
when code like '%jp%' then format(price/1000,3)
else format(price/100000,5) end as t from test;
+------+------------+-------+--------+----------+
| id | ymd | code | price | t |
+------+------------+-------+--------+----------+
| 1 | 2019-01-01 | auus | 75125 | 0.75125 |
| 2 | 2019-01-02 | nzus | 68541 | 0.68541 |
| 3 | 2019-01-03 | xuaus | 131485 | 1,314.85 |
| 4 | 2019-01-04 | aujp | 77852 | 77.852 |
| 5 | 2019-01-05 | usjp | 110852 | 110.852 |
| 6 | 2019-01-06 | xuaus | 131091 | 1,310.91 |
+------+------------+-------+--------+----------+
6 rows in set (0.00 sec)
mysql> select ##version;
+----------------------------------------+
| ##version |
+----------------------------------------+
| 10.3.11-MariaDB-1:10.3.11+maria~bionic |
+----------------------------------------+
1 row in set (0.00 sec)
Note that it includes a "thousands-separator" when appropriate. See the 3rd argument to FORMAT() or the Locale setting to change that.
I suspect it is the display process that is formatting the output differently. By switching from ROUND to FORMAT, I managed to get MariaDB's output to be nearly the same as MySQL's. The remaining difference is the added commas ("thousands separators"), which may show as '.' for some Locales.
In contrast, for MySQL 5.6.22:
+------+------------+-------+--------+---------+
| id | ymd | code | price | t |
+------+------------+-------+--------+---------+
| 1 | 2019-01-01 | auus | 75125 | 0.75125 |
| 2 | 2019-01-02 | nzus | 68541 | 0.68541 |
| 3 | 2019-01-03 | xuaus | 131485 | 1314.85 |
| 4 | 2019-01-04 | aujp | 77852 | 77.852 |
| 5 | 2019-01-05 | usjp | 110852 | 110.852 |
| 6 | 2019-01-06 | xuaus | 131091 | 1310.91 |
+------+------------+-------+--------+---------+
The numeric values are the same, but the display is different. The difference seems to come from the commandline tool mysql, not from ROUND, itself. Note that t is right-justified, implying that the values are seen as numeric.
If this offends someone enough, file a bug report with MariaDB.
77.75200000000001 -- This is representative of some intermediate computation using DOUBLE instead of all DECIMAL. MySQL (and MariaDB) do a reasonably good job of second-guessing where the number are headed. And usually they get away with whatever is done.
In DOUBLE, 77.75200000000001 is not exactly equal to the DECIMAL 77.752 because one is binary, one is decimal. For this reason, I often recommend not using FLOAT or DOUBLE for "money".
Assuming your real goal is to represent a monetary value as 77.7520000000000000000000000..., that is exactly '77.752', and, assuming you need at most 5 decimal places for the various values, I recommend you do this:
t DECIMAL(m, 5)
where m is a suitably large number for any values you may eventually have. For the numbers given, (9,5) will suffice, but I suspect you should do more like DECIMAL(14,5) to allow for a billion dollars/euros/yen/etc.
What I don't know is where in the processing DOUBLE crept in.
Latest 'advice'
Use DECIMAL(14,5) for all monetary values in your system, not INT.
14,5 lets you get up to a billion 'dollars'; change that as needed for your expected max value.
Ignore my comments about FORMAT(); it seems to be too confusing.
Get rid of the CASE clause, at least for that particular usage.
Most arithmetic among DECIMAL values will be exact, and not encounter 77.75200000000001. If it crops up again, start a new Question and include all the steps, datatypes, etc, involved in the computation.
The above notes refer to storing and computing. For displaying, please specify the requirements:
Plan A: 5 decimal places is OK.
Plan B: need to round to 3 or 2 decimals for some values.
Plan C: You have application code, not in SQL, that can deal with the issue.
Plan D:...

sql float zero padding

I am wondering if there is a way to convert a float number in SQL to a zero padded number after the decimal point for example if I have the following table:
.--------------------.-------.
| name | grade |
.--------------------.-------.
| courseE | 5 |
| courseG | 4 |
| courseB | 2.5 |
| courseC | 2.5 |
| courseF | 1.25 |
| courseD | 0 |
I want to convert the field grade to a zerro padded number and the result would be like this:
.--------------------.-------.
| name | grade |
.--------------------.-------.
| courseE | 5.000 |
| courseG | 4.000 |
| courseB | 2.500 |
| courseC | 2.500 |
| courseF | 1.250 |
| courseD | 0.000 |
I have tried to convert the field grade as float by usibg the CAST(EXP) AS FLOAT but why it did not work with me?!
thanks in advance
Your error starts at float - since there's no such data type available for CAST() conversion. Instead you should use DECIMAL, which allows you also to set decimal and numeric parts:
SELECT name, CAST(grade AS DECIMAL(4, 3)) FROM t
About format: first number indicates how many total digits will hold your decimal value, while second number indicates how many decimal digits will be (i.e. after dot). To be more precise, it's not format, it's certain data-type restriction definitions (since decimal is a special fixed-point data type in MySQL)
select convert(decimal(10, 3), #number)
The number "3" represents the number of decimals you want after the "."

Floating Point Types comparisons

I have inserted diff values of pi (see below):
3.14
3.1415
3.14159
3.14159265359
I do not see the different in how the different floating point types handle the same values.
Code:
mysql> select * from test_types;
+---------+---------+---------+----------+
| flo | dub | deci | noomeric |
+---------+---------+---------+----------+
| 3.14000 | 3.14000 | 3.14000 | 3.14000 |
| 3.14150 | 3.14150 | 3.14150 | 3.14150 |
| 3.14159 | 3.14159 | 3.14159 | 3.14159 |
| 3.14150 | 3.14150 | 3.14150 | 3.14150 |
| 3.14159 | 3.14159 | 3.14159 | 3.14159 |
+---------+---------+---------+----------+
5 rows in set (0.00 sec)
mysql> describe test_types;
+----------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+---------------+------+-----+---------+-------+
| flo | float(10,5) | YES | | NULL | |
| noomeric | decimal(10,5) | YES | | NULL | |
| deci | decimal(10,5) | YES | | NULL | |
| dub | double(10,5) | YES | | NULL | |
+----------+---------------+------+-----+---------+-------+
4 rows in set (0.00 sec)
I can see here that when creating the table the field with numeric type used DECIMAL (see describe command table).
Does anybody know an example showing differences between FLOAT, DECIMAL and DOUBLE please?
FLOAT and DOUBLE are meant for very small values or very large values.
Essentially they are the same thing (except differ in storage size FLOAT 4 bytes against DOUBLE 8 bytes, see Data Type Storage Requirements)
The main thing about them is that they are approximate (see quoted from Oracle website):
Because floating-point values are approximate and not stored as exact
values, attempts to treat them as exact in comparisons may lead to
problems. They are also subject to platform or implementation
dependencies.
DECIMAL allows for an exact representation but the reason your DECIMAL column did not work for PI very well is because you allowed for only 5 decimal places but then you fed it 11 decimal places.
The best way to store the value of PI accurate to 11 decimal places is something like DECIMAL(12,11).
For an actual example for values being treated differently when stored as DECIMAL as opposed to same value being stored and used as a FLOAT see below:
CREATE TABLE decimal_vs_float_test
( dec DECIMAL(12,11)
, fl FLOAT
);
INSERT INTO decimal_vs_float_test VALUES
( 3.947947949 , 3.947947949 )
,( 3.777777777 , 3.777777777 )
,( 3.555555555 , 3.555555555 )
,( 3.333333333 , 3.333333333 )
,( 3.111111111 , 3.111111111 )
;
SELECT * FROM decimal_vs_float_test WHERE fl = dec
Now you can see the values for a DECIMAL or a FLOAT treated differently.
Hope that helps.
Additionally FLOAT and DOUBLE are floating binary point types whereas DECIMAL is a floating decimal point type.
See this answer for more exact details on what that means, the difference between how the types are encoded and when is best to use what type (its meant for C# but its still interesting).

need explanation for this MySQL query

I just came across this database query and wonder what exactly this query does..Please clarify ..
select * from tablename order by priority='High' DESC, priority='Medium' DESC, priority='Low" DESC;
Looks like it'll order the priority by High, Medium then Low.
Because if the order by clause was just priority DESC then it would do it alphabetical, which would give
Medium
Low
High
It basically lists all fields from the table "tablename" and ordered by priority High, Medium, Low.
So High appears first in the list, then Medium, and then finally Low
i.e.
* High
* High
* High
* Medium
* Medium
* Low
Where * is the rest of the fields in the table
Others have already explained what id does (High comes first, then Medium, then Low). I'll just add a few words about WHY that is so.
The reason is that the result of a comparison in MySQL is an integer - 1 if it's true, 0 if it's false. And you can sort by integers, so this construct works. I'm not sure this would fly on other RDBMS though.
Added: OK, a more detailed explanation. First of all, let's start with how ORDER BY works.
ORDER BY takes a comma-separated list of arguments which it evalutes for every row. Then it sorts by these arguments. So, for example, let's take the classical example:
SELECT * from MyTable ORDER BY a, b, c desc
What ORDER BY does in this case, is that it gets the full result set in memory somewhere, and for every row it evaluates the values of a, b and c. Then it sorts it all using some standard sorting algorithm (such as quicksort). When it needs to compare two rows to find out which one comes first, it first compares the values of a for both rows; if those are equal, it compares the values of b; and, if those are equal too, it finally compares the values of c. Pretty simple, right? It's what you would do too.
OK, now let's consider something trickier. Take this:
SELECT * from MyTable ORDER BY a+b, c-d
This is basically the same thing, except that before all the sorting, ORDER BY takes every row and calculates a+b and c-d and stores the results in invisible columns that it creates just for sorting. Then it just compares those values like in the previous case. In essence, ORDER BY creates a table like this:
+-------------------+-----+-----+-----+-----+-------+-------+
| Some columns here | A | B | C | D | A+B | C-D |
+-------------------+-----+-----+-----+-----+-------+-------+
| | 1 | 2 | 3 | 4 | 3 | -1 |
| | 8 | 7 | 6 | 5 | 15 | 1 |
| | ... | ... | ... | ... | ... | ... |
+-------------------+-----+-----+-----+-----+-------+-------+
And then sorts the whole thing by the last two columns, which it discards afterwards. You don't even see them it your result set.
OK, something even weirder:
SELECT * from MyTable ORDER BY CASE WHEN a=b THEN c ELSE D END
Again - before sorting is performed, ORDER BY will go through each row, calculate the value of the expression CASE WHEN a=b THEN c ELSE D END and store it in an invisible column. This expression will always evaluate to some value, or you get an exception. Then it just sorts by that column which contains simple values, not just a fancy formula.
+-------------------+-----+-----+-----+-----+-----------------------------------+
| Some columns here | A | B | C | D | CASE WHEN a=b THEN c ELSE D END |
+-------------------+-----+-----+-----+-----+-----------------------------------+
| | 1 | 2 | 3 | 4 | 4 |
| | 3 | 3 | 6 | 5 | 6 |
| | ... | ... | ... | ... | ... |
+-------------------+-----+-----+-----+-----+-----------------------------------+
Hopefully you are now comfortable with this part. If not, re-read it or ask for more examples.
Next thing is the boolean expressions. Or rather the boolean type, which for MySQL happens to be an integer. In other words SELECT 2>3 will return 0 and SELECT 2<3 will return 1. That's just it. The boolean type is an integer. And you can do integer stuff with it too. Like SELECT (2<3)+5 will return 6.
OK, now let's put all this together. Let's take your query:
select * from tablename order by priority='High' DESC, priority='Medium' DESC, priority='Low" DESC;
What happens is that ORDER BY sees a table like this:
+-------------------+----------+-----------------+-------------------+----------------+
| Some columns here | priority | priority='High' | priority='Medium' | priority='Low' |
+-------------------+----------+-----------------+-------------------+----------------+
| | Low | 0 | 0 | 1 |
| | High | 1 | 0 | 0 |
| | Medium | 0 | 1 | 0 |
| | Low | 0 | 0 | 1 |
| | High | 1 | 0 | 0 |
| | Low | 0 | 0 | 1 |
| | Medium | 0 | 1 | 0 |
| | High | 1 | 0 | 0 |
| | Medium | 0 | 1 | 0 |
| | Low | 0 | 0 | 1 |
+-------------------+----------+-----------------+-------------------+----------------+
And it then sorts by the last three invisble columns which are discarded later.
Does it make sense now?
(P.S. In reality, of course, there are no invisible columns and the whole thing is made much trickier to get good speed, using indexes if possible and other stuff. However it is much easier to understand the process like this. It's not wrong either.)