Get rows product (multiplication) - mysql

SO,
The problem
I have an issue with rows multiplication. In SQL, there is a SUM() function which calculates sum for some field for set of rows. I want to get multiplication, i.e. for table
+------+
| data |
+------+
| 2 |
| -1 |
| 3 |
+------+
that will be 2*(-1)*3 = -6 as a result. I'm using DOUBLE data type for storing my data values.
My approach
From school math it is known that log(A x B) = log(A) + log(B) - so that could be used to created desired expression like:
SELECT
IF(COUNT(IF(SIGN(`col`)=0,1,NULL)),0,
IF(COUNT(IF(SIGN(`col`)<0,1,NULL))%2,-1,1)
*
EXP(SUM(LN(ABS(`col`))))) as product
FROM `test`;
-here you see weakness of this method - since log(X) is undefined when X<=0 - I need to count negative signs before calculating whole expression. Sample data and query for this is given in this fiddle.
Another weakness is that we need to find if there is 0 among column values (Since it is a sample, in real situation I'm going to select product for some subset of table rows with some condition(s) - i.e. I can not simply remove 0-s from my table, because result zero product is a valid and expected result for some rows subsets)
Specifics
And now, finally, my question main part: how to handle situation when we have expression like: X*Y*Z and here X < MAXF, Y<MAXF, but X*Y>MAXF and X*Y*Z<MAXF - so we have possible data type overflow (here MAXF is limit for double MySQL data type). The sample is here. Query above works well, but can I always be sure that it will handle that properly? I.e. may be there is another case with overflow issue when some sub-products causing overflow, but entire product is ok (without overflow).
Or may be there is another way to find rows product? Also, in table there possibly be millions of records (-1.1<X<=1.1 mainly, but probably with values such as 100 or 1000 - i.e. high enough to overflow DOUBLE if multiplied with certain quantity if we have an issue that I've described above) - may be calculating via log will be slow?

I guess this would work...
SELECT IF(MOD(COUNT(data < 0),2)=1
, EXP(SUM(LOG(data)))*-1
, EXP(SUM(LOG(data))))
x
FROM my_table;

If you need this type of calculations often, I suggest you store the signs and the logarithms in separate columns.
The signs can be stored as 1 (for positives), -1 (for negatives) and 0 (for zero.)
The logarithm can be assigned for zero as 0 (or any other value) but it should not be used in calculations.
Then the calculation would be:
SELECT
CASE WHEN EXISTS (SELECT 1 FROM test WHERE <condition> AND datasign = 0)
THEN 0
ELSE (SELECT 1-2*(SUM(datasign=-1)%2) FROM test WHERE <condition>)
END AS resultsign,
CASE WHEN EXISTS (SELECT 1 FROM test WHERE <condition> AND datasign = 0)
THEN -1 -- undefined log for result 0
ELSE (SELECT SUM(datalog) FROM test WHERE <condition> AND datasign <> 0)
END AS resultlog
;
This way, you have no overflow problems. You can check the resultlog if it exceeds some limits or just try to calculate resultdata = resultsign * EXP(resultlog) and see if an error is thrown.

This question is a remarkable one in the sea of low quality ones. Thank you, even reading it was a pleasure.
Precision
The exp(log(a)+log(b)) idea is a good one in itself. However, after reading "What Every Computer Scientist Should Know About Floating-Point Arithmetic", make sure you use DECIMAL or NUMERIC data types to be sure you are using Precision Math, or else your values will be surprisingly inaccurate. For a couple of million rows, errors can add up very quickly! DECIMAL (as per the MySQL doc) has a maximum of 65 digits precision, while for example 64bit IEEE754 floating point values have only up to 16 digits (log10(2^52) = 15.65) precision!
Overflow
As per the relevant part of the MySQL doc:
Integer overflow results in silent wraparound.
DECIMAL overflow results in a truncated result and a warning.
Floating-point overflow produces a NULL result. Overflow for some operations can result in +INF, -INF, or NaN.
So you can detect floating point overflow if it would ever happen.
Sadly, if a series of operations would result in a correct value, fitting into the data type used, but at least one subresult in the process of calculations would not, then you won't get the correct value at the end.
Performance
Premature optimization is the root of all evil. Try it, and if it is slow, take the appropriate actions. Doing this might not be lightning quick, but still might be quicker than getting all the results, and doing it on the application server. Only measurements can decide which gets to be quicker...

Related

MySQL round in query, wrong result

I have a question about a query that I'm running on a MySQL Server (v5.5.50-0+deb8u1).
SELECT 12 - (SELECT qty FROM Table WHERE id = 5213) AS Amount
so Amount value is 12 - 8,5500000000000007 = 3.4499999999999993
But if I run the query:
SELECT qty FROM Table WHERE id = 5213
it returns 8.55 that is the correct number written in the record, so I was expecting that the first querty returned 3.45.
The "qty" column in the table "Table" is a DOUBLE.
How is it possibile? How can I get the right answer from the query?
thanks in advance
Well that's just the way floating numbers are.
Floating-point numbers sometimes cause confusion because they are
approximate and not stored as exact values. A floating-point value as
written in an SQL statement may not be the same as the value
represented internally.
This statement holds true for many programming languages as well. Some numbers don't even have an exact representation. Here's something from the python manual
The problem is easier to understand at first in base 10. Consider the
fraction 1/3. You can approximate that as a base 10 fraction:
0.3 or, better,
0.33 or, better,
0.333 and so on. No matter how many digits you’re willing to write down, the result will never be exactly 1/3, but will be an
increasingly better approximation of 1/3.
In the same way, no matter how many base 2 digits you’re willing to
use, the decimal value 0.1 cannot be represented exactly as a base 2
fraction. In base 2, 1/10 is the infinitely repeating fraction
So in short generally doing is float1 = float2 type of comparison is a bad idea but everyone keeps forgetting it.
You can define 'qty' column as decimal(10,2)

How to know the number of positions on the right of the decimal point in a float?

I'm preparing some mapping sheets for migrating an actual MYSQL database to a new ORACLE one. Some of the data are defined as float, but I would like to know exactly the length of the value in the column having the maximum decimals after the point. This would help me to restrict the data type instead of declaring it as a NUMBER.
Is there an easy way to do this in MySQL? I've tried with a regular expression but it does not match all values (I've found a value like 7.34397493274) but the following regex does not retrieve it:
SELECT column
from `db`.`table`
where column REGEXP '^-?[0-9]+\.[0-9]{7,}$' =1;
Thanks
You are going down the wrong track. There is no convenient answer to "how many digits are to the right of the decimal point in a floating point number". There is an answer to the "precision" of a floating point number. That is 23. The relationship between precision and the numbers to the right of the floating point number depends on the scale factor.
You might want to review the documentation entitled Problems with Floating Point Numbers.
More concretely, the problem is that a particular number might be represented as:
1.200000000001
or
1.199999999997
(I'm not saying these are actual representations, just examples.) What value would you give for the numbers to the right of the decimal point? By representing the values as floats, the database has lost this information.
Instead, you have several options:
Just use NUMBER, which is generally a reasonable type.
Use BINARY_FLOAT, which would be the same type.
Understand the application to figure out how many decimal points are actually needed.
Play games with the representation, looking for strings of zeros and nines (say four in a row) and assume they are not significant.
If you are looking to find the length after the decimal point then in mysql you can use substring_index and length function together as
mysql> select length(substring_index('7.34397493274','.',-1)) as len;
+-----+
| len |
+-----+
| 11 |
+-----+
1 row in set (0.00 sec)

Give an unique 6 or 9 digit number to each row

Is it possible to assign an unique 6 or 9 digit number to each new row only with MySQL.
Example :
id1 : 928524
id2 : 124952
id3 : 485920
...
...
P.S : I can do that with php's rand() function, but I want a better way.
MySQL can assign unique continuous keys by itself. If you don't want to use rand(), maybe this is what you meant?
I suggest you manually set the ID of the first row to 100000, then tell the database to auto increment. Next row should then be 100001, then 100002 and so on. Each unique.
Don't know why you would ever want to do this but you will have to use php's rand function, see if its already in the database, if it is start from the beginning again, if its not then use it for the id.
Essentially you want a cryptographic hash that's guaranteed not to have a collision for your range of inputs. Nobody seems to know the collision behavior of MD5, so here's an algorithm that's guaranteed not to have any: Choose two large numbers M and N that have no common divisors-- they can be two very large primes, or 2**64 and 3**50, or whatever. You will be generating numbers in the range 0..M-1. Use the following hashing function:
H(k) = k*N (mod M)
Basic number theory guarantees that the sequence has no collisions in the range 0..M-1. So as long as the IDs in your table are less than M, you can just hash them with this function and you'll have distinct hashes. If you use unsigned 64-bit integer arithmetic, you can let M = 2**64. N can then be any odd number (I'd choose something large enough to ensure that k*N > M), and you get the modulo operation for free as arithmetic overflow!
I wrote the following in comments but I'd better repeat it here: This is not a good way to implement access protection. But it does prevent people from slurping all your content, if M is sufficiently large.

The correct way to manipulate doubles on MYSQL (precision)?

I have a 1-cent-auction website that increases bids by 1 cent on every bid
The current_bid field is a DOUBLE on mysql that represents the bid in dollars, and i need to avoid cases like 0.2 + 0.1 = 0.299999999
(not sure if it's the right result format but you get the idea)
I have had lots of cases other than with these two numbers because of precision..
Now, here is my code : (i am hoping its correct and efficient, otherwise, i am open to your ideas)
UPDATE `auctions` SET
`current_bid` = ROUND(ROUND(`current_bid` * 100) + 1)/100
...
Is it too late to switch to DECIMAL? Floating point column types provide approximate values; it's a well known fact and it's by design.

Intriguing sql query (not in select)

I have the following sql query in mysql:
SELECT *
FROM _t_test
WHERE pret NOT
IN ( 2.6700, 2.6560, 1.8200 )
I would expect the rows with the value 1.8200 not to be shown, yet I still get them.
Am I missing something?
The field "pret" is double(16,4).
This is a rounding error. A double is not an exact value, so 1.8200 isn't represented exactly, so the values are not exactly the same.
For MYSQL floating points, see http://dev.mysql.com/doc/refman/5.0/en/problems-with-float.html
The correct way to do floating-point number comparison is to first
decide on an acceptable tolerance for differences between the numbers
and then do the comparison against the tolerance value. For example,
if we agree that floating-point numbers should be regarded the same if
they are same within a precision of one in ten thousand (0.0001), the
comparison should be written to find differences larger than the
tolerance value
See http://en.wikipedia.org/wiki/Double_precision_floating-point_format