Is there a similar function like mt_rand for MYSQL? I've searched everywhere but can't seem to find one.
DELIMITER $$
CREATE FUNCTION `my_rand`(`arg_min` INT, `arg_max` INT) RETURNS int(11)
BEGIN
RETURN ROUND( arg_min + RAND( ) * (arg_max-arg_min) );
END
invocation:
SELECT my_rand(10,15)
will output one integer number between 10 and 15.
As a sidenote... if you wanted the benefits of mt_rand, intending 'Marsenne Twister', then I fear you should look for an UDF implementation (maybe you should check this Statistics for mySQL).
If instead you look for a function that simply spits out a random integer between min and max, this should do the trick.
also, if we used FLOOR as a rounding function, it would have been impossible to get 15 as a result of my example.
this is because, from the docs:
RAND() Returns a random floating-point value v in the range 0 <= v <
1.0.
Reading immediately after RAND, you can find ROUND in the docs... it states that ROUND has pretty implementation-dependant behaviour for rounding, so its behaviour may change (and has changed) between mysql versions. This is probably the reason why the docs suggest rounding RAND with FLOOR instead of ROUND. So, basically it seems debatable which choice is best... "FLOOR or ROUND, make your choice..."
Finally... take a look at this article. It may seem scary but at least it makes it clear that the question was 'difficult', indeed.
Order by RAND() by Jan Kneschke
Related
SO,
The problem
I have an issue with rows multiplication. In SQL, there is a SUM() function which calculates sum for some field for set of rows. I want to get multiplication, i.e. for table
+------+
| data |
+------+
| 2 |
| -1 |
| 3 |
+------+
that will be 2*(-1)*3 = -6 as a result. I'm using DOUBLE data type for storing my data values.
My approach
From school math it is known that log(A x B) = log(A) + log(B) - so that could be used to created desired expression like:
SELECT
IF(COUNT(IF(SIGN(`col`)=0,1,NULL)),0,
IF(COUNT(IF(SIGN(`col`)<0,1,NULL))%2,-1,1)
*
EXP(SUM(LN(ABS(`col`))))) as product
FROM `test`;
-here you see weakness of this method - since log(X) is undefined when X<=0 - I need to count negative signs before calculating whole expression. Sample data and query for this is given in this fiddle.
Another weakness is that we need to find if there is 0 among column values (Since it is a sample, in real situation I'm going to select product for some subset of table rows with some condition(s) - i.e. I can not simply remove 0-s from my table, because result zero product is a valid and expected result for some rows subsets)
Specifics
And now, finally, my question main part: how to handle situation when we have expression like: X*Y*Z and here X < MAXF, Y<MAXF, but X*Y>MAXF and X*Y*Z<MAXF - so we have possible data type overflow (here MAXF is limit for double MySQL data type). The sample is here. Query above works well, but can I always be sure that it will handle that properly? I.e. may be there is another case with overflow issue when some sub-products causing overflow, but entire product is ok (without overflow).
Or may be there is another way to find rows product? Also, in table there possibly be millions of records (-1.1<X<=1.1 mainly, but probably with values such as 100 or 1000 - i.e. high enough to overflow DOUBLE if multiplied with certain quantity if we have an issue that I've described above) - may be calculating via log will be slow?
I guess this would work...
SELECT IF(MOD(COUNT(data < 0),2)=1
, EXP(SUM(LOG(data)))*-1
, EXP(SUM(LOG(data))))
x
FROM my_table;
If you need this type of calculations often, I suggest you store the signs and the logarithms in separate columns.
The signs can be stored as 1 (for positives), -1 (for negatives) and 0 (for zero.)
The logarithm can be assigned for zero as 0 (or any other value) but it should not be used in calculations.
Then the calculation would be:
SELECT
CASE WHEN EXISTS (SELECT 1 FROM test WHERE <condition> AND datasign = 0)
THEN 0
ELSE (SELECT 1-2*(SUM(datasign=-1)%2) FROM test WHERE <condition>)
END AS resultsign,
CASE WHEN EXISTS (SELECT 1 FROM test WHERE <condition> AND datasign = 0)
THEN -1 -- undefined log for result 0
ELSE (SELECT SUM(datalog) FROM test WHERE <condition> AND datasign <> 0)
END AS resultlog
;
This way, you have no overflow problems. You can check the resultlog if it exceeds some limits or just try to calculate resultdata = resultsign * EXP(resultlog) and see if an error is thrown.
This question is a remarkable one in the sea of low quality ones. Thank you, even reading it was a pleasure.
Precision
The exp(log(a)+log(b)) idea is a good one in itself. However, after reading "What Every Computer Scientist Should Know About Floating-Point Arithmetic", make sure you use DECIMAL or NUMERIC data types to be sure you are using Precision Math, or else your values will be surprisingly inaccurate. For a couple of million rows, errors can add up very quickly! DECIMAL (as per the MySQL doc) has a maximum of 65 digits precision, while for example 64bit IEEE754 floating point values have only up to 16 digits (log10(2^52) = 15.65) precision!
Overflow
As per the relevant part of the MySQL doc:
Integer overflow results in silent wraparound.
DECIMAL overflow results in a truncated result and a warning.
Floating-point overflow produces a NULL result. Overflow for some operations can result in +INF, -INF, or NaN.
So you can detect floating point overflow if it would ever happen.
Sadly, if a series of operations would result in a correct value, fitting into the data type used, but at least one subresult in the process of calculations would not, then you won't get the correct value at the end.
Performance
Premature optimization is the root of all evil. Try it, and if it is slow, take the appropriate actions. Doing this might not be lightning quick, but still might be quicker than getting all the results, and doing it on the application server. Only measurements can decide which gets to be quicker...
I have a query
SELECT MAX(CAST(user_name as SIGNED)) as max_id FROM (`users`)
it returns
2.01303045556E+12
but actually the maximum value is 2013030455555
Anybody know how it happens??
That is correct.
2.01303045556E+12 actually IS 2013030455555.
x E+12 means x*10 ^ 12
2*10^12=2000000000000 (2 followed by 12 zeros).
This is expotential (usually floating point) number representation. See Scientific notation at wikipedia (scroll down to "E notation").
To get rid of it you may cast that data to decimal or integer, instead of float. Maybe there are better methods, but I dont know them.
Example:
-- example for 16 digits
SELECT MAX(CAST(user_name as DECIMAL(16,0)) as max_id FROM (`users`)
Another solution: change format of the number in SQL or maybe PHP if you are using it.
First at all I have a very concrete question, but maybe an alternative approach to my problem (second part) could also help me.
Is there a way to address a character in a string via its index in mysql. (i.e. in PHP $var[2] will give you the 3rd charater)?
The obvious way is SUBSTRING(var, 3,1 ) but since my strings are 1024 character long I assume this is not the fastest solution. As displayed in the code sample using substring to retrieve the tail of the string also gain no performance difference. Is there maybe a way to iterate over a string? (Shift the first element?)
CREATE FUNCTION hashDiff( hash1 TEXT(1024), hash2 TEXT(1024), threshold INT)
RETURNS INT
DETERMINISTIC
BEGIN
DECLARE diff, x, b1, b2 INT;
SET diff =0;
SET x = 0;
WHILE (x<1024 AND diff<threshold) DO
SET b1 = ASCII(hash1); --uses first character only!!
SET b2 = ASCII(hash2);
SET hash1=SUBSTRING(hash1, 2 );
SET hash2=SUBSTRING(hash2, 2 );
SET diff=diff+ ((b1-b2)*(b1-b2));
SET x=x+1;
END WHILE;
RETURN diff;
END
If you not already read it from the code, I try to write a stored procedure to calculate the difference or distance between to hashes. The difference is the sum of the character-wise square distances (i.e. hashDiff(AA,AC)=(65-65)²+(65-67)²=4). The first major performance boost could be achieved by introducing a threshold to cancel the calculation if the hashes are already to different. But since mysql is not my "every day" language, I stuck at this point in finding other optimizations. For completeness two sample hashes:
YAAAAAAYAAAYAAVAAQAARAOAAOAQASAQAMAKAKAJIAJAJIAHAHIAKJAIIAHHAHIIAIHGAGFFAGGFEAFEEEEAEDDDDDAEEEEDEEEFAFFFFFFEFFFEFFFFFGFEEFFEEEFFFJEFFEEEEEEELFFFFEEFJEEEEDIEEEEEIEEEEHEEEJEEFKFEFKGGFNHGOIIJTJKYONYNMTGHNHHQISJJQIKWLXJJSMYRQWJOGKDDFCCBBAAAAAAAAAAAAAAAAAAAAAAAYAAAAAAYAAAYAAWAARAASASAAQARAUAYAYATAOALKAJAJIAIAHHAHGAGFAFFAEFFAEFFAFFFAEEFFAFEEEDADEDDDDADDDCDDDDDAEEEFEEEEDDDEEEDEDDEEEFEFFGGFMFHGFFFGFFFLGHGGHGGNHHGGGOHGHGHMGGFGMFFFMFGFLFFFMGFFMGGMGGGNGGMGGLGGLGGMGGLEIEEHDCGCGCDGDGDCGDFCECCECECECECFCECFCFCFCFCFCGCJGYCYAAAAAAYAAAYAAUAATAAUAUAAUARARAQAPAPASARRAPARQAPAQQAQQAQSAKMATKKAIIHAIHGAGGGGAGHHGGAGGFGFFAFFGEFFFFFAFFGFGGGFFFEEFGFFGGFGGHIJJLKLWLKJJIJJJKJRLJKLKKKUKLLKKUMMKJIQIIIISKJJWKLLXMLMYMLNYMMYMLLWJIQIINFGKFFKEEIDHEDHDDFCECCFDECCFCFDGCDGCGCGEGCDCECECFDFCGDGCIEKEOAYNFBREUXKPQMMQTKTMMNJLPPVYYYTOUOPOLLJKKJJJIJIMJJJLIJJLLJIIHHIHHHIGHIHIHJHHHJHHIHGHGHFGHGFFEFEEEFEFEFFGGHIHIHGHGHHIIIIHIIJMNLONKLKKKKKKKMLKKLONMKOOOMLOPONMNMKKLLKKLMNKLMMMNMOPPOORPORSSVRTSSRTRRTSSTTXSTQRPONOKKLKLJMKJJIJIIHHHIIIJHIJIJJIJIKJIMWMYYDAAAAAAAAAAA
AAAAAAAAAAABAABAACAACACAACADADAEADADADADDAEAEEAEAFEAEEAEFAFGAGGGAGGGAHHHAHIIIAIHIJHAIIHIHHAJIHIJIJKJAJJJIKJJJJKKJKJKKLKLKLLMMMNNMYOOOOOOPOONYOONONNPYNOOOPYOOPPPYNONNYMLLWLLKUJIISHIHOGGMFGFLFFMGGLFGLGFLFFKFKFFLEEKFLEFJFKFGNGNHLFHJFIEGDIEKGOIRFGBBAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABABACACDACADDACADDADDAEDAFEAEEFAFFFAFFGAFGGGAIGHHIAHIHHHHAHIHIIJJIIAIIIIJIJKJIIIIJJHIIHIIIIJIIIIRJJJJKJJJJLVKLLKLLKXLMMKMXMLLLMWMMMMYMNLYMNNYNNMYMMNYMLYLMLXKJRIHPHIMGGMFEJEJEEIEEHDGCDFCFDCFCECECCEBEBECFDGCFDNGLDBAAAAAAAAAAAAAAAAAAAAAABAAAAABAABABAACACACACACACACADDADAEEAFAFGAFGAHGAGGAGGHAGGIAIHJAJJJJAJKKKKAMLMNNNANOMMNNMMNAONMNOOOMOOPOMNOMMNPOOPPPPRQQYPPRPPPPPNOYLLMMMMLYLMLMLYLMLMMYLNNMYNLLWMLKXLLLUKIKQIIQGHHPFHNGFLFFLGFJEEJEIDDIDCHDFCDGCFCCFCECECCECFCGDGDHDHDIFIDEBBAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABAABBBBBCBCCCCDCCCCCCCCDDDDEDEEEEFFDEGGHGHHHGHHHHHHIIJJJJJIJJJJJJIKJJKLKKMMNMMMMMMMNNNNNNLNNONPONNNOOOOPQQQRSSSSSSUTSTUUUVWVVXUYXWVXVXWYVYWYVYYUWVUTTSSPQPQOPOPONONOMONOOONNNMMNLJJKJIIJHHGGGFHFGFFFFEEEDDEEEEFGGIGJLRNEAAAAAAAAAAAAA
Any help or hint would be appreciated.
The only way you would be able to use an array of sorts would be to use a temporary tables and cursors/resultsets.
The problem is you will still need to iterate over the strings and use substring to break them apart. To my knowledge there is no 'wordwrap' or 'explode' function to chop up the string.
I'm building a 'find my nearest' script whereby my client has provided me with a list of their locations. After some research, I determined that the way to do this was to geocode the address/postcode given by the user, and use the Haversine formula to calculate the distance.
Formula wise, I got the answer I was looking for from this question (kudos to you guys). So I won't repeat the lengthy query/formula here.
What i'd like to have been able to do though, as an example - is something like:
SELECT address, haversine(#myLat,#myLong,db_lat,db_long,'MILES') .....
This would be just easier to remember, easier to read later, and more re-usable by copying the function into future projects without having to relearn / re-integrate the big formula. Additionally, the last argument could help with being able to return distances in different units.
Is it possible to create a user MySQL function / procedure to do this, and how would I go about it? (I assume this is what they are for, but i've never needed to use them!)
Would it offer any speed difference (either way) over the long version?
Yes, you can create a stored function for this purpose. Something like this:
DELIMITER //
DROP FUNCTION IF EXISTS Haversine //
CREATE FUNCTION Haversine
( myLat FLOAT
, myLong FLOAT
, db_lat FLOAT
, db_long FLOAT
, unit VARCHAR(20)
)
RETURNS FLOAT
DETERMINISTIC
BEGIN
DECLARE haver FLOAT ;
IF unit = 'MILES' --- calculations
SET haver = ... --- calculations
RETURN haver ;
END //
DELIMITER ;
I don't think it offers any speed gains but it's good for all the other reasons you mention: Readability, reusability, ease of maintenance (imagine you find an error after 2 years and you have to edit the code in a (few) hundred places).
I am attempting to get a random bearing, from 0 to 359.9.
SET bearing = FLOOR((RAND() * 359.9));
I may call the procedure that runs this request within the same while loop, immediately one after the next. Unfortunately, the randomization seems to be anything but unique. e.g.
Results
358.07
359.15
357.85
I understand how randomization works, and I know because of my quick calls to the same function, the ticks used to generate the random number are very close to one another.
In any other situation, I would wait a few milliseconds in between calls or reinit my Random object (such as in C#), which would greatly vary my randomness. However, I don't want to wait in this situation.
How can I increase randomness without waiting?
I understand how randomization works, and I know because of my quick calls to the same function, the ticks used to generate the random number are very close to one another.
That's not quite right. Where folks get into trouble is when they re-seed a random number generator repeatedly with the current time, and because they do it very quickly the time is the same and they end up re-seeding the RNG with the same seed. This results in the RNG spitting out the same sequence of numbers each time it is re-seeded.
Importantly, by "the same" I mean exactly the same. An RNG is either going to return an identical sequence or a completely different one. A "close" seed won't result in a "similar" sequence. You will either get an identical sequence or a totally different one.
The correct solution to this is not to stagger your re-seeds, but actually to stop re-seeding the RNG. You only need to seed an RNG once.
Anyways, that is neither here nor there. MySQL's RAND() function does not require explicit seeding. When you call RAND() without arguments the seeding is taken care of for you meaning you can call it repeatedly without issue. There's no time-based limitation with how often you can call it.
Actually your SQL looks fine as is. There's something missing from your post, in fact. Since you're calling FLOOR() the result you get should always be an integer. There's no way you'll get a fractional result from that assignment. You should see integral results like this:
187
274
89
345
That's what I got from running SELECT FLOOR(RAND() * 359.9) repeatedly.
Also, for what it's worth RAND() will never return 1.0. Its range is 0 ≤ RAND() < 1.0. You are safe using 360 vs. 359.9:
SET bearing = FLOOR(RAND() * 360);