I would like to know is there a way to select randomly generated number between 100 and 500 along with a select query.
Eg: SELECT name, address, random_number FROM users
I dont have to store this number in db and only to use it to display purpose.
I tried it something like this, but it can't get to work..
SELECT name, address, FLOOR(RAND() * 500) AS random_number FROM users
Hope someone help me out.
Thank you
This should give what you want:
FLOOR(RAND() * 401) + 100
Generically, FLOOR(RAND() * (<max> - <min> + 1)) + <min> generates a number between <min> and <max> inclusive.
Update
This full statement should work:
SELECT name, address, FLOOR(RAND() * 401) + 100 AS `random_number`
FROM users
As RAND produces a number 0 <= v < 1.0 (see documentation) you need to use ROUND to ensure that you can get the upper bound (500 in this case) and the lower bound (100 in this case)
So to produce the range you need:
SELECT name, address, ROUND(100.0 + 400.0 * RAND()) AS random_number
FROM users
Additional to this answer, create a function like
CREATE FUNCTION myrandom(
pmin INTEGER,
pmax INTEGER
)
RETURNS INTEGER(11)
DETERMINISTIC
NO SQL
SQL SECURITY DEFINER
BEGIN
RETURN floor(pmin+RAND()*(pmax-pmin));
END;
and call like
SELECT myrandom(100,300);
This gives you random number between 100 and 300
You could create a random number using FLOOR(RAND() * n) as randnum (n is an integer), however if you do not need the same random number to be repeated then you will have to somewhat store in a temp table. So you can check it against with where randnum not in (select * from temptable)...
these both are working nicely:
select round(<maxNumber>*rand())
FLOOR(RAND() * (<max> - <min> + 1)) + <min> // generates a number
between <min> and <max> inclusive.
This is correct formula to find integers from i to j where i <= R <= j
FLOOR(min+RAND()*(max-min))
Related
I have a table which contains thousands of rows and I would like to calculate the 90th percentile for one of the fields, called 'round'.
For example, select the value of round which is at the 90th percentile.
I don't see a straightforward way to do this in MySQL.
Can somebody provide some suggestions as to how I may start this sort of calculation?
Thank you!
First, lets assume that you have a table with a value column. You want to get the row with 95th percentile value. In other words, you are looking for a value that is bigger than 95 percent of all values.
Here is a simple answer:
SELECT * FROM
(SELECT t.*, #row_num :=#row_num + 1 AS row_num FROM YOUR_TABLE t,
(SELECT #row_num:=0) counter ORDER BY YOUR_VALUE_COLUMN)
temp WHERE temp.row_num = ROUND (.95* #row_num);
Compare solutions:
Number of seconds it took on my server to get 99 percentile of 1.3 million rows:
LIMIT x,y with index and no where: 0.01 seconds
LIMIT x,y with no where: 0.7 seconds
LIMIT x,y with where: 2.3 seconds
Full scan with no where: 1.6 seconds
Full scan with where: 5.7 seconds
Fastest solution for large tables using LIMIT x,y ():
Get count of values: SELECT COUNT(*) AS cnt FROM t
Get nth value, where n = (cnt - 1) * (1 - 0.95) : SELECT k FROM t ORDER BY k DESC LIMIT n,1
This solution requires two queries, because mysql does not support specifying variables in LIMIT clause, except for stored procedures (can be optimized with stored procedure). Usually additional query overhead is very low
This solution can be further optimized if you add index to k column and do not use complex where clauses (like 0.01 second for table with 1 million rows, because sorting is not needed).
Implementation example in PHP (can calculate percentile not only of columns, but also of expressions):
function get_percentile($table, $where, $expr, $percentile) {
if ($where) $subq = "WHERE $where";
else $subq = "";
$r = query("SELECT COUNT(*) AS cnt FROM $table $subq");
$w = mysql_fetch_assoc($r);
$num = abs(round(($w['cnt'] - 1) * (100 - $percentile) / 100.0));
$q = "SELECT ($expr) AS prcres FROM $table $subq ORDER BY ($expr) DESC LIMIT $num,1";
$r = query($q);
if (!mysql_num_rows($r)) return null;
$w = mysql_fetch_assoc($r);
return $w['prcres'];
}
// Usage example
$time = get_percentile(
"state", // table
"service='Time' AND cnt>0 AND total>0", // some filter
"total/cnt", // expression to evaluate
80); // percentile
The SQL standard supports the PERCENTILE_DISC and PERCENTILE_CONT inverse distribution functions for precisely this job. Implementations are available in at least Oracle, PostgreSQL, SQL Server, Teradata. Unfortunately not in MySQL. But you can emulate PERCENTILE_DISC in MySQL 8 as follows:
SELECT DISTINCT first_value(my_column) OVER (
ORDER BY CASE WHEN p <= 0.9 THEN p END DESC /* NULLS LAST */
) x,
FROM (
SELECT
my_column,
percent_rank() OVER (ORDER BY my_column) p,
FROM my_table
) t;
This calculates the PERCENT_RANK for each row given your my_column ordering, and then finds the last row for which the percent rank is less or equal to the 0.9 percentile.
This only works on MySQL 8+, which has window function support.
I was trying to solve this for quite some time and then I found the following answer. Honestly brilliant. Also quite fast even for big tables (the table where I used it contained approx 5 mil records and needed a couple of seconds).
SELECT
CAST(SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(field_name ORDER BY
field_name SEPARATOR ','), ',', 95/100 * COUNT(*) + 1), ',', -1) AS DECIMAL)
AS 95th Per
FROM table_name;
As you can imagine just replace table_name and field_name with your table's and column's names.
For further information check Roland Bouman's original post
In MySQL 8 there is the ntile window function you can use:
SELECT SomeTable.ID, SomeTable.Round
FROM SomeTable
JOIN (
SELECT SomeTable, (NTILE(100) OVER w) AS Percentile
FROM SomeTable
WINDOW w AS (ORDER BY Round)
) AS SomeTablePercentile ON SomeTable.ID = SomeTablePercentile.ID
WHERE Percentile = 90
LIMIT 1
https://dev.mysql.com/doc/refman/8.0/en/window-function-descriptions.html#function_ntile
http://www.artfulsoftware.com/infotree/queries.php#68
SELECT
a.film_id ,
ROUND( 100.0 * ( SELECT COUNT(*) FROM film AS b WHERE b.length <= a.length ) / total.cnt, 1 )
AS percentile
FROM film a
CROSS JOIN (
SELECT COUNT(*) AS cnt
FROM film
) AS total
ORDER BY percentile DESC;
This can be slow for very large tables
As pert Tony_Pets answer, but as I noted on a similar question: I had to change the calculation slightly, for example the 90th percentile - "90/100 * COUNT(*) + 0.5" instead of "90/100 * COUNT(*) + 1". Sometimes it was skipping two values past the percentile point in the ordered list, instead of picking the next higher value for the percentile. Maybe the way integer rounding works in mysql.
ie:
.... SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(fieldValue ORDER BY fieldValue SEPARATOR ','), ',', 90/100 * COUNT(*) + 0.5), ',', -1) as 90thPercentile ....
The most common definition of a percentile is a number where a certain percentage of scores fall below that number. You might know that you scored 67 out of 90 on a test. But that figure has no real meaning unless you know what percentile you fall into. If you know that your score is in the 95th percentile, that means you scored better than 95% of people who took the test.
This solution works also with the older MySQL 5.7.
SELECT *, #row_num as numRows, 100 - (row_num * 100/(#row_num + 1)) as percentile
FROM (
select *, #row_num := #row_num + 1 AS row_num
from (
SELECT t.subject, pt.score, p.name
FROM test t, person_test pt, person p, (
SELECT #row_num := 0
) counter
where t.id=pt.test_id
and p.id=pt.person_id
ORDER BY score desc
) temp
) temp2
-- optional: filter on a minimal percentile (uncomment below)
-- having percentile >= 80
An alternative solution that works in MySQL 8: generate a histogram of your data:
ANALYZE TABLE my_table UPDATE HISTOGRAM ON my_column WITH 100 BUCKETS;
And then just select the 95th record from information_schema.column_statistics:
SELECT v,c FROM information_schema.column_statistics, JSON_TABLE(histogram->'$.buckets',
'$[*]' COLUMNS(v VARCHAR(60) PATH '$[0]', c double PATH '$[1]')) hist
WHERE column_name='my_column' LIMIT 95,1
And voila! You will still need to decide whether you take the lower or upper limit of the percentile, or perhaps take an average - but that is a small task now. Most importantly - this is very quick, once the histogram object is built.
Credit for this solution: lefred's blog.
Given an example query like
SELECT IF(
ABS(sum(grades) / count(*) * 100) < 1,
ROUND(sum(grades) / count(*) * 100, 2),
IF(ABS(sum(grades) / count(*) * 100) < 10,
ROUND(sum(grades) / count(*) * 100, 1),
ROUND(sum(grades) / count(*) * 100, 0)
)
)
from important_table
group by timestamp
is it possible to not have to calculate the sum 3 times every grouping, but instead store the sum(grades) / count(*) * 100 in a variable beforehand, still grouped by timestamp, and use that in the query? This looks a bit dirty and repeating to me.
Is that a real optimization possibility? Or does mysql cache it anyway when executing the query?
I wouldn't worry about recalculating the values. The SQL optimizer can decide to calculate them only once if it wants (althoughI don't think the MySQL optimizer does this).
However, assuming that grades is not NULL, you can simplify your expressions by using avg():
select (case when ABS(avg(grades) * 100) < 1
then ROUND(avg(grades) * 100, 2)
when ABS(avg(grades) * 100) < 10
then ROUND(avg(grades) * 100, 1)
else ROUND(avg(grades) * 100, 0)
end)
from important_table
group by timestamp;
Need some help to convert below MYSQL query to DB2 query:
SELECT FROM_UNIXTIME(CEILING((UNIX_TIMESTAMP(count_datetime))/300)*300) AS t,
sum(count_web) as web,
sum(count_mobile) as mobile,
sum(count_total) as total
from clicks_user_count GROUP BY t
ORDER BY `t` DESC
CREATE OR REPLACE FUNCTION FROM_UNIXTIME (P_UTS BIGINT)
RETURNS TIMESTAMP
CONTAINS SQL
DETERMINISTIC
NO EXTERNAL ACTION
RETURN TIMESTAMP('1970-01-01-00.00.00') + CURRENT TIMEZONE + P_UTS SECONDS;
I thinks that's the equivalent to your query
SELECT
timestamp(date(count_datetime)) + (midnight_seconds(count_datetime) / 300 * 300) seconds AS t,
sum(count_web) as web,
sum(count_mobile) as mobile,
sum(count_total) as total
from clicks_user_count
GROUP BY timestamp(date(count_datetime)) + (midnight_seconds(count_datetime) / 300 * 300) seconds
order by t
Below SQL assisted by nfgl able to produce same result in DB2. Thx you.
SELECT
timestamp(date(count_datetime)) + (midnight_seconds(count_datetime) / 300 * 300) seconds AS t,
sum(count_web) as web,
sum(count_mobile) as mobile,
sum(count_total) as total
from clicks_user_count
GROUP BY timestamp(date(count_datetime)) + (midnight_seconds(count_datetime) / 300 * 300) seconds
order by t
I'm not good at sql but I can create,understand common SQL queries. While scouring the net it seems its hard to find a befitting way on this query.
I have a query which is
SELECT COUNT(`BetID`),
FORMAT(SUM(`BetAmount`),0),
FORMAT(SUM(`Payout`),0),
ROUND((SUM(`BetAmount`) / COUNT(`BetID`)),2),
ROUND((((SUM(`BetAmount`) + SUM(`Payout`)) / SUM(`Payout`)) * 100),2)
FROM `betdb`
I would like to subtract the result of
FORMAT(SUM(`BetAmount`),0)
and
FORMAT(SUM(`Payout`),0)
Any other ideas to execute subtraction in this mysql query?
If you want the numbers rounded before subtracting them (which seems to be the case when you want to subtract the formatted numbers), you'll need to round them first to the same precision as the formatting, subtract and lastly format the result;
SELECT COUNT(`BetID`),
FORMAT(SUM(`BetAmount`),0),
FORMAT(SUM(`Payout`),0),
FORMAT(ROUND(SUM(`BetAmount`),0) - ROUND(SUM(`Payout`),0),0) diff,
ROUND((SUM(`BetAmount`) / COUNT(`BetID`)),2),
ROUND((((SUM(`BetAmount`) + SUM(`Payout`)) / SUM(`Payout`)) * 100),2)
FROM `betdb`
A simple SQLfiddle to test with.
Use FORMAT((SUM(BetAmount) - SUM(Payout)),0)
Try this:
SELECT COUNT(`BetID`),
FORMAT(SUM(`BetAmount`),0),
FORMAT(SUM(`Payout`),0),
FORMAT((SUM(`BetAmount`) - SUM(`Payout`)),0),
ROUND((SUM(`BetAmount`) / COUNT(`BetID`)),2),
ROUND((((SUM(`BetAmount`) + SUM(`Payout`)) / SUM(`Payout`)) * 100),2)
FROM `betdb`
You could also try using a join statement so that the calculation is only done once:
SELECT *,t.BetTotal - t.PayoutTotal as Difference
FROM (
SELECT
COUNT(`BetID`) AS Count,
FORMAT(SUM(`BetAmount`),0) as BetTotal,
FORMAT(SUM(`Payout`),0) as PayoutTotal,
ROUND((SUM(`BetAmount`) / COUNT(`BetID`)),2),
ROUND((((SUM(`BetAmount`) + SUM(`Payout`)) / SUM(`Payout`)) * 100),2)
FROM `betdb`
) as t
I'm using following sql code to find out 'ALL' poi closest to the set coordinates, but I would want to find out specific poi instead of all of them. When I try to use the where clause I get an error and it doesn't work and this is where I'm currently stuck, since I only use one table for all the coordinates off all poi's.
SET #orig_lat=55.4058;
SET #orig_lon=13.7907;
SET #dist=10;
SELECT
*,
3956 * 2 * ASIN(SQRT(POWER(SIN((#orig_lat -abs(latitude)) * pi()/180 / 2), 2)
+ COS(#orig_lat * pi()/180 ) * COS(abs(latitude) * pi()/180)
* POWER(SIN((#orig_lon - longitude) * pi()/180 / 2), 2) )) as distance
FROM geo_kulplex.sweden_bobo
HAVING distance < #dist
ORDER BY distance limit 10;
The problem is that you can not reference an aliased column (distancein this case) in a select or where clause. For example, you can't do this:
select a, b, a + b as NewCol, NewCol + 1 as AnotherCol from table
where NewCol = 2
This will fail in both: the select statement when trying to process NewCol + 1 and also in the where statement when trying to process NewCol = 2.
There are two ways to solve this:
1) Replace the reference by the calculated value itself. Example:
select a, b, a + b as NewCol, a + b + 1 as AnotherCol from table
where a + b = 2
2) Use an outer select statement:
select a, b, NewCol, NewCol + 1 as AnotherCol from (
select a, b, a + b as NewCol from table
) as S
where NewCol = 2
Now, given your HUGE and not very human-friendly calculated column :) I think you should go for the last option to improve readibility:
SET #orig_lat=55.4058;
SET #orig_lon=13.7907;
SET #dist=10;
SELECT * FROM (
SELECT
*,
3956 * 2 * ASIN(SQRT(POWER(SIN((#orig_lat -abs(latitude)) * pi()/180 / 2), 2)
+ COS(#orig_lat * pi()/180 ) * COS(abs(latitude) * pi()/180)
* POWER(SIN((#orig_lon - longitude) * pi()/180 / 2), 2) )) as distance
FROM geo_kulplex.sweden_bobo
) AS S
WHERE distance < #dist
ORDER BY distance limit 10;
Edit: As #Kaii mentioned below this will result in a full table scan. Depending on the amount of data you will be processing you might want to avoid that and go for the first option, which should perform faster.
The reason why you cant use your alias in the WHERE clause is the order in which MySQL executes things:
FROM
WHERE
GROUP BY
HAVING
SELECT
ORDER BY
When executing your WHERE clause, the value for your column alias is not yet calculated. This is a good thing, because it would waste a lot of performance. Imagine many (1,000,000) rows -- to use your calculation in the WHERE clause, each of those 1,000,000 would first have to be fetched and calculated so the WHERE condition can compare the calculation results to your expectation.
You can do this explicitly by either
using HAVING (thats the reason why HAVING has another name as WHERE - its a different thing)
using a subquery as illustrated by #MostyMostacho (will effectively do the same with some overhead)
put the complex calculation in the WHERE clause (will effectively give the same performance result as HAVING)
All those will perform almost equally bad: each row is fetched first, the distance calculated and finally filtered by distance before sending the result to the client.
You can gain much (!) better performance by mixing a simple WHERE clause for distance approximation (filtering rows to fetch first) with the more precise euclidian formula in a HAVING clause.
find rows that could match the #distance = 10 condition using a WHERE clause based on simple X and Y distance (bounding box) -- this is a cheap operation.
filter those results using the formula for euclidian distance in a HAVING clause -- this is an expensive operation.
Look at this query to understand what i mean:
SET #orig_lat=55.4058;
SET #orig_lon=13.7907;
SET #dist=10;
SELECT
*,
3956 * 2 * ASIN(SQRT(POWER(SIN((#orig_lat -abs(latitude)) * pi()/180 / 2), 2)
+ COS(#orig_lat * pi()/180 ) * COS(abs(latitude) * pi()/180)
* POWER(SIN((#orig_lon - longitude) * pi()/180 / 2), 2) )) as distance
FROM geo_kulplex.sweden_bobo
/* WHERE clause to pre-filter by distance approximation .. filter results
later with precise euclidian calculation. can use indexes. */
WHERE
/* i'm unsure about geo stuff ... i dont think you want a
distance of 10° here, please adjust this properly!! */
latitude BETWEEN (#orig_lat - #dist) AND (#orig_lat + #dist)
AND longitude BETWEEN (#orig_lon - #dist) AND (#orig_lon + #dist)
/* HAVING clause to filter result using the more precise euclidian distance */
HAVING distance < #dist
ORDER BY distance limit 10;
For those who are interested in the constant:
3956 is the radius of the earth in miles, so the resulting distance is measured in miles
6371 is the radius of the earth in kilometers, so use this constant to measure distance in kilometers
Find more information in the wiki about the Haversine formula