Select nth percentile from MySQL

Select nth percentile from MySQL - mysql

I have a simple table of data, and I'd like to select the row that's at about the 40th percentile from the query.
I can do this right now by first querying to find the number of rows and then running another query that sorts and selects the nth row:
select count(*) as `total` from mydata;
which may return something like 93, 93*0.4 = 37
select * from mydata order by `field` asc limit 37,1;
Can I combine these two queries into a single query?

This will give you approximately the 40th percentile, it returns the row where 40% of rows are less than it. It sorts rows by how far they are from the 40th percentile, since no row may fall exactly on the 40th percentile.
SELECT m1.field, m1.otherfield, count(m2.field)
FROM mydata m1 INNER JOIN mydata m2 ON m2.field<m1.field
GROUP BY
m1.field,m1.otherfield
ORDER BY
ABS(0.4-(count(m2.field)/(select count(*) from mydata)))
LIMIT 1

As an exercise in futility (your current solition would probably be faster and prefered), if the table is MYISAM (or you can live with the approximation of InnoDB):
SET #row =0;
SELECT x.*
FROM information_schema.tables
JOIN (
SELECT #row := #row+1 as 'row',mydata.*
FROM mydata
ORDER BY field ASC
) x
ON x.row = round(information_schema.tables.table_rows * 0.4)
WHERE information_schema.tables.table_schema = database()
AND information_schema.tables.table_name = 'mydata';

There's also this solution, which uses a monster string made by GROUP_CONCAT. I had to up the max on the output like so to get it to work:
SET SESSION group_concat_max_len = 1000000;
MySql wizards out there: feel free to comment on the relative performance of the methods.

Related

SQL subquery with reference to parent table gives "Unknown column 'test_results.id' in 'where clause'"

First off, take a look at diagram (this is an application for testing students knowledge)
I already have working application, which calculates score (in percents), but to sort by score, it is required to select all the records (of current test). And it drastically slows down app (~ 10 seconds of waiting). So I decided to move that logic into single sql query.
Now, my SQL query looks like this:
select test_results.*,
(
select test_result_total_score * 100 / test_result_total_max_score
from (
select (select sum(question_score)
from (
select question_total_right_answers = question_total_options as question_score
from (
select (
select count(*)
from answers
inner join answer_options on answer_options.id = answers.answer_option_id
where answers.asked_question_id = asked_questions.id
and answers.is_chosen = answer_options.is_right
) as question_total_right_answers,
(
select count(*)
from answers
left join answer_options on answer_options.id = answers.answer_option_id
where answers.asked_question_id = asked_questions.id
) as question_total_options
from asked_questions
where asked_questions.test_result_id = test_results.id
) as rigt_per_question
) as questions_scores) as test_result_total_score,
(select count(*)
from asked_questions
where asked_questions.test_result_id = test_results.id) as test_result_total_max_score
) as right_per_test_result
) as result_in_percents
from test_results
where test_results.id between 1 and 200;
Here is what it should do: for each asked question collect how many answer options there are (question_total_options) and how many answers user selected right (question_total_right_answers) - the very nested subqueries.
Then for each of this results calculate score (this is basically 1 if user selected all right options and 0 if at least one option is selected not right).
After that, we sum scores of all that questions (test_result_total_score - how many questions user answered right). Also, we calculate how many questions there are in test result (test_result_total_max_score).
With that information we can calculate percentage of right answered questions (test_result_total_score * 100 / test_result_total_max_score)
And the error is on lines 23 and 28:
where asked_questions.test_result_id = test_results.id
where asked_questions.test_result_id = test_results.id) as test_result_total_max_score
It says: [42S22][1054] Unknown column 'test_results.id' in 'where clause'
I have tried using variable #test_result_id like this:
select test_results.*,
#test_result_id := test_results.id,
( ... )
where asked_questions.test_result_id = #test_result_id
where asked_questions.test_result_id = #test_result_id) as test_result_total_max_score
And it evaluates, but in wrong way (probably because order of evaluation select values is undefined). BTW, all result_in_percents correspond to very first result.

For those facing similar problem, it seems that there is no simple solution.
First off, you can try rewrite your subqueries with joins as I did (see below). But when you would like to perform group operations on grouped results, you are really unhappy person). A "dirty" solution might be create function to overcome barrier of nesting subqueries.
create function test_result_in_percents(test_result_id bigint unsigned)
returns float
begin
return (
select sum(tmp.question_right) * 100 / count(*)
from (select sum(answers.is_chosen = answer_options.is_right) = count(*) as question_right
, asked_questions.test_result_id as test_result_id
from answers
inner join answer_options on answer_options.id = answers.answer_option_id
inner join asked_questions on asked_questions.id = answers.asked_question_id
where asked_questions.test_result_id = test_result_id
group by answers.asked_question_id
) as tmp
group by test_result_id
);
end;
And then, just use this function:
select (test_result_in_percents(test_results.id)) as `result_percents`
from `test_results`
where `test_results`.`test_id` = 181
and `test_results`.`test_id` is not null
order by `test_results`.`id` desc;

MySQL limit work around

I need to limit records based on percentage but MYSQL does not allow that. I need 10 percent User Id of (count(User Id)/max(Total_Users_bynow)
My code is as follows:
select * from flavia.TableforThe_top_10percent_of_the_user where `User Id` in (select distinct(`User Id`) from flavia.TableforThe_top_10percent_of_the_user group by `User Id` having count(distinct(`User Id`)) <= round((count(`User Id`)/max(Total_Users_bynow))*0.1)*count(`User Id`));
Kindly help.

Consider splitting your problem in pieces. You can use user variables to get what you need. Quoting from this question's answers:
You don't have to solve every problem in a single query.
So... let's get this done. I'll not put your full query, but some examples:
-- Step 1. Get the total of the rows of your dataset
set #nrows = (select count(*) from (select ...) as a);
-- --------------------------------------^^^^^^^^^^
-- The full original query (or, if possible a simple version of it) goes here
-- Step 2. Calculate how many rows you want to retreive
-- You may use "round()", "ceiling()" or "floor()", whichever fits your needs
set #limrows = round(#nrows * 0.1);
-- Step 3. Run your query:
select ...
limit #limrows;
After checking, I found this post which says that my above approach won't work. There's, however, an alternative:
-- Step 1. Get the total of the rows of your dataset
set #nrows = (select count(*) from (select ...) as a);
-- --------------------------------------^^^^^^^^^^
-- The full original query (or, if possible a simple version of it) goes here
-- Step 2. Calculate how many rows you want to retreive
-- You may use "round()", "ceiling()" or "floor()", whichever fits your needs
set #limrows = round(#nrows * 0.1);
-- Step 3. (UPDATED) Run your query.
-- You'll need to add a "rownumber" column to make this work.
select *
from (select #rownum := #rownum+1 as rownumber
, ... -- The rest of your columns
from (select #rownum := 0) as init
, ... -- The rest of your FROM definition
order by ... -- Be sure to order your data
) as a
where rownumber <= #limrows
Hope this helps (I think it will work without a quirk this time)

merging SQL statements and how can it affect processing time

Let's assume I have the following tables:
items table
item_id|view_count
item_views table
view_id|item_id|ip_address|last_view
What I would like to do is:
If last view of item with given item_id by given ip_address was 1+ hour ago I would like to increment view_count of item in items table. And as a result get the view count of item. How I will do it normally:
q = SELECT count(*) FROM item_views WHERE item_id='item_id' AND ip_address='some_ip' AND last_view < current_time-60*60
if(q==1) then q = UPDATE items SET view_count = view_count+1 WHERE item_id='item_id'
//and finally get view_count of item
q = SELECT view_count FROM items WHERE item_id='item_id'
Here I used 3 SQL queries. How can I merge it into one SQL query? And how can it affect the processing time? Will it be faster or slower than previous method?

I don't think your logic is correct for what you describe that you want. The query:
SELECT count(*)
FROM item_views
WHERE item_id='item_id' AND
ip_address='some_ip' AND
last_view < current_time-60*60
is counting the number of views longer ago than your time frame. I think you want:
last_view > current_time-60*60
and then have if q = 0 on the next line.
MySQL is pretty good with the performance of not exists, so the following should work well:
update items
set view_count = view_count+1
WHERE item_id='item_id' and
not exists (select 1
from item_views
where item_id='item_id' AND
ip_address='some_ip' AND
last_view > current_time-60*60
)
It will work much better with an index on item_views(item_id, ip_address, last_view) and an index on item(item_id).
In MySQL scripting, you could then write:
. . .
set view_count = (#q := view_count+1)
. . .
This would also give you the variable you are looking for.

update target
set target.view_count = target.view_count + 1
from items target
inner join (
select item_id
from item_views
where item_id = 'item_id'
and ip_address = 'some_ip'
and last_view < current_time - 60*60
) ref
on ref.item_id = target.item_id;
You can only combine the update statement with the condition using a join as in the above example; but you'll still need a separate select statement.
It may be slower on very large set and/or unindexed table.

What is the best way to pick a random row from a table in MySQL? [duplicate]

What is a fast way to select a random row from a large mysql table?
I'm working in php, but I'm interested in any solution even if it's in another language.

Grab all the id's, pick a random one from it, and retrieve the full row.
If you know the id's are sequential without holes, you can just grab the max and calculate a random id.
If there are holes here and there but mostly sequential values, and you don't care about a slightly skewed randomness, grab the max value, calculate an id, and select the first row with an id equal to or above the one you calculated. The reason for the skewing is that id's following such holes will have a higher chance of being picked than ones that follow another id.
If you order by random, you're going to have a terrible table-scan on your hands, and the word quick doesn't apply to such a solution.
Don't do that, nor should you order by a GUID, it has the same problem.

I knew there had to be a way to do it in a single query in a fast way. And here it is:
A fast way without involvement of external code, kudos to
http://jan.kneschke.de/projects/mysql/order-by-rand/
SELECT name
FROM random AS r1 JOIN
(SELECT (RAND() *
(SELECT MAX(id)
FROM random)) AS id)
AS r2
WHERE r1.id >= r2.id
ORDER BY r1.id ASC
LIMIT 1;

MediaWiki uses an interesting trick (for Wikipedia's Special:Random feature): the table with the articles has an extra column with a random number (generated when the article is created). To get a random article, generate a random number and get the article with the next larger or smaller (don't recall which) value in the random number column. With an index, this can be very fast. (And MediaWiki is written in PHP and developed for MySQL.)
This approach can cause a problem if the resulting numbers are badly distributed; IIRC, this has been fixed on MediaWiki, so if you decide to do it this way you should take a look at the code to see how it's currently done (probably they periodically regenerate the random number column).

Here's a solution that runs fairly quickly, and it gets a better random distribution without depending on id values being contiguous or starting at 1.
SET #r := (SELECT ROUND(RAND() * (SELECT COUNT(*) FROM mytable)));
SET #sql := CONCAT('SELECT * FROM mytable LIMIT ', #r, ', 1');
PREPARE stmt1 FROM #sql;
EXECUTE stmt1;

Maybe you could do something like:
SELECT * FROM table
WHERE id=
(FLOOR(RAND() *
(SELECT COUNT(*) FROM table)
)
);
This is assuming your ID numbers are all sequential with no gaps.

Add a column containing a calculated random value to each row, and use that in the ordering clause, limiting to one result upon selection. This works out faster than having the table scan that ORDER BY RANDOM() causes.
Update: You still need to calculate some random value prior to issuing the SELECT statement upon retrieval, of course, e.g.
SELECT * FROM `foo` WHERE `foo_rand` >= {some random value} LIMIT 1

There is another way to produce random rows using only a query and without order by rand().
It involves User Defined Variables.
See how to produce random rows from a table

In order to find random rows from a table, don’t use ORDER BY RAND() because it forces MySQL to do a full file sort and only then to retrieve the limit rows number required. In order to avoid this full file sort, use the RAND() function only at the where clause. It will stop as soon as it reaches to the required number of rows.
See
http://www.rndblog.com/how-to-select-random-rows-in-mysql/

if you don't delete row in this table, the most efficient way is:
(if you know the mininum id just skip it)
SELECT MIN(id) AS minId, MAX(id) AS maxId FROM table WHERE 1
$randId=mt_rand((int)$row['minId'], (int)$row['maxId']);
SELECT id,name,... FROM table WHERE id=$randId LIMIT 1

I see here a lot of solution. One or two seems ok but other solutions have some constraints. But the following solution will work for all situation
select a.* from random_data a, (select max(id)*rand() randid from random_data) b
where a.id >= b.randid limit 1;
Here, id, don't need to be sequential. It could be any primary key/unique/auto increment column. Please see the following Fastest way to select a random row from a big MySQL table
Thanks
Zillur
- www.techinfobest.com

For selecting multiple random rows from a given table (say 'words'), our team came up with this beauty:
SELECT * FROM
`words` AS r1 JOIN
(SELECT MAX(`WordID`) as wid_c FROM `words`) as tmp1
WHERE r1.WordID >= (SELECT (RAND() * tmp1.wid_c) AS id) LIMIT n

The classic "SELECT id FROM table ORDER BY RAND() LIMIT 1" is actually OK.
See the follow excerpt from the MySQL manual:
If you use LIMIT row_count with ORDER BY, MySQL ends the sorting as soon as it has found the first row_count rows of the sorted result, rather than sorting the entire result.

With a order yo will do a full scan table.
Its best if you do a select count(*) and later get a random row=rownum between 0 and the last registry

An easy but slow way would be (good for smallish tables)
SELECT * from TABLE order by RAND() LIMIT 1

In pseudo code:
sql "select id from table"
store result in list
n = random(size of list)
sql "select * from table where id=" + list[n]
This assumes that id is a unique (primary) key.

Take a look at this link by Jan Kneschke or this SO answer as they both discuss the same question. The SO answer goes over various options also and has some good suggestions depending on your needs. Jan goes over all the various options and the performance characteristics of each. He ends up with the following for the most optimized method by which to do this within a MySQL select:
SELECT name
FROM random AS r1 JOIN
(SELECT (RAND() *
(SELECT MAX(id)
FROM random)) AS id)
AS r2
WHERE r1.id >= r2.id
ORDER BY r1.id ASC
LIMIT 1;
HTH,
-Dipin

I'm a bit new to SQL but how about generating a random number in PHP and using
SELECT * FROM the_table WHERE primary_key >= $randNr
this doesn't solve the problem with holes in the table.
But here's a twist on lassevks suggestion:
SELECT primary_key FROM the_table
Use mysql_num_rows() in PHP create a random number based on the above result:
SELECT * FROM the_table WHERE primary_key = rand_number
On a side note just how slow is SELECT * FROM the_table:
Creating a random number based on mysql_num_rows() and then moving the data pointer to that point mysql_data_seek(). Just how slow will this be on large tables with say a million rows?

I ran into the problem where my IDs were not sequential. What I came up with this.
SELECT * FROM products WHERE RAND()<=(5/(SELECT COUNT(*) FROM products)) LIMIT 1
The rows returned are approximately 5, but I limit it to 1.
If you want to add another WHERE clause it becomes a bit more interesting. Say you want to search for products on discount.
SELECT * FROM products WHERE RAND()<=(100/(SELECT COUNT(*) FROM pt_products)) AND discount<.2 LIMIT 1
What you have to do is make sure you are returning enough result which is why I have it set to 100. Having a WHERE discount<.2 clause in the subquery was 10x slower, so it's better to return more results and limit.

Use the below query to get the random row
SELECT user_firstname ,
COUNT(DISTINCT usr_fk_id) cnt
FROM userdetails
GROUP BY usr_fk_id
ORDER BY cnt ASC
LIMIT 1

In my case my table has an id as primary key, auto-increment with no gaps, so I can use COUNT(*) or MAX(id) to get the number of rows.
I made this script to test the fastest operation:
logTime();
query("SELECT COUNT(id) FROM tbl");
logTime();
query("SELECT MAX(id) FROM tbl");
logTime();
query("SELECT id FROM tbl ORDER BY id DESC LIMIT 1");
logTime();
The results are:
Count: 36.8418693542479 ms
Max: 0.241041183472 ms
Order: 0.216960906982 ms
Answer with the order method:
SELECT FLOOR(RAND() * (
SELECT id FROM tbl ORDER BY id DESC LIMIT 1
)) n FROM tbl LIMIT 1
...
SELECT * FROM tbl WHERE id = $result;

I have used this and the job was done
the reference from here
SELECT * FROM myTable WHERE RAND()<(SELECT ((30/COUNT(*))*10) FROM myTable) ORDER BY RAND() LIMIT 30;

Create a Function to do this most likely the best answer and most fastest answer here!
Pros - Works even with Gaps and extremely fast.
<?
$sqlConnect = mysqli_connect('localhost','username','password','database');
function rando($data,$find,$max = '0'){
global $sqlConnect; // Set as mysqli connection variable, fetches variable outside of function set as GLOBAL
if($data == 's1'){
$query = mysqli_query($sqlConnect, "SELECT * FROM `yourtable` ORDER BY `id` DESC LIMIT {$find},1");
$fetched_data = mysqli_fetch_assoc($query);
if(mysqli_num_rows($fetched_data>0){
return $fetch_$data;
}else{
rando('','',$max); // Start Over the results returned nothing
}
}else{
if($max != '0'){
$irand = rand(0,$max);
rando('s1',$irand,$max); // Start rando with new random ID to fetch
}else{
$query = mysqli_query($sqlConnect, "SELECT `id` FROM `yourtable` ORDER BY `id` DESC LIMIT 0,1");
$fetched_data = mysqli_fetch_assoc($query);
$max = $fetched_data['id'];
$irand = rand(1,$max);
rando('s1',$irand,$max); // Runs rando against the random ID we have selected if data exist will return
}
}
}
$your_data = rando(); // Returns listing data for a random entry as a ASSOC ARRAY
?>
Please keep in mind this code as not been tested but is a working concept to return random entries even with gaps.. As long as the gaps are not huge enough to cause a load time issue.

Quick and dirty method:
SET #COUNTER=SELECT COUNT(*) FROM your_table;
SELECT PrimaryKey
FROM your_table
LIMIT 1 OFFSET (RAND() * #COUNTER);
The complexity of the first query is O(1) for MyISAM tables.
The second query accompanies a table full scan. Complexity = O(n)
Dirty and quick method:
Keep a separate table for this purpose only. You should also insert the same rows to this table whenever inserting to the original table. Assumption: No DELETEs.
CREATE TABLE Aux(
MyPK INT AUTO_INCREMENT,
PrimaryKey INT
);
SET #MaxPK = (SELECT MAX(MyPK) FROM Aux);
SET #RandPK = CAST(RANDOM() * #MaxPK, INT)
SET #PrimaryKey = (SELECT PrimaryKey FROM Aux WHERE MyPK = #RandPK);
If DELETEs are allowed,
SET #delta = CAST(#RandPK/10, INT);
SET #PrimaryKey = (SELECT PrimaryKey
FROM Aux
WHERE MyPK BETWEEN #RandPK - #delta AND #RandPK + #delta
LIMIT 1);
The overall complexity is O(1).

SELECT DISTINCT * FROM yourTable WHERE 4 = 4 LIMIT 1;

quick selection of a random row from a large table in mysql

What is a fast way to select a random row from a large mysql table?
I'm working in php, but I'm interested in any solution even if it's in another language.

Grab all the id's, pick a random one from it, and retrieve the full row.
If you know the id's are sequential without holes, you can just grab the max and calculate a random id.
If there are holes here and there but mostly sequential values, and you don't care about a slightly skewed randomness, grab the max value, calculate an id, and select the first row with an id equal to or above the one you calculated. The reason for the skewing is that id's following such holes will have a higher chance of being picked than ones that follow another id.
If you order by random, you're going to have a terrible table-scan on your hands, and the word quick doesn't apply to such a solution.
Don't do that, nor should you order by a GUID, it has the same problem.

I knew there had to be a way to do it in a single query in a fast way. And here it is:
A fast way without involvement of external code, kudos to
http://jan.kneschke.de/projects/mysql/order-by-rand/
SELECT name
FROM random AS r1 JOIN
(SELECT (RAND() *
(SELECT MAX(id)
FROM random)) AS id)
AS r2
WHERE r1.id >= r2.id
ORDER BY r1.id ASC
LIMIT 1;

MediaWiki uses an interesting trick (for Wikipedia's Special:Random feature): the table with the articles has an extra column with a random number (generated when the article is created). To get a random article, generate a random number and get the article with the next larger or smaller (don't recall which) value in the random number column. With an index, this can be very fast. (And MediaWiki is written in PHP and developed for MySQL.)
This approach can cause a problem if the resulting numbers are badly distributed; IIRC, this has been fixed on MediaWiki, so if you decide to do it this way you should take a look at the code to see how it's currently done (probably they periodically regenerate the random number column).

Here's a solution that runs fairly quickly, and it gets a better random distribution without depending on id values being contiguous or starting at 1.
SET #r := (SELECT ROUND(RAND() * (SELECT COUNT(*) FROM mytable)));
SET #sql := CONCAT('SELECT * FROM mytable LIMIT ', #r, ', 1');
PREPARE stmt1 FROM #sql;
EXECUTE stmt1;

Maybe you could do something like:
SELECT * FROM table
WHERE id=
(FLOOR(RAND() *
(SELECT COUNT(*) FROM table)
)
);
This is assuming your ID numbers are all sequential with no gaps.

Add a column containing a calculated random value to each row, and use that in the ordering clause, limiting to one result upon selection. This works out faster than having the table scan that ORDER BY RANDOM() causes.
Update: You still need to calculate some random value prior to issuing the SELECT statement upon retrieval, of course, e.g.
SELECT * FROM `foo` WHERE `foo_rand` >= {some random value} LIMIT 1

There is another way to produce random rows using only a query and without order by rand().
It involves User Defined Variables.
See how to produce random rows from a table

In order to find random rows from a table, don’t use ORDER BY RAND() because it forces MySQL to do a full file sort and only then to retrieve the limit rows number required. In order to avoid this full file sort, use the RAND() function only at the where clause. It will stop as soon as it reaches to the required number of rows.
See
http://www.rndblog.com/how-to-select-random-rows-in-mysql/

if you don't delete row in this table, the most efficient way is:
(if you know the mininum id just skip it)
SELECT MIN(id) AS minId, MAX(id) AS maxId FROM table WHERE 1
$randId=mt_rand((int)$row['minId'], (int)$row['maxId']);
SELECT id,name,... FROM table WHERE id=$randId LIMIT 1

I see here a lot of solution. One or two seems ok but other solutions have some constraints. But the following solution will work for all situation
select a.* from random_data a, (select max(id)*rand() randid from random_data) b
where a.id >= b.randid limit 1;
Here, id, don't need to be sequential. It could be any primary key/unique/auto increment column. Please see the following Fastest way to select a random row from a big MySQL table
Thanks
Zillur
- www.techinfobest.com

For selecting multiple random rows from a given table (say 'words'), our team came up with this beauty:
SELECT * FROM
`words` AS r1 JOIN
(SELECT MAX(`WordID`) as wid_c FROM `words`) as tmp1
WHERE r1.WordID >= (SELECT (RAND() * tmp1.wid_c) AS id) LIMIT n

The classic "SELECT id FROM table ORDER BY RAND() LIMIT 1" is actually OK.
See the follow excerpt from the MySQL manual:
If you use LIMIT row_count with ORDER BY, MySQL ends the sorting as soon as it has found the first row_count rows of the sorted result, rather than sorting the entire result.

With a order yo will do a full scan table.
Its best if you do a select count(*) and later get a random row=rownum between 0 and the last registry

An easy but slow way would be (good for smallish tables)
SELECT * from TABLE order by RAND() LIMIT 1

In pseudo code:
sql "select id from table"
store result in list
n = random(size of list)
sql "select * from table where id=" + list[n]
This assumes that id is a unique (primary) key.

Take a look at this link by Jan Kneschke or this SO answer as they both discuss the same question. The SO answer goes over various options also and has some good suggestions depending on your needs. Jan goes over all the various options and the performance characteristics of each. He ends up with the following for the most optimized method by which to do this within a MySQL select:
SELECT name
FROM random AS r1 JOIN
(SELECT (RAND() *
(SELECT MAX(id)
FROM random)) AS id)
AS r2
WHERE r1.id >= r2.id
ORDER BY r1.id ASC
LIMIT 1;
HTH,
-Dipin

I'm a bit new to SQL but how about generating a random number in PHP and using
SELECT * FROM the_table WHERE primary_key >= $randNr
this doesn't solve the problem with holes in the table.
But here's a twist on lassevks suggestion:
SELECT primary_key FROM the_table
Use mysql_num_rows() in PHP create a random number based on the above result:
SELECT * FROM the_table WHERE primary_key = rand_number
On a side note just how slow is SELECT * FROM the_table:
Creating a random number based on mysql_num_rows() and then moving the data pointer to that point mysql_data_seek(). Just how slow will this be on large tables with say a million rows?

I ran into the problem where my IDs were not sequential. What I came up with this.
SELECT * FROM products WHERE RAND()<=(5/(SELECT COUNT(*) FROM products)) LIMIT 1
The rows returned are approximately 5, but I limit it to 1.
If you want to add another WHERE clause it becomes a bit more interesting. Say you want to search for products on discount.
SELECT * FROM products WHERE RAND()<=(100/(SELECT COUNT(*) FROM pt_products)) AND discount<.2 LIMIT 1
What you have to do is make sure you are returning enough result which is why I have it set to 100. Having a WHERE discount<.2 clause in the subquery was 10x slower, so it's better to return more results and limit.

Use the below query to get the random row
SELECT user_firstname ,
COUNT(DISTINCT usr_fk_id) cnt
FROM userdetails
GROUP BY usr_fk_id
ORDER BY cnt ASC
LIMIT 1

In my case my table has an id as primary key, auto-increment with no gaps, so I can use COUNT(*) or MAX(id) to get the number of rows.
I made this script to test the fastest operation:
logTime();
query("SELECT COUNT(id) FROM tbl");
logTime();
query("SELECT MAX(id) FROM tbl");
logTime();
query("SELECT id FROM tbl ORDER BY id DESC LIMIT 1");
logTime();
The results are:
Count: 36.8418693542479 ms
Max: 0.241041183472 ms
Order: 0.216960906982 ms
Answer with the order method:
SELECT FLOOR(RAND() * (
SELECT id FROM tbl ORDER BY id DESC LIMIT 1
)) n FROM tbl LIMIT 1
...
SELECT * FROM tbl WHERE id = $result;

I have used this and the job was done
the reference from here
SELECT * FROM myTable WHERE RAND()<(SELECT ((30/COUNT(*))*10) FROM myTable) ORDER BY RAND() LIMIT 30;

Create a Function to do this most likely the best answer and most fastest answer here!
Pros - Works even with Gaps and extremely fast.
<?
$sqlConnect = mysqli_connect('localhost','username','password','database');
function rando($data,$find,$max = '0'){
global $sqlConnect; // Set as mysqli connection variable, fetches variable outside of function set as GLOBAL
if($data == 's1'){
$query = mysqli_query($sqlConnect, "SELECT * FROM `yourtable` ORDER BY `id` DESC LIMIT {$find},1");
$fetched_data = mysqli_fetch_assoc($query);
if(mysqli_num_rows($fetched_data>0){
return $fetch_$data;
}else{
rando('','',$max); // Start Over the results returned nothing
}
}else{
if($max != '0'){
$irand = rand(0,$max);
rando('s1',$irand,$max); // Start rando with new random ID to fetch
}else{
$query = mysqli_query($sqlConnect, "SELECT `id` FROM `yourtable` ORDER BY `id` DESC LIMIT 0,1");
$fetched_data = mysqli_fetch_assoc($query);
$max = $fetched_data['id'];
$irand = rand(1,$max);
rando('s1',$irand,$max); // Runs rando against the random ID we have selected if data exist will return
}
}
}
$your_data = rando(); // Returns listing data for a random entry as a ASSOC ARRAY
?>
Please keep in mind this code as not been tested but is a working concept to return random entries even with gaps.. As long as the gaps are not huge enough to cause a load time issue.

Quick and dirty method:
SET #COUNTER=SELECT COUNT(*) FROM your_table;
SELECT PrimaryKey
FROM your_table
LIMIT 1 OFFSET (RAND() * #COUNTER);
The complexity of the first query is O(1) for MyISAM tables.
The second query accompanies a table full scan. Complexity = O(n)
Dirty and quick method:
Keep a separate table for this purpose only. You should also insert the same rows to this table whenever inserting to the original table. Assumption: No DELETEs.
CREATE TABLE Aux(
MyPK INT AUTO_INCREMENT,
PrimaryKey INT
);
SET #MaxPK = (SELECT MAX(MyPK) FROM Aux);
SET #RandPK = CAST(RANDOM() * #MaxPK, INT)
SET #PrimaryKey = (SELECT PrimaryKey FROM Aux WHERE MyPK = #RandPK);
If DELETEs are allowed,
SET #delta = CAST(#RandPK/10, INT);
SET #PrimaryKey = (SELECT PrimaryKey
FROM Aux
WHERE MyPK BETWEEN #RandPK - #delta AND #RandPK + #delta
LIMIT 1);
The overall complexity is O(1).

SELECT DISTINCT * FROM yourTable WHERE 4 = 4 LIMIT 1;

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Select nth percentile from MySQL - mysql

There's also this solution, which uses a monster string made by GROUP_CONCAT. I had to up the max on the output like so to get it to work: SET SESSION group_concat_max_len = 1000000; MySql wizards out there: feel free to comment on the relative performance of the methods.

Related

SQL subquery with reference to parent table gives "Unknown column 'test_results.id' in 'where clause'"

MySQL limit work around

merging SQL statements and how can it affect processing time

What is the best way to pick a random row from a table in MySQL? [duplicate]

quick selection of a random row from a large table in mysql

Categories

Resources