MYSQL SELECT random on large table ORDER BY SCORE [duplicate] - mysql

This question already has answers here:
Optimizing my mysql statement! - RAND() TOO SLOW
(6 answers)
Closed 8 years ago.
I have a large mysql table with about 25000 rows. There are 5 table fields: ID, NAME, SCORE, AGE,SEX
I need to select random 5 MALES order BY SCORE DESC
For instance, if there are 100 men that score 60 each and another 100 that score 45 each, the script should return random 5 from the first 200 men from the list of 25000
ORDER BY RAND()
is very slow
The real issue is that the 5 men should be a random selection within the first 200 records. Thanks for the help

so to get something like this I would use a subquery.. that way you are only putting the RAND() on the outer query which will be much less taxing.
From what I understood from your question you want 200 males from the table with the highest score... so that would be something like this:
SELECT *
FROM table_name
WHERE age = 'male'
ORDER BY score DESC
LIMIT 200
now to randomize 5 results it would be something like this.
SELECT id, score, name, age, sex
FROM
( SELECT *
FROM table_name
WHERE age = 'male'
ORDER BY score DESC
LIMIT 200
) t -- could also be written `AS t` or anything else you would call it
ORDER BY RAND()
LIMIT 5

I dont think that sorting by random can be "optimised out" in any way as sorting is N*log(N) operation. Sorting is avoided by query analyzer by using indexes.
The ORDER BY RAND() operation actually re-queries each row of your table, assigns a random number ID and then delivers the results. This takes a large amount of processing time for table of more than 500 rows. And since your table is containing approx 25000 rows then it will definitely take a good amount of time.

Related

SQL Query - selecting higher priority rows more often

I am trying to do SQL code in mysqli query to select rows with higher priority more often. I have a DB where all posts are sorted by priority, but I want it select like this (10 - the highest priority):
**Priority**
10
3
10
9
7
10
9
1
10
How can I do this? I have tried that to solve by more ways but no result. Thank you.
If you want to sample your data with preference to higher priorities, you could do something like this:
SELECT *
FROM (
SELECT OrderDetailID
,mod(OrderDetailID, 10) + 1 AS priority
,rand() * 10 AS rand_priority
FROM OrderDetails
) A
WHERE rand_priority < priority
ORDER BY OrderDetailID
This query runs in MySQL Tryit from W3Schools.
mod(OrderDetailID, 10) + 1 simulates a 1-10 priority - your table just has this value in it already
rand() * 10 gives you a random number between 0 and 10
Then by filtering to only ones where the random number is less than the priority, you get a result set where the higher priorities are more likely.
You may use rank function if your MySQL version supports it. It will order your data by priority in descending order and ranks each row. If the two rows have same priority then both rows will have same ranking. Then you can filter out the first rank data which will give you highest priority rows always.
Select * FROM
(
SELECT
col1,
col2,
priority,
RANK() OVER w AS 'rank'
FROM MyTable
WINDOW w AS (ORDER BY priority)
) MyQuery
Where rank = 1
Note : Syntax might be incorrect, please feel to edit the query.
This post might help you for ranking if your MySql version doesn't support Rank.

Mysql select records with offset

I'm looking for a mysql select that will allow me to select (LIMIT 8) records after some changing number of first few matches;
select id
from customers
where name LIKE "John%"
Limit 8
So if i have a table with 1000 of johns with various last names
I want to be able to select records 500-508
You can send the offset to the limit statement, like this:
SELECT id
FROM customers
WHERE name LIKE "John%"
LIMIT 8 OFFSET 500
Notice the OFFSET 500 on the limit. That sets the 'start point' past the first 500 entries (at entry #501).
Therefor, entries #501, #502, #503, #504, #505, #506, #507 and #508 will be selected.
This can also be written:
LIMIT 500, 8
Personally, I don't like that as much and don't understand the order.
Pedantic point: 500-508 is 9 entries, so I had to adjust.
As a solution please try executing the following sql query
select id from customers where name LIKE "John%" Limit 500,8

MySQL equally distributed random rows with WHERE clause

I have this table,
person_id int(10) pk
points int(6) index
other columns not very important
I have this random function which is very fast on a table with 10M rows:
SELECT person_id
FROM persons AS r1 JOIN
(SELECT (RAND() *
(SELECT MAX(person_id)
FROM persons)) AS id)
AS r2
WHERE r1.person_id >= r2.id
ORDER BY r1.person_id ASC
LIMIT 1
This is all great but now I wish to show only people with points > 0. Example table:
PERSON_ID POINTS
1 4
2 6
3 0
4 3
When I append AND points > 0 to the where clause, person_id 3 can't be selected, so a gap is created and when the random select person_id 3, person_id 4 will be selected. This gives person 4 a bigger chance to be chosen. Any one got suggestions how I can adjust the query to make it work with the where clause and give all rows same % of chance to be selected.
Info table: The table is uniform, no gaps in person_id's. About 90% will have 0 points. I want to make the query for where points = 0 and points > 0.
Before someone will say, use rand(): this is not solution for tables with more than a few 100k rows.
Bonus question: will it be possible to select x random rows in 1 query, so I do not have to call this query a few times when i want more random rows?
Important note: performance is key, with 10M+ rows the query may not take much longer than the current query, which takes 0.0005 seconds, I prefer to stay under 0.05 second.
Last note: If you think the query will never be this fast with above requirements, but another solution is possible (like fetching 100 rows and showing x random which has more than 0 points), please tell :)
Really appreciate your help and all help is welcome :)
You could generate in-line gap-free id's for the records that you really want to work with, and generate then the random selector using the total number of records available.
Try with this (props to the chosen answer here for the row_number generator):
SELECT r1.*
FROM
(SELECT person_id,
#curRow := #curRow + 1 AS row_number
FROM persons as p,
(SELECT #curRow := 0) r0
WHERE points>0) r1
, (SELECT COUNT(1) * RAND() id
FROM persons
WHERE points>0) r2
WHERE r1.person_id>=r2.id
ORDER BY r1.person_id ASC
LIMIT 1;
You can mess with it in this sqlfiddle.

Sum Of mysql table record

I have a table named tbl_Question and a column named INT_MARK which has different marks for different questions. Like this:
VH_QUESTION INT_MARK
----------- --------
Q1 2
Q2 4
My question is: How to get a random set of 20 questions whose total sum of marks is 50?
select VH_QUESTION, sum(INT_MARK) from tbl_Question
group by VH_QUESTION
having sum(INT_MARK) > 50
order by rand() limit 1
I think this question may help you - seems a very similar problem.
If that don't work, I'd try to divide the problem in two: first, you make a combinatory of your questions. Then, you filter them by it's sum of points.
I couldn't find, however, how to produce all combinations of the table. I don't know how difficult that would be.
select VH_QUESTION, sum(INT_MARK) from tbl_Question
group by VH_QUESTION
having sum(INT_MARK) >= 50
order by rand() limit 20
Quick answer
SELECT * ,SUM(INT_MARK) as total_mark FROM tbl_Question
GROUP BY VH_QUESTION
HAVING total_mark="50"
ORDER BY RAND()
LIMIT 5
it returns 0 line when no answers are possible but each time it finds one the questionsare random.
You could check the benchmark to see if you can have a faster query for large tables.

Select a Portion of Vast Data Over Time with MySQL

I have hundreds of thousands of price points spanning 40 years plus. I would like to construct a query that will only return 3000 total data points, with the last 500 being the most recent data points, and the other 2500 being just a sample of the rest of the data, evenly distributed.
Is it possible to do this in one query? How would I select just a sample of the large amount of data? This is a small example of what I mean for getting just a sample of the other 2500 data points:
1
2
3
4
5
6
7
8
9
10
And I want to return something like this:
1
5
10
Here's the query for the last 500:
SELECT * FROM price ORDER BY time_for DESC LIMIT 500
I'm not sure how to go about getting the sample data from the other data points.
Try this:
(SELECT * FROM price ORDER BY time_for DESC LIMIT 500)
UNION ALL
(SELECT * FROM price WHERE time_for < (SELECT time_for FROM price ORDER BY time_for LIMIT 500, 1) ORDER BY rand() LIMIT 2500)
ORDER BY time_for
Note: It's probably going to be slow. How big is this table?
It might be faster to only get the primary ID from all these rows, then join it to the original in a secondary query once it's narrowed down. This is because ORDER BY rand() LIMIT has to sort the entire table. If the table is large this can take a LONG time, and a lot of disk space. Retrieving only the ID reduces the necessary disk space.
The previous answer is good, but you did specify that you want the results to be evenly distributed so I'll add this possibility too. By iterating a counter over the rows you can use a MOD operator to sample an even distribution. I don't have a MYSQL install right now to test this so apologies if the syntax isn't 100% spot on. But it should be close enough and may give you some ideas.
( SELECT p1.*
FROM price p1
ORDER BY p1.time_for DESC
LIMIT 500 )
UNION ALL
( SELECT #i := #i + 1 AS row_num,
p2.*
FROM price p2,
(SELECT #i: = 0)
WHERE row_num > 500
AND (row_num % 500) = 0
ORDER BY time_for DESC )
The first query gives the 500 latest rows. The second query gives every 500th row after that, thus returning an even distribution from the rest of the data. Obviously you can tune this parameter to achieve the desired sample spacing. Or base it on the total number of rows in the table to calculate the necessary spacing to give exactly 2500 records.