Is there a way to be less random than ORDER BY RAND to increase speed? - mysql

I run this query 5 times, 5 seconds apart, on a table of 500,000 rows:
SELECT * FROM `apps` WHERE dev_name = '' ORDER BY RAND() LIMIT 10;
I'd like to get 50 rows that have a 90-95% chance of being unique. The query takes 10 seconds right now. I'd rather get it lower and have a smaller chance of being random.

Try
AND RAND() >= 0.90
(or 0.95 if you like) in your WHERE clause.

You may want to study Fetching Random Rows from a Table, as highlighted in here in a related question. Another great article can be found here.

Related

Why to extract the last portion (LIMIT and OFFSET) of rows from MYSQL is much slower than to extract all other portions?

I have to extract quite big chunk of data from database (MYSQL MariaDB) and hereby I decided to take it by small chunks (limit = 1000 rows ). In total I need to extract ~9228 rows (it sounds not much but fetch time reach 100 - 120 seconds when I try to get all 9228 lines at once with one query).
When I fetch first 8 data chunks (1000 rows each), everything is good ~0.4 seconds per query. But when I try to extract the last 228 lines everything goes really slow - 80 seconds if I use LIMIT 1000 OFFSET 9000 or 50 seconds when I use the exact number of rows for LIMIT LIMIT 228 OFFSET 9000. But the query which is used to get total number of lines takes 30 seconds, so the two queries in total 80 seconds again.
My sql query to get data looks the following:
SELECT events.eventid, functions.triggerid FROM events
INNER JOIN functions ON events.objectid = functions.triggerid
WHERE
events.name LIKE 'DISCONNECT MSK-AP%'
OR events.name LIKE 'AP MSK-AP%' # '%MSK-AP%' is much slower than OR
AND events.value = 1
AND events.clock >= '1588280400'
AND events.clock <= '1590958799'
GROUP BY events.eventid
ORDER BY events.eventid DESC
LIMIT 1000 OFFSET 0; # SO OFFSET COULD BE 0, 1000, 2000, ... 8000, 9000
My sql query to get total number of lines (it is slow 30 seconds!) is as follows:
SELECT COUNT(distinct(events.eventid)) FROM events
INNER JOIN functions ON events.objectid = functions.triggerid
WHERE
events.name LIKE 'DISCONNECT MSK-AP%'
OR events.name LIKE 'AP MSK-AP%'
AND events.value = 1
AND events.clock >= '1588280400'
AND events.clock <= '1590958799';
My Database version:
protocol_version 10
slave_type_conversions
version 5.5.60-MariaDB
version_comment MariaDB Server
version_compile_machine x86_64
version_compile_os Linux
Why the last query to get the last chunk is so slow in comparison with other and what can I do to solve the issue? Can temporary database table help in the case?
Why I am not sure that the answer to question fits my case:
Why does MYSQL higher LIMIT offset slow the query down?
Because the problems does not correlate with OFFSET SIZE, e.g. :
LIMIT 100 OFFSET 9100; - 0.25 seconds BUT
LIMIT 100 OFFSET 9200; - 114 seconds!
So the problem appears when offset + limit is close or larger than total lines number (9228) !
OFFSET sucks performance.
A better way is to "remember where you left off".
Discussion: http://mysql.rjweb.org/doc.php/pagination
Why slower than reading all?
When using OFFSET, the query first counts off the number of rows given by the OFFSET, then delivers the number of rows given by the LIMIT. So, it gets slower and slower. The final offset takes about the same amount of time as reading the entire table.

MYSQL SELECT random on large table ORDER BY SCORE [duplicate]

This question already has answers here:
Optimizing my mysql statement! - RAND() TOO SLOW
(6 answers)
Closed 8 years ago.
I have a large mysql table with about 25000 rows. There are 5 table fields: ID, NAME, SCORE, AGE,SEX
I need to select random 5 MALES order BY SCORE DESC
For instance, if there are 100 men that score 60 each and another 100 that score 45 each, the script should return random 5 from the first 200 men from the list of 25000
ORDER BY RAND()
is very slow
The real issue is that the 5 men should be a random selection within the first 200 records. Thanks for the help
so to get something like this I would use a subquery.. that way you are only putting the RAND() on the outer query which will be much less taxing.
From what I understood from your question you want 200 males from the table with the highest score... so that would be something like this:
SELECT *
FROM table_name
WHERE age = 'male'
ORDER BY score DESC
LIMIT 200
now to randomize 5 results it would be something like this.
SELECT id, score, name, age, sex
FROM
( SELECT *
FROM table_name
WHERE age = 'male'
ORDER BY score DESC
LIMIT 200
) t -- could also be written `AS t` or anything else you would call it
ORDER BY RAND()
LIMIT 5
I dont think that sorting by random can be "optimised out" in any way as sorting is N*log(N) operation. Sorting is avoided by query analyzer by using indexes.
The ORDER BY RAND() operation actually re-queries each row of your table, assigns a random number ID and then delivers the results. This takes a large amount of processing time for table of more than 500 rows. And since your table is containing approx 25000 rows then it will definitely take a good amount of time.

Sum Of mysql table record

I have a table named tbl_Question and a column named INT_MARK which has different marks for different questions. Like this:
VH_QUESTION INT_MARK
----------- --------
Q1 2
Q2 4
My question is: How to get a random set of 20 questions whose total sum of marks is 50?
select VH_QUESTION, sum(INT_MARK) from tbl_Question
group by VH_QUESTION
having sum(INT_MARK) > 50
order by rand() limit 1
I think this question may help you - seems a very similar problem.
If that don't work, I'd try to divide the problem in two: first, you make a combinatory of your questions. Then, you filter them by it's sum of points.
I couldn't find, however, how to produce all combinations of the table. I don't know how difficult that would be.
select VH_QUESTION, sum(INT_MARK) from tbl_Question
group by VH_QUESTION
having sum(INT_MARK) >= 50
order by rand() limit 20
Quick answer
SELECT * ,SUM(INT_MARK) as total_mark FROM tbl_Question
GROUP BY VH_QUESTION
HAVING total_mark="50"
ORDER BY RAND()
LIMIT 5
it returns 0 line when no answers are possible but each time it finds one the questionsare random.
You could check the benchmark to see if you can have a faster query for large tables.

Why it takes more time for MySQL to get records starting with higher number? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How can I speed up a MySQL query with a large offset in the LIMIT clause?
In our application we are showing records from MySQL on a web page. Like in most such applications we use paging. So the query looks like this:
select * from sms_message
where account_group_name = 'scott'
and received_on > '2012-10-11' and
received_on < '2012-11-30'
order by received_on desc
limit 200 offset 3000000;
This query takes 104 seconds. If I only change offset to low one or remove it completely, it's only half a second. Why is that?
There is only one compound index, and it's account_group_name, received_on and two other columns. Table is InnoDB.
UPDATE:
Explain returns:
1 SIMPLE sms_message ref all_filters_index all_filters_index 258 const 2190030 Using where
all_filters_index is 4-columns filter mentioned above.
Yes this is true, time increases as offset value increases and the reason is because offset works on the physical position of rows in the table which is not indexed. So to find a row at offset x, the database engine must iterate through all the rows from 0 to x.

How to randomly select multiple rows satisfying certain conditions from a MySQL table?

I'm looking for an efficient way of randomly selecting 100 rows satisfying certain conditions from a MySQL table with potentially millions of rows.
Almost everything I've found suggests avoiding the use of ORDER BY RAND(), because of poor performance and scalability.
However, this article suggests ORDER BY RAND() may still be used as a "nice and fast way" to fetch randow data.
Based on this article, below is some example code showing what I'm trying to accomplish. My questions are:
Is this an efficient way of randomly selecting 100 (or up to several hundred) rows from a table with potentially millions of rows?
When will performance become an issue?
SELECT user.*
FROM (
SELECT id
FROM user
WHERE is_active = 1
AND deleted = 0
AND expiretime > '.time().'
AND id NOT IN (10, 13, 15)
AND id NOT IN (20, 30, 50)
AND id NOT IN (103, 140, 250)
ORDER BY RAND()
LIMIT 100
)
AS random_users
STRAIGHT JOIN user
ON user.id = random_users.id
Is strongly urge you to read this article. The last segment will be covering the selection of multiple random row. And you should be able to notice the SELECT statement in the PROCEDURE that will be described there. That would be the spot where you add your specific WHERE conditions.
The problem with ORDER BY RAND() is that this operation has complexity of n*log2(n), while the method described in the article that I linked, has almost constant complexity.
Lets assume, that selecting random row from table, which contains 10 entries, using ORDER BY RAND() takes 1 time unit:
entries | time units
-------------------------
10 | 1 /* if this takes 0.001s */
100 | 20
1'000 | 300
10'000 | 4'000
100'000 | 50'000
1'000'000 | 600'000 /* then this will need 10 minutes */
And you wrote that you are dealing with table on scale of millions.
I'm afraid no-one's going to be able to answer your question with any accuracy. If you really want to know you'll need to run some benchmarks against your system (not the live one ideally but an exact copy). Benchmark this solution against a different solution (getting the random rows using PHP for example) and compare the numbers to what you/your client consider "good performance). Then ramp up your data trying to keep the distribution of column values as close to real as you can and see where performance starts to drop off. To be honest if it works for you now with a bit of headroom, then I'd go for it. When (if!) it becomes a bottleneck then you can look at it again - or just chuck extra iron at your database...
Preprocess as much as possible
try something like (VB-like example)
Dim sRND = New StringBuilder : Dim iRandom As New Random()
Dim iMaxID As Integer = **put you maxId here**
Dim Cnt as Integer=0
While Cnt < 100
Dim RndVal As Integer = iRandom.Next(1, iMaxID)
If Not ("10,13,15,20,30,50,103,140,250").Contains(RndVal) Then
Cnt += 1
sRND.Append("," & RndVal)
end if
End While
String.Format("SELECT * FROM (Select ID FROM(User) WHERE(is_active = 1) AND deleted = 0 AND expiretime > {0} AND id IN ({1}) .blahblablah.... LIMIT 100",time(), Mid(sRND.ToString, 2))
I didn't check for syntax but you'll get my drift I hope.
This will make MySql read records that fit the 'IN' and stop when it reaches 100 without the need to preprocess all records first.
Please let me know the elapsedtime difference if you try it. (I'm qurious)