I want to improve my current query. So I have this table called Incomes. Where I have a sourceId varchar field. I have a single SELECT for the fields I need, but I needed to add an extra field called isFirstTime to represent if it was the first time on the row on what that sourceId was used. This is my current query:
SELECT DISTINCT
`income`.*,
CASE WHEN (
SELECT
`income2`.id
FROM
`income` as `income2`
WHERE
`income2`."sourceId" = `income`."sourceId"
ORDER BY
`income2`.created asc
LIMIT 1
) = `income`.id THEN true ELSE false END
as isFirstIncome
FROM
`income` as `income`
WHERE `income`.incomeType IN ('passive', 'active') AND `income`.status = 'paid'
ORDER BY `income`.created desc
LIMIT 50
The query works but slows down if I keep increasing the LIMIT or OFFSET. Any suggestions?
UPDATE 1:
Added WHERE statements used on the original query
UPDATE 2:
MYSQL version 5.7.22
You can achieve it using Ordered Analytical Function.
You can use ROW_NUMBER or RANK to get the desired result.
Below query will give the desired output.
SELECT *,
CASE
WHEN Row_number()
OVER(
PARTITION BY sourceid
ORDER BY created ASC) = 1 THEN true
ELSE false
END AS isFirstIncome
FROM income
WHERE incomeType IN ('passive', 'active') AND status = 'paid'
ORDER BY created desc
DB Fiddle: See the result here
My first thought is that isFirstIncome should be an extra column in the table. It should be populated as the data is inserted.
If you don't like that, let's try to optimize the query...
Let's avoid doing the subquery more than 50 times. This requires turning the query inside-out. (It's like "explode-implode", where the query gathers lots of stuff, then sorts it and throws most of the rows away.)
To summarize:
do the least amount of effort to just identify the 5 rows.
JOIN to whatever tables are needed (including itself if appropriate); this is to get any other columns desired (including isFirstIncome).
SELECT i3.*,
( ... using i3 ... ) as isFirstIncome
FROM (
SELECT i1.id, i1.sourceId
FROM `income` AS i1
WHERE i1.incomeType IN ('passive', 'active')
AND i1.status = 'paid'
ORDER BY i1.created DESC
LIMIT 50
) AS i2
JOIN income AS i3 USING(id)
ORDER BY i2.created DESC -- yes, repeated
(I left out the computation of isFirstIncome; it is discussed in other Answers. But note that it will be executed at most 50 times.)
(The aliases -- i1, i2, i3 -- are numbered in the order they will be "used"; this is to assist in following the SQL.)
To assist in performance, add
INDEX(status, incomeType, created, id, sourceId)
It should help with my formulation, but probably not for the other versions. Your version would benefit from
INDEX(sourceId, created, id)
This query (along with a few others I think have a related issue) did not take 30 seconds when MySQL was local on the same EC2 instance as the rest of the website. More like milliseconds.
Does anything look off?
SELECT *, chv_images.image_id FROM chv_images
LEFT JOIN chv_storages ON chv_images.image_storage_id =
chv_storages.storage_id
LEFT JOIN chv_users ON chv_images.image_user_id = chv_users.user_id
LEFT JOIN chv_albums ON chv_images.image_album_id = chv_albums.album_id
LEFT JOIN chv_categories ON chv_images.image_category_id =
chv_categories.category_id
LEFT JOIN chv_meta ON chv_images.image_id = chv_meta.image_id
LEFT JOIN chv_likes ON chv_likes.like_content_type = "image" AND
chv_likes.like_content_id = chv_images.image_id AND chv_likes.like_user_id = 1
LEFT JOIN chv_follows ON chv_follows.follow_followed_user_id =
chv_images.image_user_id
LEFT JOIN chv_follows_projects ON
chv_follows_projects.follows_project_project_id =
chv_images.image_project_id LEFT JOIN chv_projects ON
chv_projects.project_id = follows_project_project_id WHERE
chv_follows.follow_user_id='1' OR (follows_project_user_id = 1 AND
chv_projects.project_privacy = "public" AND
chv_projects.project_is_public_upload = 1) GROUP BY chv_images.image_id
ORDER BY chv_images.image_id DESC
LIMIT 0,15
And this is what EXPLAIN shows:
Thank you
Update: This query has the same issue. It does not have a GROUP BY.
SELECT *, chv_images.image_id FROM chv_images
LEFT JOIN chv_storages ON chv_images.image_storage_id =
chv_storages.storage_id
LEFT JOIN chv_users ON chv_images.image_user_id = chv_users.user_id
LEFT JOIN chv_albums ON chv_images.image_album_id = chv_albums.album_id
LEFT JOIN chv_categories ON chv_images.image_category_id =
chv_categories.category_id
LEFT JOIN chv_meta ON chv_images.image_id = chv_meta.image_id
LEFT JOIN chv_likes ON chv_likes.like_content_type = "image" AND
chv_likes.like_content_id = chv_images.image_id AND chv_likes.like_user_id = 1
ORDER BY chv_images.image_id DESC
LIMIT 0,15
That EXPLAIN shows several table-scans (type: ALL), so it's not surprising that it takes over 30 seconds.
Here's your EXPLAIN:
Notice the column rows shows an estimated 14420 rows read from the first table chv_images. It's doing a table-scan of all the rows.
In general, when you do a series of JOINs, you can multiple together all the values in the rows column of the EXPLAIN, and the final result is how many row-reads MySQL has to do. In this case it's 14420 * 2 * 1 * 1 * 2 * 1 * 916, or 52,834,880 row-reads. That should put into perspective the high cost of doing several table-scans in the same query.
You might help avoid those table-scans by creating some indexes on these tables:
ALTER TABLE chv_storages
ADD INDEX (storage_id);
ALTER TABLE chv_categories
ADD INDEX (category_id);
ALTER TABLE chv_likes
ADD INDEX (like_content_id, like_content_type, like_user_id);
Try creating those indexes and then run the EXPLAIN again.
The other tables are already doing lookups by primary key (type: eq_ref) or by secondary key (type: ref) so those are already optimized.
Your EXPLAIN shows your query uses a temporary table and filesort. You should reconsider whether you need the GROUP BY, because that's probably causing the extra work.
Another tip is to avoid using SELECT * because it might be forcing the query to read many extra columns that you don't need. Instead, explicitly name only the columns you need.
Is there any indexes in chv_images?
I propose:
CREATE INDEX idx_image_id ON chv_images (image_id);
(Bill's ideas are good. I'll take the discussion a different way...)
Explode-Implode -- If the LEFT JOINs match no more than 1 row, change, for example,
SELECT
...
LEFT JOIN chv_meta ON chv_images.image_id = chv_meta.image_id
into
SELECT ...,
( SELECT foo FROM chv_meta WHERE image_id = chv_images.image_id ) AS foo, ...
If that can be done for all the JOINs, you can get rid of GROUP BY. This will avoid the costly "explode-implode" where JOINs lead to more rows, then GROUP BY gets rid of the dups. (I suspect you can't move all the joins in.)
OR -> UNION -- OR is hard to optimize. Your query looks like a good candidate for turning into UNION, then making more indexes that will become useful.
WHERE chv_follows.follow_user_id='1'
OR (follows_project_user_id = 1
AND chv_projects.project_privacy = "public"
AND chv_projects.project_is_public_upload = 1
)
Assuming that follows_project_user_id is in `chv_images,
( SELECT ...
WHERE chv_follows.follow_user_id='1' )
UNION DISTINCT -- or ALL, if you are sure there won't be dups
( SELECT ...
WHERE follows_project_user_id = 1
AND chv_projects.project_privacy = "public"
AND chv_projects.project_is_public_upload = 1 )
Indexes needed:
chv_follows: (follow_user_id)
chv_projects: (project_privacy, project_is_public_upload) -- either order
But this has not yet handled the ORDER BY and LIMIT. The general pattern for such:
( SELECT ... ORDER BY ... LIMIT 15 )
UNION
( SELECT ... ORDER BY ... LIMIT 15 )
ORDER BY ... LIMIT 15
Yes, the ORDER BY and LIMIT are repeated.
That works for page 1. If you want the next 15 rows, see http://mysql.rjweb.org/doc.php/pagination#pagination_and_union
After building those two sub-selects, look at them; I think you will be able to optimize each one, and may need new indexes because the Optimizer will start with a different 'first' table.
Let's assume I have the following tables:
items table
item_id|view_count
item_views table
view_id|item_id|ip_address|last_view
What I would like to do is:
If last view of item with given item_id by given ip_address was 1+ hour ago I would like to increment view_count of item in items table. And as a result get the view count of item. How I will do it normally:
q = SELECT count(*) FROM item_views WHERE item_id='item_id' AND ip_address='some_ip' AND last_view < current_time-60*60
if(q==1) then q = UPDATE items SET view_count = view_count+1 WHERE item_id='item_id'
//and finally get view_count of item
q = SELECT view_count FROM items WHERE item_id='item_id'
Here I used 3 SQL queries. How can I merge it into one SQL query? And how can it affect the processing time? Will it be faster or slower than previous method?
I don't think your logic is correct for what you describe that you want. The query:
SELECT count(*)
FROM item_views
WHERE item_id='item_id' AND
ip_address='some_ip' AND
last_view < current_time-60*60
is counting the number of views longer ago than your time frame. I think you want:
last_view > current_time-60*60
and then have if q = 0 on the next line.
MySQL is pretty good with the performance of not exists, so the following should work well:
update items
set view_count = view_count+1
WHERE item_id='item_id' and
not exists (select 1
from item_views
where item_id='item_id' AND
ip_address='some_ip' AND
last_view > current_time-60*60
)
It will work much better with an index on item_views(item_id, ip_address, last_view) and an index on item(item_id).
In MySQL scripting, you could then write:
. . .
set view_count = (#q := view_count+1)
. . .
This would also give you the variable you are looking for.
update target
set target.view_count = target.view_count + 1
from items target
inner join (
select item_id
from item_views
where item_id = 'item_id'
and ip_address = 'some_ip'
and last_view < current_time - 60*60
) ref
on ref.item_id = target.item_id;
You can only combine the update statement with the condition using a join as in the above example; but you'll still need a separate select statement.
It may be slower on very large set and/or unindexed table.
I know there are simmilar questions out there, but here`s my implementation on a fast random select row:
SELECT i.id, i.thumb_img, i.af, i.width, i.height
FROM images_detail id
JOIN images AS i ON id.imageid = i.id
WHERE id.imageid >=1
AND id.newsroom =1
AND i.width > i.height
AND id.imageid >= FLOOR( 1 + RAND( ) *23111593 )
LIMIT 1
The problem with this query is that indifferent of the RANDOM expression in id.imageid >= FLOOR( 1 + RAND( ) *23111593 ) it always returns the same ID, why?
Any help please?
Later edit:
The query takes 0.0005 seconds, using EXPLAIN, it reports back USING WHERE and 12993 ROWS returned
The Id's are auto-incremented, it's not 23111593 because RAND() returns 0.xxxxx so RAND()*23111593, returns about 12993 rows. The problem is, the same ID is at the top and I don`t want to call an ORDER BY clause.
http://www.greggdev.com/web/articles.php?id=6
or use
ORDER BY RAND() LIMIT 0,1;
Do you have any "imageid" in your database that is greater than the span you define (1<=X<=23111593)?
You might have only one entry in the DB.
(also, keep in mind that you might only have one that actually fits the query of id.newsroom = 1, i.width>i.height etc.)
Eventually I've loaded ALL the results in an array, stored it in a cache and made in PHP $key=rand(0,sizeof(array)) and then used it as follows array[$key]['property']
What is a fast way to select a random row from a large mysql table?
I'm working in php, but I'm interested in any solution even if it's in another language.
Grab all the id's, pick a random one from it, and retrieve the full row.
If you know the id's are sequential without holes, you can just grab the max and calculate a random id.
If there are holes here and there but mostly sequential values, and you don't care about a slightly skewed randomness, grab the max value, calculate an id, and select the first row with an id equal to or above the one you calculated. The reason for the skewing is that id's following such holes will have a higher chance of being picked than ones that follow another id.
If you order by random, you're going to have a terrible table-scan on your hands, and the word quick doesn't apply to such a solution.
Don't do that, nor should you order by a GUID, it has the same problem.
I knew there had to be a way to do it in a single query in a fast way. And here it is:
A fast way without involvement of external code, kudos to
http://jan.kneschke.de/projects/mysql/order-by-rand/
SELECT name
FROM random AS r1 JOIN
(SELECT (RAND() *
(SELECT MAX(id)
FROM random)) AS id)
AS r2
WHERE r1.id >= r2.id
ORDER BY r1.id ASC
LIMIT 1;
MediaWiki uses an interesting trick (for Wikipedia's Special:Random feature): the table with the articles has an extra column with a random number (generated when the article is created). To get a random article, generate a random number and get the article with the next larger or smaller (don't recall which) value in the random number column. With an index, this can be very fast. (And MediaWiki is written in PHP and developed for MySQL.)
This approach can cause a problem if the resulting numbers are badly distributed; IIRC, this has been fixed on MediaWiki, so if you decide to do it this way you should take a look at the code to see how it's currently done (probably they periodically regenerate the random number column).
Here's a solution that runs fairly quickly, and it gets a better random distribution without depending on id values being contiguous or starting at 1.
SET #r := (SELECT ROUND(RAND() * (SELECT COUNT(*) FROM mytable)));
SET #sql := CONCAT('SELECT * FROM mytable LIMIT ', #r, ', 1');
PREPARE stmt1 FROM #sql;
EXECUTE stmt1;
Maybe you could do something like:
SELECT * FROM table
WHERE id=
(FLOOR(RAND() *
(SELECT COUNT(*) FROM table)
)
);
This is assuming your ID numbers are all sequential with no gaps.
Add a column containing a calculated random value to each row, and use that in the ordering clause, limiting to one result upon selection. This works out faster than having the table scan that ORDER BY RANDOM() causes.
Update: You still need to calculate some random value prior to issuing the SELECT statement upon retrieval, of course, e.g.
SELECT * FROM `foo` WHERE `foo_rand` >= {some random value} LIMIT 1
There is another way to produce random rows using only a query and without order by rand().
It involves User Defined Variables.
See how to produce random rows from a table
In order to find random rows from a table, don’t use ORDER BY RAND() because it forces MySQL to do a full file sort and only then to retrieve the limit rows number required. In order to avoid this full file sort, use the RAND() function only at the where clause. It will stop as soon as it reaches to the required number of rows.
See
http://www.rndblog.com/how-to-select-random-rows-in-mysql/
if you don't delete row in this table, the most efficient way is:
(if you know the mininum id just skip it)
SELECT MIN(id) AS minId, MAX(id) AS maxId FROM table WHERE 1
$randId=mt_rand((int)$row['minId'], (int)$row['maxId']);
SELECT id,name,... FROM table WHERE id=$randId LIMIT 1
I see here a lot of solution. One or two seems ok but other solutions have some constraints. But the following solution will work for all situation
select a.* from random_data a, (select max(id)*rand() randid from random_data) b
where a.id >= b.randid limit 1;
Here, id, don't need to be sequential. It could be any primary key/unique/auto increment column. Please see the following Fastest way to select a random row from a big MySQL table
Thanks
Zillur
- www.techinfobest.com
For selecting multiple random rows from a given table (say 'words'), our team came up with this beauty:
SELECT * FROM
`words` AS r1 JOIN
(SELECT MAX(`WordID`) as wid_c FROM `words`) as tmp1
WHERE r1.WordID >= (SELECT (RAND() * tmp1.wid_c) AS id) LIMIT n
The classic "SELECT id FROM table ORDER BY RAND() LIMIT 1" is actually OK.
See the follow excerpt from the MySQL manual:
If you use LIMIT row_count with ORDER BY, MySQL ends the sorting as soon as it has found the first row_count rows of the sorted result, rather than sorting the entire result.
With a order yo will do a full scan table.
Its best if you do a select count(*) and later get a random row=rownum between 0 and the last registry
An easy but slow way would be (good for smallish tables)
SELECT * from TABLE order by RAND() LIMIT 1
In pseudo code:
sql "select id from table"
store result in list
n = random(size of list)
sql "select * from table where id=" + list[n]
This assumes that id is a unique (primary) key.
Take a look at this link by Jan Kneschke or this SO answer as they both discuss the same question. The SO answer goes over various options also and has some good suggestions depending on your needs. Jan goes over all the various options and the performance characteristics of each. He ends up with the following for the most optimized method by which to do this within a MySQL select:
SELECT name
FROM random AS r1 JOIN
(SELECT (RAND() *
(SELECT MAX(id)
FROM random)) AS id)
AS r2
WHERE r1.id >= r2.id
ORDER BY r1.id ASC
LIMIT 1;
HTH,
-Dipin
I'm a bit new to SQL but how about generating a random number in PHP and using
SELECT * FROM the_table WHERE primary_key >= $randNr
this doesn't solve the problem with holes in the table.
But here's a twist on lassevks suggestion:
SELECT primary_key FROM the_table
Use mysql_num_rows() in PHP create a random number based on the above result:
SELECT * FROM the_table WHERE primary_key = rand_number
On a side note just how slow is SELECT * FROM the_table:
Creating a random number based on mysql_num_rows() and then moving the data pointer to that point mysql_data_seek(). Just how slow will this be on large tables with say a million rows?
I ran into the problem where my IDs were not sequential. What I came up with this.
SELECT * FROM products WHERE RAND()<=(5/(SELECT COUNT(*) FROM products)) LIMIT 1
The rows returned are approximately 5, but I limit it to 1.
If you want to add another WHERE clause it becomes a bit more interesting. Say you want to search for products on discount.
SELECT * FROM products WHERE RAND()<=(100/(SELECT COUNT(*) FROM pt_products)) AND discount<.2 LIMIT 1
What you have to do is make sure you are returning enough result which is why I have it set to 100. Having a WHERE discount<.2 clause in the subquery was 10x slower, so it's better to return more results and limit.
Use the below query to get the random row
SELECT user_firstname ,
COUNT(DISTINCT usr_fk_id) cnt
FROM userdetails
GROUP BY usr_fk_id
ORDER BY cnt ASC
LIMIT 1
In my case my table has an id as primary key, auto-increment with no gaps, so I can use COUNT(*) or MAX(id) to get the number of rows.
I made this script to test the fastest operation:
logTime();
query("SELECT COUNT(id) FROM tbl");
logTime();
query("SELECT MAX(id) FROM tbl");
logTime();
query("SELECT id FROM tbl ORDER BY id DESC LIMIT 1");
logTime();
The results are:
Count: 36.8418693542479 ms
Max: 0.241041183472 ms
Order: 0.216960906982 ms
Answer with the order method:
SELECT FLOOR(RAND() * (
SELECT id FROM tbl ORDER BY id DESC LIMIT 1
)) n FROM tbl LIMIT 1
...
SELECT * FROM tbl WHERE id = $result;
I have used this and the job was done
the reference from here
SELECT * FROM myTable WHERE RAND()<(SELECT ((30/COUNT(*))*10) FROM myTable) ORDER BY RAND() LIMIT 30;
Create a Function to do this most likely the best answer and most fastest answer here!
Pros - Works even with Gaps and extremely fast.
<?
$sqlConnect = mysqli_connect('localhost','username','password','database');
function rando($data,$find,$max = '0'){
global $sqlConnect; // Set as mysqli connection variable, fetches variable outside of function set as GLOBAL
if($data == 's1'){
$query = mysqli_query($sqlConnect, "SELECT * FROM `yourtable` ORDER BY `id` DESC LIMIT {$find},1");
$fetched_data = mysqli_fetch_assoc($query);
if(mysqli_num_rows($fetched_data>0){
return $fetch_$data;
}else{
rando('','',$max); // Start Over the results returned nothing
}
}else{
if($max != '0'){
$irand = rand(0,$max);
rando('s1',$irand,$max); // Start rando with new random ID to fetch
}else{
$query = mysqli_query($sqlConnect, "SELECT `id` FROM `yourtable` ORDER BY `id` DESC LIMIT 0,1");
$fetched_data = mysqli_fetch_assoc($query);
$max = $fetched_data['id'];
$irand = rand(1,$max);
rando('s1',$irand,$max); // Runs rando against the random ID we have selected if data exist will return
}
}
}
$your_data = rando(); // Returns listing data for a random entry as a ASSOC ARRAY
?>
Please keep in mind this code as not been tested but is a working concept to return random entries even with gaps.. As long as the gaps are not huge enough to cause a load time issue.
Quick and dirty method:
SET #COUNTER=SELECT COUNT(*) FROM your_table;
SELECT PrimaryKey
FROM your_table
LIMIT 1 OFFSET (RAND() * #COUNTER);
The complexity of the first query is O(1) for MyISAM tables.
The second query accompanies a table full scan. Complexity = O(n)
Dirty and quick method:
Keep a separate table for this purpose only. You should also insert the same rows to this table whenever inserting to the original table. Assumption: No DELETEs.
CREATE TABLE Aux(
MyPK INT AUTO_INCREMENT,
PrimaryKey INT
);
SET #MaxPK = (SELECT MAX(MyPK) FROM Aux);
SET #RandPK = CAST(RANDOM() * #MaxPK, INT)
SET #PrimaryKey = (SELECT PrimaryKey FROM Aux WHERE MyPK = #RandPK);
If DELETEs are allowed,
SET #delta = CAST(#RandPK/10, INT);
SET #PrimaryKey = (SELECT PrimaryKey
FROM Aux
WHERE MyPK BETWEEN #RandPK - #delta AND #RandPK + #delta
LIMIT 1);
The overall complexity is O(1).
SELECT DISTINCT * FROM yourTable WHERE 4 = 4 LIMIT 1;