How to convert dependent subquery to join for better performance? - mysql

I have a database that stores "themes" and every theme is associated with a whole bunch of images (=screenshots of these themes). Now I want to display the latest 10 themes and for every theme I only want to get one single image from the database (the one with the lowest ID).
Currently my query looks like this (I am using a subquery):
SELECT DISTINCT
t.theme_id, t.theme_name, theme_date_last_modification, image_id, image_type
FROM
themes t, theme_images i
WHERE
i.theme_id = t.theme_id
AND t.theme_status = 3
AND t.theme_date_added < now( )
AND i.image_id = (
SELECT MIN( image_id )
FROM theme_images ii
WHERE ii.theme_id = t.theme_id
)
GROUP BY
t.theme_id
ORDER BY
t.theme_date_last_modification DESC
LIMIT 10
It works, but the query is very slow. When I use EXPLAIN I can see that there's a "dependent subquery". Is it possible to convert this dependent subquery into some kind of join that can be processed faster by mysql?
P.S.: My actual query is much more complex and makes use of more tables. I have already tried to simplify it as much as possible so that you can concentrate on the actual reason for the performance-problems.
EDIT:
This is the output of EXPLAIN:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY t index PRIMARY,themes themes 212 NULL 5846 Using where; Using index; Using temporary; Using filesort
1 PRIMARY i eq_ref PRIMARY,theme_id,image_id PRIMARY 4 func 1 Using where
2 DEPENDENT SUBQUERY ii ref theme_id theme_id 4 themes.t.theme_id 6

Try this query firstly -
SELECT
t.*, ti1.*
FROM
themes t
JOIN theme_images ti1
ON ti1.theme_id = t.theme_id
JOIN (SELECT theme_id, MIN(image_id) image_id FROM theme_images GROUP BY theme_id) ti2
ON ti1.theme_id = ti2.theme_id AND ti1.image_id = ti2.image_id
ORDER BY
t.theme_date_last_modification DESC
LIMIT 10
One more solution -
SELECT
t.*, ti.*
FROM
themes t
JOIN (SELECT * FROM theme_images ORDER BY image_id) ti
ON ti.theme_id = t.theme_id
GROUP BY
theme_id
ORDER BY
t.theme_date_last_modification DESC
LIMIT
10
Then add your WHERE filter.

One approach is to first LIMIT on the themes table, then JOIN to images:
SELECT
t.theme_id, t.theme_name, t.theme_date_last_modification,
ti.image_id, ti.image_type
FROM
( SELECT theme_id, theme_name, theme_date_last_modification
FROM themes t
WHERE theme_status = 3
AND theme_date_added < now( )
ORDER BY
theme_date_last_modification DESC
LIMIT 10
) AS t
JOIN -- LEFT JOIN if you want themes without an image
theme_images AS ti -- to be shown
ON ti.theme_id = t.theme_id
AND ti.image_id =
( SELECT ii.image_id
FROM theme_images AS ii
WHERE ii.theme_id = t.theme_id
ORDER BY ii.image_id
LIMIT 1
)
ORDER BY
t.theme_date_last_modification DESC ;
With an index on themes (theme_status, theme_date_last_modification, theme_id, theme_date_added) the limit subquery should be efficient.
I suppose you also have a (unique) index on theme_images (theme_id, image_id).

Related

Find employees latest activity is slow when adding ORDER BY

I am working on a legacy system in Laravel and I am trying to pull the latest action of some specific types of actions an employee has done.
Performance is good when I don't add ORDER BY. When adding it the query will go from something like 130 ms to 18 seconds. There are about 1.5 million rows in the actions table.
How do I fix the performance problem?
I have tried to isolate the problem by cutting out all the other parts of the query so it is more readable for you:
SELECT
employees.id,
(
SELECT DATE_FORMAT(actions.date, '%Y-%m-%d')
FROM pivot
JOIN actions
ON pivot.actions_id = actions.id
WHERE employees.id = pivot.employee_id
AND (actions.type = 'meeting'
OR (actions.type = 'phone_call'
AND JSON_VALID(actions.data) = 1
AND actions.data->>'$.update_status' = 1))
LIMIT 1
) AS latest_action
FROM employees
ORDER BY latest_action DESC
I tried using LEFT JOIN and MAX() instead but it didn't seem to solve my problem.
I just added a subquery because it was the original query is already very complex. But if you have an alternative suggestion I am all ears.
UPDATE
Result of EXPLAIN:
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 PRIMARY employees NULL ALL NULL NULL NULL NULL 15217 10 Using where
2 DEPENDENT SUBQUERY pivot NULL ref actions_type_index,pivot_type_index pivot_type_index 4 dev.employees.id 104 11.11 Using index condition
2 DEPENDENT SUBQUERY actions NULL eq_ref PRIMARY,Logs PRIMARY 4 dev.pivot.actions_id 1 6.68 Using where
UPDATE 2
Here is the indexes. The index employee_type I don't think is important for my specific query, but maybe it should be re-worked?
# pivot table
KEY `actions_type_index` (`actions_id`,`employee_type`),
KEY `pivot_type_index` (`employee_id`,`employee_type`)
# actions table
KEY `Logs` (`type`,`id`,`is_log`)
# I tried to add `date` index to `actions` table but the problem remains.
KEY `date_index` (`date`)
First of all your query is very non-optimal.
I would rewrite it this way:
SELECT
e.id,
DATE_FORMAT(vMAX(a.date), '%Y-%m-%d') AS latest_action
FROM employees e
LEFT JOIN pivot p ON p.employee_id = e.id
LEFT JOIN actions a ON p.actions_id = a.id AND (a.type = 'meeting'
OR (a.type = 'phone_call'
AND JSON_VALID(a.data) = 1
AND a.data->>'$.update_status' = 1))
GROUP BY e.id
ORDER BY latest_action DESC
Obviously there must be indexes on p.employee_id, p.actions_id, a.date. Also would be good on a.type.
Also it would be good to replace a.data->>'$.update_status' with some simple field with an index on it.

Can't figure out why this MySQL query is slow

I have one particular MySQL query which is slow, and I can't figure out why.
SELECT
s.title,
p.minPrice,
s.booking, r.url
FROM shows s
INNER JOIN showResources r
ON r.showID = s.id
INNER JOIN performances p
ON p.showID = s.id
WHERE s.lastDate >= CURDATE()
AND r.type = 'rectangle-poster'
AND p.minPrice > 0
GROUP BY s.id
ORDER BY p.minPrice ASC
LIMIT 30
The EXPLAIN for this query is as follows:
select_type table type possible_keys key key_len ref rows extra
1 SIMPLE s range PRIMARY,lastDate lastDate 4 NULL 291 Using index condition; Using temporary; Using filesort
1 SIMPLE r ref showID,type showID 5 thistle.s.id 1 Using where
1 SIMPLE p ref showID,minPrice showID 5 thistle.s.id 1 Using where
Other, seemingly far more complex queries on the same server are blisteringly fast - but this one typically takes about 4 seconds to run, and I just can't figure out why. I've even gone as far as deleting the tables and recreating them just in case it was some weird corruption, but no luck. Can a MySQL expert tell me what I'm doing wrong here?
Try this:
SELECT
s.id AS id,
s.title,
p.minPrice AS min_price,
s.booking,
r.url
FROM shows s
INNER JOIN showResources r
ON r.showID = s.id AND s.lastDate >= CURDATE() AND r.type = 'rectangle-poster'
INNER JOIN performances p
ON p.showID = s.id AND p.minPrice > 0
GROUP BY id
ORDER BY min_price ASC
LIMIT 30

order by makes query slow

I have two tables :
video (ID, TITLE, ..., UPLOADED_DATE)
join_video_category (ID (not used), ID_VIDEO_ ID_CATEGORY)
rows in video : 4 500 000 |
rows in join_video_category : 5 800 000
1 video can have many category.
I have a query works perfectly, 20 ms max to get result :
SELECT * FROM video WHERE ID IN
(SELECT ID_VIDEO FROM join_video_category WHERE ID_CATEGORY=11)
LIMIT 1000;
This query take 1000 video, the order is not important.
BUT, when i would like to get 10 latest video from a category, my query take arround 30-40 seconds :
SELECT * FROM video WHERE ID IN
(SELECT ID_VIDEO FROM join_video_category WHERE ID_CATEGORY=11)
ORDER BY UPLOADED_DATE DESC LIMIT 10;
I have index on ID_CATEGORY, ID_VIDEO, UPLOADED_DATE, PRIMARY ON ID video and join_video_category.
I have tested it with JOIN on my query, it's the same result.
First, the comparisons are to two very different queries. The first returns a bunch of videos whenever it encounters them. The second has to read all the videos and then sort them.
Try rewriting this as a JOIN:
SELECT v.*
FROM video v JOIN
join_video_category vc
ON v.id = bc.id_video
WHERE vc.ID_CATEGORY = 11
ORDER BY v.UPLOADED_DATE DESC
LIMIT 10;
That may or may not help. You have a lot of data and so you might have a lot of videos for a given category. If so, a where clause that gets more recent data might really help:
SELECT v.*
FROM video v JOIN
join_video_category vc
ON v.id = bc.id_video
WHERE vc.ID_CATEGORY = 11 AND v.UPLOADED_DATE >= '2015-01-01'
ORDER BY v.UPLOADED_DATE DESC
LIMIT 10;
Finally, if that doesn't work, consider adding something like UPLOADED_DATE into join_video_category. Then, this query should blaze:
select vc.video_id
from join_vdeo_category vc
where vc.ID_CATEGORY = 11
order by vc.UPLOADED_DATE desc
limit 10;
with an index on join_video_category(id_category, uploaded_date, video_id).
solution #1:
replacing "in" with "exists" would improve the performance, please try the below query.
SELECT * FROM video WHERE exists
(SELECT * FROM join_video_category WHERE ID_CATEGORY=11 AND join_video_category.ID_VIDEO = video.ID)
ORDER BY UPLOADED_DATE DESC LIMIT 10;
solution #2:
1) create tem_table
CREATE TABLE TEMP_TABLE AS SELECT * FROM join_video_category WHERE ID_CATEGORY=11;
2) use the temp table in solution #1
SELECT * FROM video WHERE exists
(SELECT * FROM temp_table WHERE temp_table.ID_VIDEO = video.ID)
ORDER BY UPLOADED_DATE DESC LIMIT 10;
Good Luck!!
If it is 1:Many, don't use an extra table between Video and Category. However, your row counts imply that it is Many:Many.
If it is 1:Many, simply have the category_id in the Video table, then simplify all the queries.
If it is Many:Many, then be sure to use this pattern for the junction table:
CREATE TABLE map_video_category (
video_id ...,
category_id ...,
PRIMARY KEY(video_id, category_id), -- both ids, one direction
INDEX (category_id, video_id) -- both ids, the other direction
) ENGINE=InnoDB; -- significantly better than MyISAM on INDEX handling here
The ID that you mentioned is a waste. The composite keys are optimal for all situations, and will improve performance in most situations.
Do not use IN ( SELECT ... ); the optimizer does a poor job of optimizing it. Change to a JOIN, LEFT JOIN, EXISTS, or some other construct.

Poor Performance from MySQL JOIN - How to Make Improvements?

A bit of a generic question title but I have the following query:
SELECT t.from_number, COUNT(*) AS calls
FROM t
WHERE t.organisation_id = 999
AND t.direction = 'inbound'
AND t.start_time BETWEEN '2014-03-26' AND NOW()
AND t.from_number != ''
GROUP BY t.from_number
ORDER BY calls DESC LIMIT 20
and it executes in 488ms.
However, aswell as retrieving the data from that table I need to lookup who the number belongs to.
SELECT t.from_number, COUNT(*) AS calls
FROM t
LEFT JOIN n on CONCAT('44', n.number) = t.from_number
WHERE t.organisation_id = 999
AND t.direction = 'inbound'
AND t.start_time BETWEEN '2014-03-26' AND NOW()
AND t.from_number != ''
GROUP BY t.from_number
ORDER BY calls DESC LIMIT 20
As soon as I add the JOIN the query execution time jumps up to anything from 8 - 12 seconds and that's only to find the organisation that the number belongs to, I'd need yet another join after that to retrieve the organisation name from the organisations table.
The cardinality of t and n are > 2,000,000 and ~ 63,000 respectively, and, as you can guess from above, the numbers are stored slightly differently in each:
t stores numbers as 123456789 since the country code (44) is stored in a separate column but n stores numbers as 44123456789 which is why I need to use the CONCAT but I didn't think this would affect performance since it's not in the WHERE clause.
As far as I can tell, I have indexed the important columns in each table.
Are there any suggestions on how I can improve the performance of queries when it comes to these tables?
Update
EXPLAIN output added
id, select_type, table, possible_keys, key, key_len, ref, rows, Extra
1 SIMPLE t index_merge organisation_id,start_time,direction,from_number organisation_id,direction 4,13 NULL 4174 Using intersect(organisation_id,direction); Using where; Using temporary; Using filesort
1 SIMPLE n index NULL number 768 NULL 62759 Using index
The problem is on the JOIN clause:
LEFT JOIN n on CONCAT('44', n.number) = t.from_number
It is joining the tables using the result of the function CONCAT('44', n.number).
Some databases (as Oracle), can create an index based on a funcion, but others (as MySQL) cannot. So, it cannot use any index on table n to make the join.
A solution would be to create a new column on n with the result of the used function and to index it.
You could use a code similar to:
ALTER TABLE n ADD COLUMN extended_number varchar(128) null;
UPDATE n
SET extended_number = CONCAT('44', number);
CREATE INDEX ext_numb_idx
ON n.extended_number;
After this, modify the JOIN clause of the query:
SELECT t.from_number, COUNT(*) AS calls
FROM t
LEFT JOIN n on n.extended_number = t.from_number
WHERE t.organisation_id = 999
AND t.direction = 'inbound'
AND t.start_time BETWEEN '2014-03-26' AND NOW()
AND t.from_number != ''
GROUP BY t.from_number
ORDER BY calls DESC LIMIT 20
Then MySQL will use the newly created index and will execute the query much faster.

How to make JOINS faster?

I had this query to start out with:
SELECT DISTINCT spentits.*
FROM `spentits`
WHERE (spentits.user_id IN
(SELECT following_id
FROM `follows`
WHERE `follows`.`follower_id` = '44'
AND `follows`.`accepted` = 1)
OR spentits.user_id = '44')
ORDER BY id DESC
LIMIT 15 OFFSET 0
This query takes 10ms to execute.
But once I add a simple join in:
SELECT DISTINCT spentits.*
FROM `spentits`
LEFT JOIN wishlist_items ON wishlist_items.user_id = 44 AND wishlist_items.spentit_id = spentits.id
WHERE (spentits.user_id IN
(SELECT following_id
FROM `follows`
WHERE `follows`.`follower_id` = '44'
AND `follows`.`accepted` = 1)
OR spentits.user_id = '44')
ORDER BY id DESC
LIMIT 15 OFFSET 0
This execute time increased by 11x. Now it takes around 120ms to execute. What's interesting is that if I remove either the LEFT JOIN clause or the ORDER BY id DESC , the time goes back to 10ms.
I am new to databases so I don't understand this. Why is it that removing either one of these clauses speeds it up 11x ? And how can I keep it as is but make it faster?
I have indexes on spentits.user_id, follows.follower_id, follows.accepted, and on primary ids of each table.
EXPLAIN:
1 PRIMARY spentits index index_spentits_on_user_id PRIMARY 4 NULL 15 Using where; Using temporary
1 PRIMARY wishlist_items ref index_wishlist_items_on_user_id,index_wishlist_items_on_spentit_id index_wishlist_items_on_spentit_id 5 spentit.spentits.id 1 Using where; Distinct
2 SUBQUERY follows index_merge index_follows_on_follower_id,index_follows_on_following_id,index_follows_on_accepted
index_follows_on_follower_id,index_follows_on_accepted 5,2 NULL 566 Using intersect(index_follows_on_follower_id,index_follows_on_accepted); Using where
You should have index also on:
wishlist_items.spentit_id
Because you are joining over that column
The LEFT JOIN is easy to explain: A cross product of all entries against all other entries is made. The conditions of the join (in your case: Take all entries on the left and find fitting ones on the right) are applied afterwards. So if your spentits table is large it will take the server some time. Would suggest you get rid of your subquery and make three joins. Start with the smallest table to avoid big amounts of data.
In the 2nd example the subselect runs for every spentits.user_id.
If you write is like this it will be faster because the subselect runs once:
SELECT DISTINCT spentits.*
FROM `spentits`, (SELECT following_id
FROM `follows`
WHERE `follows`.`follower_id` = '44'
AND `follows`.`accepted` = 1)
OR spentits.user_id = '44') as `follow`
LEFT JOIN wishlist_items ON wishlist_items.user_id = 44 AND wishlist_items.spentit_id = spentits.id
WHERE (spentits.user_id IN
(follow)
ORDER BY id DESC
LIMIT 15 OFFSET 0
As you can see the subselect moved to the FROM-part of the query and creates a imaginary tabel (or view).
This imaginary tabel is a inline-view.
JOINs and inline-views are faster every time than a subselect in the WHERE-part.