MySQL, Using indexes with subquery - mysql

Still trying to learn my way around indexes, shouldn't the JOINs in the outer query be using the index on the Primary Key? Are indexes not working in combination with subqueries? Thanks!
SELECT SQL_BIG_RESULT
I.item_group_id
FROM
(
SELECT SQL_BIG_RESULT
MAX(ITM.id) as max_id
FROM a_movements M
JOIN a_items_to_movements ITM ON ITM.movement_id = M.id -- Index used
WHERE M.warehouse_id IN (...) -- Index used
GROUP BY ITM.item_id
ORDER BY NULL
) X
JOIN a_items_to_movements ITM ON ITM.id = X.max_id -- Index not used
JOIN a_movements M ON M.id = ITM.movement_id
AND M.direction = 0
AND M.settled IS NOT NULL
JOIN a_items I ON I.id = ITM.item_id -- Index not used
GROUP BY I.item_group_id
ORDER BY NULL
EDIT: attached EXPLAIN output here: https://imgur.com/PdO3mIo

Related

MySQL query SLOW Don't know how to optimize

MySQL query SLOW don’t know how to optimize
I think I m fine with hardware 60GB RAM 10 Cores SSD
Hi I m having a big issue with this query running slow on Mysql they query is below:
# Thread_id: 1165100 Schema: back-Alvo-11-07-19 QC_hit: No
# Query_time: 9.015205 Lock_time: 0.000188 Rows_sent: 1 Rows_examined: 2616880
# Rows_affected: 0
SET timestamp=1568549358;
SELECT count(*) as total_rows FROM(
(SELECT m.*
FROM phpfox_channel_video AS m
INNER JOIN phpfox_channel_category AS mc
ON(mc.category_id = mc.category_id)
INNER JOIN phpfox_channel_category_data AS mcd
ON(mcd.video_id = m.video_id)
WHERE m.in_process = 0 AND m.view_id = 0
AND m.module_id = 'videochannel'
AND m.item_id = 0 AND m.privacy IN(0)
AND mcd.category_id = 17
GROUP BY m.video_id
ORDER BY m.time_stamp DESC
LIMIT 12
)) AS m
JOIN phpfox_user AS u
ON(u.user_id = m.user_id);
This query is running very slow as you can see 9 seconds
When looking for online help to optimize queries always talk about adding indexes,
as you can see below for EXPLAIN statment I already have indexes
Do you guys have any Idea where I should look to improve speed os this query? I m not DB guy having hard time with this. This is a website and have 400,000 videos.
Thanks
The explain shows that you are not using an index on table phpfox_channel_video as m, and that it is using a temporary index on table phpfox_channel_category AS mc, which means it is not using an index, but is building an index first, which takes considerable time.
Also, the index for table phpfox_channel_category_data AS mcd could be better.
The indexes you need are:
CREATE INDEX idx_cat_data_video_id ON phpfox_channel_category_data
(category_id, video_id);
CREATE INDEX idx_channel_cat_id ON phpfox_channel_category (category_id);
CREATE INDEX idx_video_mult ON phpfox_channel_video
(in_process, view_id, module_id, item_id, privacy, video_id, time_stamp);
Don't fetch m.* if you are only going to do COUNT(*).
If phpfox_channel_category is a many-to-many mapping table, follow the tips in http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table
m needs INDEX(in_process, view_id, module_id, item_id, privacy) in any order.
Avoid the GROUP BY:
INNER JOIN phpfox_channel_category AS mc ON(mc.category_id = mc.category_id)
INNER JOIN phpfox_channel_category_data AS mcd ON(mcd.video_id = m.video_id)
AND mcd.category_id = 17
GROUP BY m.video_id
--> (something like)
AND EXISTS(
SELECT 1
FROM phpfox_channel_category AS mc
JOIN phpfox_channel_category_data AS mcd
ON mcd.video_id = mc.video_id
WHERE mcd.video_id = 17
AND mc.video_id = m.video_id
)
Let's make sure that we are optimizing the right query. I suggest we check this condition in the ON clause:
mc.category_id = mc.category_id
We know that's going to be TRUE for every row in mc with a non-NULL value of category_id. We could express that condition as:
mc.category_id IS NOT NULL
This means the join is almost a cross join; every row returned from m matched with every row from mc. That is, we could get an equivalent result writing:
FROM phpfox_channel_video m
JOIN phpfox_channel_category mc
ON mc.category_id IS NOT NULL
I suspect that's not actually the result we're after. I think we were meaning to match to m.category_id. But that's just a guess.
If video_id column is PRIMARY KEY or UNIQUE KEY on m, we can avoid the potentially expensive GROUP BY operation by avoiding the joins that create duplicated rows, by using EXISTS with correlated subqueries. If we can avoid generating an intermediate result with duplicate values of video_id, then we can avoid the need to do the GROUP BY.
Also, for the inline view query, rather than return all columns * we can return just the expressions that we need. In the outer query, the only column referenced is user_id.
So we could write something like this:
SELECT COUNT(*) AS total_rows
FROM (
SELECT m.user_id
FROM phpfox_channel_video m
WHERE EXISTS ( SELECT 1
FROM phpfox_channel_category mc
WHERE mc.category_id = m.category_id
-- mc.category_id = mc.category_id -- <original
)
AND EXISTS ( SELECT 1
FROM phpfox_channel_category_data mcd
WHERE mcd.video_id = m.video_id
AND mcd.category_id = 17
)
AND m.in_process = 0
AND m.view_id = 0
AND m.module_id = 'videochannel'
AND m.item_id = 0
AND m.privacy IN (0)
ORDER BY m.time_stamp DESC
LIMIT 12
) d
JOIN phpfox_user u
ON u.user_id = d.user_id
For tuning, optimal index for m will have leading columns that have equality predicates, followed by the time_stamp column so that we can avoid a "Using filesort" operation, the ORDER BY can be satisfied by returning rows in index order. It looks like the reason we need the rows ordered is for the LIMIT clause.
... ON phpfox_channel_video (in_process, view_id, item_id, module_id
, time_stamp, video_id, ... )
The other two tables, we want indexes with leading columns that have equality predicates
... ON phpfox_channel_category_data (video_id, category_id, ...)
... ON phpfox_channel_category ( category_id, ... )
NOTES:
(It's not entirely clear why we need an inline view, and we are delaying the join from the user_id reference. Then again, the point of the entire query isn't really obvious to me; I'm just providing a re-write, given the provided SQL, with the change to the condition category_id.)
The above assumed that category_id column exists in m, and that it's a one-to-many relationship.
But if that's not true... if the mcd table is actually junction table, resolving a many-to-many relationship between video and category, such that the join condition was meant to be
mcd.category_id = mc.category_id
^
Then we would want to replace the WHERE EXISTS and AND EXISTS in the query above, into a single correlated subquery. Something like this:
SELECT COUNT(*) AS total_rows
FROM (
SELECT m.user_id
FROM phpfox_channel_video m
WHERE EXISTS ( SELECT 1
FROM phpfox_channel_category mc
JOIN phpfox_channel_category_data mcd
ON mcd.category_id = mc.category_id
WHERE mcd.video_id = m.video_id
AND mcd.category_id = 17
)
AND m.in_process = 0
AND m.view_id = 0
AND m.module_id = 'videochannel'
AND m.item_id = 0
AND m.privacy IN (0)
ORDER BY m.time_stamp DESC
LIMIT 12
) d
JOIN phpfox_user u
ON u.user_id = d.user_id

MySQL Join makes query slow - can't figure out why

I am attaching results of two Explain statements for an old query and the newer version of that query.
Do you see anything that does not make sense or looks wrong? The query became slow(4.5 seconds) after I added tm, tsa and tcd tables.
Before those three tables were added to the query it was extremely fast (0.001 seconds). Here is what the explain looked like
tm table has four columns (tm_id (PK), owner_id, manager_id, status), tcd has three columns (tm_id, cd_id, created_date). tm_id and cd_id make a composite primary key and there is another index on cd_id. Same is the case with tsa with three columns (tm_id, smpa_id, created_date) with tm_id and smpa_id being a composite primary key and smpa_id has another index.
What could be the reason for such slowness?
old query:
SELECT upcm_id, COUNT( * )
FROM user_post_content_master AS upcm
JOIN content_deck AS cd ON cd.cd_id = upcm.cd_id
JOIN social_media_post_account AS smpa ON smpa.smpa_id = upcm.smpa_id
JOIN post_content_master AS pcm ON pcm.pcm_id = upcm.pcm_id
WHERE smpa.user_id =2196
AND upcm.upcm_post_date >=1545891957
AND upcm.upcm_status =1
AND upcm.upcm_post_date >=1546560000
AND upcm.upcm_post_date <=1546732799
GROUP BY upcm.upcm_id
ORDER BY upcm.upcm_post_date ASC
New Query:
SELECT upcm_id, COUNT( * )
FROM user_post_content_master AS upcm
JOIN content_deck AS cd ON cd.cd_id = upcm.cd_id
JOIN social_media_post_account AS smpa ON smpa.smpa_id = upcm.smpa_id
JOIN post_content_master AS pcm ON pcm.pcm_id = upcm.pcm_id
JOIN team_content_deck AS tcd ON ( tcd.cd_id = upcm.cd_id )
JOIN team_social_account AS tsa ON tsa.smpa_id = upcm.smpa_id
JOIN team_members AS tm ON tm.team_member_id = tsa.team_member_id
AND tm.team_member_id = tcd.team_member_id
AND tm.owner_id =2196
AND tm.manager_id =2196
AND tm.status =1
WHERE smpa.user_id =2196
AND upcm.upcm_post_date >=1545891957
AND upcm.upcm_status =1
AND upcm.upcm_post_date >=1546560000
AND upcm.upcm_post_date <=1546732799
GROUP BY upcm.upcm_id
ORDER BY upcm.upcm_post_date ASC
If I remove the conditions from the tm table, it is fast again. Nothing changed in the joins though.
EXPLAIN SELECT upcm_id, COUNT( * )
FROM user_post_content_master AS upcm
JOIN content_deck AS cd ON cd.cd_id = upcm.cd_id
JOIN social_media_post_account AS smpa ON smpa.smpa_id = upcm.smpa_id
JOIN post_content_master AS pcm ON pcm.pcm_id = upcm.pcm_id
JOIN team_content_deck AS tcd ON ( tcd.cd_id = upcm.cd_id )
JOIN team_social_account AS tsa ON tsa.smpa_id = upcm.smpa_id
JOIN team_members AS tm ON tm.team_member_id = tsa.team_member_id
AND tm.team_member_id = tcd.team_member_id
WHERE smpa.user_id =2196
AND upcm.upcm_post_date >=1545891957
AND upcm.upcm_status =1
AND upcm.upcm_post_date >=1546560000
AND upcm.upcm_post_date <=1546732799
GROUP BY upcm.upcm_id
ORDER BY upcm.upcm_post_date ASC
I see the difference is most likely because the key selected for upcm, old query selected upcm_post_date and new query selected cd_id.
Since the data is not enough, from the name, it seems that cd_id has a much lower cardinality comparing with upcm_post_date.
Update (Extracted from my comments below):
One possible reason is because of the sequence of tables mysql decided for the query, content_deck comes before user_post_content_master. Because mysql uses nested-loop algorithm for JOIN, user_post_content_master is in an inner loop for the join.
You have a constant lookup when tm.owner_id is present, which leads MySQL optimizer to decide it win over a range scan.
In the book High Performance MySQL, there is one chapter discussing the query optimization. There is one technique called: join decomposition, i.e., to separate one big join query to small one. One extra benefit is that you can cache some common data.
I am not sure whether Index Hint can help in this case (just hint or force MySQL to use post_data for upcm): SELECT * FROM user_post_content_master USE INDEX (upcm_post_date)

How to join latest record for each foreign key without Inner select using group by and then on clause?

I have two tables r_instance(id as primary key,name,user_id,..etc) and r_response(id,comment,r_instance_id as Foreign key).
Each r_instance row have multiple r_response rows(say min of 3).
I want to get latest id and comment while joining r_response with r_instance.
But without using GROUP BY and then on clause on r_response as it is degrading query performance.So When query performance is considered using EXPLAIN the type column should not have ALL value.
My query is :
SELECT ri.id, ri.name, rr.id, rr.comment
FROM r_instance ri
JOIN (SELECT MAX(id) maxResponseId, r_instance_id instanceId
from r_response
GROUP BY r_instance_id) lastRes ON lastRes.instanceId = ri.id
JOIN r_response rr ON rr.id = lastRes.maxResponseId
You could use window function called row_number() MySQL 8.0+
SELECT * FROM
(
SELECT *,
ROW_NUMBER() OVER(PARTITION BY r_instance_id ORDER BY id DESC) Sq
from r_response
) a
INNER JOIN r_instance R ON R.id = a.r_instance_id
WHERE a.Sq = 1
Here is another method:
SELECT ri.id, ri.name, rr.id, rr.comment
FROM r_instance ri JOIN
JOIN r_response rr
ON ri.id = rr.r_instance_id
WHERE rr.id = (SELECT MAX(rr2.id)
FROM r_response rr2
WHERE rr2.r_instance_id = rr.r_instance_id
);
For performance, you want an index on r_response(r_instance_id, id).
I should note that this does not always give the best performance. It is another strategy for expressing the same logic, resulting in a different execution plan. It might result in better performance.

MySQL Query Times out - Need to speed it up

I whipped up a query here that does something particular with retrieving results that do not match the join (as suggested by this SO question).
SELECT cf.f_id
FROM comments_following AS cf
INNER JOIN comments AS c ON cf.c_id = c.id
WHERE NOT EXISTS (
SELECT 1 FROM follows WHERE f_id = cf.f_id
)
Any ideas on how to speed this up? There are anywhere from 30k-200k rows it's looking through and appears to be using indexes, but the query times out.
EXPLAIN/DESCRIBE Info:
1 PRIMARY c ALL PRIMARY NULL NULL NULL 39119
1 PRIMARY cf ref c_id, c_id_2 c_id 8 ...c.id 11 Using where; Using index
2 DEPENDENT SUBQUERY following index NULL PRIMARY 8 NULL 35612 Using where; Using index
The comments table isn't used explicitly in the query. Is it being used for filtering? If not, try:
SELECT cf.f_id
FROM comments_following cf
WHERE NOT EXISTS (
SELECT 1 FROM follows WHERE follows.f_id = cf.f_id
)
By the way, if this generates a syntax error (because follows.f_id does not exist), then that is the problem. In that case, you would think you have a correlated subquery, but there is not really one.
Or the left outer join version:
SELECT cf.f_id
FROM comments_following cf left outer join
follows f
on f.f_id = cf.f_id
where f.f_id is null
Having an index on follows(f_id) should make both these versions run faster.
LEFT JOIN sometimes is faster then WHERE NOT EXISTS subquerys, try:
SELECT cf.f_id
FROM comments_following AS cf
INNER JOIN comments AS c ON cf.c_id = c.id
LEFT JOIN follows AS f ON f.f_id = cf.f_id
WHERE f.f_id IS NULL
The answer to this problem was to place a second index on follows.f_id.

MySQL Update query with left join and group by

I am trying to create an update query and making little progress in getting the right syntax.
The following query is working:
SELECT t.Index1, t.Index2, COUNT( m.EventType )
FROM Table t
LEFT JOIN MEvents m ON
(m.Index1 = t.Index1 AND
m.Index2 = t.Index2 AND
(m.EventType = 'A' OR m.EventType = 'B')
)
WHERE (t.SpecialEventCount IS NULL)
GROUP BY t.Index1, t.Index2
It creates a list of triplets Index1,Index2,EventCounts.
It only does this for case where t.SpecialEventCount is NULL. The update query I am trying to write should set this SpecialEventCount to that count, i.e. COUNT(m.EventType) in the query above. This number could be 0 or any positive number (hence the left join). Index1 and Index2 together are unique in Table t and they are used to identify events in MEvent.
How do I have to modify the select query to become an update query? I.e. something like
UPDATE Table SET SpecialEventCount=COUNT(m.EventType).....
but I am confused what to put where and have failed with numerous different guesses.
I take it that (Index1, Index2) is a unique key on Table, otherwise I would expect the reference to t.SpecialEventCount to result in an error.
Edited query to use subquery as it didn't work using GROUP BY
UPDATE
Table AS t
LEFT JOIN (
SELECT
Index1,
Index2,
COUNT(EventType) AS NumEvents
FROM
MEvents
WHERE
EventType = 'A' OR EventType = 'B'
GROUP BY
Index1,
Index2
) AS m ON
m.Index1 = t.Index1 AND
m.Index2 = t.Index2
SET
t.SpecialEventCount = m.NumEvents
WHERE
t.SpecialEventCount IS NULL
Doing a left join with a subquery will generate a giant
temporary table in-memory that will have no indexes.
For updates, try avoiding joins and using correlated
subqueries instead:
UPDATE
Table AS t
SET
t.SpecialEventCount = (
SELECT COUNT(m.EventType)
FROM MEvents m
WHERE m.EventType in ('A','B')
AND m.Index1 = t.Index1
AND m.Index2 = t.Index2
)
WHERE
t.SpecialEventCount IS NULL
Do some profiling, but this can be significantly faster in some cases.
my example
update card_crowd as cardCrowd
LEFT JOIN
(
select cc.id , count(1) as num
from card_crowd cc LEFT JOIN
card_crowd_r ccr on cc.id = ccr.crowd_id
group by cc.id
) as tt
on cardCrowd.id = tt.id
set cardCrowd.join_num = tt.num;