MySQL Join makes query slow - can't figure out why - mysql

I am attaching results of two Explain statements for an old query and the newer version of that query.
Do you see anything that does not make sense or looks wrong? The query became slow(4.5 seconds) after I added tm, tsa and tcd tables.
Before those three tables were added to the query it was extremely fast (0.001 seconds). Here is what the explain looked like
tm table has four columns (tm_id (PK), owner_id, manager_id, status), tcd has three columns (tm_id, cd_id, created_date). tm_id and cd_id make a composite primary key and there is another index on cd_id. Same is the case with tsa with three columns (tm_id, smpa_id, created_date) with tm_id and smpa_id being a composite primary key and smpa_id has another index.
What could be the reason for such slowness?
old query:
SELECT upcm_id, COUNT( * )
FROM user_post_content_master AS upcm
JOIN content_deck AS cd ON cd.cd_id = upcm.cd_id
JOIN social_media_post_account AS smpa ON smpa.smpa_id = upcm.smpa_id
JOIN post_content_master AS pcm ON pcm.pcm_id = upcm.pcm_id
WHERE smpa.user_id =2196
AND upcm.upcm_post_date >=1545891957
AND upcm.upcm_status =1
AND upcm.upcm_post_date >=1546560000
AND upcm.upcm_post_date <=1546732799
GROUP BY upcm.upcm_id
ORDER BY upcm.upcm_post_date ASC
New Query:
SELECT upcm_id, COUNT( * )
FROM user_post_content_master AS upcm
JOIN content_deck AS cd ON cd.cd_id = upcm.cd_id
JOIN social_media_post_account AS smpa ON smpa.smpa_id = upcm.smpa_id
JOIN post_content_master AS pcm ON pcm.pcm_id = upcm.pcm_id
JOIN team_content_deck AS tcd ON ( tcd.cd_id = upcm.cd_id )
JOIN team_social_account AS tsa ON tsa.smpa_id = upcm.smpa_id
JOIN team_members AS tm ON tm.team_member_id = tsa.team_member_id
AND tm.team_member_id = tcd.team_member_id
AND tm.owner_id =2196
AND tm.manager_id =2196
AND tm.status =1
WHERE smpa.user_id =2196
AND upcm.upcm_post_date >=1545891957
AND upcm.upcm_status =1
AND upcm.upcm_post_date >=1546560000
AND upcm.upcm_post_date <=1546732799
GROUP BY upcm.upcm_id
ORDER BY upcm.upcm_post_date ASC
If I remove the conditions from the tm table, it is fast again. Nothing changed in the joins though.
EXPLAIN SELECT upcm_id, COUNT( * )
FROM user_post_content_master AS upcm
JOIN content_deck AS cd ON cd.cd_id = upcm.cd_id
JOIN social_media_post_account AS smpa ON smpa.smpa_id = upcm.smpa_id
JOIN post_content_master AS pcm ON pcm.pcm_id = upcm.pcm_id
JOIN team_content_deck AS tcd ON ( tcd.cd_id = upcm.cd_id )
JOIN team_social_account AS tsa ON tsa.smpa_id = upcm.smpa_id
JOIN team_members AS tm ON tm.team_member_id = tsa.team_member_id
AND tm.team_member_id = tcd.team_member_id
WHERE smpa.user_id =2196
AND upcm.upcm_post_date >=1545891957
AND upcm.upcm_status =1
AND upcm.upcm_post_date >=1546560000
AND upcm.upcm_post_date <=1546732799
GROUP BY upcm.upcm_id
ORDER BY upcm.upcm_post_date ASC

I see the difference is most likely because the key selected for upcm, old query selected upcm_post_date and new query selected cd_id.
Since the data is not enough, from the name, it seems that cd_id has a much lower cardinality comparing with upcm_post_date.
Update (Extracted from my comments below):
One possible reason is because of the sequence of tables mysql decided for the query, content_deck comes before user_post_content_master. Because mysql uses nested-loop algorithm for JOIN, user_post_content_master is in an inner loop for the join.
You have a constant lookup when tm.owner_id is present, which leads MySQL optimizer to decide it win over a range scan.
In the book High Performance MySQL, there is one chapter discussing the query optimization. There is one technique called: join decomposition, i.e., to separate one big join query to small one. One extra benefit is that you can cache some common data.
I am not sure whether Index Hint can help in this case (just hint or force MySQL to use post_data for upcm): SELECT * FROM user_post_content_master USE INDEX (upcm_post_date)

Related

MySQL doesn't use the index I expect when my query has a large number of values in `IN` clause

I have a problem when IN clause contains too many values. Consider this query
EXPLAIN
SELECT DISTINCT t.entry_id , t.sticky , wd.field_id_104 , t.title
FROM exp_channel_titles AS t
LEFT JOIN exp_channels ON t.channel_id = exp_channels.channel_id
LEFT JOIN exp_channel_data AS wd ON t.entry_id = wd.entry_id
LEFT JOIN exp_members AS m ON m.member_id = t.author_id
INNER JOIN exp_category_posts ON t.entry_id = exp_category_posts.entry_id
INNER JOIN exp_categories ON exp_category_posts.cat_id = exp_categories.cat_id
WHERE t.entry_id !=''
AND t.site_id IN ('1')
AND t.entry_date < 1610109517
AND (t.expiration_date = 0 OR t.expiration_date > 1610109517)
AND t.entry_id IN ('0','649','650','651','652','653','654','655')
;
if there are few values output is following, which is ok
but if IN ('0','649','650','651','652','653','654','655', thousand values)
query run about 1 minute and explain change to this
how to fix that?
UPDATE: range_optimizer_max_mem_size had already set to 0 and isn't issue
We have had similar problems at my company when someone runs a query with a very long list of values in an IN (...) predicate.
We found that MySQL enforces a limit on memory available to the range optimizer. If the list of values is too long, it exceeds the memory limit, and the optimizer cannot finish its analysis to see if it should use the index. So it gives up and says, "forget it! it's a table-scan for you."
We fix it by setting the MySQL Server configuration value range_optimizer_max_mem_size=0 which means there is no limit to the memory that the range optimizer can use.
This creates a risk that if someone were to run a query with a million values in the IN (...) list, it could use a lot of memory, maybe enough to kill the MySQL Server. But so far the tradeoff is preferable, to allow the optimizer to choose the index.
See documentation:
https://dev.mysql.com/doc/refman/5.7/en/range-optimization.html
https://dev.mysql.com/doc/refman/5.7/en/server-system-variables.html#sysvar_range_optimizer_max_mem_size
Re your comment:
Another common reason for the optimizer to choose to do a table-scan is that it calculates that your conditions match a large enough portion of the table that it's more expensive to use the index than to simply run a table-scan and examine every row.
The threshold for this isn't documented, and it depends on the implementation of the cost-based optimizer, so it might change from version to version. But my observation is that usually if your conditions match more than 20% of the table, the optimizer chooses the table-scan.
You could use an index hint to tell the optimizer to treat a table-scan as infinitely expensive, so the index is preferred to a table-scan.
Explode-implode. This is a classic problem of an inefficient way to write a query.
JOIN several tables
Filter
Collapse the results -- usually by GROUP BY or LIMIT, but DISTINCT has the same effect.
So... Turn the query inside out.
Find the ids of the desired rows in t
JOIN that to the rest of the tables.
Presumably the DISTINCT will not be needed at all.
SELECT t2.entry_id, t2.sticky, wd.field_id_104, t2.title
FROM ( SELECT id
FROM exp_channel_titles
WHERE entry_id !=''
AND site_id IN ('1')
AND entry_date < 1610109517
AND (expiration_date = 0 OR expiration_date > 1610109517)
AND entry_id IN ('0','649','650','651','652','653','654','655')
) AS t
JOIN exp_channel_titles AS t2 USING(id)
LEFT JOIN exp_channels ON t2.channel_id = exp_channels.channel_id
LEFT JOIN exp_channel_data AS wd ON t2.entry_id = wd.entry_id
;
Another reformulation
Since there is only one use for md, this might be better:
SELECT entry_id,
sticky,
( SELECT wd.field_id_104
FROM exp_channels ON t2.channel_id = exp_channels.channel_id
LEFT JOIN exp_channel_data AS wd ON t.entry_id = wd.entry_id
) AS field_id_104,
title
FROM exp_channel_titles
WHERE entry_id !=''
AND site_id IN ('1')
AND entry_date < 1610109517
AND (expiration_date = 0 OR expiration_date > 1610109517)
AND entry_id IN ('0','649','650','651','652','653','654','655')
;
and have a 5-column index starting with site_id, entry_date
Other...
AND (t.expiration_date = 0 OR t.expiration_date > 1610109517)
OR is not sargeable. Can you redesign the table to avoid this OR?
Without the above reformulation, this may help:
INDEX(site_id, entry_date)
Also, get rid of these, since they seem to be totally useless:
LEFT JOIN exp_channels ON t.channel_id = exp_channels.channel_id
LEFT JOIN exp_members AS m ON m.member_id = t.author_id
And these may be useless:
INNER JOIN exp_category_posts ON t.entry_id = exp_category_posts.entry_id
INNER JOIN exp_categories ON exp_category_posts.cat_id = exp_categories.cat_id

How to join latest record for each foreign key without Inner select using group by and then on clause?

I have two tables r_instance(id as primary key,name,user_id,..etc) and r_response(id,comment,r_instance_id as Foreign key).
Each r_instance row have multiple r_response rows(say min of 3).
I want to get latest id and comment while joining r_response with r_instance.
But without using GROUP BY and then on clause on r_response as it is degrading query performance.So When query performance is considered using EXPLAIN the type column should not have ALL value.
My query is :
SELECT ri.id, ri.name, rr.id, rr.comment
FROM r_instance ri
JOIN (SELECT MAX(id) maxResponseId, r_instance_id instanceId
from r_response
GROUP BY r_instance_id) lastRes ON lastRes.instanceId = ri.id
JOIN r_response rr ON rr.id = lastRes.maxResponseId
You could use window function called row_number() MySQL 8.0+
SELECT * FROM
(
SELECT *,
ROW_NUMBER() OVER(PARTITION BY r_instance_id ORDER BY id DESC) Sq
from r_response
) a
INNER JOIN r_instance R ON R.id = a.r_instance_id
WHERE a.Sq = 1
Here is another method:
SELECT ri.id, ri.name, rr.id, rr.comment
FROM r_instance ri JOIN
JOIN r_response rr
ON ri.id = rr.r_instance_id
WHERE rr.id = (SELECT MAX(rr2.id)
FROM r_response rr2
WHERE rr2.r_instance_id = rr.r_instance_id
);
For performance, you want an index on r_response(r_instance_id, id).
I should note that this does not always give the best performance. It is another strategy for expressing the same logic, resulting in a different execution plan. It might result in better performance.

Understaing the difference between two queries from performance point

I have this two version of the same query. Both produce same results (164 rows). But the second one takes .5 sec while the 1st one takes 17 sec. Can someone explain what's going on here?
TABLE organizations : 11988 ROWS
TABLE transaction_metas : 58232 ROWS
TABLE contracts_history : 219469 ROWS
# TAKES 17 SEC
SELECT contracts_history.buyer_id as id, org.name, SUM(transactions_count) as transactions_count, GROUP_CONCAT(DISTINCT(tm.value)) as balancing_authorities
From `contracts_history`
INNER JOIN `organizations` as `org`
ON `org`.`id` = `contracts_history`.`buyer_id`
LEFT JOIN `transaction_metas` as `tm`
ON `tm`.`contract_token` = `contracts_history`.`token` and `tm`.`field` = '1'
WHERE `contracts_history`.`seller_id` = '850'
GROUP BY `contracts_history`.`buyer_id` ORDER BY `balancing_authorities` DESC
# TAKES .6 SEC
SELECT contracts_history.buyer_id as id, org.name, SUM(transactions_count) as transactions_count, GROUP_CONCAT(DISTINCT(tm.value)) as balancing_authorities
From `contracts_history`
INNER JOIN `organizations` as `org`
ON `org`.`id` = `contracts_history`.`buyer_id`
left join (select * from `transaction_metas` where contract_token in (select token from `contracts_history` where seller_id = 850)) as `tm`
ON `tm`.`contract_token` = `contracts_history`.`token` and `tm`.`field` = '1'
WHERE `contracts_history`.`seller_id` = '850'
GROUP BY `contracts_history`.`buyer_id` ORDER BY `balancing_authorities` DESC
Explain Results:
First Query: https://prnt.sc/hjtiw6
Second Query: https://prnt.sc/hjtjjg
As based on my debugging of the first query it was clear that left join to transaction_metas table was making it slow, So I tried to limit its rows instead of joining to the full table. It seems to work but I don't understand why.
Join is a set of combinations from rows in your tables. That in mind, in the first query the engine combines all the results to filter just after. In second case one it applies the filter before it tries make the combinations.
The best case would make use of filter in JOIN clause without subquery.
Much like this:
SELECT contracts_history.buyer_id as id, org.name, SUM(transactions_count) as transactions_count, GROUP_CONCAT(DISTINCT(tm.value)) as balancing_authorities
From `contracts_history`
INNER JOIN `organizations` as `org`
ON `org`.`id` = `contracts_history`.`buyer_id`
AND `contracts_history`.`seller_id` = '850'
LEFT JOIN `transaction_metas` as `tm`
ON `tm`.`contract_token` = `contracts_history`.`token`
AND `tm`.`field` = 1
GROUP BY `contracts_history`.`buyer_id` ORDER BY `balancing_authorities` DESC
Note: When you reduce the size of the join tables by filtering with subqueries, it may allow the rows fit into the buffer. Nice trick to small buffer limit.
A Better explication:
https://dev.mysql.com/doc/refman/5.5/en/explain-output.html

Understanding why this query is slow

The below query is very slow (takes around 1 second), but is only searching approx 2500 records (+ inner joined tables).
if i remove the ORDER BY, the query runs in much less time (0.05 or less)
OR if i remove the part nested select below "# used to select where no ProfilePhoto specified" it also runs fast, but i need both of these included.
I have indexes (or primary key) on :tPhoto_PhotoID, PhotoID, p.Enabled, CustomerID, tCustomer_CustomerID, ProfilePhoto (bool), u.UserName, e.PrivateEmail, m.tUser_UserID, Enabled, Active, m.tMemberStatuses_MemberStatusID, e.tCustomerMembership_MembershipID, e.DateCreated
(do i have too many indexes? my understanding is add them anywhere i use WHERE or ON)
The Query :
SELECT e.CustomerID,
e.CustomerName,
e.Location,
SUBSTRING_INDEX(e.CustomerProfile,' ', 25) AS Description,
IFNULL(p.PhotoURL, PhotoTable.PhotoURL) AS PhotoURL
FROM tCustomer e
LEFT JOIN (tCustomerPhoto ep INNER JOIN tPhoto p ON (ep.tPhoto_PhotoID = p.PhotoID AND p.Enabled=1))
ON e.CustomerID = ep.tCustomer_CustomerID AND ep.ProfilePhoto = 1
# used to select where no ProfilePhoto specified
LEFT JOIN ((SELECT pp.PhotoURL, epp.tCustomer_CustomerID
FROM tPhoto pp
LEFT JOIN tCustomerPhoto epp ON epp.tPhoto_PhotoID = pp.PhotoID
GROUP BY epp.tCustomer_CustomerID) AS PhotoTable) ON e.CustomerID = PhotoTable.tCustomer_CustomerID
INNER JOIN tUser u ON u.UserName = e.PrivateEmail
INNER JOIN tmembers m ON m.tUser_UserID = u.UserID
WHERE e.Enabled=1
AND e.Active=1
AND m.tMemberStatuses_MemberStatusID = 2
AND e.tCustomerMembership_MembershipID != 6
ORDER BY e.DateCreated DESC
LIMIT 12
i have similar queries that but they run much faster.
any opinions would be grateful:
Until we get more clarity on your question between working in other query etc..Try EXPLAIN {YourSelectQuery} in MySQL client and see the suggestions to improve the performance.

Mysql not using index in LEFT JOIN when joined column used in WHERE clause

I've been puzzling around this problem in mySQL 5.0.51a for quite a while now:
When using LEFT JOIN to join a table AND using a column of the joined table in the WHERE clause, mySQL fails to use the primary index of the joined table in the JOIN, even FORCE INDEX (PRIMARY) fails.
If no column of the joined table is in the WHERE clause, everything works fine.
If the GROUP BY is removed, the index is also used.
Yet I need both of them.
Faulty:
(in my special case up to 1000 secs of exec time)
SELECT *
FROM tbl_contract co
LEFT JOIN tbl_customer cu ON cu.customer_id = co.customer_id
WHERE cu.marketing_allowed = 1 AND co.marketing_allowed = 1
GROUP BY cu.id
ORDER BY cu.name ASC
Working, but not solving my problems:
SELECT *
FROM tbl_contract co
LEFT JOIN tbl_customer cu ON cu.customer_id = co.customer_id
GROUP BY co.id
Table structures (transcribed, as the real tables are more complex)
tbl_contract:
id: INT(11) PRIMARY
customer_id: INT(11)
marketing_allowed: TINYINT(1)
tbl_customer:
customer_id: INT(11) PRIMARY
marketing_allowed: TINYINT(1)
mySQL EXPLAIN notices PRIMARY as possible key when joining, but doesn't use it.
There has been one solution:
SELECT (...)
HAVING cu.marketing_allowed = 1
Solves the problem BUT we use the query in other contexts, where we can only select ONE column in the whole statement, but HAVING needs the marketing_allowed column to be selected in the SELECT-Statement.
I also noticed, that running ANALYZE TABLE on the desired tables will make mySQL 5.5.8 on my local system do the right thing, but I cannot always assure that ANALYZE has been run right before the statement. Anyways, this solution does not work under mySQL 5.0.51a on our productive server. :(
Is there a special rule in mySQL which I didn't notice? Why are LEFT JOIN indexes not used if columns appear in the WHERE clause? Why can't I force them?
Thx in advance,
René
[EDIT]
Thanks to some replies I could optimize the query using an INNER JOIN, but unfortunately, though seeming absolutely fine, mySQL still rejects to use an index when using an ORDER BY clause, as I found out:
SELECT *
FROM tbl_contract co
INNER JOIN tbl_customer cu ON cu.customer_id = co.customer_id AND cu.marketing_allowed = 1
WHERE cu.marketing_allowed = 1
ORDER BY cu.name ASC
If you leave the ORDER BY out, mySQL will use the index correctly.
I have removed the GROUP BY as it has no relevance in the example.
[EDIT2]
FORCING Indexes does not help, as well. So, the question is: Why does mySQL not use an index for joining, as the ORDER BY is executed AFTER joining and reducing the result set by a WHERE clause ? This should usually not influence joining...
I'm not sure I understand what you are asking, but
SELECT *
FROM tbl_contract co
LEFT JOIN tbl_customer cu ON cu.customer_id = co.customer_id
WHERE cu.marketing_allowed = 1 AND co.marketing_allowed = 1
will not do an outer join (because of cu.marketing_allowed = 1).
You probably meant to use:
SELECT *
FROM tbl_contract co
LEFT JOIN tbl_customer cu
ON cu.customer_id = co.customer_id
AND cu.marketing_allowed = 1
WHERE co.marketing_allowed = 1
I had the same trouble. MySQL optimizer is not using indexes while using JOIN with conditions. I changed my SQL statement from JOIN to subqueries :
SELECT
t1.field1,
t1.field2,
...
(SELECT
t2.field3
FROM table2 t2
WHERE t2.fieldX=t1.fieldX
) AS field3,
(SELECT
t2.field4
FROM table2 t2
WHERE t2.fieldX=t1.fieldX
) AS field4,
FROM table1 t1
WHERE t1.fieldZ='valueZ'
ORDER BY t1.sortedField
This request is much more complicated but as indexes are used, it is also much more faster.
You could also use STRAIGHT_JOIN but performance is better with above query. Here's a comparison for by DB with 100k rows in table1 and 20k in table2 :
0.00s using above query
0.10s using STRAIGHT_JOIN
0.30 using JOIN
Have you tried multiple condition on JOIN clause?
SELECT *
FROM tbl_contract co
LEFT JOIN tbl_customer cu ON cu.customer_id = co.customer_id AND cu.marketing_allowed = 1
WHERE co.marketing_allowed = 1