Tips for improving this slow mysql query? - mysql

I'm using a query which generally executes in under a second, but sometimes takes between 10-40 seconds to finish. I'm actually not totally clear on how the subquery works, I just know that it works, in that it gives me 15 rows for each faverprofileid.
I'm logging slow queries and it's telling me 5823244 rows were examined, which is odd because there aren't anywhere close to that many rows in any of the tables involved (the favorites table has the most at 50,000 rows).
Can anyone offer me some pointers? Is it an issue with the subquery and needing to use filesort?
EDIT: Running explain shows that the users table is not using an index (even though id is the primary key). Under extra it says: Using temporary; Using filesort.
SELECT F.id,F.created,U.username,U.fullname,U.id,I.*
FROM favorites AS F
INNER JOIN users AS U ON F.faver_profile_id = U.id
INNER JOIN items AS I ON F.notice_id = I.id
WHERE faver_profile_id IN (360,379,95,315,278,1)
AND F.removed = 0
AND I.removed = 0
AND F.collection_id is null
AND I.nudity = 0
AND (SELECT COUNT(*) FROM favorites WHERE faver_profile_id = F.faver_profile_id
AND created > F.created AND removed = 0 AND collection_id is null) < 15
ORDER BY F.faver_profile_id, F.created DESC;

The number of rows examined represents is large because many rows have been examined more than once. You are getting this because of an incorrectly optimized query plan which results in table scans when index lookups should have been performed. In this case the number of rows examined is exponential, i.e. of an order of magnitude comparable to the product of the total number of rows in more than one table.
Make sure that you have run ANALYZE TABLE on your three tables.
Read on how to avoid table scans, and identify then create any missing indexes
Rerun ANALYZE and re-explain your queries
the number of examined rows must drop dramatically
if not, post the full explain plan
use query hints to force the use of indices (to see the index names for a table, use SHOW INDEX):
SELECT
F.id,F.created,U.username,U.fullname,U.id,I.*
FROM favorites AS F FORCE INDEX (faver_profile_id_key)
INNER JOIN users AS U FORCE INDEX FOR JOIN (PRIMARY) ON F.faver_profile_id = U.id
INNER JOIN items AS I FORCE INDEX FOR JOIN (PRIMARY) ON F.notice_id = I.id
WHERE faver_profile_id IN (360,379,95,315,278,1)
AND F.removed = 0
AND I.removed = 0
AND F.collection_id is null
AND I.nudity = 0
AND (SELECT COUNT(*) FROM favorites FORCE INDEX (faver_profile_id_key) WHERE faver_profile_id = F.faver_profile_id
AND created > F.created AND removed = 0 AND collection_id is null) < 15
ORDER BY F.faver_profile_id, F.created DESC;
You may also change your query to use GROUP BY faver_profile_id/HAVING count > 15 instead of the nested SELECT COUNT(*) subquery, as suggested by vartec. The performance of both your original and vartec's query should be comparable if both are properly optimized e.g. using hints (your query would use nested index lookups, whereas vartec's query would use a hash-based strategy.)

I think with GROUP BY and HAVING it should be faster.
Is that what you want?
SELECT F.id,F.created,U.username,U.fullname,U.id, I.field1, I.field2, count(*) as CNT
FROM favorites AS F
INNER JOIN users AS U ON F.faver_profile_id = U.id
INNER JOIN items AS I ON F.notice_id = I.id
WHERE faver_profile_id IN (360,379,95,315,278,1)
AND F.removed = 0
AND I.removed = 0
AND F.collection_id is null
AND I.nudity = 0
GROUP BY F.id,F.created,U.username,U.fullname,U.id,I.field1, I.field2
HAVING CNT < 15
ORDER BY F.faver_profile_id, F.created DESC;
Don't know which fields from items you need, so I've put placeholders.

I suggest you use Mysql Explain Query to see how your mysql server handles the query. My bet is your indexes aren't optimal, but explain should do much better than my bet.

You could do a loop on each id and use limit instead of the count(*) subquery:
foreach $id in [123,456,789]:
SELECT
F.id,
F.created,
U.username,
U.fullname,
U.id,
I.*
FROM
favorites AS F INNER JOIN
users AS U ON F.faver_profile_id = U.id INNER JOIN
items AS I ON F.notice_id = I.id
WHERE
F.faver_profile_id = {$id} AND
I.removed = 0 AND
I.nudity = 0 AND
F.removed = 0 AND
F.collection_id is null
ORDER BY
F.faver_profile_id,
F.created DESC
LIMIT
15;

I'll suppose the result of that query is intented to be shown as a paged list. In that case, perhaps you could consider to do a simpler "unjoined query" and do a second query for each row to read only the 15, 20 or 30 elements shown. Was not a JOIN a heavy operation? This would simplify the query and It wouldn't become slower when the joined tables grow.
Tell me if I'm wrong, please.

Related

MySQL doesn't use the index I expect when my query has a large number of values in `IN` clause

I have a problem when IN clause contains too many values. Consider this query
EXPLAIN
SELECT DISTINCT t.entry_id , t.sticky , wd.field_id_104 , t.title
FROM exp_channel_titles AS t
LEFT JOIN exp_channels ON t.channel_id = exp_channels.channel_id
LEFT JOIN exp_channel_data AS wd ON t.entry_id = wd.entry_id
LEFT JOIN exp_members AS m ON m.member_id = t.author_id
INNER JOIN exp_category_posts ON t.entry_id = exp_category_posts.entry_id
INNER JOIN exp_categories ON exp_category_posts.cat_id = exp_categories.cat_id
WHERE t.entry_id !=''
AND t.site_id IN ('1')
AND t.entry_date < 1610109517
AND (t.expiration_date = 0 OR t.expiration_date > 1610109517)
AND t.entry_id IN ('0','649','650','651','652','653','654','655')
;
if there are few values output is following, which is ok
but if IN ('0','649','650','651','652','653','654','655', thousand values)
query run about 1 minute and explain change to this
how to fix that?
UPDATE: range_optimizer_max_mem_size had already set to 0 and isn't issue
We have had similar problems at my company when someone runs a query with a very long list of values in an IN (...) predicate.
We found that MySQL enforces a limit on memory available to the range optimizer. If the list of values is too long, it exceeds the memory limit, and the optimizer cannot finish its analysis to see if it should use the index. So it gives up and says, "forget it! it's a table-scan for you."
We fix it by setting the MySQL Server configuration value range_optimizer_max_mem_size=0 which means there is no limit to the memory that the range optimizer can use.
This creates a risk that if someone were to run a query with a million values in the IN (...) list, it could use a lot of memory, maybe enough to kill the MySQL Server. But so far the tradeoff is preferable, to allow the optimizer to choose the index.
See documentation:
https://dev.mysql.com/doc/refman/5.7/en/range-optimization.html
https://dev.mysql.com/doc/refman/5.7/en/server-system-variables.html#sysvar_range_optimizer_max_mem_size
Re your comment:
Another common reason for the optimizer to choose to do a table-scan is that it calculates that your conditions match a large enough portion of the table that it's more expensive to use the index than to simply run a table-scan and examine every row.
The threshold for this isn't documented, and it depends on the implementation of the cost-based optimizer, so it might change from version to version. But my observation is that usually if your conditions match more than 20% of the table, the optimizer chooses the table-scan.
You could use an index hint to tell the optimizer to treat a table-scan as infinitely expensive, so the index is preferred to a table-scan.
Explode-implode. This is a classic problem of an inefficient way to write a query.
JOIN several tables
Filter
Collapse the results -- usually by GROUP BY or LIMIT, but DISTINCT has the same effect.
So... Turn the query inside out.
Find the ids of the desired rows in t
JOIN that to the rest of the tables.
Presumably the DISTINCT will not be needed at all.
SELECT t2.entry_id, t2.sticky, wd.field_id_104, t2.title
FROM ( SELECT id
FROM exp_channel_titles
WHERE entry_id !=''
AND site_id IN ('1')
AND entry_date < 1610109517
AND (expiration_date = 0 OR expiration_date > 1610109517)
AND entry_id IN ('0','649','650','651','652','653','654','655')
) AS t
JOIN exp_channel_titles AS t2 USING(id)
LEFT JOIN exp_channels ON t2.channel_id = exp_channels.channel_id
LEFT JOIN exp_channel_data AS wd ON t2.entry_id = wd.entry_id
;
Another reformulation
Since there is only one use for md, this might be better:
SELECT entry_id,
sticky,
( SELECT wd.field_id_104
FROM exp_channels ON t2.channel_id = exp_channels.channel_id
LEFT JOIN exp_channel_data AS wd ON t.entry_id = wd.entry_id
) AS field_id_104,
title
FROM exp_channel_titles
WHERE entry_id !=''
AND site_id IN ('1')
AND entry_date < 1610109517
AND (expiration_date = 0 OR expiration_date > 1610109517)
AND entry_id IN ('0','649','650','651','652','653','654','655')
;
and have a 5-column index starting with site_id, entry_date
Other...
AND (t.expiration_date = 0 OR t.expiration_date > 1610109517)
OR is not sargeable. Can you redesign the table to avoid this OR?
Without the above reformulation, this may help:
INDEX(site_id, entry_date)
Also, get rid of these, since they seem to be totally useless:
LEFT JOIN exp_channels ON t.channel_id = exp_channels.channel_id
LEFT JOIN exp_members AS m ON m.member_id = t.author_id
And these may be useless:
INNER JOIN exp_category_posts ON t.entry_id = exp_category_posts.entry_id
INNER JOIN exp_categories ON exp_category_posts.cat_id = exp_categories.cat_id

Slow MariaDB when Query Joining Million of Records

I installed a plug in and I have done optimisation on the back end (SSD, single column indexing for columns called in GROUP BY & WHERE)
but when running this query
SELECT u.user_id, u.profile_page_id, u.server_id AS user_server_id, u.user_name, u.full_name, u.gender, u.user_image, u.is_invisible, u.user_group_id, u.language_id, u.birthday, u.country_iso, m.*
FROM(
(SELECT m.*
FROM phpfox_channel_video AS m
INNER JOIN phpfox_channel_category AS mc
ON(mc.category_id = mc.category_id)
INNER JOIN phpfox_channel_category_data AS mcd
ON(mcd.video_id = m.video_id)
WHERE m.in_process = 0 AND m.view_id = 0 AND m.module_id = 'videochannel' AND m.item_id = 0 AND m.privacy IN(0) AND mcd.category_id = 17
GROUP BY m.video_id
ORDER BY m.time_stamp DESC
)) AS m
JOIN phpfox_user AS u
ON(u.user_id = m.user_id)
ORDER BY m.time_stamp DESC
LIMIT 24;
it takes 20 seconds, while changing it to this instead
SELECT u.user_id, u.profile_page_id, u.server_id AS user_server_id, u.user_name, u.full_name, u.gender, u.user_image, u.is_invisible, u.user_group_id, u.language_id, u.birthday, u.country_iso, m.*
FROM(
(SELECT m.*
FROM phpfox_channel_video AS m
INNER JOIN phpfox_channel_category_data AS mcd
ON(mcd.video_id = m.video_id AND mcd.category_id = 17)
WHERE m.in_process = 0 AND m.view_id = 0 AND m.module_id = 'videochannel' AND m.item_id = 0 AND m.privacy IN(0)
GROUP BY m.video_id
ORDER BY m.time_stamp DESC
)) AS m
JOIN phpfox_user AS u
ON(u.user_id = m.user_id)
ORDER BY m.time_stamp DESC
LIMIT 24;
This runs about 5-6 seconds
The phpfox_channel_video contains 2 million rows (and will keep on adding quickly, its a social media site and user can upload files too) so caching isn't quite useful (but activated).
Any hints on how to optimise this ? I have minimum experience with MariaDB/MySQL as I've been accustomed to MS SQL for big data, and creating my own structure. Any recommended method without needing much altering to the tables (adding tables is OK).
Or should I need to restructure the PHP & table to optimise the query to be below 1 second / query.
Thank you!
I found these links
http://mysql.rjweb.org/doc.php/memory &
http://mysql.rjweb.org/doc.php/ricksrots#indexing
Are they still relevant ?
attached is the explain results
And as for the Index, the current config is set to index every column is stated as an index key, all all the tables involved in the query above.
Would a print out of my current server configuration be helpful ? Thanks !
INNER JOIN phpfox_channel_category AS mc ON(mc.category_id = mc.category_id)
Is almost useless.
You don't use any columns of mc for other purposes.
This JOIN is performed.
This JOIN verified that there is a corresponding row in mc.
This JOIN will bloat the temp table if there are multiple corresponding rows.
Bloat leads to wasted work in the GROUP BY.
Similarly, your second query does not use mcd.
Please use different aliases for derived tables. It is hard to follow the multiple uses of m..
This is totally useless:
ORDER BY m.time_stamp DESC
MySQL/MariaDB is free to ignore an ORDER BY in a derived table. A table is defined to be an unordered set of rows. Ordering can only be done at the end.
Suggested index
m: INDEX(item_id, module_id, view_id, in_process, -- any order; tested with '='
privacy, -- sometimes has a list?
video_id) -- last
mcd: INDEX(category_id, video_id) -- in either order
There is a more logical way to do this, and possibly faster:
INNER JOIN phpfox_channel_category_data AS mcd
ON mcd.video_id = m.video_id
AND mcd.category_id = 17
Remove that, and remove the GROUP BY m.id, then add this to the WHERE:
AND EXISTS( SELECT 1 FROM phpfox_channel_category_data AS mcd
WHERE mcd.video_id = m.video_id
AND mcd.category_id = 17 )
(The index mentioned above still applies.)
Not that I have perhaps eliminated two "filesorts" -- for the GROUP BY and the ORDER BY. Another note: EXPLAIN does not always show how many filesorts thre really are. (But EXPLAIN FORMAT=JSON SELECT ... does.)
I managed to clean up the query, after checking the table, turns out that
WHERE m.in_process = 0
AND m.view_id = 0
AND m.module_id = 'videochannel'
AND m.item_id = 0
AND m.privacy IN(0)
Doesn't need to be run, because all the table matches that condition .. (for the current case of this website).. So I just optimize those long queries. And Manage to hit < 1 second now ..

Understaing the difference between two queries from performance point

I have this two version of the same query. Both produce same results (164 rows). But the second one takes .5 sec while the 1st one takes 17 sec. Can someone explain what's going on here?
TABLE organizations : 11988 ROWS
TABLE transaction_metas : 58232 ROWS
TABLE contracts_history : 219469 ROWS
# TAKES 17 SEC
SELECT contracts_history.buyer_id as id, org.name, SUM(transactions_count) as transactions_count, GROUP_CONCAT(DISTINCT(tm.value)) as balancing_authorities
From `contracts_history`
INNER JOIN `organizations` as `org`
ON `org`.`id` = `contracts_history`.`buyer_id`
LEFT JOIN `transaction_metas` as `tm`
ON `tm`.`contract_token` = `contracts_history`.`token` and `tm`.`field` = '1'
WHERE `contracts_history`.`seller_id` = '850'
GROUP BY `contracts_history`.`buyer_id` ORDER BY `balancing_authorities` DESC
# TAKES .6 SEC
SELECT contracts_history.buyer_id as id, org.name, SUM(transactions_count) as transactions_count, GROUP_CONCAT(DISTINCT(tm.value)) as balancing_authorities
From `contracts_history`
INNER JOIN `organizations` as `org`
ON `org`.`id` = `contracts_history`.`buyer_id`
left join (select * from `transaction_metas` where contract_token in (select token from `contracts_history` where seller_id = 850)) as `tm`
ON `tm`.`contract_token` = `contracts_history`.`token` and `tm`.`field` = '1'
WHERE `contracts_history`.`seller_id` = '850'
GROUP BY `contracts_history`.`buyer_id` ORDER BY `balancing_authorities` DESC
Explain Results:
First Query: https://prnt.sc/hjtiw6
Second Query: https://prnt.sc/hjtjjg
As based on my debugging of the first query it was clear that left join to transaction_metas table was making it slow, So I tried to limit its rows instead of joining to the full table. It seems to work but I don't understand why.
Join is a set of combinations from rows in your tables. That in mind, in the first query the engine combines all the results to filter just after. In second case one it applies the filter before it tries make the combinations.
The best case would make use of filter in JOIN clause without subquery.
Much like this:
SELECT contracts_history.buyer_id as id, org.name, SUM(transactions_count) as transactions_count, GROUP_CONCAT(DISTINCT(tm.value)) as balancing_authorities
From `contracts_history`
INNER JOIN `organizations` as `org`
ON `org`.`id` = `contracts_history`.`buyer_id`
AND `contracts_history`.`seller_id` = '850'
LEFT JOIN `transaction_metas` as `tm`
ON `tm`.`contract_token` = `contracts_history`.`token`
AND `tm`.`field` = 1
GROUP BY `contracts_history`.`buyer_id` ORDER BY `balancing_authorities` DESC
Note: When you reduce the size of the join tables by filtering with subqueries, it may allow the rows fit into the buffer. Nice trick to small buffer limit.
A Better explication:
https://dev.mysql.com/doc/refman/5.5/en/explain-output.html

MY SQL running very slow due to `Order by` & `Limit`

I have a performance issue with the query below on MYSQL. The below query has 5 tables involved. When I apply the order by and limit, the results are retrieved in 0.3 secs. But without the order by and limit, I was able to get the results in 0.01 secs. I am tired changing the query but that did not work. Could someone please help me with this query so I can get the results in desired time (<0.3 secs).
Below are the details.
m_todos = 286579 (records)
m_pat = 214858 (records)
users = 119 (records)
m_programs = 26 (records)
role = 4 (records)
SELECT *
FROM (
SELECT t.*,
mp.name as A_name,
u.first_name, u.last_name,
p.first, p.last, p.zone, p.language,p.handling,
r.name,
u2.first_name AS created_first_name,
u2.last_name AS created_last_name
FROM m_todos t
INNER JOIN role r ON t.role_id=r.id
INNER JOIN m_pat p ON t.patient_id = p.id
LEFT JOIN users u2 ON t.created_id=u2.id
LEFT JOIN m_programs mp ON t.prog_id=mp.id
LEFT JOIN users u ON t.user_id=u.id
WHERE t.role_id !='9'
AND t.completed = '0000-00-00 00:00:00'
) C
ORDER BY priority DESC, due ASC
LIMIT 0,10
Get rid of the outer SELECT; move the ORDER BY and LIMIT in.
Indexes:
t: (completed)
t: (priority, due)
I assume priority and due are in t?? Please be explicit in the query. It could make a huge difference.
If the following works, it should speed things up a lot: Start by finding the t.id without all the JOINs:
SELECT id
FROM m_todos
WHERE role_id !='9'
AND completed = '0000-00-00 00:00:00'
ORDER BY priority DESC, due DESC
LIMIT 10
That will benefit from this covering composite index:
INDEX(completed, role_id, priority, due, id)
Debug that. Then use it in the rest:
SELECT t.*, the-other-stuff
FROM ( that-query ) AS t1
JOIN m_todos AS t USING(id)
then-the-rest-of-the-JOINs
ORDER BY priority DESC, due ASC -- yes, again
If you don't need all of t.*, it may be beneficial to spell out the actual columns needed.
The reason for this to run much faster is that the 10 rows are found efficiently by looking only at the one table. The original code was shoveling around a lot more rows than 10 and they included all the columns of t, plus columns from the other tables.
My version does only 10 lookups for all the extra stuff.

slow query with lots of left joins

When I check SHOW PROCESSLIST; in database I got below query. It heavily uses CPU (more than 100%), it took 80 seconds to complete the query. We have a separate server for database(64GB RAM).
INSERT INTO `search_tmp_598075de5c7e67_73335919`
SELECT `main_select`.`entity_id`, MAX(score) AS `relevance`
FROM (SELECT `search_index`.`entity_id`, (((0)) * 1) AS score
FROM `catalogsearch_fulltext_scope1` AS `search_index`
LEFT JOIN `catalog_eav_attribute` AS `cea`
ON search_index.attribute_id = cea.attribute_id
LEFT JOIN `catalog_category_product_index` AS `category_ids_index`
ON search_index.entity_id = category_ids_index.product_id
LEFT JOIN `review_entity_summary` AS `rating`
ON `rating`.`entity_pk_value`=`search_index`.entity_id
AND `rating`.entity_type = 1
AND `rating`.store_id = 1
WHERE (category_ids_index.category_id = 2299)
) AS `main_select`
GROUP BY `entity_id`
ORDER BY `relevance` DESC
LIMIT 10000
why does this query use my full CPU resources?
Some inefficiencies:
There is a non-null condition on the records of the outer joined catalog_category_product_index. This turns the outer join into an inner join. It will be more efficient to use an inner join clause.
There is no need to have a nested query: the grouping, ordering and limiting can be done directly on the inner query.
(((0)) * 1) is just a complex way of saying 0, and taking the MAX of that will obviously still return a relevance of 0 for all records. Not only is this an inefficient way to output 0, it also makes no sense. I assume your real query has some less evident calculation there, which might need optimisation.
If catalog_eav_attribute.attribute_id is a unique field, then there is no sense in outer joining that table, because that data is not used anywhere
If review_entity_summary.entity_pk_value is unique (at least when entity_type = 1 and store_id = 1), then again there is no use in outer joining that table, because that data is not used anywhere
If the fields in the above 2 bullet points are non-unique, but the number of records returned per search_index.entity_id value is not influencing the result (as it currently stands with the obscure (((0)) * 1) value, it does not), then neither outer join is needed either.
With these assumptions, the select part can be reduced to:
SELECT search_index.entity_id,
MAX(((0)) * 1) AS relevance
FROM catalogsearch_fulltext_scope1 AS search_index
INNER JOIN catalog_category_product_index AS category_ids_index
ON search_index.entity_id = category_ids_index.product_id
WHERE category_ids_index.category_id = 2299
GROUP BY search_index.entity_id
ORDER BY relevance DESC
LIMIT 10000
I still left the (((0)) * 1) in there, but it really makes no sense.