I am struggling with a mysql problem.
I have two exact same queries, just the item_id at the end is different, but they return different execution plans when I execute them with analyze/explain.
This results in a huge difference of time needed to return a result.
The query is something like
explain select `orders`.*,
(select count(*) from `items`
inner join `orders_items` on `items`.`id` = `orders_items`.`item_id`
where `orders`.`id` = `orders_items`.`order_id`
and `items`.`deleted_at` is null) as `items_count`,
(select count(*) from `returns`
where `orders`.`id` = `returns`.`order_id`
and `returns`.`deleted_at` is null) as `returns_count`,
(select count(*) from `shippings`
where `orders`.`id` = `shippings`.`order_id`
and `shippings`.`deleted_at` is null) as `shippings_count`,
(select count(*) from `orders` as `laravel_reserved_2`
where `orders`.`id` = `laravel_reserved_2`.`recurred_from_id`
and `laravel_reserved_2`.`deleted_at` is null) as `recurred_orders_count`,
(select COALESCE(SUM(orders_items.amount), 0) from `items`
inner join `orders_items` on `items`.`id` = `orders_items`.`item_id`
where `orders`.`id` = `orders_items`.`order_id`
and `items`.`deleted_at` is null) as `items_sum_orders_itemsamount`,
`orders`.*,
`orders_items`.`item_id` as `pivot_item_id`,
`orders_items`.`order_id` as `pivot_order_id`,
`orders_items`.`amount` as `pivot_amount`,
`orders_items`.`id` as `pivot_id`
from `orders`
inner join `orders_items` on `orders`.`id` = `orders_items`.`order_id`
where `orders_items`.`item_id` = 497
and `import_finished` = 1
and `orders`.`deleted_at` is null
order by `id` desc limit 50 offset 0;
As you can see it is a laravel/eloquent query.
This is the execution plan for the query above:
But when I change the item_id at the end it return the following execution plan
It is absolutely random. 30% of the item_id's return the faster one and 70% return the slower one and I have no idea why. The related data is almost the same for every item we have in our database.
I also flushed the query cache to see if this was causing the different exec plans but no success.
I am googlin' since 4 hours but I can't find anything about this exact problem.
Can someone of you guys tell me why this hapens?
Thanks in advance!!
Edit 01/21/2023 - 19:04:
Seems like mysql don't like to order by columns which are not defined in the related where clause, in this case the pivot table orders_items.
I just replaced the
order by id
with
order by orders_items.order_id
This results in a 10 times faster query.
A query using different execution plans just because of a different parameter can have several reasons. The simplest explanation would be the position of the used item_id in the relevant index. The position in the index may affect the cost of using the index which in turn may affect if it is used at all. (this is just an example)
It is important to note that the explain statement will give you the planned execution plan but maybe not the actually used one.
EXPLAIN ANALYZE is the command which will output the actually used execution plan for you. It may still yield different results for different parameters.
ON `orders`.`id` = `orders_items`.`order_id`
where `orders_items`.`item_id` = 497
and ???.`import_finished` = 1
and `orders`.`deleted_at` is null
order by ???.`id` desc
limit 50 offset 0;
One order maps to many order items, correct?
So, ORDER BY orders.id is different than ORDER BY orders_items.item_id.
Does item_id = 497 show up in many different orders?
So, think about which ORDER BY you really want.
Meanwhile, these may help with performance:
orders_items: INDEX(order_id, item_id)
returns: INDEX(order_id, deleted_at)
shippings: INDEX(order_id, deleted_at)
laravel_reserved_2: INDEX(recurred_from_id, deleted_at)
Related
I have this two version of the same query. Both produce same results (164 rows). But the second one takes .5 sec while the 1st one takes 17 sec. Can someone explain what's going on here?
TABLE organizations : 11988 ROWS
TABLE transaction_metas : 58232 ROWS
TABLE contracts_history : 219469 ROWS
# TAKES 17 SEC
SELECT contracts_history.buyer_id as id, org.name, SUM(transactions_count) as transactions_count, GROUP_CONCAT(DISTINCT(tm.value)) as balancing_authorities
From `contracts_history`
INNER JOIN `organizations` as `org`
ON `org`.`id` = `contracts_history`.`buyer_id`
LEFT JOIN `transaction_metas` as `tm`
ON `tm`.`contract_token` = `contracts_history`.`token` and `tm`.`field` = '1'
WHERE `contracts_history`.`seller_id` = '850'
GROUP BY `contracts_history`.`buyer_id` ORDER BY `balancing_authorities` DESC
# TAKES .6 SEC
SELECT contracts_history.buyer_id as id, org.name, SUM(transactions_count) as transactions_count, GROUP_CONCAT(DISTINCT(tm.value)) as balancing_authorities
From `contracts_history`
INNER JOIN `organizations` as `org`
ON `org`.`id` = `contracts_history`.`buyer_id`
left join (select * from `transaction_metas` where contract_token in (select token from `contracts_history` where seller_id = 850)) as `tm`
ON `tm`.`contract_token` = `contracts_history`.`token` and `tm`.`field` = '1'
WHERE `contracts_history`.`seller_id` = '850'
GROUP BY `contracts_history`.`buyer_id` ORDER BY `balancing_authorities` DESC
Explain Results:
First Query: https://prnt.sc/hjtiw6
Second Query: https://prnt.sc/hjtjjg
As based on my debugging of the first query it was clear that left join to transaction_metas table was making it slow, So I tried to limit its rows instead of joining to the full table. It seems to work but I don't understand why.
Join is a set of combinations from rows in your tables. That in mind, in the first query the engine combines all the results to filter just after. In second case one it applies the filter before it tries make the combinations.
The best case would make use of filter in JOIN clause without subquery.
Much like this:
SELECT contracts_history.buyer_id as id, org.name, SUM(transactions_count) as transactions_count, GROUP_CONCAT(DISTINCT(tm.value)) as balancing_authorities
From `contracts_history`
INNER JOIN `organizations` as `org`
ON `org`.`id` = `contracts_history`.`buyer_id`
AND `contracts_history`.`seller_id` = '850'
LEFT JOIN `transaction_metas` as `tm`
ON `tm`.`contract_token` = `contracts_history`.`token`
AND `tm`.`field` = 1
GROUP BY `contracts_history`.`buyer_id` ORDER BY `balancing_authorities` DESC
Note: When you reduce the size of the join tables by filtering with subqueries, it may allow the rows fit into the buffer. Nice trick to small buffer limit.
A Better explication:
https://dev.mysql.com/doc/refman/5.5/en/explain-output.html
I have a performance issue with the query below on MYSQL. The below query has 5 tables involved. When I apply the order by and limit, the results are retrieved in 0.3 secs. But without the order by and limit, I was able to get the results in 0.01 secs. I am tired changing the query but that did not work. Could someone please help me with this query so I can get the results in desired time (<0.3 secs).
Below are the details.
m_todos = 286579 (records)
m_pat = 214858 (records)
users = 119 (records)
m_programs = 26 (records)
role = 4 (records)
SELECT *
FROM (
SELECT t.*,
mp.name as A_name,
u.first_name, u.last_name,
p.first, p.last, p.zone, p.language,p.handling,
r.name,
u2.first_name AS created_first_name,
u2.last_name AS created_last_name
FROM m_todos t
INNER JOIN role r ON t.role_id=r.id
INNER JOIN m_pat p ON t.patient_id = p.id
LEFT JOIN users u2 ON t.created_id=u2.id
LEFT JOIN m_programs mp ON t.prog_id=mp.id
LEFT JOIN users u ON t.user_id=u.id
WHERE t.role_id !='9'
AND t.completed = '0000-00-00 00:00:00'
) C
ORDER BY priority DESC, due ASC
LIMIT 0,10
Get rid of the outer SELECT; move the ORDER BY and LIMIT in.
Indexes:
t: (completed)
t: (priority, due)
I assume priority and due are in t?? Please be explicit in the query. It could make a huge difference.
If the following works, it should speed things up a lot: Start by finding the t.id without all the JOINs:
SELECT id
FROM m_todos
WHERE role_id !='9'
AND completed = '0000-00-00 00:00:00'
ORDER BY priority DESC, due DESC
LIMIT 10
That will benefit from this covering composite index:
INDEX(completed, role_id, priority, due, id)
Debug that. Then use it in the rest:
SELECT t.*, the-other-stuff
FROM ( that-query ) AS t1
JOIN m_todos AS t USING(id)
then-the-rest-of-the-JOINs
ORDER BY priority DESC, due ASC -- yes, again
If you don't need all of t.*, it may be beneficial to spell out the actual columns needed.
The reason for this to run much faster is that the 10 rows are found efficiently by looking only at the one table. The original code was shoveling around a lot more rows than 10 and they included all the columns of t, plus columns from the other tables.
My version does only 10 lookups for all the extra stuff.
When I check SHOW PROCESSLIST; in database I got below query. It heavily uses CPU (more than 100%), it took 80 seconds to complete the query. We have a separate server for database(64GB RAM).
INSERT INTO `search_tmp_598075de5c7e67_73335919`
SELECT `main_select`.`entity_id`, MAX(score) AS `relevance`
FROM (SELECT `search_index`.`entity_id`, (((0)) * 1) AS score
FROM `catalogsearch_fulltext_scope1` AS `search_index`
LEFT JOIN `catalog_eav_attribute` AS `cea`
ON search_index.attribute_id = cea.attribute_id
LEFT JOIN `catalog_category_product_index` AS `category_ids_index`
ON search_index.entity_id = category_ids_index.product_id
LEFT JOIN `review_entity_summary` AS `rating`
ON `rating`.`entity_pk_value`=`search_index`.entity_id
AND `rating`.entity_type = 1
AND `rating`.store_id = 1
WHERE (category_ids_index.category_id = 2299)
) AS `main_select`
GROUP BY `entity_id`
ORDER BY `relevance` DESC
LIMIT 10000
why does this query use my full CPU resources?
Some inefficiencies:
There is a non-null condition on the records of the outer joined catalog_category_product_index. This turns the outer join into an inner join. It will be more efficient to use an inner join clause.
There is no need to have a nested query: the grouping, ordering and limiting can be done directly on the inner query.
(((0)) * 1) is just a complex way of saying 0, and taking the MAX of that will obviously still return a relevance of 0 for all records. Not only is this an inefficient way to output 0, it also makes no sense. I assume your real query has some less evident calculation there, which might need optimisation.
If catalog_eav_attribute.attribute_id is a unique field, then there is no sense in outer joining that table, because that data is not used anywhere
If review_entity_summary.entity_pk_value is unique (at least when entity_type = 1 and store_id = 1), then again there is no use in outer joining that table, because that data is not used anywhere
If the fields in the above 2 bullet points are non-unique, but the number of records returned per search_index.entity_id value is not influencing the result (as it currently stands with the obscure (((0)) * 1) value, it does not), then neither outer join is needed either.
With these assumptions, the select part can be reduced to:
SELECT search_index.entity_id,
MAX(((0)) * 1) AS relevance
FROM catalogsearch_fulltext_scope1 AS search_index
INNER JOIN catalog_category_product_index AS category_ids_index
ON search_index.entity_id = category_ids_index.product_id
WHERE category_ids_index.category_id = 2299
GROUP BY search_index.entity_id
ORDER BY relevance DESC
LIMIT 10000
I still left the (((0)) * 1) in there, but it really makes no sense.
SELECT LM.user_id,LM.users_lineup_id, min( LM.total_score ) AS total_score
FROM vi_lineup_master LM JOIN
vi_contest AS C
ON C.contest_unique_id = LM.contest_unique_id join
(SELECT min( total_score ) as total_score
FROM vi_lineup_master
GROUP BY group_unique_id
) as preq
ON LM.total_score = preq.total_score
WHERE LM.contest_unique_id = 'iledhSBDO' AND
C.league_contest_type = 1
GROUP BY group_unique_id
Above query is to find the loser per group of game, query return accurate result but its not responding with large data. How can I optimize this?
You can try to move your JOINs to subqueries. Also, you should pay attention on your "wrong" GROUP BY usage on the outer query. In Mysql you can group by some columns and select others not specified in the group clause without any aggregation function, but the database can't ensure what data it will return to you. For the sake of consistency of your application, wrap them in an aggregation function.
Check if this one helps:
SELECT
MIN(LM.user_id) AS user_id,
MIN(LM.users_lineup_id) AS users_lineup_id,
MIN(LM.total_score) AS total_score
FROM vi_lineup_master LM
WHERE 1=1
-- check if this "contest_unique_id" is equals
-- to 'iledhSBDO' for a "league_contest_type" valued 1
AND LM.contest_unique_id IN
(
SELECT C.contest_unique_id
FROM vi_contest AS C
WHERE 1=1
AND C.contest_unique_id = 'iledhSBDO'
AND C.league_contest_type = 1
)
-- check if this "total_score" is one of the
-- "min(total_score)" from each "group_unique_id"
AND LM.total_score IN
(
SELECT MIN(total_score)
FROM vi_lineup_master
GROUP BY group_unique_id
)
GROUP BY LM.group_unique_id
;
Also, some pieces of this query may seem redundant, but it's because I did not want to change the filters you wrote, just moved them.
Also, your query logic seems a bit strange to me, based on the tables/columns names and how you wrote it... please, check the comments in my query which reflects what I understood of your implementation.
Hope it helps.
In the following query, I show the latest status of the sale (by stage, in this case the number 3). The query is based on a subquery in the status history of the sale:
SELECT v.id_sale,
IFNULL((
SELECT (CASE WHEN IFNULL( vec.description, '' ) = ''
THEN ve.name
ELSE vec.description
END)
FROM t_record veh
INNER JOIN t_state_campaign vec ON vec.id_state_campaign = veh.id_state_campaign
INNER JOIN t_state ve ON ve.id_state = vec.id_state
WHERE veh.id_sale = v.id_sale
AND vec.id_stage = 3
ORDER BY veh.id_record DESC
LIMIT 1
), 'x') sale_state_3
FROM t_sale v
INNER JOIN t_quarters sd ON v.id_quarters = sd.id_quarters
WHERE 1 =1
AND v.flag =1
AND v.id_quarters =4
AND EXISTS (
SELECT '1'
FROM t_record
WHERE id_sale = v.id_sale
LIMIT 1
)
the query delay 0.0057seg and show 1011 records.
Because I have to filter the sales by the name of the state as it would have to repeat the subquery in a where clause, I have decided to change the same query using joins. In this case, I'm using the MAX function to obtain the latest status:
SELECT
v.id_sale,
IFNULL(veh3.State3,'x') AS sale_state_3
FROM t_sale v
INNER JOIN t_quarters sd ON v.id_quarters = sd.id_quarters
LEFT JOIN (
SELECT veh.id_sale,
(CASE WHEN IFNULL(vec.description,'') = ''
THEN ve.name
ELSE vec.description END) AS State3
FROM t_record veh
INNER JOIN (
SELECT id_sale, MAX(id_record) AS max_rating
FROM(
SELECT veh.id_sale, id_record
FROM t_record veh
INNER JOIN t_state_campaign vec ON vec.id_state_campaign = veh.id_state_campaign AND vec.id_stage = 3
) m
GROUP BY id_sale
) x ON x.max_rating = veh.id_record
INNER JOIN t_state_campaign vec ON vec.id_state_campaign = veh.id_state_campaign
INNER JOIN t_state ve ON ve.id_state = vec.id_state
) veh3 ON veh3.id_sale = v.id_sale
WHERE v.flag = 1
AND v.id_quarters = 4
This query shows the same results (1011). But the problem is it takes 0.0753 sec
Reviewing the possibilities I have found the factor that makes the difference in the speed of the query:
AND EXISTS (
SELECT '1'
FROM t_record
WHERE id_sale = v.id_sale
LIMIT 1
)
If I remove this clause, both queries the same time delay... Why it works better? Is there any way to use this clause in the joins? I hope your help.
EDIT
I will show the results of EXPLAIN for each query respectively:
q1:
q2:
Interesting, so that little statement basically determines if there is a match between t_record.id_sale and t_sale.id_sale.
Why is this making your query run faster? Because Where statements applied prior to subSelects in the select statement, so if there is no record to go with the sale, then it doesn't bother processing the subSelect. Which is netting you some time. So that's why it works better.
Is it going to work in your join syntax? I don't really know without having your tables to test against but you can always just apply it to the end and find out. Add the keyword EXPLAIN to the beginning of your query and you will get a plan of execution which will help you optimize things. Probably the best way to get better results in your join syntax is to add some indexes to your tables.
But I ask you, is this even necessary? You have a query returning in <8 hundredths of a second. Unless this query is getting ran thousands of times an hour, this is not really taxing your DB at all and your time is probably better spent making improvements elsewhere in your application.