I'm having some troubles (both technical and conceptual) optimizating some very slow queries.
This is my original query:
select Con_Progr,
Pre_Progr
from contribuente,
preavviso_ru,
comune,
via
where [lots of where clauses]
and ((Via_Progr = Con_Via_Sec) or (Via_Progr = Con_Via_Res and (Con_Via_Sec is null or Con_Via_Sec = '0')))
order by Con_Cognome,
Con_Nome asc;
This query takes about 38secs to execute, which is a really slow time. I manipulated it a bit and managed to speed up to about 0.1sec and now the query looks like this:
(select Con_Progr,
Pre_Progr
from preavviso_ru
join contribuente
on Pre_Contribuente = Con_Progr
join via
on Via_Progr = Con_Via_Sec
join comune
on Via_Comune = Com_CC
where [lots of where clauses]
order by Con_Cognome,
Con_Nome asc
)
union
(
select Con_Progr,
Pre_Progr
from preavviso_ru
join contribuente
on Pre_Contribuente = Con_Progr
join via
on Via_Progr = Con_Via_Res
join comune
on Via_Comune = Com_CC
where [lots of where clauses]
and (Con_Via_Sec is null or Con_Via_Sec = '0')
order by Con_Cognome,
Con_Nome asc
)
As you can see I split up the where clause in the original query that used an OR operator in two different subqueries and then merged them. That resolved the speed problem. The result though is not perfect, 'cause I've lost the ordering. I tried to select the columns in the subqueries and then perform the ordering on that result, like this:
select Con_Progr,
Pre_Progr
from (
[FIRST SUBQUERY]
) as T1 union (
[SECOND SUBQUERY]
) as T2
order by Con_Cognome,
Con_Nome asc
but I get a syntax error near 'union'. Any suggestion?
This was the technical issue. Now for the conceptual, I reckon that the two subqueries are VERY similar (they only differ by a join clause and a where clause), is there a way to rearrange the second query (the fast one) in a more elegant way?
I resolved the techincal issue, I misplaced the parenthesis:
select Con_Progr,
Pre_Progr
from (
[FIRST SUBQUERY]
union
[SECOND SUBQUERY]
) as T
order by Con_Cognome,
Con_Nome asc
Now it works almost perfectly (there's still a bit of discrepancy in the ordering but it's not a problem.
About the efficiency of the query, I found that the problem is the OR condition on two different columns, since MySQL 4.0 (which I'm using for backwards compatibility issues) allows only one index for table. But still, why doesn't it use at least one of those indexes. I'll test a bit more...
Related
I am struggling with a mysql problem.
I have two exact same queries, just the item_id at the end is different, but they return different execution plans when I execute them with analyze/explain.
This results in a huge difference of time needed to return a result.
The query is something like
explain select `orders`.*,
(select count(*) from `items`
inner join `orders_items` on `items`.`id` = `orders_items`.`item_id`
where `orders`.`id` = `orders_items`.`order_id`
and `items`.`deleted_at` is null) as `items_count`,
(select count(*) from `returns`
where `orders`.`id` = `returns`.`order_id`
and `returns`.`deleted_at` is null) as `returns_count`,
(select count(*) from `shippings`
where `orders`.`id` = `shippings`.`order_id`
and `shippings`.`deleted_at` is null) as `shippings_count`,
(select count(*) from `orders` as `laravel_reserved_2`
where `orders`.`id` = `laravel_reserved_2`.`recurred_from_id`
and `laravel_reserved_2`.`deleted_at` is null) as `recurred_orders_count`,
(select COALESCE(SUM(orders_items.amount), 0) from `items`
inner join `orders_items` on `items`.`id` = `orders_items`.`item_id`
where `orders`.`id` = `orders_items`.`order_id`
and `items`.`deleted_at` is null) as `items_sum_orders_itemsamount`,
`orders`.*,
`orders_items`.`item_id` as `pivot_item_id`,
`orders_items`.`order_id` as `pivot_order_id`,
`orders_items`.`amount` as `pivot_amount`,
`orders_items`.`id` as `pivot_id`
from `orders`
inner join `orders_items` on `orders`.`id` = `orders_items`.`order_id`
where `orders_items`.`item_id` = 497
and `import_finished` = 1
and `orders`.`deleted_at` is null
order by `id` desc limit 50 offset 0;
As you can see it is a laravel/eloquent query.
This is the execution plan for the query above:
But when I change the item_id at the end it return the following execution plan
It is absolutely random. 30% of the item_id's return the faster one and 70% return the slower one and I have no idea why. The related data is almost the same for every item we have in our database.
I also flushed the query cache to see if this was causing the different exec plans but no success.
I am googlin' since 4 hours but I can't find anything about this exact problem.
Can someone of you guys tell me why this hapens?
Thanks in advance!!
Edit 01/21/2023 - 19:04:
Seems like mysql don't like to order by columns which are not defined in the related where clause, in this case the pivot table orders_items.
I just replaced the
order by id
with
order by orders_items.order_id
This results in a 10 times faster query.
A query using different execution plans just because of a different parameter can have several reasons. The simplest explanation would be the position of the used item_id in the relevant index. The position in the index may affect the cost of using the index which in turn may affect if it is used at all. (this is just an example)
It is important to note that the explain statement will give you the planned execution plan but maybe not the actually used one.
EXPLAIN ANALYZE is the command which will output the actually used execution plan for you. It may still yield different results for different parameters.
ON `orders`.`id` = `orders_items`.`order_id`
where `orders_items`.`item_id` = 497
and ???.`import_finished` = 1
and `orders`.`deleted_at` is null
order by ???.`id` desc
limit 50 offset 0;
One order maps to many order items, correct?
So, ORDER BY orders.id is different than ORDER BY orders_items.item_id.
Does item_id = 497 show up in many different orders?
So, think about which ORDER BY you really want.
Meanwhile, these may help with performance:
orders_items: INDEX(order_id, item_id)
returns: INDEX(order_id, deleted_at)
shippings: INDEX(order_id, deleted_at)
laravel_reserved_2: INDEX(recurred_from_id, deleted_at)
SELECT LM.user_id,LM.users_lineup_id, min( LM.total_score ) AS total_score
FROM vi_lineup_master LM JOIN
vi_contest AS C
ON C.contest_unique_id = LM.contest_unique_id join
(SELECT min( total_score ) as total_score
FROM vi_lineup_master
GROUP BY group_unique_id
) as preq
ON LM.total_score = preq.total_score
WHERE LM.contest_unique_id = 'iledhSBDO' AND
C.league_contest_type = 1
GROUP BY group_unique_id
Above query is to find the loser per group of game, query return accurate result but its not responding with large data. How can I optimize this?
You can try to move your JOINs to subqueries. Also, you should pay attention on your "wrong" GROUP BY usage on the outer query. In Mysql you can group by some columns and select others not specified in the group clause without any aggregation function, but the database can't ensure what data it will return to you. For the sake of consistency of your application, wrap them in an aggregation function.
Check if this one helps:
SELECT
MIN(LM.user_id) AS user_id,
MIN(LM.users_lineup_id) AS users_lineup_id,
MIN(LM.total_score) AS total_score
FROM vi_lineup_master LM
WHERE 1=1
-- check if this "contest_unique_id" is equals
-- to 'iledhSBDO' for a "league_contest_type" valued 1
AND LM.contest_unique_id IN
(
SELECT C.contest_unique_id
FROM vi_contest AS C
WHERE 1=1
AND C.contest_unique_id = 'iledhSBDO'
AND C.league_contest_type = 1
)
-- check if this "total_score" is one of the
-- "min(total_score)" from each "group_unique_id"
AND LM.total_score IN
(
SELECT MIN(total_score)
FROM vi_lineup_master
GROUP BY group_unique_id
)
GROUP BY LM.group_unique_id
;
Also, some pieces of this query may seem redundant, but it's because I did not want to change the filters you wrote, just moved them.
Also, your query logic seems a bit strange to me, based on the tables/columns names and how you wrote it... please, check the comments in my query which reflects what I understood of your implementation.
Hope it helps.
I am trying to compare values against same table which has more than 1,000,000 rows. Below is my query and it takes around 25 secs to get results.
EXPLAIN SELECT DISTINCT a.studyid,a.number,a.load_number,b.studyid,b.number,b.load_number FROM
(SELECT t1.*, buildnumber,platformid FROM t t1
INNER JOIN testlog t2 ON t1.`testid` = t2.`testid`
WHERE (buildnumber =1031719 AND platformid IN (SELECT platformid FROM platform WHERE platform.`Description` = "Windows 7 SP1"))
)AS a
JOIN
(SELECT t1.*,buildnumber,platformid FROM t t1
INNER JOIN testlog t2 ON t1.`testid` = t2.`testid`
WHERE (buildnumber =1030716 AND platformid IN (SELECT platformid FROM platform WHERE platform.`Description` = "Windows 7 SP1"))
)AS b
ON a.studyid=b.studyid AND a.load_number = b.load_number AND a.number = b.number
Could you anyone help me to improve this query to get fast enough results?
The problem here is even I have number and load_number index, the query doesn't use that. I dont know why it is always ignored..
Thanks.
First, you have a silly query. You are retrieving six columns, but there are only three values. Look at the on clause.
I think your best bet is to rewrite the query using conditional aggregation. I think the following is equivalent:
SELECT t1.studyid, t1.load_number, t1.number
FROM t t1 INNER JOIN
testlog t2
ON t1.testid = t2.testid
WHERE t2.buildnumber IN (1031719, 1030716) AND
platformid IN (SELECT platformid FROM platform p WHERE p.Description = 'Windows 7 SP1'))
GROUP BY studyid, load_number, number
HAVING MIN(buildnumber) <> MAX(buildnumber)
For this query, you want indexes on platform(Description, platformid) and testlog(buildnumber, platformid) and t(testid).
Problem #1:
IN ( SELECT ... ) optimizes very poorly. The subquery is rerun again and again. It looks like you are expecting exactly one id from that query; if so, change it to = ( SELECT ... ). That way it will be run exactly once.
Problem #2:
FROM ( SELECT ... )
JOIN ( SELECT ... ) ON ...
optimizes poorly because neither subquery. Can you merge the two subqueries into one, as Gordon was trying? If not, then put one of them into a TEMPORARY TABLE and add an appropriate index to that table so that the ON will be able to use it. Probably PRIMARY KEY(studyid, load_number, number).
Footnote: The latest versions of MySQL have made improvements on these problems by dynamically generating indexes. What version are you using?
I have couple tables joined in MySQL - one has many others.
And try to select items from one, ordered by min values from another table.
Without grouping in seems to be like this:
Code:
select `catalog_products`.id
, `catalog_products`.alias
, `tmpKits`.`minPrice`
from `catalog_products`
left join `product_kits` on `product_kits`.`product_id` = `catalog_products`.`id`
left join (
SELECT MIN(new_price) AS minPrice, id FROM product_kits GROUP BY id
) AS tmpKits on `tmpKits`.`id` = `product_kits`.`id`
where `category_id` in ('62')
order by product_kits.new_price ASC
Result:
But when I add group by, I get this:
Code:
select `catalog_products`.id
, `catalog_products`.alias
, `tmpKits`.`minPrice`
from `catalog_products`
left join `product_kits` on `product_kits`.`product_id` = `catalog_products`.`id`
left join (
SELECT MIN(new_price) AS minPrice, id FROM product_kits GROUP BY id
) AS tmpKits on `tmpKits`.`id` = `product_kits`.`id`
where `category_id` in ('62')
group by `catalog_products`.`id`
order by product_kits.new_price ASC
Result:
And this is incorrect sorting!
Somehow when I group this results, I get id 280 before 281!
But I need to get:
281|1600.00
280|2340.00
So, grouping breaks existing ordering!
For one, when you apply the GROUP BY to only one column, there is no guarantee that the values in the other columns will be consistently correct. Unfortunately, MySQL allows this type of SELECT/GROUPing to happen other products don't. Two, the syntax of using an ORDER BY in a subquery while allowed in MySQL is not allowed in other database products including SQL Server. You should use a solution that will return the proper result each time it is executed.
So the query will be:
For one, when you apply the GROUP BY to only one column, there is no guarantee that the values in the other columns will be consistently correct. Unfortunately, MySQL allows this type of SELECT/GROUPing to happen other products don't. Two, the syntax of using an ORDER BY in a subquery while allowed in MySQL is not allowed in other database products including SQL Server. You should use a solution that will return the proper result each time it is executed.
So the query will be:
select CP.`id`, CP.`alias`, TK.`minPrice`
from catalog_products CP
left join `product_kits` PK on PK.`product_id` = CP.`id`
left join (
SELECT MIN(`new_price`) AS "minPrice", `id` FROM product_kits GROUP BY `id`
) AS TK on TK.`id` = PK.`id`
where CP.`category_id` IN ('62')
order by PK.`new_price` ASC
group by CP.`id`
The thing is that group by does not recognize order by in MySQL.
Actually, what I was doing is really bad practice.
In this case you should use distinct and by catalog_products.*
In my opinion, group by is really useful when you need group result of agregated functions.
Otherwise you should not use it to get unique values.
I joined a few tables into a view for easier coding. Now when I join a few tables with that view, I get poor performance. While I'm getting more rows in view, the speed of those queries is dramatically decreased. I wrote a lot of code using this view so I don't like the solution to rewrite all of those queries :). Is there any elegant solution to speed up this view when I join it with other tables?
This is one of my queries where tickets_parsed is a view:
SELECT detailValue, SUM(total_tickets) AS total_tickets, SUM(money_in) AS money_in, SUM(handling_charges) AS handling_charges
FROM (
SELECT bsid, COUNT(*) AS total_tickets, SUM(amount_total) AS money_in, SUM(handling_charges) AS handling_charges
FROM `bingo`.tickets_parsed
WHERE tickettime BETWEEN '$date' AND '$date2a'
AND ticketstatus <> 'CLOSED'
GROUP BY bsid
ORDER BY NULL
) AS sub
NATURAL JOIN betshop_details
WHERE detailID = 5
GROUP BY detailValue
ORDER BY NULL
You query uses the view in a sub query, which likely prevents it using indexes. But I am not sure that the sub query is necessary.
You could possibly use something like this:-
SELECT a.detailValue, COUNT(*) AS total_tickets, SUM(b.amount_total) AS money_in, SUM(b.handling_charges) AS handling_charges
FROM bingo.tickets_parsed a
INNER JOIN betshop_details b
ON a.bsid = b.bsid
WHERE a.detailID = 5
AND b.tickettime BETWEEN '$date' AND '$date2a'
AND b.ticketstatus <> 'CLOSED'
GROUP BY a.detailValue
ORDER BY NULL