I have the following query with a LEFT JOIN between the first two subqueries e and a, and an INNER JOIN between the last two subqueries a and m:
SELECT {cols}
FROM
(SELECT {cols}
FROM {table}
WHERE {conditions}) AS e
LEFT JOIN
(SELECT {cols}
FROM {table}
WHERE {conditions}) AS a
ON e.col = a.col
INNER JOIN
(SELECT {cols}
FROM {table}
WHERE {conditions}) AS m
ON e.col = m.col
When I change the second join from INNER JOIN to LEFT JOIN, the execution time increase by a factor of ~200. The number of records from each subquery is as follows:
e -> Number of records: 303
a -> Number of records: 18
m -> Number of recordings: 295
I assumed MySQL would evaluate each subquery as an independent subquery and then do the joins, in which case, the change from INNER JOIN to LEFT JOIN should not lead to such an increase in the execution time given the relatively low number of records as shown above.
So, obviously it seems that's not the execution order being followed.
EXPLAIN PLAN:
Case 1 with INNER JOIN: join e with m first, then join with a.
Case 2 with LEFT JOIN: join e with a first, then join with m.
I'm not sure why the two plans are different in the two cases and how this might lead to a difference in the execution time.
Can anyone help explain to me what the actual execution order may be?
The join order is up to the query planner to optimize as it sees fit. Sometimes it gets it wrong. If you think the join order is suboptimal you can force it by specifying SELECT STRAIGHT_JOIN instead of SELECT. This will force the query planner to join tables in the order listed in the query.
Outer joins are always going to be slower because they have to scan more rows - they cannot discard rows that don't match in both sides.
Related
I've a query like below,
SELECT
c.testID,
FROM a
INNER JOIN b ON a.id=b.ID
INNER JOIN c ON b.r_ID=c.id
WHERE c.test IS NOT NULL;
Can this query be optimized further?, I want inner join between three tables to happen only if it meets the where clause.
Where clause works as filter on the data what appears after all JOINs,
whereas if you use same restriction to JOIN clause itself then it will be optimized in sense of avoiding filter after join. That is, join on filtered data instead.
SELECT c.testID,
FROM a
INNER JOIN b ON a.id = b.ID
INNER JOIN c ON b.r_ID = c.id AND c.test IS NOT NULL;
Moreover, you must create an index for the column test in table c to speed up the query.
Also, learn EXPLAIN command to the queries for best results.
Try the following:
SELECT
c.testID
FROM c
INNER JOIN b ON c.test IS NOT NULL AND b.r_ID=c.testID
INNER JOIN a ON a.id=b.r_ID;
I changed the order of the joins and conditions so that the first statement to be evaluated is c.test IS NOT NULL
Disclaimer: You should use the explain command in order to see the execution.
I'm pretty sure that even the minor change I just did might have no difference due to the MySql optimizer that work on all queries.
See the MySQL Documentation: Optimizing Queries with EXPLAIN
Three queries Compared
Have a look at the following fiddle:
https://www.db-fiddle.com/f/fXsT8oMzJ1H31FwMHrxR3u/0
I ran three different queries and in the end, MySQL optimized and ran them the same way.
Three Queries:
EXPLAIN SELECT
c.testID
FROM c
INNER JOIN b ON c.test IS NOT NULL AND b.r_ID=c.testID
INNER JOIN a ON a.id=b.r_ID;
EXPLAIN SELECT c.testID
FROM a
INNER JOIN b ON a.id = b.r_id
INNER JOIN c ON b.r_ID = c.testID AND c.test IS NOT NULL;
EXPLAIN SELECT
c.testID
FROM a
INNER JOIN b ON a.id=b.r_ID
INNER JOIN c ON b.r_ID=c.testID
WHERE c.test IS NOT NULL;
All tables should have a PRIMARY KEY. Assuming that id is the PRIMARY KEY for the tables that it is in, then you need these secondary keys for maximal performance:
c: INDEX(test, test_id, id) -- `test` must be first
b: INDEX(r_ID)
Both of those are useful and "covering".
Another thing to note: b and a is virtually unused in the query, so you may as well write the query this way:
SELECT c.testID,
FROM c
WHERE c.test IS NOT NULL;
At that point, all you need is INDEX(test, testID).
I suspect you "simplified" your query by leaving out some uses of a and b. Well, I simplified it from there, just as the Optimizer should have done. (However, elimination of tables is an optimization that it does not do; it figures that is something the user would have done.)
On the other hand, b and a are not totally useless. The JOIN verify that there are corresponding rows, possibly many such rows, in those tables. Again, I think you had some other purpose.
I have a query that looks like this:
select `adverts`.*
from `adverts`
inner join `advert_category` on `advert_category`.`advert_id` = `adverts`.`id`
inner join `advert_location` on `adverts`.`id` = `advert_location`.`advert_id`
where `advert_location`.`location_id` = ?
and `advert_category`.`category_id` = ?
order by `updated_at` desc
The problem here is I have a huge database and this response is absolutely ravaging my database.
What I really need is to do the first join, and then do there where clause. This will whittle down my response from like 100k queries to less than 10k, then I want to do the other join, in order to whittle down the responses again so I can get the advert_location on the category items.
Doing it as is just isn't viable.
So, how do I go about using a join and a where condition, and then after getting that response doing a further join with a where condition?
Thanks
This is your query, written a bit simpler so I can read it:
select a.*
from adverts a inner join
advert_category ac
on ac.advert_id = a.id inner join
advert_location al
on al.advert_id = a.id
where al.location_id = ? and
ac.category_id = ?
order by a.updated_at desc;
I am speculating that advert_category and advert_locations have multiple rows per advert. In that case, you are getting a Cartesian product for each advert.
A better way to write the query uses exists:
select a.*
from adverts a
where exists (select 1
from advert_location al
where al.advert_id = a.id and al.location_id = ?
) and
exists (select 1
from advert_category ac
where ac.advert_id = a.id and ac.category_id = ?
)
order by a.updated_at desc;
For this version, you want indexes on advert_location(advert_id, location_id), advert_category(advert_id, category_id), and probably advert(updated_at, id).
You can write the 1st join in a Derived Table including a WHERE-condition and then do the 2nd join (but a decent optimizer might resolve the Derived Table again and do what he thinks is best based on statistics):
select adverts.*
from
(
select `adverts`.*
from `adverts`
inner join `advert_category`
on `advert_category`.`advert_id` =`adverts`.`id`
where `advert_category`.`category_id` = ?
) as adverts
inner join `advert_location`
on `adverts`.`id` = `advert_location`.`advert_id`
where `advert_location`.`location_id` = ?
order by `updated_at` desc
MySQL will reorder inner joins for you during optimization, regardless of how you wrote them in your query. Inner join is the same in either direction (in algebra this is called commutative), so this is safe to do.
You can see the result of join reordering if you use EXPLAIN on your query.
If you don't like the order MySQL chose for your joins, you can override it with this kind of syntax:
from `adverts`
straight_join `advert_category` ...
https://dev.mysql.com/doc/refman/5.7/en/join.html says:
STRAIGHT_JOIN is similar to JOIN, except that the left table is always read before the right table. This can be used for those (few) cases for which the join optimizer processes the tables in a suboptimal order.
Once the optimizer has decided on the join order, it always does one join at a time, in that order. This is called the nested join method.
There isn't really any way to "do the join then do the where clause". Conditions are combined together when looking up rows for joined tables. But this is a good thing, because you can then create a compound index that helps match rows based on both join conditions and where conditions.
PS: When asking query optimization question, you should include the EXPLAIN output, and also run SHOW CREATE TABLE <tablename> for each table, and include the result. Then we don't have to guess at the columns and indexes in your table.
I have two MySQL Queries that i am trying to use to balance the db and find issues.
Query 1 for advance totals 286940.99 and query 2 totals 288645.3 which is 1,704.31 different (the advance unallocated is also wrong).
The issue I have is they are both summing the same column the only diffeence been that query 2 has a left join to join another table. The left Join should not affect the primary table at all.
Here are my queries
Query 1
SELECT sum(advance) as advance, sum(advance_unallocated) as advance_unallocated FROM `deals`
Query 2
SELECT
pb.id as pbid,
d.id as did,
sum(d.advance) as advance,
sum(d.advance_unallocated) as advance_unallocated,
sum(pb.payments) as pb_payment,
sum(pb.payments) - sum(d.advance) - sum(d.advance_unallocated) as diff,
pb.payment_method as p_payment_method
FROM `deals` d
LEFT OUTER JOIN `payment_balance` pb on d.id = pb.link_id and pb.table_name = 'deals'
This happens because LEFT JOIN returns not the [0, 1] number of rows as you originally thought but all the rows that match the join conditions, or a row filled NULLs otherwise.
In your particular case there are 1 or more rows that match join condition "unexpectedly".
I have two tables:
Shop_Products
Shop_Products_Egenskaber_Overruling
I want to select all records in Shop_Products_Egenskaber_Overruling which has a related record in
Shop_Products. This Means a record with an equal ProductNum.
This Works for me with the statement below, but I don't think a CROSS JOIN is the best approach for large record sets. When using the statement in web controls, it becomes pretty slow, even with only 1000 records. Is there a better way to accomplish this?
SELECT Shop_Products.*, Shop_Products_Egenskaber_Overruling.*
FROM Shop_Products CROSS JOIN
Shop_Products_Egenskaber_Overruling
WHERE Shop_Products.ProductNum = Shop_Products_Egenskaber_Overruling.ProductNum
Any optimizing suggestions?
Best regards.
You can do it that way but not sure it will ensure an optimization
SELECT Shop_Products.*, Shop_Products_Egenskaber_Overruling.*
FROM Shop_Products
INNER JOIN Shop_Products_Egenskaber_Overruling on Shop_Products.ProductNum = Shop_Products_Egenskaber_Overruling.ProductNum
You are actually looking for an INNER JOIN.
SELECT
SO.*,
SPEO.*
FROM SHOP_PRODUCTS SP
INNER JOIN Shop_Products_Egenskaber_Overruling SPEO
ON SP.ProductNum = SPEO.ProductNum
This will have improved performance over your CROSS-JOIN, because the condition to look for records with equal ProductNum is implicit in the JOIN condition and the WHERE clause is eliminated.
WHERE clauses always execute AFTER a JOIN. In your case, all possible combinations are created by the CROSS JOIN and then filtered by the conditions in the WHERE clause.
By using an INNER JOIN you are doing the filtering in the first step.
Cross join is slower, because it produce all combinations, which filtred after by where predicate. So you can use INNER JOIN for better performance. But I think It would be useful if you check execution plan of this query anyway, because in Oracle there is no difference between where and inner join solutions Inner join vs Where
Try using INNER JOIN
SELECT Produkter.*, Egenskaber.*
FROM Shop_Products Produkter
INNER JOIN Shop_Products_Egenskaber_Overruling Egenskaber ON Produkter.ProductNum=Egenskaber.ProductNum
Jag namngav aven dem pa Norska..
I'm wondering if a 'normal' inner join leads to higher execution performance in MySQL queries than a simplistic query where you list all tables and then join them with 'and t1.t2id = t2.id' and so on ..
The execution plan and runtime is the same.
One is called ANSI style (INNER JOIN, LEFT, RIGHT) the other is called Theta style.
These two queries are equivalent in every way to mysql server
SELECT * FROM A INNER JOIN B ON A.ID = B.ID;
SELECT * FROM A, B WHERE A.ID = B.ID;
You can test this by typing EXPLAIN in front of both queries and the result returned should be the same.