MySQL join performance vs correlated queries - mysql

I'm wondering if a 'normal' inner join leads to higher execution performance in MySQL queries than a simplistic query where you list all tables and then join them with 'and t1.t2id = t2.id' and so on ..

The execution plan and runtime is the same.
One is called ANSI style (INNER JOIN, LEFT, RIGHT) the other is called Theta style.
These two queries are equivalent in every way to mysql server
SELECT * FROM A INNER JOIN B ON A.ID = B.ID;
SELECT * FROM A, B WHERE A.ID = B.ID;
You can test this by typing EXPLAIN in front of both queries and the result returned should be the same.

Related

Order of execution in MySQL query with multiple JOINS of subqueries

I have the following query with a LEFT JOIN between the first two subqueries e and a, and an INNER JOIN between the last two subqueries a and m:
SELECT {cols}
FROM
(SELECT {cols}
FROM {table}
WHERE {conditions}) AS e
LEFT JOIN
(SELECT {cols}
FROM {table}
WHERE {conditions}) AS a
ON e.col = a.col
INNER JOIN
(SELECT {cols}
FROM {table}
WHERE {conditions}) AS m
ON e.col = m.col
When I change the second join from INNER JOIN to LEFT JOIN, the execution time increase by a factor of ~200. The number of records from each subquery is as follows:
e -> Number of records: 303
a -> Number of records: 18
m -> Number of recordings: 295
I assumed MySQL would evaluate each subquery as an independent subquery and then do the joins, in which case, the change from INNER JOIN to LEFT JOIN should not lead to such an increase in the execution time given the relatively low number of records as shown above.
So, obviously it seems that's not the execution order being followed.
EXPLAIN PLAN:
Case 1 with INNER JOIN: join e with m first, then join with a.
Case 2 with LEFT JOIN: join e with a first, then join with m.
I'm not sure why the two plans are different in the two cases and how this might lead to a difference in the execution time.
Can anyone help explain to me what the actual execution order may be?
The join order is up to the query planner to optimize as it sees fit. Sometimes it gets it wrong. If you think the join order is suboptimal you can force it by specifying SELECT STRAIGHT_JOIN instead of SELECT. This will force the query planner to join tables in the order listed in the query.
Outer joins are always going to be slower because they have to scan more rows - they cannot discard rows that don't match in both sides.

SQL inner join multiple tables with one query

I've a query like below,
SELECT
c.testID,
FROM a
INNER JOIN b ON a.id=b.ID
INNER JOIN c ON b.r_ID=c.id
WHERE c.test IS NOT NULL;
Can this query be optimized further?, I want inner join between three tables to happen only if it meets the where clause.
Where clause works as filter on the data what appears after all JOINs,
whereas if you use same restriction to JOIN clause itself then it will be optimized in sense of avoiding filter after join. That is, join on filtered data instead.
SELECT c.testID,
FROM a
INNER JOIN b ON a.id = b.ID
INNER JOIN c ON b.r_ID = c.id AND c.test IS NOT NULL;
Moreover, you must create an index for the column test in table c to speed up the query.
Also, learn EXPLAIN command to the queries for best results.
Try the following:
SELECT
c.testID
FROM c
INNER JOIN b ON c.test IS NOT NULL AND b.r_ID=c.testID
INNER JOIN a ON a.id=b.r_ID;
I changed the order of the joins and conditions so that the first statement to be evaluated is c.test IS NOT NULL
Disclaimer: You should use the explain command in order to see the execution.
I'm pretty sure that even the minor change I just did might have no difference due to the MySql optimizer that work on all queries.
See the MySQL Documentation: Optimizing Queries with EXPLAIN
Three queries Compared
Have a look at the following fiddle:
https://www.db-fiddle.com/f/fXsT8oMzJ1H31FwMHrxR3u/0
I ran three different queries and in the end, MySQL optimized and ran them the same way.
Three Queries:
EXPLAIN SELECT
c.testID
FROM c
INNER JOIN b ON c.test IS NOT NULL AND b.r_ID=c.testID
INNER JOIN a ON a.id=b.r_ID;
EXPLAIN SELECT c.testID
FROM a
INNER JOIN b ON a.id = b.r_id
INNER JOIN c ON b.r_ID = c.testID AND c.test IS NOT NULL;
EXPLAIN SELECT
c.testID
FROM a
INNER JOIN b ON a.id=b.r_ID
INNER JOIN c ON b.r_ID=c.testID
WHERE c.test IS NOT NULL;
All tables should have a PRIMARY KEY. Assuming that id is the PRIMARY KEY for the tables that it is in, then you need these secondary keys for maximal performance:
c: INDEX(test, test_id, id) -- `test` must be first
b: INDEX(r_ID)
Both of those are useful and "covering".
Another thing to note: b and a is virtually unused in the query, so you may as well write the query this way:
SELECT c.testID,
FROM c
WHERE c.test IS NOT NULL;
At that point, all you need is INDEX(test, testID).
I suspect you "simplified" your query by leaving out some uses of a and b. Well, I simplified it from there, just as the Optimizer should have done. (However, elimination of tables is an optimization that it does not do; it figures that is something the user would have done.)
On the other hand, b and a are not totally useless. The JOIN verify that there are corresponding rows, possibly many such rows, in those tables. Again, I think you had some other purpose.

MySQL, functional difference between ON and WHERE in specific statement

I am studying select queries for MySQL join functions.
I am slightly confused on the below query. I understand the below statement to join attributes from multiple tables with the ON clause, and then filter the results set with the WHERE clause.
Is this correct? What other functionality does this provide? Are there better alternatives?
The tables, attributes, and schema are not relevant to this question, specifically just the ON and WHERE interaction. Thanks in advance for any insight you can provide, appreciated.
SELECT DISTINCT Movies.title
FROM Rentals
INNER JOIN Customers
INNER JOIN Copies
INNER JOIN Movies ON Rentals.customerNum=Customers.customerNum
AND Rentals.barcode=Copies.barcode
AND Copies.movieNum=Movies.movieNum
WHERE Customers.givenName='Chad'
AND Customers.familyName='Black';
INNER JOIN (and the outer joins) are binary operators that should be followed by an ON clause. Your particular syntax works in MySQL, but will not work in any other database (because it is missing two ON clauses).
I would recommend writing the query as:
SELECT DISTINCT m.title
FROM Movies m JOIN
Copies co
ON co.movieNum = m.movieNum JOIN
Rentals r
ON r.barcode = co.barcode JOIN
Customers c
ON c.customerNum = r.customerNum
WHERE c.givenName = 'Chad' AND
c.familyName = 'Black';
You should always put the JOIN conditions in the ON clause, with one ON per JOIN. This also introduces table aliases, which make the query easier to write and to read.
The WHERE clause has additional filtering conditions. These could also be in ON clauses, but I think the query reads better with them in the WHERE clause. You can glance at the query and see: "We are getting something from a bunch of tables for Chad Black".
Ordinary inner JOIN operations only generate result rows for table rows matching their ON condition. They suppress any rows that don't match. That means you can move the contents of ON clauses to your WHERE clause and get the same result set. Still, don't do that; JOINs are easier to understand when they have ON clauses.
If you use LEFT JOIN, a kind of outer join, you get rows from the first table you mention that don't match any rows in the second table according to the ON clause.
SELECT a.name, b.name
FROM a
LEFT JOIN b ON a.a_id = b.a_id
gives you are result set containing all rows of a with NULL values in b.name indicating that the ON condition did not match.

mysql joining efficiency - join with where then join with something else

I have a query that looks like this:
select `adverts`.*
from `adverts`
inner join `advert_category` on `advert_category`.`advert_id` = `adverts`.`id`
inner join `advert_location` on `adverts`.`id` = `advert_location`.`advert_id`
where `advert_location`.`location_id` = ?
and `advert_category`.`category_id` = ?
order by `updated_at` desc
The problem here is I have a huge database and this response is absolutely ravaging my database.
What I really need is to do the first join, and then do there where clause. This will whittle down my response from like 100k queries to less than 10k, then I want to do the other join, in order to whittle down the responses again so I can get the advert_location on the category items.
Doing it as is just isn't viable.
So, how do I go about using a join and a where condition, and then after getting that response doing a further join with a where condition?
Thanks
This is your query, written a bit simpler so I can read it:
select a.*
from adverts a inner join
advert_category ac
on ac.advert_id = a.id inner join
advert_location al
on al.advert_id = a.id
where al.location_id = ? and
ac.category_id = ?
order by a.updated_at desc;
I am speculating that advert_category and advert_locations have multiple rows per advert. In that case, you are getting a Cartesian product for each advert.
A better way to write the query uses exists:
select a.*
from adverts a
where exists (select 1
from advert_location al
where al.advert_id = a.id and al.location_id = ?
) and
exists (select 1
from advert_category ac
where ac.advert_id = a.id and ac.category_id = ?
)
order by a.updated_at desc;
For this version, you want indexes on advert_location(advert_id, location_id), advert_category(advert_id, category_id), and probably advert(updated_at, id).
You can write the 1st join in a Derived Table including a WHERE-condition and then do the 2nd join (but a decent optimizer might resolve the Derived Table again and do what he thinks is best based on statistics):
select adverts.*
from
(
select `adverts`.*
from `adverts`
inner join `advert_category`
on `advert_category`.`advert_id` =`adverts`.`id`
where `advert_category`.`category_id` = ?
) as adverts
inner join `advert_location`
on `adverts`.`id` = `advert_location`.`advert_id`
where `advert_location`.`location_id` = ?
order by `updated_at` desc
MySQL will reorder inner joins for you during optimization, regardless of how you wrote them in your query. Inner join is the same in either direction (in algebra this is called commutative), so this is safe to do.
You can see the result of join reordering if you use EXPLAIN on your query.
If you don't like the order MySQL chose for your joins, you can override it with this kind of syntax:
from `adverts`
straight_join `advert_category` ...
https://dev.mysql.com/doc/refman/5.7/en/join.html says:
STRAIGHT_JOIN is similar to JOIN, except that the left table is always read before the right table. This can be used for those (few) cases for which the join optimizer processes the tables in a suboptimal order.
Once the optimizer has decided on the join order, it always does one join at a time, in that order. This is called the nested join method.
There isn't really any way to "do the join then do the where clause". Conditions are combined together when looking up rows for joined tables. But this is a good thing, because you can then create a compound index that helps match rows based on both join conditions and where conditions.
PS: When asking query optimization question, you should include the EXPLAIN output, and also run SHOW CREATE TABLE <tablename> for each table, and include the result. Then we don't have to guess at the columns and indexes in your table.

Inner join works Too Slower - just show loading label in phpmyadmin

I have 3 tables each have almost 70,000 data
when i execute select query in which i add one inner join than it works faster.
Following works faster
select A.id from product as A
inner join product_cat as B on A.id=B.mapped_id
OR
select A.id from product as A
inner join product_sup as C on A.id = C.mapped_id
(It works faster for one inner join)
but when i add both inner join in same select query than it works too slower(Does not display data it just show loading label in phpmyadmin)
select A.id
from product as A
inner join product_cat as B
on A.id = B.mapped_id
inner join product_sup as C
on A.id = C.mapped_id
my purpose it only to find out how much record is there in database.
also tried with count function though takes too much time.
Any help will be appreciated,
Thanks,
Use EXPLAIN to analyze performance of the query and identify any missing indexes.
http://dev.mysql.com/doc/refman/5.5/en/using-explain.html
http://dev.mysql.com/doc/refman/5.5/en/explain-output.html
Add your actual query, execution plan printed by EXPLAIN and CREATE TABLE statements of all involved tables to your question if you want to get a specific advice.
Maybe your query is trying to return more than 1 million of result and that is the reason fo the slowness. Maybe you are doing a Cartesian product between tables B and C.
Let's put an example.
Table (A) = (id=1)
Table (B) = (id=1,mapped_id=1),(id=2,mapped_id=1),
(id=3,mapped_id=1),(id=4,mapped_id=1)
Table (C) = (id=1,mapped_id=1),(id=2,mapped_id=1), (id=3,mapped_id=1)
If we do that query with those data it would return 12 rows, all of them with A.id=1
To solve the problem you could try to use a DISTINCT on the SELECT clause or to do a group with the GROUP BYclase, but I think the better solution is to redesing the query depending on your goals.
If you want to use the group by your query will be something like this
select A.id from product as A
inner join product_cat as B on A.id = B.mapped_id
inner join product_sup as C on A.id = C.mapped_id
group by A.id