Below two queries result the same result set. In first I have only used INNER JOIN and in second query mix of joins like LEFT and RIGHT JOIN. I personally prefer INNER JOIN when there is no specific task/requirement of other joins. But I just want to know that is there any difference between the below two queries in terms of performance or execution time. Is it ok to use inner join than the mix of joins?
1.
SELECT film.title, category.name, film.rating, language.name
FROM film INNER JOIN film_category ON film_category.film_id = film.film_id
INNER JOIN category ON category.category_id = film_category.category_id
INNER JOIN language ON language.language_id = film.language_id
WHERE category.name = "Sci-Fi" AND film.rating = "NC-17";
SELECT film.title, film.release_year, film.rating,category.name, language.name
FROM film LEFT JOIN language ON language.language_id=film.language_id
RIGHT JOIN film_category ON film_category.film_id = film.film_id
LEFT JOIN category ON category.category_id=film_category.category_id
WHERE film.rating="NC-17" AND category.name="Sci-Fi";
Please see this INNER JOIN vs LEFT JOIN performance in SQL Server.
However, choosing the proper join type is depending on the usecase and result set which you need to extract.
Please do not mix the different types except in this way: INNER, INNER, ... LEFT, LEFT, ... Any other combination has ambiguities about what gets done first. If you must mix them up, use parentheses to indicate which JOIN must be done before the others.
As for whether INNER/LEFT/RIGHT are identical, let me explain with one example:
SELECT ...
FROM a
LEFT JOIN b ON ... -- really INNER
WHERE b.x = 17
That WHERE effectively turns the LEFT JOIN into INNER JOIN. The Optimizer will do such. I, as a human, will stumble over the query until I realize that. So, humor me by calling it INNER JOIN.
Phrased another way, use LEFT only when the "right" table's columns are optional, but you want NULLs when they are missing. Of course, you may want the NULLs so you can say "find rows of a that are not in b:
SELECT ...
FROM a
LEFT JOIN b ON ...
WHERE b.id IS NULL -- common use for LEFT
While I have your attention, here are some notes/rules:
The keywords INNER, CROSS, and OUTER are ignored by MySQL. The ON and WHERE clauses will determine which type of JOIN is really intended.
Have you ever seen an owl turn its head nearly all the way around? That's what happens to my head when I see a RIGHT JOIN. Please convert it to a LEFT JOIN.
Though the Optimizer does not require this distinction, please use ON to specify how the tables are related and use WHERE for filtering. (With INNER, they are equivalent; with LEFT, you may get different results.)
Sometimes EXISTS( SELECT ... ) is better than a LEFT JOIN.
Optimizations vary depending on the existence of GROUP BY, ORDER BY, and LIMIT. But that is a loooong discussion.
Back to your question of which is faster, etc. Well, if the Optimizer is going to turn one into another, then those two have identical performance.
Related
My question is that when I have a SELECT with more than one JOIN. Where am I supposed to put the ON clause?
For example:
when it's an inner join after an inner join
when it's an inner join after a left join
when it's a left join after an inner join
In the first example, I've seen people put the ON clause right after each joins.
In the second example, I've seen people put all the ON clause after the last JOIN. So right now I'm a little bit confused on where to put it and does it give me the same answer even if it is put in different places.
You should interleave the on clauses, regardless of the type of join. So:
from a join
b
on . . . left join
c
on . . .
And so on as you add more tables.
MySQL makes the on clause optional, which confuses things. However, standard SQL does allow:
from a join
b join
c
on b.? = c.?
on a.? = b.?
However, this is generally discouraged. People find that hard to follow.
I have two tables, toynav_product_import - 18533 rows, catalog_product_entity - 42000 rows.
The below query, LEFT JOIN takes more than 2 minutes, while INNER JOIN runs in 0.009 seconds. The first table has the necessary index for the barcode field.
SELECT tpi.barcode FROM toynav_product_import tpi
INNER JOIN catalog_product_entity cpe ON tpi.barcode = cpe.sku
Please advise
toynav_product_import
catalog_product_entity
An outer join ( LEFT JOIN or RIGHT JOIN ) has to do all the work of an INNER JOIN plus the extra work of null-extending the results
And even if a LEFT JOIN were faster in specific situations, it is not functionally equivalent to an INNER JOIN, so you cannot simply go replacing all instances of one with the other!
Sorry cant post this as a comment
A LEFT JOIN is slower than the Inner Join. By definition, an outer join (LEFT JOIN or RIGHT JOIN) has to do all the work of an INNER JOIN plus the extra work of null-extending the results, Thats the reason. And as it also returns more number of Rows as compare to inner join, Thats why execution takes more time.
But by indexing the Foreign Keys properly, you can definitely increase the performance of the Joins.
It also depends on the Data, Its not always the case that Left join is slower, There are the cases when Left join is faster, But mostly Inner join is faster according to above described reasons.
Please refer to this link, the guy explained the difference very clearly.
I am studying select queries for MySQL join functions.
I am slightly confused on the below query. I understand the below statement to join attributes from multiple tables with the ON clause, and then filter the results set with the WHERE clause.
Is this correct? What other functionality does this provide? Are there better alternatives?
The tables, attributes, and schema are not relevant to this question, specifically just the ON and WHERE interaction. Thanks in advance for any insight you can provide, appreciated.
SELECT DISTINCT Movies.title
FROM Rentals
INNER JOIN Customers
INNER JOIN Copies
INNER JOIN Movies ON Rentals.customerNum=Customers.customerNum
AND Rentals.barcode=Copies.barcode
AND Copies.movieNum=Movies.movieNum
WHERE Customers.givenName='Chad'
AND Customers.familyName='Black';
INNER JOIN (and the outer joins) are binary operators that should be followed by an ON clause. Your particular syntax works in MySQL, but will not work in any other database (because it is missing two ON clauses).
I would recommend writing the query as:
SELECT DISTINCT m.title
FROM Movies m JOIN
Copies co
ON co.movieNum = m.movieNum JOIN
Rentals r
ON r.barcode = co.barcode JOIN
Customers c
ON c.customerNum = r.customerNum
WHERE c.givenName = 'Chad' AND
c.familyName = 'Black';
You should always put the JOIN conditions in the ON clause, with one ON per JOIN. This also introduces table aliases, which make the query easier to write and to read.
The WHERE clause has additional filtering conditions. These could also be in ON clauses, but I think the query reads better with them in the WHERE clause. You can glance at the query and see: "We are getting something from a bunch of tables for Chad Black".
Ordinary inner JOIN operations only generate result rows for table rows matching their ON condition. They suppress any rows that don't match. That means you can move the contents of ON clauses to your WHERE clause and get the same result set. Still, don't do that; JOINs are easier to understand when they have ON clauses.
If you use LEFT JOIN, a kind of outer join, you get rows from the first table you mention that don't match any rows in the second table according to the ON clause.
SELECT a.name, b.name
FROM a
LEFT JOIN b ON a.a_id = b.a_id
gives you are result set containing all rows of a with NULL values in b.name indicating that the ON condition did not match.
Say if I have this query
SELECT TableA.Id, TableA.Number, TableA.Name, TableA.HOl, TableB.Contact, TableC.activity
FROM TableA
left JOIN TableB on TableA.Id = TableB.TableA_Id
left join TableC on TableB.userid = TableC.userid
where TableA.hol = 50
order By TableA.Id
Is it better to but the TableA.Hol in the where or in the ON clause?
I am not sure if it makes a difference, I am trying to determine why it slow. Maybe it something else with my joins?
This is your query:
select TableA.Id, TableA.Number, TableA.Name, TableA.HOl, TableB.Contact, TableC.activity
from TableA left join
TableB
on TableA.Id = TableB.TableA_Id left join
TableC
on TableB.userid = TableC.userid
where TableA.hol = 50
order By TableA.Id;
A left join keeps all rows in the first table, regardless of what the on clause evaluates to. This means that a condition on the first table is effectively ignored in the on clause. Well, not exactly ignored -- the condition is false so the columns from the second table will be NULL for those rows.
So, filters on the first table in a left join should be in the where clause.
Conditions on subsequent tables should be in the on clause. Otherwise, those conditions will turn the outer join into an inner join.
SELECT A.Id, A.Number, A.Name, A.HOl, B.Contact, C.activity
FROM TableA A
LEFT OUTER JOIN TableB B
ON (A.Id = B.TableA_Id)
LEFT OUTER JOIN TableC C
ON (B.userid = C.userid)
AND A.hol = 50
ORDER BY A.Id
If you are referencing more than one table you can use an alias which improves readability. But this has nothing to do with performance.
Regardless of whether you are using JOIN or LEFT JOIN, use ON to specify how the tables are related and use WHERE to filter.
In the case of JOIN, the it does not matter where you put the filtering; it is for readability that you should follow the above rule.
In the case of LEFT JOIN, the results are likely to be different.
If you do
EXPLAIN EXTENDED SELECT ...
SHOW WARNINGS;
you can see what the SQL parser decided to do. In general, it moves ON clauses are to WHERE, indicating that it does not matter (to the semantics) which place they are. But, for LEFT JOIN, some things must remain in the ON.
Note another thing:
FROM a ...
LEFT JOIN b ...
WHERE b.foo = 123
effectively throws out the LEFT. The difference between LEFT and non-LEFT is whether you get rows of b filled with NULLs. But WHERE b.foo = 123 says you definitely do not want such rows. So, for clarity for the reader, do not say LEFT.
So, I agree with your original formulation. But I also like short aliases for all tables. Be sure to qualify all columns -- the reader may not know which table a column is in.
Your title says "multiple" joins. I discussed a single JOIN; the lecture applies to any number of JOINs.
sub1 and sub2 both have a 1-to-1 relationship with super.
I wish to determine whether a join exists for either one of them for a given super record.
The following two queries should produce my desired results. Are there any reasons to use !ISNULL() versus IS NOT NULL?
SELECT super.*
FROM super
LEFT OUTER JOIN sub1 ON super.id=sub1.super_id
LEFT OUTER JOIN sub2 ON super.id=sub2.super_id
WHERE (!ISNULL(sub1.id) OR !ISNULL(sub2.id)) AND super.id=123;
SELECT super.*
FROM super
LEFT OUTER JOIN sub1 ON super.id=sub1.super_id
LEFT OUTER JOIN sub2 ON super.id=sub2.super_id
WHERE (sub1.id IS NOT NULL OR sub2.id IS NOT NULL) AND super.id=123;
Use your second choice (IS NOT NULL). The query optimizer may or may not be able to help with the efficiency of your second query. But the query optimizer doesn't do functions. It assumes that it has to evaluate any function you give for all possible rows and columns; it doesn't try to infer the functions' meaning.