what is following left join sql statement doing? - mysql

Below is copied from high performance mysql book:
select film.film_id from sakila.film
left outer join sakila.film_actor using(film_id)
where film_actor.film_id is null;
I could not understand what it is doing.
Does the where clause filter for film_actor before joining. If so, how does join performs (film_id is null already, how does it join with film using film_id)

It's a standard SQL pattern for finding parent rows that have no children, in this case films that don't have an actor.
It works because missed left joins have all nulls in the missed joined row, and the where clause is evaluated after the join is made. Specifying a column that can't be null in reality in the joined row as being null returns only mussed joins.
Note also that you don't need distinct, because there is only ever one such row returned for missed joins.

Related

Execution time taken by different SQL JOINS

Below two queries result the same result set. In first I have only used INNER JOIN and in second query mix of joins like LEFT and RIGHT JOIN. I personally prefer INNER JOIN when there is no specific task/requirement of other joins. But I just want to know that is there any difference between the below two queries in terms of performance or execution time. Is it ok to use inner join than the mix of joins?
1.
SELECT film.title, category.name, film.rating, language.name
FROM film INNER JOIN film_category ON film_category.film_id = film.film_id
INNER JOIN category ON category.category_id = film_category.category_id
INNER JOIN language ON language.language_id = film.language_id
WHERE category.name = "Sci-Fi" AND film.rating = "NC-17";
SELECT film.title, film.release_year, film.rating,category.name, language.name
FROM film LEFT JOIN language ON language.language_id=film.language_id
RIGHT JOIN film_category ON film_category.film_id = film.film_id
LEFT JOIN category ON category.category_id=film_category.category_id
WHERE film.rating="NC-17" AND category.name="Sci-Fi";
Please see this INNER JOIN vs LEFT JOIN performance in SQL Server.
However, choosing the proper join type is depending on the usecase and result set which you need to extract.
Please do not mix the different types except in this way: INNER, INNER, ... LEFT, LEFT, ... Any other combination has ambiguities about what gets done first. If you must mix them up, use parentheses to indicate which JOIN must be done before the others.
As for whether INNER/LEFT/RIGHT are identical, let me explain with one example:
SELECT ...
FROM a
LEFT JOIN b ON ... -- really INNER
WHERE b.x = 17
That WHERE effectively turns the LEFT JOIN into INNER JOIN. The Optimizer will do such. I, as a human, will stumble over the query until I realize that. So, humor me by calling it INNER JOIN.
Phrased another way, use LEFT only when the "right" table's columns are optional, but you want NULLs when they are missing. Of course, you may want the NULLs so you can say "find rows of a that are not in b:
SELECT ...
FROM a
LEFT JOIN b ON ...
WHERE b.id IS NULL -- common use for LEFT
While I have your attention, here are some notes/rules:
The keywords INNER, CROSS, and OUTER are ignored by MySQL. The ON and WHERE clauses will determine which type of JOIN is really intended.
Have you ever seen an owl turn its head nearly all the way around? That's what happens to my head when I see a RIGHT JOIN. Please convert it to a LEFT JOIN.
Though the Optimizer does not require this distinction, please use ON to specify how the tables are related and use WHERE for filtering. (With INNER, they are equivalent; with LEFT, you may get different results.)
Sometimes EXISTS( SELECT ... ) is better than a LEFT JOIN.
Optimizations vary depending on the existence of GROUP BY, ORDER BY, and LIMIT. But that is a loooong discussion.
Back to your question of which is faster, etc. Well, if the Optimizer is going to turn one into another, then those two have identical performance.

MySQL, functional difference between ON and WHERE in specific statement

I am studying select queries for MySQL join functions.
I am slightly confused on the below query. I understand the below statement to join attributes from multiple tables with the ON clause, and then filter the results set with the WHERE clause.
Is this correct? What other functionality does this provide? Are there better alternatives?
The tables, attributes, and schema are not relevant to this question, specifically just the ON and WHERE interaction. Thanks in advance for any insight you can provide, appreciated.
SELECT DISTINCT Movies.title
FROM Rentals
INNER JOIN Customers
INNER JOIN Copies
INNER JOIN Movies ON Rentals.customerNum=Customers.customerNum
AND Rentals.barcode=Copies.barcode
AND Copies.movieNum=Movies.movieNum
WHERE Customers.givenName='Chad'
AND Customers.familyName='Black';
INNER JOIN (and the outer joins) are binary operators that should be followed by an ON clause. Your particular syntax works in MySQL, but will not work in any other database (because it is missing two ON clauses).
I would recommend writing the query as:
SELECT DISTINCT m.title
FROM Movies m JOIN
Copies co
ON co.movieNum = m.movieNum JOIN
Rentals r
ON r.barcode = co.barcode JOIN
Customers c
ON c.customerNum = r.customerNum
WHERE c.givenName = 'Chad' AND
c.familyName = 'Black';
You should always put the JOIN conditions in the ON clause, with one ON per JOIN. This also introduces table aliases, which make the query easier to write and to read.
The WHERE clause has additional filtering conditions. These could also be in ON clauses, but I think the query reads better with them in the WHERE clause. You can glance at the query and see: "We are getting something from a bunch of tables for Chad Black".
Ordinary inner JOIN operations only generate result rows for table rows matching their ON condition. They suppress any rows that don't match. That means you can move the contents of ON clauses to your WHERE clause and get the same result set. Still, don't do that; JOINs are easier to understand when they have ON clauses.
If you use LEFT JOIN, a kind of outer join, you get rows from the first table you mention that don't match any rows in the second table according to the ON clause.
SELECT a.name, b.name
FROM a
LEFT JOIN b ON a.a_id = b.a_id
gives you are result set containing all rows of a with NULL values in b.name indicating that the ON condition did not match.

Interpreting SQL Join Statement

Im supposed to be converting this query from MYSQL to SQL Server. However, the join statement is throwing me off. I haven't seen joins done like this and I am slightly confused on how to translate it.
SELECT
`Supplier Confirmed Orders` + `Log Tech Confirmed Orders` AS 'orders confirmed',
`Orders in CVN`-`Cancelled Orders` AS 'Orders in CVN',
tblloadingmonths.`month`,
tblvendorindex.`vendorindexid`,
'Service' AS category
FROM
tblloadingmonths
JOIN
tblvendorindex
LEFT JOIN
tblcvn ON tblloadingmonths.`month` = tblcvn.`month`
AND tblvendorindex.vendorindexid = tblcvn.vendorindexid
Whats throwing me off is that the loadingmonths and vendorindex tables dont have any common fields, but theyre being joined, and then left joined with cvn. I've always been taught to do tableA join tableB ON colA = colB join tableC ON colB = colC, but not tableA join talbeB left join tableC ON colA = colC AND colB = colC. As it stands, the query cant run in SQL Server with the joins the way the way they are. I had to set it up like this:
SELECT
CVN.[Supplier Confirmed Orders] + CVN.[Log Tech Confirmed Orders] AS 'orders confirmed',
(CVN.[Orders in CVN] - CVN.[Cancelled Orders]) AS 'Orders in CVN',
tblloadingmonths.month,
tblvendorindex.vendorindexid,
'Service' AS category,
'CVN Compliance' as metric
FROM
cvn
JOIN
tblvendorindex ON tblvendorindex.vendorindexid = CVN.vendorindexid
INNER JOIN
tblloadingmonths ON tblloadingmonths.month = CVN.month
Im getting different results for this converted query. Any guidance would be greatly appreciated
You are taught correctly in that you should list your join clause (ON...) and the second query is obviously far more readable and is preferred. Similarly, old-style joins simply list everything in the WHERE clause for INNER JOIN but again, this is hard to read. Regarding your different results. They are different because
cvn table name is different from tblcvn meaning it's a different object. If that was a type-o then...
In the first you LEFT JOINing to the tblloadingmonths and tblloadingmonths tables... meaning the rows must exist in both of those (for your join clause) for tblcvn rows to be returned. However, since it's a LEFT JOIN, the rows for tblloadingmonths and tblloadingmonths will be returned regardless of the match in tblcvn. In the second, you are using cvn as the base table and using INNER JOIN throughout. This means that the match must exists for the join clause for the rows in cvn to be returned. Otherwise, they would be filtered.
To recap, when using LEFT JOIN...
The table in the FROM clause matters. Swapping it with a LEFT JOIN table could change the results (as you witnessed)
Criteria in a WHERE clause could turn the LEFT into an INNER join.
I can't tell you if MySQL automatically assigns a join based on key assignments, or if it's making a Cartesian product. When using JOIN in SQL Server you have to list the ON clause. You could use old style joins... FROM Table1, Table2... but this should get you as Cartesian without a WHERE clause. This should get you close with some edits on your part:
SELECT
CVN.[Supplier Confirmed Orders] + CVN.[Log Tech Confirmed Orders] AS 'orders confirmed',
(CVN.[Orders in CVN] - CVN.[Cancelled Orders]) AS 'Orders in CVN',
tblloadingmonths.month,
tblvendorindex.vendorindexid,
'Service' AS category,
'CVN Compliance' as metric
FROM
tblvendorindex
INNER JOIN
tblloadingmonths ON tblvendorindex.??? = tblloadingmonths.???? --find out what the relation is, a foreign key perpahs. Would need a data model to determine.
LEFT JOIN
cvn ON tblloadingmonths.month = cnv.month
AND tblvendorindex.vendorindexid = cvn.vendorindexid
Thanks to your comments, you confirmed the MySQL join was producing a Cartesian Product implicitly which you can achieve with a CROSS JOIN explicitly in SQL Server, as you answered.
SELECT
CVN.[Supplier Confirmed Orders] + CVN.[Log Tech Confirmed Orders] AS 'orders confirmed',
(CVN.[Orders in CVN] - CVN.[Cancelled Orders]) AS 'Orders in CVN',
tblloadingmonths.month,
tblvendorindex.vendorindexid,
'Service' AS category,
'CVN Compliance' as metric
FROM
tblvendorindex
CROSS JOIN tblloadingmonths
LEFT JOIN
cvn ON tblloadingmonths.month = cnv.month
AND tblvendorindex.vendorindexid = cvn.vendorindexid

Is condition in the JOIN clause evil SQL

Is it better to have SQL condition in the JOIN clause or in the WHERE clause ? Is SQL engine optimized for either way ? Does it depend on the engine ?
Is it always possible to replace condition in the JOIN clause by a condition in the WHERE clause ?
Example here to illustrate what i mean with condition
SELECT role_.name
FROM user_role
INNER JOIN user ON user_role.user_id_ = user.id AND
user_role.user_id_ = #user_id
INNER JOIN role ON user_role.role_id = role_.id
vs.
SELECT role_.name
FROM user_role
INNER JOIN user ON user_role.user_id_ = user.id
INNER JOIN role ON user_role.role_id = role_.id
WHERE user.id = #user_id
SQL condition in JOIN clause and in WHERE condition are equivalent if INNER JOIN is used.
Otherwise if any other JOIN is used like LEFT/RIGHT than after matching rows based on condition , another step occurs which is addition of OUTER ROWS , ie non matching rows .
WHERE condition simply filters out all non matching rows.
See this thread
Having the non-key condition in the join clause is not only OK, it is preferable especially in this query, because you can avoid some joins to other tables that are further joined to the table to which the on clause belongs.
Where clause is evaluated after all joins have been made - it's a filter on the result set. But by putting the condition in the join clause, you can stop the rows being joined at he time they're bing joined.
In your case it makes no difference, because you don't have any following tables, but I use his technique often to gain performance in my queries.
By looking at the plan generated for both the queries we can see that having the condition in the INNER JOIN or WHERE clause generates the same plan.
But the problem with using the condition in the WHERE clause you'll not be able to handle OUTER JOINs

ISNULL() verse IS NULL performance when using LEFT JOINs

sub1 and sub2 both have a 1-to-1 relationship with super.
I wish to determine whether a join exists for either one of them for a given super record.
The following two queries should produce my desired results. Are there any reasons to use !ISNULL() versus IS NOT NULL?
SELECT super.*
FROM super
LEFT OUTER JOIN sub1 ON super.id=sub1.super_id
LEFT OUTER JOIN sub2 ON super.id=sub2.super_id
WHERE (!ISNULL(sub1.id) OR !ISNULL(sub2.id)) AND super.id=123;
SELECT super.*
FROM super
LEFT OUTER JOIN sub1 ON super.id=sub1.super_id
LEFT OUTER JOIN sub2 ON super.id=sub2.super_id
WHERE (sub1.id IS NOT NULL OR sub2.id IS NOT NULL) AND super.id=123;
Use your second choice (IS NOT NULL). The query optimizer may or may not be able to help with the efficiency of your second query. But the query optimizer doesn't do functions. It assumes that it has to evaluate any function you give for all possible rows and columns; it doesn't try to infer the functions' meaning.