ISNULL() verse IS NULL performance when using LEFT JOINs - mysql

sub1 and sub2 both have a 1-to-1 relationship with super.
I wish to determine whether a join exists for either one of them for a given super record.
The following two queries should produce my desired results. Are there any reasons to use !ISNULL() versus IS NOT NULL?
SELECT super.*
FROM super
LEFT OUTER JOIN sub1 ON super.id=sub1.super_id
LEFT OUTER JOIN sub2 ON super.id=sub2.super_id
WHERE (!ISNULL(sub1.id) OR !ISNULL(sub2.id)) AND super.id=123;
SELECT super.*
FROM super
LEFT OUTER JOIN sub1 ON super.id=sub1.super_id
LEFT OUTER JOIN sub2 ON super.id=sub2.super_id
WHERE (sub1.id IS NOT NULL OR sub2.id IS NOT NULL) AND super.id=123;

Use your second choice (IS NOT NULL). The query optimizer may or may not be able to help with the efficiency of your second query. But the query optimizer doesn't do functions. It assumes that it has to evaluate any function you give for all possible rows and columns; it doesn't try to infer the functions' meaning.

Related

Execution time taken by different SQL JOINS

Below two queries result the same result set. In first I have only used INNER JOIN and in second query mix of joins like LEFT and RIGHT JOIN. I personally prefer INNER JOIN when there is no specific task/requirement of other joins. But I just want to know that is there any difference between the below two queries in terms of performance or execution time. Is it ok to use inner join than the mix of joins?
1.
SELECT film.title, category.name, film.rating, language.name
FROM film INNER JOIN film_category ON film_category.film_id = film.film_id
INNER JOIN category ON category.category_id = film_category.category_id
INNER JOIN language ON language.language_id = film.language_id
WHERE category.name = "Sci-Fi" AND film.rating = "NC-17";
SELECT film.title, film.release_year, film.rating,category.name, language.name
FROM film LEFT JOIN language ON language.language_id=film.language_id
RIGHT JOIN film_category ON film_category.film_id = film.film_id
LEFT JOIN category ON category.category_id=film_category.category_id
WHERE film.rating="NC-17" AND category.name="Sci-Fi";
Please see this INNER JOIN vs LEFT JOIN performance in SQL Server.
However, choosing the proper join type is depending on the usecase and result set which you need to extract.
Please do not mix the different types except in this way: INNER, INNER, ... LEFT, LEFT, ... Any other combination has ambiguities about what gets done first. If you must mix them up, use parentheses to indicate which JOIN must be done before the others.
As for whether INNER/LEFT/RIGHT are identical, let me explain with one example:
SELECT ...
FROM a
LEFT JOIN b ON ... -- really INNER
WHERE b.x = 17
That WHERE effectively turns the LEFT JOIN into INNER JOIN. The Optimizer will do such. I, as a human, will stumble over the query until I realize that. So, humor me by calling it INNER JOIN.
Phrased another way, use LEFT only when the "right" table's columns are optional, but you want NULLs when they are missing. Of course, you may want the NULLs so you can say "find rows of a that are not in b:
SELECT ...
FROM a
LEFT JOIN b ON ...
WHERE b.id IS NULL -- common use for LEFT
While I have your attention, here are some notes/rules:
The keywords INNER, CROSS, and OUTER are ignored by MySQL. The ON and WHERE clauses will determine which type of JOIN is really intended.
Have you ever seen an owl turn its head nearly all the way around? That's what happens to my head when I see a RIGHT JOIN. Please convert it to a LEFT JOIN.
Though the Optimizer does not require this distinction, please use ON to specify how the tables are related and use WHERE for filtering. (With INNER, they are equivalent; with LEFT, you may get different results.)
Sometimes EXISTS( SELECT ... ) is better than a LEFT JOIN.
Optimizations vary depending on the existence of GROUP BY, ORDER BY, and LIMIT. But that is a loooong discussion.
Back to your question of which is faster, etc. Well, if the Optimizer is going to turn one into another, then those two have identical performance.

SQL inner join multiple tables with one query

I've a query like below,
SELECT
c.testID,
FROM a
INNER JOIN b ON a.id=b.ID
INNER JOIN c ON b.r_ID=c.id
WHERE c.test IS NOT NULL;
Can this query be optimized further?, I want inner join between three tables to happen only if it meets the where clause.
Where clause works as filter on the data what appears after all JOINs,
whereas if you use same restriction to JOIN clause itself then it will be optimized in sense of avoiding filter after join. That is, join on filtered data instead.
SELECT c.testID,
FROM a
INNER JOIN b ON a.id = b.ID
INNER JOIN c ON b.r_ID = c.id AND c.test IS NOT NULL;
Moreover, you must create an index for the column test in table c to speed up the query.
Also, learn EXPLAIN command to the queries for best results.
Try the following:
SELECT
c.testID
FROM c
INNER JOIN b ON c.test IS NOT NULL AND b.r_ID=c.testID
INNER JOIN a ON a.id=b.r_ID;
I changed the order of the joins and conditions so that the first statement to be evaluated is c.test IS NOT NULL
Disclaimer: You should use the explain command in order to see the execution.
I'm pretty sure that even the minor change I just did might have no difference due to the MySql optimizer that work on all queries.
See the MySQL Documentation: Optimizing Queries with EXPLAIN
Three queries Compared
Have a look at the following fiddle:
https://www.db-fiddle.com/f/fXsT8oMzJ1H31FwMHrxR3u/0
I ran three different queries and in the end, MySQL optimized and ran them the same way.
Three Queries:
EXPLAIN SELECT
c.testID
FROM c
INNER JOIN b ON c.test IS NOT NULL AND b.r_ID=c.testID
INNER JOIN a ON a.id=b.r_ID;
EXPLAIN SELECT c.testID
FROM a
INNER JOIN b ON a.id = b.r_id
INNER JOIN c ON b.r_ID = c.testID AND c.test IS NOT NULL;
EXPLAIN SELECT
c.testID
FROM a
INNER JOIN b ON a.id=b.r_ID
INNER JOIN c ON b.r_ID=c.testID
WHERE c.test IS NOT NULL;
All tables should have a PRIMARY KEY. Assuming that id is the PRIMARY KEY for the tables that it is in, then you need these secondary keys for maximal performance:
c: INDEX(test, test_id, id) -- `test` must be first
b: INDEX(r_ID)
Both of those are useful and "covering".
Another thing to note: b and a is virtually unused in the query, so you may as well write the query this way:
SELECT c.testID,
FROM c
WHERE c.test IS NOT NULL;
At that point, all you need is INDEX(test, testID).
I suspect you "simplified" your query by leaving out some uses of a and b. Well, I simplified it from there, just as the Optimizer should have done. (However, elimination of tables is an optimization that it does not do; it figures that is something the user would have done.)
On the other hand, b and a are not totally useless. The JOIN verify that there are corresponding rows, possibly many such rows, in those tables. Again, I think you had some other purpose.

MySQL, functional difference between ON and WHERE in specific statement

I am studying select queries for MySQL join functions.
I am slightly confused on the below query. I understand the below statement to join attributes from multiple tables with the ON clause, and then filter the results set with the WHERE clause.
Is this correct? What other functionality does this provide? Are there better alternatives?
The tables, attributes, and schema are not relevant to this question, specifically just the ON and WHERE interaction. Thanks in advance for any insight you can provide, appreciated.
SELECT DISTINCT Movies.title
FROM Rentals
INNER JOIN Customers
INNER JOIN Copies
INNER JOIN Movies ON Rentals.customerNum=Customers.customerNum
AND Rentals.barcode=Copies.barcode
AND Copies.movieNum=Movies.movieNum
WHERE Customers.givenName='Chad'
AND Customers.familyName='Black';
INNER JOIN (and the outer joins) are binary operators that should be followed by an ON clause. Your particular syntax works in MySQL, but will not work in any other database (because it is missing two ON clauses).
I would recommend writing the query as:
SELECT DISTINCT m.title
FROM Movies m JOIN
Copies co
ON co.movieNum = m.movieNum JOIN
Rentals r
ON r.barcode = co.barcode JOIN
Customers c
ON c.customerNum = r.customerNum
WHERE c.givenName = 'Chad' AND
c.familyName = 'Black';
You should always put the JOIN conditions in the ON clause, with one ON per JOIN. This also introduces table aliases, which make the query easier to write and to read.
The WHERE clause has additional filtering conditions. These could also be in ON clauses, but I think the query reads better with them in the WHERE clause. You can glance at the query and see: "We are getting something from a bunch of tables for Chad Black".
Ordinary inner JOIN operations only generate result rows for table rows matching their ON condition. They suppress any rows that don't match. That means you can move the contents of ON clauses to your WHERE clause and get the same result set. Still, don't do that; JOINs are easier to understand when they have ON clauses.
If you use LEFT JOIN, a kind of outer join, you get rows from the first table you mention that don't match any rows in the second table according to the ON clause.
SELECT a.name, b.name
FROM a
LEFT JOIN b ON a.a_id = b.a_id
gives you are result set containing all rows of a with NULL values in b.name indicating that the ON condition did not match.

what is following left join sql statement doing?

Below is copied from high performance mysql book:
select film.film_id from sakila.film
left outer join sakila.film_actor using(film_id)
where film_actor.film_id is null;
I could not understand what it is doing.
Does the where clause filter for film_actor before joining. If so, how does join performs (film_id is null already, how does it join with film using film_id)
It's a standard SQL pattern for finding parent rows that have no children, in this case films that don't have an actor.
It works because missed left joins have all nulls in the missed joined row, and the where clause is evaluated after the join is made. Specifying a column that can't be null in reality in the joined row as being null returns only mussed joins.
Note also that you don't need distinct, because there is only ever one such row returned for missed joins.

What would cause a query to run slowly when used a subquery, but not when run separately?

I have something similar to the following:
SELECT c.id
FROM contact AS c
WHERE c.id IN (SELECT s.contact_id
FROM sub_table AS s
LEFT JOIN contact_sub AS c2 ON (s.id = c2.sub_field)
WHERE c2.phone LIKE '535%')
ORDER BY c.name
The problem is that the query takes a very very very long time (>2minutes), but if I take the subquery, run it separately, implode the ids and insert them into the main query, it runs in well less than 1 second, including the data retrival and implosion.
I have checked the explains on both methods and keys are being used appropriately and the same ways. The subquery doesn't return more than 200 IDs.
What could be causing the subquery method to take so much longer?
BTW, I know the query above can be written with joins, but the query I have can't be--this is just a simplified version.
Using MySQL 5.0.22.
Sounds suspiciously like MySQL bug #32665: Query with dependent subquery is too slow.
What happens if you try it like this?
SELECT c.id
FROM contact AS c
INNER JOIN (SELECT s.contact_id
FROM sub_table AS s
LEFT JOIN contact_sub AS c2 ON (s.id = c2.sub_field)
WHERE c2.phone LIKE '535%') subq ON subq.contact_id=c.id
ORDER BY c.name
Assuming that the result of s.contact_id is unique. You can add distinct to the subquery if it is not.
I always use uncorrelated subqueries this way rather than using the IN operator in the where clause.
Have you checked the Execution Plan for the query? This will usually show you the problem.
Can't you do another join instead of a subquery?
SELECT c.id
FROM contact AS c
JOIN sub_table AS s on c.id = s.contact_id
LEFT JOIN contact_sub AS cs ON (s.id = cs.sub_field)
WHERE cs.phone LIKE '535%'
ORDER BY c.name
Since the subquery is referring to a field sub_field in the outer select, it has to be run once for each row in the outer table - the results for the inner query will change with each row in the outer table.
It's a correlated subquery. It runs once for each row in the outer select. (I think. You have two tables with the same correlation name, I'm assuming that's a typo. That you say it can't be rewritten as a join means it's correlated. )
Ok, I'm going to give you something to try. You say that the subquery is not correlated, and that you still can't join on it. And that it you take the output of the subquery, and lexically substitute that for the subquery, the main query runs much faster.
So try this: make the subquery into a view: create view foo followed by the text of the subquery. Then rewrite the main query to get rid of the "IN" clause and instead join to the view.
How's the timing on that?