Is it better to have SQL condition in the JOIN clause or in the WHERE clause ? Is SQL engine optimized for either way ? Does it depend on the engine ?
Is it always possible to replace condition in the JOIN clause by a condition in the WHERE clause ?
Example here to illustrate what i mean with condition
SELECT role_.name
FROM user_role
INNER JOIN user ON user_role.user_id_ = user.id AND
user_role.user_id_ = #user_id
INNER JOIN role ON user_role.role_id = role_.id
vs.
SELECT role_.name
FROM user_role
INNER JOIN user ON user_role.user_id_ = user.id
INNER JOIN role ON user_role.role_id = role_.id
WHERE user.id = #user_id
SQL condition in JOIN clause and in WHERE condition are equivalent if INNER JOIN is used.
Otherwise if any other JOIN is used like LEFT/RIGHT than after matching rows based on condition , another step occurs which is addition of OUTER ROWS , ie non matching rows .
WHERE condition simply filters out all non matching rows.
See this thread
Having the non-key condition in the join clause is not only OK, it is preferable especially in this query, because you can avoid some joins to other tables that are further joined to the table to which the on clause belongs.
Where clause is evaluated after all joins have been made - it's a filter on the result set. But by putting the condition in the join clause, you can stop the rows being joined at he time they're bing joined.
In your case it makes no difference, because you don't have any following tables, but I use his technique often to gain performance in my queries.
By looking at the plan generated for both the queries we can see that having the condition in the INNER JOIN or WHERE clause generates the same plan.
But the problem with using the condition in the WHERE clause you'll not be able to handle OUTER JOINs
Related
I am studying select queries for MySQL join functions.
I am slightly confused on the below query. I understand the below statement to join attributes from multiple tables with the ON clause, and then filter the results set with the WHERE clause.
Is this correct? What other functionality does this provide? Are there better alternatives?
The tables, attributes, and schema are not relevant to this question, specifically just the ON and WHERE interaction. Thanks in advance for any insight you can provide, appreciated.
SELECT DISTINCT Movies.title
FROM Rentals
INNER JOIN Customers
INNER JOIN Copies
INNER JOIN Movies ON Rentals.customerNum=Customers.customerNum
AND Rentals.barcode=Copies.barcode
AND Copies.movieNum=Movies.movieNum
WHERE Customers.givenName='Chad'
AND Customers.familyName='Black';
INNER JOIN (and the outer joins) are binary operators that should be followed by an ON clause. Your particular syntax works in MySQL, but will not work in any other database (because it is missing two ON clauses).
I would recommend writing the query as:
SELECT DISTINCT m.title
FROM Movies m JOIN
Copies co
ON co.movieNum = m.movieNum JOIN
Rentals r
ON r.barcode = co.barcode JOIN
Customers c
ON c.customerNum = r.customerNum
WHERE c.givenName = 'Chad' AND
c.familyName = 'Black';
You should always put the JOIN conditions in the ON clause, with one ON per JOIN. This also introduces table aliases, which make the query easier to write and to read.
The WHERE clause has additional filtering conditions. These could also be in ON clauses, but I think the query reads better with them in the WHERE clause. You can glance at the query and see: "We are getting something from a bunch of tables for Chad Black".
Ordinary inner JOIN operations only generate result rows for table rows matching their ON condition. They suppress any rows that don't match. That means you can move the contents of ON clauses to your WHERE clause and get the same result set. Still, don't do that; JOINs are easier to understand when they have ON clauses.
If you use LEFT JOIN, a kind of outer join, you get rows from the first table you mention that don't match any rows in the second table according to the ON clause.
SELECT a.name, b.name
FROM a
LEFT JOIN b ON a.a_id = b.a_id
gives you are result set containing all rows of a with NULL values in b.name indicating that the ON condition did not match.
I have a query that looks like this:
select `adverts`.*
from `adverts`
inner join `advert_category` on `advert_category`.`advert_id` = `adverts`.`id`
inner join `advert_location` on `adverts`.`id` = `advert_location`.`advert_id`
where `advert_location`.`location_id` = ?
and `advert_category`.`category_id` = ?
order by `updated_at` desc
The problem here is I have a huge database and this response is absolutely ravaging my database.
What I really need is to do the first join, and then do there where clause. This will whittle down my response from like 100k queries to less than 10k, then I want to do the other join, in order to whittle down the responses again so I can get the advert_location on the category items.
Doing it as is just isn't viable.
So, how do I go about using a join and a where condition, and then after getting that response doing a further join with a where condition?
Thanks
This is your query, written a bit simpler so I can read it:
select a.*
from adverts a inner join
advert_category ac
on ac.advert_id = a.id inner join
advert_location al
on al.advert_id = a.id
where al.location_id = ? and
ac.category_id = ?
order by a.updated_at desc;
I am speculating that advert_category and advert_locations have multiple rows per advert. In that case, you are getting a Cartesian product for each advert.
A better way to write the query uses exists:
select a.*
from adverts a
where exists (select 1
from advert_location al
where al.advert_id = a.id and al.location_id = ?
) and
exists (select 1
from advert_category ac
where ac.advert_id = a.id and ac.category_id = ?
)
order by a.updated_at desc;
For this version, you want indexes on advert_location(advert_id, location_id), advert_category(advert_id, category_id), and probably advert(updated_at, id).
You can write the 1st join in a Derived Table including a WHERE-condition and then do the 2nd join (but a decent optimizer might resolve the Derived Table again and do what he thinks is best based on statistics):
select adverts.*
from
(
select `adverts`.*
from `adverts`
inner join `advert_category`
on `advert_category`.`advert_id` =`adverts`.`id`
where `advert_category`.`category_id` = ?
) as adverts
inner join `advert_location`
on `adverts`.`id` = `advert_location`.`advert_id`
where `advert_location`.`location_id` = ?
order by `updated_at` desc
MySQL will reorder inner joins for you during optimization, regardless of how you wrote them in your query. Inner join is the same in either direction (in algebra this is called commutative), so this is safe to do.
You can see the result of join reordering if you use EXPLAIN on your query.
If you don't like the order MySQL chose for your joins, you can override it with this kind of syntax:
from `adverts`
straight_join `advert_category` ...
https://dev.mysql.com/doc/refman/5.7/en/join.html says:
STRAIGHT_JOIN is similar to JOIN, except that the left table is always read before the right table. This can be used for those (few) cases for which the join optimizer processes the tables in a suboptimal order.
Once the optimizer has decided on the join order, it always does one join at a time, in that order. This is called the nested join method.
There isn't really any way to "do the join then do the where clause". Conditions are combined together when looking up rows for joined tables. But this is a good thing, because you can then create a compound index that helps match rows based on both join conditions and where conditions.
PS: When asking query optimization question, you should include the EXPLAIN output, and also run SHOW CREATE TABLE <tablename> for each table, and include the result. Then we don't have to guess at the columns and indexes in your table.
I executed these Queries
SELECT *
FROM A
INNER JOIN B ON coalesce(A.a1,'') = coalesce(B.a1,'') and A.a1 <> '';
/\
||
Condition in On Clause
and
SELECT *
FROM A
INNER JOIN B ON coalesce(A.a1,'') = coalesce(B.a1,'') WHERE A.a1 <> '';
/\
||
Condition in Where Clause
and get different Result ?
I want to understand what is the Difference between putting filtering condition in On Clause vs Condition in Where claues ,
which one is better in terms of performance ?
Update
Sample Data
a1 is blank '' in both Tables .
Now with 1st Query i am getting no Rows --> 0 Rows
but with my 2nd Queries i am getting multiple Rows --> 1251 Rows
They are not same.
Consider these queries:
SELECT *
FROM Orders
LEFT JOIN OrderLines ON OrderLines.OrderID=Orders.ID
WHERE Orders.ID = 12345
and
SELECT *
FROM Orders
LEFT JOIN OrderLines ON OrderLines.OrderID=Orders.ID AND Orders.ID=12345
The first will return an order and its lines, if any, for order number 12345. The second will return all orders, but only order 12345 will have any lines associated with it.
With an INNER JOIN, the clauses are effectively equivalent. However, just because they are functionally the same, in that they produce the same results, does not mean the two kinds of clauses have the same semantic meaning.
I meant:
-Does not matter for inner joins
-Matters for outer joins
a. 'where' clause: After joining. Records after join would be filtered.
b. 'on' clause - Before joining. Records (from right Table) would be filtered before joining, this may end up as null in the result (since OUTER join).
Please refer these articles:
http://weblogs.sqlteam.com/jeffs/archive/2007/05/14/criteria-on-outer-joined-tables.aspx
and
https://sites.google.com/site/nosuchmethodexception/database/join/join-vs-where-clause
In your first query, the condition " and A.a1<>'' " is applied only to table B. However, the where condition in your second query is applied to A and B.
It doesnt have any affect on Inner Join but it has affects ont Left Join
Below is copied from high performance mysql book:
select film.film_id from sakila.film
left outer join sakila.film_actor using(film_id)
where film_actor.film_id is null;
I could not understand what it is doing.
Does the where clause filter for film_actor before joining. If so, how does join performs (film_id is null already, how does it join with film using film_id)
It's a standard SQL pattern for finding parent rows that have no children, in this case films that don't have an actor.
It works because missed left joins have all nulls in the missed joined row, and the where clause is evaluated after the join is made. Specifying a column that can't be null in reality in the joined row as being null returns only mussed joins.
Note also that you don't need distinct, because there is only ever one such row returned for missed joins.
I have two MySQL queries:
First:
SELECT DISTINCT (broker.broker_id),company_id ,broker_name,company_name,mobile1_no,email,pan_card_num,broker_id,broker_id,company_id
FROM broker_firm AS broker_firm
LEFT JOIN broker ON broker_firm.company_id = broker.firm_id
AND broker_firm.is_active =1
AND broker.is_active =1
This query is generating 331 results.
Second:
SELECT COUNT( broker.broker_id ) AS tot
FROM broker_firm AS broker_firm
LEFT JOIN broker AS broker ON broker_firm.company_id = broker.firm_id
AND broker_firm.is_active =1
AND broker.is_active =1
This query is generating 289 results.
Can anyone please tell me the reason why? I expected both of the results to be same. Or maybe, the Count(*) result to be greater.
Thanks in advance
When you do a left join, the logic is simple: keep all the rows in the first table, regardless of whether the condition in the on clause is true. If the condition is false, then all the columns in the second table get a value of NULL.
When you do an inner join, the logic is to keep all rows in the first table.
In the first query, the additional conditions are in the on clause. Hence, all rows in the first table are kept (and don't forget that the join itself may result in duplicates). In the second query, the where clause has a condition broker.is_active = 1. This condition will fail when is_active is NULL -- which is what happens when the records don't match. In other words, the condition is turning the left join into an inner join.
EDIT:
The idea is the same. The second query is counting the matching records. count(broker.broker_id) counts the non-NULL values for that column. This is the same as doing an inner join.
The first query is counting all records. select distinct selects distinct values of all the columns. Your syntax is a bit confusing, because it suggests that you just want the distinct value of one column. But that is not how it works. Because you have columns from both tables in the select, the non-matching brokers will have their company information on the row, making that row distinct from all other rows.
why dont you use both in one query ?
SELECT broker.broker_id,company_id ,broker_name,company_name,mobile1_no,email,pan_card_num,broker_id,broker_id,company_id,COUNT( broker.broker_id ) AS tot
FROM broker_firm AS broker_firm
LEFT JOIN broker ON broker_firm.company_id = broker.firm_id
AND broker_firm.is_active =1
AND broker.is_active =1
GROUP BY broker.broker_id
First query counts all firms, you have 42 firms with no broker.
try
select count(broker_firm.company_id)
FROM broker_firm
where broker_firm.company_id not in (select firm_id from broker)