MySQL JOINS without where clause - mysql

I know that in MySQL SQL it only makes sense to index those fields you use in the WHERE clause. But if you are using JOINS, i believe that the JOIN also acts as the WHERE clause because it is comparing two fields. For example:
select b.name, p.location
from Branch as p, Person as p
where b.id = p.id;
is the same as
select b.name, p.location
from Branch as p
INNER JOIN Person as p ON (p.id = b.id);
So my understanding is that the INNER JOIN = WHERE clause in a way, or translated that way by MySQL, and hence can be indexed on i.e, a columns used on a JOIN are indexed (if they have indexes created on them). Is my understanding correct?

Where and Join are pretty similar. However, Join is more at table level, and where is more at column level. Yes, you are corrrect.

Related

SQL inner join multiple tables with one query

I've a query like below,
SELECT
c.testID,
FROM a
INNER JOIN b ON a.id=b.ID
INNER JOIN c ON b.r_ID=c.id
WHERE c.test IS NOT NULL;
Can this query be optimized further?, I want inner join between three tables to happen only if it meets the where clause.
Where clause works as filter on the data what appears after all JOINs,
whereas if you use same restriction to JOIN clause itself then it will be optimized in sense of avoiding filter after join. That is, join on filtered data instead.
SELECT c.testID,
FROM a
INNER JOIN b ON a.id = b.ID
INNER JOIN c ON b.r_ID = c.id AND c.test IS NOT NULL;
Moreover, you must create an index for the column test in table c to speed up the query.
Also, learn EXPLAIN command to the queries for best results.
Try the following:
SELECT
c.testID
FROM c
INNER JOIN b ON c.test IS NOT NULL AND b.r_ID=c.testID
INNER JOIN a ON a.id=b.r_ID;
I changed the order of the joins and conditions so that the first statement to be evaluated is c.test IS NOT NULL
Disclaimer: You should use the explain command in order to see the execution.
I'm pretty sure that even the minor change I just did might have no difference due to the MySql optimizer that work on all queries.
See the MySQL Documentation: Optimizing Queries with EXPLAIN
Three queries Compared
Have a look at the following fiddle:
https://www.db-fiddle.com/f/fXsT8oMzJ1H31FwMHrxR3u/0
I ran three different queries and in the end, MySQL optimized and ran them the same way.
Three Queries:
EXPLAIN SELECT
c.testID
FROM c
INNER JOIN b ON c.test IS NOT NULL AND b.r_ID=c.testID
INNER JOIN a ON a.id=b.r_ID;
EXPLAIN SELECT c.testID
FROM a
INNER JOIN b ON a.id = b.r_id
INNER JOIN c ON b.r_ID = c.testID AND c.test IS NOT NULL;
EXPLAIN SELECT
c.testID
FROM a
INNER JOIN b ON a.id=b.r_ID
INNER JOIN c ON b.r_ID=c.testID
WHERE c.test IS NOT NULL;
All tables should have a PRIMARY KEY. Assuming that id is the PRIMARY KEY for the tables that it is in, then you need these secondary keys for maximal performance:
c: INDEX(test, test_id, id) -- `test` must be first
b: INDEX(r_ID)
Both of those are useful and "covering".
Another thing to note: b and a is virtually unused in the query, so you may as well write the query this way:
SELECT c.testID,
FROM c
WHERE c.test IS NOT NULL;
At that point, all you need is INDEX(test, testID).
I suspect you "simplified" your query by leaving out some uses of a and b. Well, I simplified it from there, just as the Optimizer should have done. (However, elimination of tables is an optimization that it does not do; it figures that is something the user would have done.)
On the other hand, b and a are not totally useless. The JOIN verify that there are corresponding rows, possibly many such rows, in those tables. Again, I think you had some other purpose.

MySQL, functional difference between ON and WHERE in specific statement

I am studying select queries for MySQL join functions.
I am slightly confused on the below query. I understand the below statement to join attributes from multiple tables with the ON clause, and then filter the results set with the WHERE clause.
Is this correct? What other functionality does this provide? Are there better alternatives?
The tables, attributes, and schema are not relevant to this question, specifically just the ON and WHERE interaction. Thanks in advance for any insight you can provide, appreciated.
SELECT DISTINCT Movies.title
FROM Rentals
INNER JOIN Customers
INNER JOIN Copies
INNER JOIN Movies ON Rentals.customerNum=Customers.customerNum
AND Rentals.barcode=Copies.barcode
AND Copies.movieNum=Movies.movieNum
WHERE Customers.givenName='Chad'
AND Customers.familyName='Black';
INNER JOIN (and the outer joins) are binary operators that should be followed by an ON clause. Your particular syntax works in MySQL, but will not work in any other database (because it is missing two ON clauses).
I would recommend writing the query as:
SELECT DISTINCT m.title
FROM Movies m JOIN
Copies co
ON co.movieNum = m.movieNum JOIN
Rentals r
ON r.barcode = co.barcode JOIN
Customers c
ON c.customerNum = r.customerNum
WHERE c.givenName = 'Chad' AND
c.familyName = 'Black';
You should always put the JOIN conditions in the ON clause, with one ON per JOIN. This also introduces table aliases, which make the query easier to write and to read.
The WHERE clause has additional filtering conditions. These could also be in ON clauses, but I think the query reads better with them in the WHERE clause. You can glance at the query and see: "We are getting something from a bunch of tables for Chad Black".
Ordinary inner JOIN operations only generate result rows for table rows matching their ON condition. They suppress any rows that don't match. That means you can move the contents of ON clauses to your WHERE clause and get the same result set. Still, don't do that; JOINs are easier to understand when they have ON clauses.
If you use LEFT JOIN, a kind of outer join, you get rows from the first table you mention that don't match any rows in the second table according to the ON clause.
SELECT a.name, b.name
FROM a
LEFT JOIN b ON a.a_id = b.a_id
gives you are result set containing all rows of a with NULL values in b.name indicating that the ON condition did not match.

How to do multiple joins when some tables are empty

I am trying to append 'lookup data' to a main record text/description field so it looks something like this:
This is the Description Text
LookUpName1:
LookupValue1
LookupValueN
This worked fine with Inner Join like so
Select J.id, Concat('<b>LookUpName</b>:<br>',group_concat(F.LookUpValue SEPARATOR '<br>'))
from MainTable J Inner Join
LookUpTable L Inner Join
LookUpValuesTable F
On J.ID = L.JobID and F.ID = L.FilterID
Group by J.ID
However my goal is to add append multiple Lookup Tables and if I add them to this as Inner Joins I naturally just get those record where both/all the LookupTables have records.
On the other hand when I tried Join or Left Join I got an error on the Group by J.ID.
My goal is to append any of the existing Lookup Table values to all of the Description. Right now all I can achieve is returning appended descriptions which have ALL of the Lookup table values.
Your query would work if the on clauses were in the "right" place:
select J.id,
Concat('<b>LookUpName</b>:<br>', group_concat(F.LookUpValue separator '<br>'))
from MainTable J left join
LookUpTable L
on J.ID = L.JobID left join
LookUpValuesTable F
on F.ID = L.FilterID
group by J.ID;
The problem with your query is a MySQL (mis)feature. The on clause is optional for an inner join. Don't ask me why the MySQL designers thought inner join and cross join should be syntactically equivalent. Every other database requires an on clause for an inner join. It is easy enough to express a cross join using on 1=1.
However, the on clause is required for the left join, so when you switch to a left join, the compiler has a problem with the unorthodox syntax. The real problem is a missing on clause; this just happens to show up as "I wasn't expecting a group by yet." Using more traditional syntax with each join followed by an on should fix the problem.

Where to put conditions in Multiple Joins?

Say if I have this query
SELECT TableA.Id, TableA.Number, TableA.Name, TableA.HOl, TableB.Contact, TableC.activity
FROM TableA
left JOIN TableB on TableA.Id = TableB.TableA_Id
left join TableC on TableB.userid = TableC.userid
where TableA.hol = 50
order By TableA.Id
Is it better to but the TableA.Hol in the where or in the ON clause?
I am not sure if it makes a difference, I am trying to determine why it slow. Maybe it something else with my joins?
This is your query:
select TableA.Id, TableA.Number, TableA.Name, TableA.HOl, TableB.Contact, TableC.activity
from TableA left join
TableB
on TableA.Id = TableB.TableA_Id left join
TableC
on TableB.userid = TableC.userid
where TableA.hol = 50
order By TableA.Id;
A left join keeps all rows in the first table, regardless of what the on clause evaluates to. This means that a condition on the first table is effectively ignored in the on clause. Well, not exactly ignored -- the condition is false so the columns from the second table will be NULL for those rows.
So, filters on the first table in a left join should be in the where clause.
Conditions on subsequent tables should be in the on clause. Otherwise, those conditions will turn the outer join into an inner join.
SELECT A.Id, A.Number, A.Name, A.HOl, B.Contact, C.activity
FROM TableA A
LEFT OUTER JOIN TableB B
ON (A.Id = B.TableA_Id)
LEFT OUTER JOIN TableC C
ON (B.userid = C.userid)
AND A.hol = 50
ORDER BY A.Id
If you are referencing more than one table you can use an alias which improves readability. But this has nothing to do with performance.
Regardless of whether you are using JOIN or LEFT JOIN, use ON to specify how the tables are related and use WHERE to filter.
In the case of JOIN, the it does not matter where you put the filtering; it is for readability that you should follow the above rule.
In the case of LEFT JOIN, the results are likely to be different.
If you do
EXPLAIN EXTENDED SELECT ...
SHOW WARNINGS;
you can see what the SQL parser decided to do. In general, it moves ON clauses are to WHERE, indicating that it does not matter (to the semantics) which place they are. But, for LEFT JOIN, some things must remain in the ON.
Note another thing:
FROM a ...
LEFT JOIN b ...
WHERE b.foo = 123
effectively throws out the LEFT. The difference between LEFT and non-LEFT is whether you get rows of b filled with NULLs. But WHERE b.foo = 123 says you definitely do not want such rows. So, for clarity for the reader, do not say LEFT.
So, I agree with your original formulation. But I also like short aliases for all tables. Be sure to qualify all columns -- the reader may not know which table a column is in.
Your title says "multiple" joins. I discussed a single JOIN; the lecture applies to any number of JOINs.

mysql joining efficiency - join with where then join with something else

I have a query that looks like this:
select `adverts`.*
from `adverts`
inner join `advert_category` on `advert_category`.`advert_id` = `adverts`.`id`
inner join `advert_location` on `adverts`.`id` = `advert_location`.`advert_id`
where `advert_location`.`location_id` = ?
and `advert_category`.`category_id` = ?
order by `updated_at` desc
The problem here is I have a huge database and this response is absolutely ravaging my database.
What I really need is to do the first join, and then do there where clause. This will whittle down my response from like 100k queries to less than 10k, then I want to do the other join, in order to whittle down the responses again so I can get the advert_location on the category items.
Doing it as is just isn't viable.
So, how do I go about using a join and a where condition, and then after getting that response doing a further join with a where condition?
Thanks
This is your query, written a bit simpler so I can read it:
select a.*
from adverts a inner join
advert_category ac
on ac.advert_id = a.id inner join
advert_location al
on al.advert_id = a.id
where al.location_id = ? and
ac.category_id = ?
order by a.updated_at desc;
I am speculating that advert_category and advert_locations have multiple rows per advert. In that case, you are getting a Cartesian product for each advert.
A better way to write the query uses exists:
select a.*
from adverts a
where exists (select 1
from advert_location al
where al.advert_id = a.id and al.location_id = ?
) and
exists (select 1
from advert_category ac
where ac.advert_id = a.id and ac.category_id = ?
)
order by a.updated_at desc;
For this version, you want indexes on advert_location(advert_id, location_id), advert_category(advert_id, category_id), and probably advert(updated_at, id).
You can write the 1st join in a Derived Table including a WHERE-condition and then do the 2nd join (but a decent optimizer might resolve the Derived Table again and do what he thinks is best based on statistics):
select adverts.*
from
(
select `adverts`.*
from `adverts`
inner join `advert_category`
on `advert_category`.`advert_id` =`adverts`.`id`
where `advert_category`.`category_id` = ?
) as adverts
inner join `advert_location`
on `adverts`.`id` = `advert_location`.`advert_id`
where `advert_location`.`location_id` = ?
order by `updated_at` desc
MySQL will reorder inner joins for you during optimization, regardless of how you wrote them in your query. Inner join is the same in either direction (in algebra this is called commutative), so this is safe to do.
You can see the result of join reordering if you use EXPLAIN on your query.
If you don't like the order MySQL chose for your joins, you can override it with this kind of syntax:
from `adverts`
straight_join `advert_category` ...
https://dev.mysql.com/doc/refman/5.7/en/join.html says:
STRAIGHT_JOIN is similar to JOIN, except that the left table is always read before the right table. This can be used for those (few) cases for which the join optimizer processes the tables in a suboptimal order.
Once the optimizer has decided on the join order, it always does one join at a time, in that order. This is called the nested join method.
There isn't really any way to "do the join then do the where clause". Conditions are combined together when looking up rows for joined tables. But this is a good thing, because you can then create a compound index that helps match rows based on both join conditions and where conditions.
PS: When asking query optimization question, you should include the EXPLAIN output, and also run SHOW CREATE TABLE <tablename> for each table, and include the result. Then we don't have to guess at the columns and indexes in your table.