left/right join performance - mysql

Is there a different between in left and right join, the below sql statements are same result, but is the both is the same performance?
SELECT count(*)
FROM writers
RIGHT JOIN blogs ON blogs.members_id = model_id
WHERE member_id = 123 AND blogs.status = 1;
SELECT count(*)
FROM blogs
LEFT JOIN writers ON writers.model_id = blogs.members_id
WHERE writers.member_id = 123
AND blogs.status = 1;

No difference at all. Both are implementing inner joins. Huh? You think you specified an outer join, but your where clause is undoing it.
Presumably, you intend:
SELECT count(*)
FROM blogs b LEFT JOIN
writers w
ON w.model_id = b.members_id AND w.member_id = 123
WHERE b.status = 1;
Although you can write this as a RIGHT JOIN, I strongly discourage it. Here are some reasons:
SQL is read left-to-right. It is easier to follow "keep all the rows in the first table" than "keep all the rows in the last table which I haven't yet read".
SQL is parsed left-to-right. For a single join, this doesn't make a difference. But with multiple joins, there are subtle differences between A LEFT JOIN B LEFT JOIN C and C RIGHT JOIN B RIGHT JOIN A.

Related

SQL inner join multiple tables with one query

I've a query like below,
SELECT
c.testID,
FROM a
INNER JOIN b ON a.id=b.ID
INNER JOIN c ON b.r_ID=c.id
WHERE c.test IS NOT NULL;
Can this query be optimized further?, I want inner join between three tables to happen only if it meets the where clause.
Where clause works as filter on the data what appears after all JOINs,
whereas if you use same restriction to JOIN clause itself then it will be optimized in sense of avoiding filter after join. That is, join on filtered data instead.
SELECT c.testID,
FROM a
INNER JOIN b ON a.id = b.ID
INNER JOIN c ON b.r_ID = c.id AND c.test IS NOT NULL;
Moreover, you must create an index for the column test in table c to speed up the query.
Also, learn EXPLAIN command to the queries for best results.
Try the following:
SELECT
c.testID
FROM c
INNER JOIN b ON c.test IS NOT NULL AND b.r_ID=c.testID
INNER JOIN a ON a.id=b.r_ID;
I changed the order of the joins and conditions so that the first statement to be evaluated is c.test IS NOT NULL
Disclaimer: You should use the explain command in order to see the execution.
I'm pretty sure that even the minor change I just did might have no difference due to the MySql optimizer that work on all queries.
See the MySQL Documentation: Optimizing Queries with EXPLAIN
Three queries Compared
Have a look at the following fiddle:
https://www.db-fiddle.com/f/fXsT8oMzJ1H31FwMHrxR3u/0
I ran three different queries and in the end, MySQL optimized and ran them the same way.
Three Queries:
EXPLAIN SELECT
c.testID
FROM c
INNER JOIN b ON c.test IS NOT NULL AND b.r_ID=c.testID
INNER JOIN a ON a.id=b.r_ID;
EXPLAIN SELECT c.testID
FROM a
INNER JOIN b ON a.id = b.r_id
INNER JOIN c ON b.r_ID = c.testID AND c.test IS NOT NULL;
EXPLAIN SELECT
c.testID
FROM a
INNER JOIN b ON a.id=b.r_ID
INNER JOIN c ON b.r_ID=c.testID
WHERE c.test IS NOT NULL;
All tables should have a PRIMARY KEY. Assuming that id is the PRIMARY KEY for the tables that it is in, then you need these secondary keys for maximal performance:
c: INDEX(test, test_id, id) -- `test` must be first
b: INDEX(r_ID)
Both of those are useful and "covering".
Another thing to note: b and a is virtually unused in the query, so you may as well write the query this way:
SELECT c.testID,
FROM c
WHERE c.test IS NOT NULL;
At that point, all you need is INDEX(test, testID).
I suspect you "simplified" your query by leaving out some uses of a and b. Well, I simplified it from there, just as the Optimizer should have done. (However, elimination of tables is an optimization that it does not do; it figures that is something the user would have done.)
On the other hand, b and a are not totally useless. The JOIN verify that there are corresponding rows, possibly many such rows, in those tables. Again, I think you had some other purpose.

How to do multiple joins when some tables are empty

I am trying to append 'lookup data' to a main record text/description field so it looks something like this:
This is the Description Text
LookUpName1:
LookupValue1
LookupValueN
This worked fine with Inner Join like so
Select J.id, Concat('<b>LookUpName</b>:<br>',group_concat(F.LookUpValue SEPARATOR '<br>'))
from MainTable J Inner Join
LookUpTable L Inner Join
LookUpValuesTable F
On J.ID = L.JobID and F.ID = L.FilterID
Group by J.ID
However my goal is to add append multiple Lookup Tables and if I add them to this as Inner Joins I naturally just get those record where both/all the LookupTables have records.
On the other hand when I tried Join or Left Join I got an error on the Group by J.ID.
My goal is to append any of the existing Lookup Table values to all of the Description. Right now all I can achieve is returning appended descriptions which have ALL of the Lookup table values.
Your query would work if the on clauses were in the "right" place:
select J.id,
Concat('<b>LookUpName</b>:<br>', group_concat(F.LookUpValue separator '<br>'))
from MainTable J left join
LookUpTable L
on J.ID = L.JobID left join
LookUpValuesTable F
on F.ID = L.FilterID
group by J.ID;
The problem with your query is a MySQL (mis)feature. The on clause is optional for an inner join. Don't ask me why the MySQL designers thought inner join and cross join should be syntactically equivalent. Every other database requires an on clause for an inner join. It is easy enough to express a cross join using on 1=1.
However, the on clause is required for the left join, so when you switch to a left join, the compiler has a problem with the unorthodox syntax. The real problem is a missing on clause; this just happens to show up as "I wasn't expecting a group by yet." Using more traditional syntax with each join followed by an on should fix the problem.

Where to put conditions in Multiple Joins?

Say if I have this query
SELECT TableA.Id, TableA.Number, TableA.Name, TableA.HOl, TableB.Contact, TableC.activity
FROM TableA
left JOIN TableB on TableA.Id = TableB.TableA_Id
left join TableC on TableB.userid = TableC.userid
where TableA.hol = 50
order By TableA.Id
Is it better to but the TableA.Hol in the where or in the ON clause?
I am not sure if it makes a difference, I am trying to determine why it slow. Maybe it something else with my joins?
This is your query:
select TableA.Id, TableA.Number, TableA.Name, TableA.HOl, TableB.Contact, TableC.activity
from TableA left join
TableB
on TableA.Id = TableB.TableA_Id left join
TableC
on TableB.userid = TableC.userid
where TableA.hol = 50
order By TableA.Id;
A left join keeps all rows in the first table, regardless of what the on clause evaluates to. This means that a condition on the first table is effectively ignored in the on clause. Well, not exactly ignored -- the condition is false so the columns from the second table will be NULL for those rows.
So, filters on the first table in a left join should be in the where clause.
Conditions on subsequent tables should be in the on clause. Otherwise, those conditions will turn the outer join into an inner join.
SELECT A.Id, A.Number, A.Name, A.HOl, B.Contact, C.activity
FROM TableA A
LEFT OUTER JOIN TableB B
ON (A.Id = B.TableA_Id)
LEFT OUTER JOIN TableC C
ON (B.userid = C.userid)
AND A.hol = 50
ORDER BY A.Id
If you are referencing more than one table you can use an alias which improves readability. But this has nothing to do with performance.
Regardless of whether you are using JOIN or LEFT JOIN, use ON to specify how the tables are related and use WHERE to filter.
In the case of JOIN, the it does not matter where you put the filtering; it is for readability that you should follow the above rule.
In the case of LEFT JOIN, the results are likely to be different.
If you do
EXPLAIN EXTENDED SELECT ...
SHOW WARNINGS;
you can see what the SQL parser decided to do. In general, it moves ON clauses are to WHERE, indicating that it does not matter (to the semantics) which place they are. But, for LEFT JOIN, some things must remain in the ON.
Note another thing:
FROM a ...
LEFT JOIN b ...
WHERE b.foo = 123
effectively throws out the LEFT. The difference between LEFT and non-LEFT is whether you get rows of b filled with NULLs. But WHERE b.foo = 123 says you definitely do not want such rows. So, for clarity for the reader, do not say LEFT.
So, I agree with your original formulation. But I also like short aliases for all tables. Be sure to qualify all columns -- the reader may not know which table a column is in.
Your title says "multiple" joins. I discussed a single JOIN; the lecture applies to any number of JOINs.

MySQL JOINS without where clause

I know that in MySQL SQL it only makes sense to index those fields you use in the WHERE clause. But if you are using JOINS, i believe that the JOIN also acts as the WHERE clause because it is comparing two fields. For example:
select b.name, p.location
from Branch as p, Person as p
where b.id = p.id;
is the same as
select b.name, p.location
from Branch as p
INNER JOIN Person as p ON (p.id = b.id);
So my understanding is that the INNER JOIN = WHERE clause in a way, or translated that way by MySQL, and hence can be indexed on i.e, a columns used on a JOIN are indexed (if they have indexes created on them). Is my understanding correct?
Where and Join are pretty similar. However, Join is more at table level, and where is more at column level. Yes, you are corrrect.

MySQL - SELECT with LEFT JOIN optimization

I have 3 tables to select from. 2 of them are always necessary (tbl_notes, tbl_clients) while the 3rd is optional (tbl_notes_categories).
I've always used a LEFT JOIN in my queries with questionable correlating records to the primary table.
But I'm not getting any results with the query below.
Would someone point out how I'm using the LEFT JOIN incorrectly?
SELECT n.*, c.clientname, nc.notecategoryname
FROM tbl_notes n, tbl_clients c
LEFT JOIN tbl_notes_categories nc ON n.categoryid = nc.categoryid
WHERE n.clientid = c.clientid
AND c.clientid = 12345
ORDER BY n.dateinserted DESC
In fact, I'm getting a sql error.
#1054 - Unknown column 'n.categoryid' in 'on clause'
categoryid certainly does exist in tbl_notes
I probably need to brush up on how JOINS really work. I'm guessing I cannot have a LEFT JOIN with 2 database tables before it?
On a side note, I can foresee times when there will be multiple required tables, with several optional tables. (in this case tbl_notes_categories is optional)
Assuming the column categoryid exists in the tbl_notes table...
Try rewriting the query to use the JOIN syntax, rather than using the old-school comma as the join operator. (If the problem isn't a misnamed column, I suspect the problem is in mixing the two types of syntax... but this is just a suspicion, I have no reason to test mixing old-style comma joins with JOIN keywords.)
I'd write the statement like this:
SELECT n.*, c.clientname, nc.notecategoryname
FROM tbl_notes n
JOIN tbl_clients c
ON n.clientid = c.clientid
LEFT
JOIN tbl_notes_categories nc
ON nc.categoryid = n.categoryid
WHERE c.clientid = 12345
ORDER BY n.dateinserted DESC
(Actually, I would specify the individual columns to return from n, rather than using n.*, but that's just a style preference, not a SQL syntax requirement.)