I have two SQL queries one with a WHERE and one with JOIN.
SELECT * FROM Table1 T1, Table2 T2 WHERE T1.Key = T2.Key AND T2.Key = T1.Key
SELECT * FROM Table1 T1 JOIN Table2 T2 ON T1.Key = T2.Key And T2.Key = T1.Key
Are there any differences in the two queries? If they are the same, which one is more efficient to use?
Your first query uses ANSI-89 SQL syntax, your second query the more modern ANSI-92 join syntax. Functionally they are equivalent. The second syntax is easier to read because it keeps the condition near the joined table. That's far more visible with multiple joins.
See this question for more details.
Yes, both queries will give the same results. The second one uses explicit join and is the recommended one.
As for efficiency. Most database will optimize both queries to same execution plan, but very few databases may optimize the second one better.
Related
It seems that the second statement applies the where condition first before joining and the first one does join before applying the where condition, so the second one would be faster because it would do less joining. But is that really the case? Is there a reference which says definitely that in the first statement the where condition is executed after all the other joining operations finish?
SELECT * FROM class t1
LEFT JOIN class_students t2 ON t1.id = t2.class_id
LEFT JOIN student t3 ON t2.student_id = t3.id
WHERE t1.id = 1;
or
SELECT * FROM (SELECT * FROM class WHERE id = 1) t1
LEFT JOIN class_students t2 ON t1.id = t2.class_id
LEFT JOIN student t3 ON t2.student_id = t3.id;
Your second option has a "derived table" (a subquery in FROM or JOIN). Subqueries usually take extra effort. So, usually it is better to avoid them.
In your particular example, the Optimizer will probably start with t1 because the WHERE clause mentions it. That is, the execution will filter based on t1.id = 1, just as you suggest the second version would do.
Note my italicized words... There are exceptions to my statements; if you find a case where the second version runs faster, present it, I may be able to explain why it runs faster. (A likely example is where the subquery has GROUP BY and/or LIMIT. This is different enough from WHERE to make a difference.)
Shuold I use INNER JOIN conditions as a WHERE conditions?
Consider these two sample queries to explain the question:
SELECT t1.*, t2.*
FROM table1 AS t1
INNER JOIN table2 AS t2
ON t1.id = t2.foreign_key
WHERE t1.year < 2014
and this without the WHERE clause
SELECT t1.*, t2.*
FROM table1 AS t1
INNER JOIN table2 AS t2
ON t1.id = t2.foreign_key
AND t1.year < 2014
Since the JOIN type is INNER, both queries will result on typical result set.
Which is better in term of performance?
Generally performance should be similar since both queries should execute in the same way (if query optimizer is good).
I usually use WHERE clause since having simple join condition make sure that index scan will be used (if there is appropriate index).
For eaxample if you have slightly change in your query (see conditions order):
SELECT t1.*, t2.*
FROM table1 AS t1
INNER JOIN table2 AS t2
ON t1.year < 2014
AND t1.id = t2.foreign_key
Some optimizer engines could decide not to use index on t2.foreign_key column.
Try to check your query plans, should be near identical.
Also, db engine can optimize query to a better execution plan, so there should be no difference
Ok, I am using Mysql DB. I have 2 simple tables.
Table1
ID-Text
12-txt1
13-txt2
42-txt3
.....
Table2
ID-Type-Text
13- 1 - MuTxt1
42- 1 - MuTxt2
12- 2 - Xnnn
Now I want to join these 2 tables to get all data for Type=1 in table 2
SQL1:
Select * from
Table1 t1
Join
(select * from Table2 where Type=1) t2
on t1.ID=t2.ID
SQL2:
Select * from
Table1 t1
Join
Table2 t2
on t1.ID=t2.ID
where t2.Type=1
These 2 queries give the same result, but which one is faster?
I don't know how Mysql does the Join (or How the Join works in Mysql) & that why I wonder this!!
Exxtra info, Now if i don't want type=1 but want t2.text='MuTxt1', so Sql2 will become
Select * from
Table1 t1
Join
Table2 t2
on t1.ID=t2.ID
where t2.text='MuTxt1'
I feel like this query is slower??
Sometimes the MySQL query optimizer does a pretty decent job and sometimes it sucks. Having said that, there are exception to my answer where the optimizer optimizes something else better.
Sub-Queries are generally expensive as MySQL will need to execute and store results seperately. Normally if you could use a sub-query or a join, the join is faster. Especially when using sub-query as part of your where clause and don't put a limit to it.
Select *
from Table1 t1
Join Table2 t2 on t1.ID=t2.ID
where t2.Type=1
and
Select *
from Table1 t1
Join Table2 t2
where t1.ID =t2.ID AND t2.Type=1
should perform equally well, while
Select *
from Table1 t1
Join (select *
from Table2
where Type=1) t2
on t1.ID=t2.ID
most likely is a lot slower as MySQL stores the result of select * from Table2 where Type=1 into a temporary table.
Generally joins work by building a table comprised of all combinations of rows from both table and afterwards removing lines which do not match the conditions. MySQL of course will try to use indexes containing the columns compared in the on clause and specified in the where clause.
If you are interested in which indexes are used, write EXPLAIN in front of your query and execute.
As per my view 2nd query is more better than first query in terms of code readability and performance. You can include filter condition in Join clause also like
Select * from
Table1 t1
Join
Table2 t2 on t1.ID=t2.ID and t2.Type=1
You can compare execution time for all queries in SQL fiddle here :
Query 1
Query 2
My Query
I think this question is hard to answer since we don't exactly know the internals of the query parser in the database. Usually these kind of constructions are evaluated by the database in a similar way (it can see that the first and second query are identical so parses it correctly, or not).
I would write the second one since it is more clear what is happening.
I'm struggling with someone else's code. What might the WHERE clause do in the following (MySQL) statement?
SELECT * FROM t1, t2 WHERE t1.id = t2.id IN (1,2,3)
It's not providing the desired result in my case, but I'm trying to figure what the original author intended.
Can anyone provide an example of the use of a WHERE clause like this?
This condition starts from the right, evaluates t2.id IN (1,2,3), gets the result (0 or 1), and uses it for join with t1.id. All rows of t2 with id from the IN list are joined to the row in t1 that has id of one; all other rows of t2 are joined with the row in t1 that has id of zero. Here is a small demo on sqlfiddle.com: link.
It is hard to imagine that that was the intent of the author, however: I think a more likely check was for both items to be in the list, and also being equal to each other. The equality to each other is important, because it looks like the author wanted to join the two tables.
A more modern way of doing joins is with ANSI SQL syntax. Here is the equivalent of your query in ANSI SQL:
SELECT * FROM t1 JOIN t2 ON t1.id = t2.id IN (1,2,3)
We have tables with more then 3m records. When using innerjoin it is much slower then select * from db1,db2 where db1.field=db2.field
Any thoughts?
INNER JOIN should not be any different from a SELECT FROM t1,t2 WHERE t1.c=t2.c, it is just a different syntax for doing the same thing and is treated the same by the optimiser.
Any difference in performance is in some other aspect of the query. Please POST:
The schema of both tables including their indexes (SHOW CREATE TABLE gives you this)
Both the queries you're comparing
Some detail about your performance testing methodology (it may be flawed)
The EXPLAIN output of both queries.
If you want a reasonable answer.
SELECT * from t1, t2 where t1.id = t2.id
is equivalent to
SELECT * from t1 INNER JOIN t2 on t1.id = t2.id.
However, if there are other criteria for the SQL query, then the behaviour may differ. For instance.
SELECT * from t1, t2 where t1.id = t2.id and t1.col1 is not null;
can be written in two different ways with the INNER JOIN:
SELECT * from t1 INNER JOIN t2 on t1.id = t2.id and t1.col1 is not null
or
SELECT * from t1 INNER JOIN t2 on t1.id = t2.id
WHERE t1.col1 is not null
This may or may not end up being the same query (according to the optimiser), and the complexity of the other parts of the query. The EXPLAIN PLAN will tell you if you are executing the same query.
Why are the above queries different? Because the restriction on not null is done at different stages of the query, which may have an impact on the performance, or even on the number of rows returned.
In general, the ...where db1.field=db2.field... syntax is an inner join. It's just the implicit notation instead of the explicit. If you're joining on the same columns and returning the same columns, performance should be identical. More: http://en.wikipedia.org/wiki/Join_(SQL)#Inner_join
I generally use explicit INNER JOIN or LEFT JOIN syntax according to needs. When the optimizer does a bad job, a STRAIGHT_JOIN can often sort it out, with suitable rearrangement of the query.
With any join involving large tables, it's worth using EXPLAIN.