Shuold I use INNER JOIN conditions as a WHERE conditions?
Consider these two sample queries to explain the question:
SELECT t1.*, t2.*
FROM table1 AS t1
INNER JOIN table2 AS t2
ON t1.id = t2.foreign_key
WHERE t1.year < 2014
and this without the WHERE clause
SELECT t1.*, t2.*
FROM table1 AS t1
INNER JOIN table2 AS t2
ON t1.id = t2.foreign_key
AND t1.year < 2014
Since the JOIN type is INNER, both queries will result on typical result set.
Which is better in term of performance?
Generally performance should be similar since both queries should execute in the same way (if query optimizer is good).
I usually use WHERE clause since having simple join condition make sure that index scan will be used (if there is appropriate index).
For eaxample if you have slightly change in your query (see conditions order):
SELECT t1.*, t2.*
FROM table1 AS t1
INNER JOIN table2 AS t2
ON t1.year < 2014
AND t1.id = t2.foreign_key
Some optimizer engines could decide not to use index on t2.foreign_key column.
Try to check your query plans, should be near identical.
Also, db engine can optimize query to a better execution plan, so there should be no difference
Related
I work on a query in mysql that spend 30 sec to execute. The format is like this :
SELECT id
FROM table1 t1
INNER JOIN table2 t2
ON t1.id = t2.idt2
The INNER JOIN take 25 of 30 sec. When I write this like this :
SELECT id
FROM table1 t1
INNER JOIN (
SELECT idt2,col1,col2,col3
FROM table2
) t2
ON t1.id = t2.idt2
It take only 8 sec! Why does it work? I'm afraid of losing data.
(obviously, my query is more complex than this one, it's just an exemple)
Well you haven't shown us the EXPLAIN output
EXPLAIN SELECT id
FROM table1 t1
INNER JOIN table2 t2
ON t1.id = t2.idt2
this would definitly give us some insights of your query and table sctructures.
Based on your scenario, 1st query seems like you have issues with indexing.
What happened in your 2nd query is the optimizer is creating a temporary set from your subquery furthering filtering your data. I dont recommend doing that in MOST cases.
Purpose of subquery is to solve complex logic, not an instant solution for everything.
sql engine would use HashJoin if a query like this:
select * from table1 t1 left join table2 t2 on t1.id = t2.id;
that's fine. but if the query is like this:
select * from table1 t1 left join table2 t2 on t1.id > t2.id;
how to handle this?
the nestedloop join would work, but is there any better way?
For distributed SQL, a straight up non-qual join (t1.id > t2.id) is pretty expensive to execute. If one side is small you do a broadcast, and then use a sorted index on every node. If both sides are large, you can to range partition one and build a sorted index, and then replicate the other rows to any range that might match.
Normally, you have a combination equality and non-equal join like t1.id = t2.id and t1.cost < t2.cost. In that case, case you can do a normal distributed hash join, and then keep a sorted list of the secondary items to perform the non-equal part. This is what Presto does.
I am trying to join two table using left join, that is table1 left join table2.
I would only like part of the rows from A to be joined with B. Is it recommended that i use a sub query to filter rows from table1 or avoid them in where clause to improve my query performance?
select t1.a
,t1.b
,t2.c
from (select *
from table1
where a='x'
) t1 LEFT JOIN table2 t2 on t1.d=t2.d
or
select t1.a
,t1.b
,t2.c
from table1 t1 LEFT JOIN table2 t2 on t1.d=t2.d
where t1.a='x'
Check the query plan but I doubt it would make any difference.
It very depends on the structure and content of your database. The best way is to look into the query plan and compare it for both versions of your query.
You can find this documentation useful: MySQL Query Execution Plan
I would like to know whether this two versions are equivalent in result and which is better for performance reasons and why?
Nested Select in Select version
select
t1.c1,
t1.c2,
(select Count(t2.c1) from t2 where t2.id = t1.id) as count_t
from
t1
VS
select t1.c1,t1.c2, Count(t2.c1)
from t1,t2
where t2.id= t1.id
The first query is analog of this query -
SELECT
t1.c1,
t1.c2,
COUNT(t2.c1)
FROM t1
LEFT JOIN t2
ON t2.id = t1.id;
It selects all records from first table, and all matched records from second table (it is LEFT JOIN condition).
The second is analog of this query -
SELECT
t1.c1,
t1.c2,
COUNT(t2.c1)
FROM t1
JOIN t2
ON t2.id = t1.id;
It selects only matched records in both tables (it is INNER JOIN condition).
Well they are different queries. The top one will select all rows from t1 returning 0 for the count if there is no matching id in table t2.
The second query will only return rows where t1 and t2 both have a row with the same id.
The first query will likely suffer from performance issues on large data sets. The second query will potentially have a Cartesian issue. I would go with a join or left join based on your intent to have records from table 1 if table 2 has no related records and then add a group by statement to control the Cartesian.
We have tables with more then 3m records. When using innerjoin it is much slower then select * from db1,db2 where db1.field=db2.field
Any thoughts?
INNER JOIN should not be any different from a SELECT FROM t1,t2 WHERE t1.c=t2.c, it is just a different syntax for doing the same thing and is treated the same by the optimiser.
Any difference in performance is in some other aspect of the query. Please POST:
The schema of both tables including their indexes (SHOW CREATE TABLE gives you this)
Both the queries you're comparing
Some detail about your performance testing methodology (it may be flawed)
The EXPLAIN output of both queries.
If you want a reasonable answer.
SELECT * from t1, t2 where t1.id = t2.id
is equivalent to
SELECT * from t1 INNER JOIN t2 on t1.id = t2.id.
However, if there are other criteria for the SQL query, then the behaviour may differ. For instance.
SELECT * from t1, t2 where t1.id = t2.id and t1.col1 is not null;
can be written in two different ways with the INNER JOIN:
SELECT * from t1 INNER JOIN t2 on t1.id = t2.id and t1.col1 is not null
or
SELECT * from t1 INNER JOIN t2 on t1.id = t2.id
WHERE t1.col1 is not null
This may or may not end up being the same query (according to the optimiser), and the complexity of the other parts of the query. The EXPLAIN PLAN will tell you if you are executing the same query.
Why are the above queries different? Because the restriction on not null is done at different stages of the query, which may have an impact on the performance, or even on the number of rows returned.
In general, the ...where db1.field=db2.field... syntax is an inner join. It's just the implicit notation instead of the explicit. If you're joining on the same columns and returning the same columns, performance should be identical. More: http://en.wikipedia.org/wiki/Join_(SQL)#Inner_join
I generally use explicit INNER JOIN or LEFT JOIN syntax according to needs. When the optimizer does a bad job, a STRAIGHT_JOIN can often sort it out, with suitable rearrangement of the query.
With any join involving large tables, it's worth using EXPLAIN.