Which of the two SELECT statements is faster? - mysql

It seems that the second statement applies the where condition first before joining and the first one does join before applying the where condition, so the second one would be faster because it would do less joining. But is that really the case? Is there a reference which says definitely that in the first statement the where condition is executed after all the other joining operations finish?
SELECT * FROM class t1
LEFT JOIN class_students t2 ON t1.id = t2.class_id
LEFT JOIN student t3 ON t2.student_id = t3.id
WHERE t1.id = 1;
or
SELECT * FROM (SELECT * FROM class WHERE id = 1) t1
LEFT JOIN class_students t2 ON t1.id = t2.class_id
LEFT JOIN student t3 ON t2.student_id = t3.id;

Your second option has a "derived table" (a subquery in FROM or JOIN). Subqueries usually take extra effort. So, usually it is better to avoid them.
In your particular example, the Optimizer will probably start with t1 because the WHERE clause mentions it. That is, the execution will filter based on t1.id = 1, just as you suggest the second version would do.
Note my italicized words... There are exceptions to my statements; if you find a case where the second version runs faster, present it, I may be able to explain why it runs faster. (A likely example is where the subquery has GROUP BY and/or LIMIT. This is different enough from WHERE to make a difference.)

Related

SQL : Join after Where Clause

Ive got a big SQL statement.
But i want to test the where clause on the firsttable (T1) and after that, make all the joins on the rows selected using the where clause.
Because actually the query is very slow, cause MySQL join all the tables to t1 and then test the where clause !
SELECT * from FIRSTTABLE T1
LEFT JOIN T2 on (....)
LEFT JOIN T3 on (....)
WHERE T1.column = '1' OR T1.column= '5'
Any ideas ?
You can do something like:
SELECT * from
(
SELECT * from FIRSTTABLE T1
WHERE T1.column = '1' OR T1.column= '5'
) as T2
LEFT JOIN T3 on (....)
LEFT JOIN T4 on (....)
This is too long for a comment.
SQL queries are compiled and optimized before they are executed. The order of clauses in a query really has nothing to do with the final execution plan. More specifically, the filtering conditions in the WHERE clause could take place before, after, or even during the JOIN processing.
You can start to learn about SQL optimization by understanding indexes. The MySQL documentation is a very reasonable place to start.

how sql engine handle join query with non-equal?

sql engine would use HashJoin if a query like this:
select * from table1 t1 left join table2 t2 on t1.id = t2.id;
that's fine. but if the query is like this:
select * from table1 t1 left join table2 t2 on t1.id > t2.id;
how to handle this?
the nestedloop join would work, but is there any better way?
For distributed SQL, a straight up non-qual join (t1.id > t2.id) is pretty expensive to execute. If one side is small you do a broadcast, and then use a sorted index on every node. If both sides are large, you can to range partition one and build a sorted index, and then replicate the other rows to any range that might match.
Normally, you have a combination equality and non-equal join like t1.id = t2.id and t1.cost < t2.cost. In that case, case you can do a normal distributed hash join, and then keep a sorted list of the secondary items to perform the non-equal part. This is what Presto does.

What is the difference between using “JOIN” and “WHERE”?

I have two SQL queries one with a WHERE and one with JOIN.
SELECT * FROM Table1 T1, Table2 T2 WHERE T1.Key = T2.Key AND T2.Key = T1.Key
SELECT * FROM Table1 T1 JOIN Table2 T2 ON T1.Key = T2.Key And T2.Key = T1.Key
Are there any differences in the two queries? If they are the same, which one is more efficient to use?
Your first query uses ANSI-89 SQL syntax, your second query the more modern ANSI-92 join syntax. Functionally they are equivalent. The second syntax is easier to read because it keeps the condition near the joined table. That's far more visible with multiple joins.
See this question for more details.
Yes, both queries will give the same results. The second one uses explicit join and is the recommended one.
As for efficiency. Most database will optimize both queries to same execution plan, but very few databases may optimize the second one better.

SQL "IN" combined with "=" in WHERE clause

I'm struggling with someone else's code. What might the WHERE clause do in the following (MySQL) statement?
SELECT * FROM t1, t2 WHERE t1.id = t2.id IN (1,2,3)
It's not providing the desired result in my case, but I'm trying to figure what the original author intended.
Can anyone provide an example of the use of a WHERE clause like this?
This condition starts from the right, evaluates t2.id IN (1,2,3), gets the result (0 or 1), and uses it for join with t1.id. All rows of t2 with id from the IN list are joined to the row in t1 that has id of one; all other rows of t2 are joined with the row in t1 that has id of zero. Here is a small demo on sqlfiddle.com: link.
It is hard to imagine that that was the intent of the author, however: I think a more likely check was for both items to be in the list, and also being equal to each other. The equality to each other is important, because it looks like the author wanted to join the two tables.
A more modern way of doing joins is with ANSI SQL syntax. Here is the equivalent of your query in ANSI SQL:
SELECT * FROM t1 JOIN t2 ON t1.id = t2.id IN (1,2,3)

thoughts on innerjoin mysql

We have tables with more then 3m records. When using innerjoin it is much slower then select * from db1,db2 where db1.field=db2.field
Any thoughts?
INNER JOIN should not be any different from a SELECT FROM t1,t2 WHERE t1.c=t2.c, it is just a different syntax for doing the same thing and is treated the same by the optimiser.
Any difference in performance is in some other aspect of the query. Please POST:
The schema of both tables including their indexes (SHOW CREATE TABLE gives you this)
Both the queries you're comparing
Some detail about your performance testing methodology (it may be flawed)
The EXPLAIN output of both queries.
If you want a reasonable answer.
SELECT * from t1, t2 where t1.id = t2.id
is equivalent to
SELECT * from t1 INNER JOIN t2 on t1.id = t2.id.
However, if there are other criteria for the SQL query, then the behaviour may differ. For instance.
SELECT * from t1, t2 where t1.id = t2.id and t1.col1 is not null;
can be written in two different ways with the INNER JOIN:
SELECT * from t1 INNER JOIN t2 on t1.id = t2.id and t1.col1 is not null
or
SELECT * from t1 INNER JOIN t2 on t1.id = t2.id
WHERE t1.col1 is not null
This may or may not end up being the same query (according to the optimiser), and the complexity of the other parts of the query. The EXPLAIN PLAN will tell you if you are executing the same query.
Why are the above queries different? Because the restriction on not null is done at different stages of the query, which may have an impact on the performance, or even on the number of rows returned.
In general, the ...where db1.field=db2.field... syntax is an inner join. It's just the implicit notation instead of the explicit. If you're joining on the same columns and returning the same columns, performance should be identical. More: http://en.wikipedia.org/wiki/Join_(SQL)#Inner_join
I generally use explicit INNER JOIN or LEFT JOIN syntax according to needs. When the optimizer does a bad job, a STRAIGHT_JOIN can often sort it out, with suitable rearrangement of the query.
With any join involving large tables, it's worth using EXPLAIN.