MySQL JOIN behind the scenes - mysql

I remember reading somewhere / being told / inventing a rumor (^_^) that the two following queries are the same behind the scenes in MySQL servers:
SELECT *
FROM a
JOIN b
ON a.id = b.id
and
SELECT *
FROM a, b
WHERE a.id = b.id
Is it true? If so, is one better than the alternate in other terms? (such as parsing efficiency or standard compliance)

It is in fact true. The first query is according the SQL-89 standard and the second is according to SQL-92.
The SQL-92 standard introduced INNER JOIN .. ON and OUTER JOIN .. ON in order to replace the more complex(?) syntax of SQL-89.
An outer join in SQL-89 would be:
SELECT ...
FROM t1, t2
WHERE t1.id *= t2.id
where in SQL-92 it would be
FROM t1 OUTER JOIN t2 ON t1.id = t2.id
I did prefer SQL-89 over SQL-92 for a long while, but I think SQL Server 2008 compability removed the support for SQL-89 join syntax.

yep, these are identical. But it's not something specific to Mysql - it's just a different joining styles. The one you wrote on top is newer and preffered one

Related

What is the difference between using “JOIN” and “WHERE”?

I have two SQL queries one with a WHERE and one with JOIN.
SELECT * FROM Table1 T1, Table2 T2 WHERE T1.Key = T2.Key AND T2.Key = T1.Key
SELECT * FROM Table1 T1 JOIN Table2 T2 ON T1.Key = T2.Key And T2.Key = T1.Key
Are there any differences in the two queries? If they are the same, which one is more efficient to use?
Your first query uses ANSI-89 SQL syntax, your second query the more modern ANSI-92 join syntax. Functionally they are equivalent. The second syntax is easier to read because it keeps the condition near the joined table. That's far more visible with multiple joins.
See this question for more details.
Yes, both queries will give the same results. The second one uses explicit join and is the recommended one.
As for efficiency. Most database will optimize both queries to same execution plan, but very few databases may optimize the second one better.

Teradata equivalent of MySQL's USING

My question is quite similar to this one, but in Teradata:
SQL Server equivalent of MySQL's USING
Is there any equivalent shortcut to this query?
SELECT *
FROM t1
JOIN t2
ON (t1.column = t2.column)
No. The closest thing you can do with a natural join is:
SELECT
FROM T1, T2
WHERE t1.column = t2.column;
Yes. It's ANSI JOIN syntax. For example:
SELECT
*
FROM T1
INNER JOIN T2 ON T1.column = T2.column
;
For a multiple column join criteria, do the following:
SELECT
*
FROM T1
INNER JOIN T2 ON T2.column1 = T1.column1
AND T2.column2 = T1.column2
LEFT OUTER JOIN T3 ON T3.column1 = T2.column1
;
Detailed, comprehensive information with examples is available in Chapter 2 of Teradata® RDBMS SQL Reference - Volume 6 Data Manipulation Statements.
If Teradata supports NATURAL JOINs, then you're set. In MySQL, NATURAL JOINs are INNER JOINs with a USING clause. Also, you can add a LEFT|RIGHT and OUTER clauses to the NATURAL clause to further specify how you want the JOIN made.
Check the documentation of Teradata, hopefully it should support it.

What is the default MySQL JOIN behaviour, INNER or OUTER?

So I've been looking through the internet the last hour, reading and looking for the definitive answer to this simple question.
What is the default JOIN in MySQL?
SELECT * FROM t1 JOIN t2
Is that the same as
SELECT * FROM t1, t2
OR
SELECT * FROM t1 INNER JOIN t2
Also a related question, when you use "WHERE" clauses, is it the same as JOIN or INNER JOIN ?
Right now I'm thinking a stand-alone JOIN is identical to using commas and WHERE clauses.
In MySQL writing JOIN unqualified implies INNER JOIN. In other words the INNER in INNER JOIN is optional. INNER and CROSS are synonyms in MySQL. For clarity I write JOIN or INNER JOIN if I have a join condition and CROSS JOIN if I don't have a condition.
The allowed syntax for joins is described in the documentation.
Right now I'm thinking a stand-alone JOIN is nothing more than (identical to) using commas and WHERE clauses.
The effect is the same, but the history behind them is different. The comma syntax is from the ANSI-89 standard. However there are a number of problems with this syntax so in the ANSI-92 standard the JOIN keyword was introduced.
I would strongly recommend that you always use JOIN syntax rather than the comma.
T1 JOIN T2 ON ... is more readable than T1, T2 WHERE ....
It is more maintainable because the table relationships and filters are clearly defined rather than mixed together.
The JOIN syntax is easier to convert to OUTER JOIN than the comma syntax.
Mixing the comma and JOIN syntax in the same statement can give curious errors due to the precedence rules.
It is less likely to accidentally create a cartesian product when using the JOIN syntax due to a forgotten join clause, because the join clauses are written next to the joins and it is easy to see if one is missing.
These are all equivalent, and also equal to, CROSS JOIN.
There are some differences between using comma and [INNER | CROSS] JOIN syntax, which might be important when joining more tables. Pretty much all you need to know is described here in the MySQL JOIN documentation.

MySQL statement using OUTER JOIN vs using WHERE to set conditions

For the statements with INNER JOIN:
SELECT column(s) FROM table1
INNER JOIN table2 ON condition(s)
...
INNER JOIN tableN ON condition(s);
I can write an equivalent statement with this:
SELECT column(s) FROM table1, table2, ..., tableN WHERE condition(s);
notice how I use WHERE to set my conditions in the second statement.
Question: can I write equivalent statements using WHERE to set my conditions for any OUTER (LEFT/RIGHT) JOIN statements as well?
can I write equivalent statements using WHERE to set my conditions for any OUTER (LEFT/RIGHT) JOIN statements as well?
No, not in ANSI SQL or MySQL. Some other databases have their own syntax that was used before the ANSI JOIN syntax was accepted. For example in Oracle 8:
WHERE table1.id=table2.thing (+)
But today the ANSI JOIN syntax should generally be preferred for both kinds of join.
I am no MYSQL expert, but in oracle the syntax for an outer join in a where condition is where t1.id += t2.id
From the MySQL 5.0 Reference Manual :
"INNER JOIN and, (comma) are semantically equivalent in the absence of a join condition"
So to me, that would read that a comma implies an INNER JOIN.
I recommend you use ANSI join syntax for greater clarity and portability.
SELECT ...
FROM table1 t1
RIGHT OUTER JOIN table2 t2 ON (t1.x = t2.y AND somecondition)
In Sybase you can use table1.field1 = table2.field1 (left outer) = right outer

thoughts on innerjoin mysql

We have tables with more then 3m records. When using innerjoin it is much slower then select * from db1,db2 where db1.field=db2.field
Any thoughts?
INNER JOIN should not be any different from a SELECT FROM t1,t2 WHERE t1.c=t2.c, it is just a different syntax for doing the same thing and is treated the same by the optimiser.
Any difference in performance is in some other aspect of the query. Please POST:
The schema of both tables including their indexes (SHOW CREATE TABLE gives you this)
Both the queries you're comparing
Some detail about your performance testing methodology (it may be flawed)
The EXPLAIN output of both queries.
If you want a reasonable answer.
SELECT * from t1, t2 where t1.id = t2.id
is equivalent to
SELECT * from t1 INNER JOIN t2 on t1.id = t2.id.
However, if there are other criteria for the SQL query, then the behaviour may differ. For instance.
SELECT * from t1, t2 where t1.id = t2.id and t1.col1 is not null;
can be written in two different ways with the INNER JOIN:
SELECT * from t1 INNER JOIN t2 on t1.id = t2.id and t1.col1 is not null
or
SELECT * from t1 INNER JOIN t2 on t1.id = t2.id
WHERE t1.col1 is not null
This may or may not end up being the same query (according to the optimiser), and the complexity of the other parts of the query. The EXPLAIN PLAN will tell you if you are executing the same query.
Why are the above queries different? Because the restriction on not null is done at different stages of the query, which may have an impact on the performance, or even on the number of rows returned.
In general, the ...where db1.field=db2.field... syntax is an inner join. It's just the implicit notation instead of the explicit. If you're joining on the same columns and returning the same columns, performance should be identical. More: http://en.wikipedia.org/wiki/Join_(SQL)#Inner_join
I generally use explicit INNER JOIN or LEFT JOIN syntax according to needs. When the optimizer does a bad job, a STRAIGHT_JOIN can often sort it out, with suitable rearrangement of the query.
With any join involving large tables, it's worth using EXPLAIN.