Speaking in principle, which one is faster:
SELECT FROM t1 INNER JOIN t2 INNER JOIN t3 ON t1.c = t2.c = t3.c
vs
SELECT FROM t1 INNER JOIN t2 USING (c) INNER JOIN t3 USING (c)
The easiest way for you to tell this would be to look at your explain plan. If you look at both, you'll probably see zero difference.
The using() keyword here is simply a shorthand expression. It evaluates to the same thing as your other option, and therefore makes no difference to performance.
The FROM part of any SQL statement is logically processed first before any other part of SQL statement and both statements are joining in the FROM portion (just don't add the join criteria to a WHERE clause). Assuming both have syntax correct, then the bigger question is whether these are flowing from M:1, M:1 context and whether there are indexes on both primary and foreign keys. Then run them through a query analyzer to gage the actual performance.
Related
It seems to me that you can do the same thing in a SQL query using either NOT EXISTS, NOT IN, or LEFT JOIN WHERE IS NULL. For example:
SELECT a FROM table1 WHERE a NOT IN (SELECT a FROM table2)
SELECT a FROM table1 WHERE NOT EXISTS (SELECT * FROM table2 WHERE table1.a = table2.a)
SELECT a FROM table1 LEFT JOIN table2 ON table1.a = table2.a WHERE table1.a IS NULL
I'm not sure if I got all the syntax correct, but these are the general techniques I've seen. Why would I choose to use one over the other? Does performance differ...? Which one of these is the fastest / most efficient? (If it depends on implementation, when would I use each one?)
NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: SQL Server
NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: PostgreSQL
NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: Oracle
NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: MySQL
In a nutshell:
NOT IN is a little bit different: it never matches if there is but a single NULL in the list.
In MySQL, NOT EXISTS is a little bit less efficient
In SQL Server, LEFT JOIN / IS NULL is less efficient
In PostgreSQL, NOT IN is less efficient
In Oracle, all three methods are the same.
If the database is good at optimising the query, the two first will be transformed to something close to the third.
For simple situations like the ones in you question, there should be little or no difference, as they all will be executed as joins. In more complex queries, the database might not be able to make a join out of the not in and not exists queryes. In that case the queries will get a lot slower. On the other hand, a join may also perform badly if there is no index that can be used, so just because you use a join doesn't mean that you are safe. You would have to examine the execution plan of the query to tell if there may be any performance problems.
Assuming you are avoiding nulls, they are all ways of writing an anti-join using Standard SQL.
An obvious omission is the equivalent using EXCEPT:
SELECT a FROM table1
EXCEPT
SELECT a FROM table2
Note in Oracle you need to use the MINUS operator (arguably a better name):
SELECT a FROM table1
MINUS
SELECT a FROM table2
Speaking of proprietary syntax, there may also be non-Standard equivalents worth investigating depending on the product you are using e.g. OUTER APPLY in SQL Server (something like):
SELECT t1.a
FROM table1 t1
OUTER APPLY
(
SELECT t2.a
FROM table2 t2
WHERE t2.a = t1.a
) AS dt1
WHERE dt1.a IS NULL;
When need to insert data in table with multi-field primary key, consider that it will be much faster (I tried in Access but I think in any Database) not to check that "not exists records with 'such' values in table", - rather just insert into table, and excess records (by the key) will not be inserted twice.
Performance perspective always avoid using inverse keywords like NOT IN, NOT EXISTS, ...
Because to check the inverse items DBMS need to runs through all the available and drop the inverse selection.
i have this query
SELECT t1.col1
FROM table1 t1
INNER JOIN table2 t2 ON t2.col1 = t1.col1
WHERE t2.col IN (1, 2, 3)
and this query
SELECT t1.col1
FROM table1 t1
INNER JOIN table2 t2 ON t2.col1 = t1.col1 AND t2.col IN (1, 2, 3)
both game me the same execution plan and told me
Using where;
does that mean that my 2nd query is converted to the first form from the optimizer and that i should follow the first form?
and is there a way to check the optimizer new query?
SQL is not a procedural language. It is a descriptive language. A query describes the result set that you want to produce.
With an inner join, the two queries in your question are identical -- they produce the same result set under all circumstances. Which to prefer is a stylistic preference. MySQL should treat the two the same way from an optimization perspective.
One preference is that filters on a single table are more appropriate for WHERE and ON.
With an outer join, the two queries are not the same, and you should use the one that expresses your intent.
The first one filters after joining the two tables; while the second one joins using the filtering of the values as a condition of joining the two tables. You may check this thread:
SQL join: where clause vs. on clause
You can check how much does it take to do both queries, but anyway, you can write the query in both ways, the optimizer will try to
optimize the query either if it is the first query or the second one.
I like this rule:
ON ... says how the tables are related.
WHERE ... filters the results.
This becomes important with LEFT JOIN because ON does not act as a filter.
This shows you that the Optimizer treats them the same:
EXPLAIN EXTENDED SELECT ...
SHOW WARNINGS;
When i execute this mysql query like
select * from t1 where colomn1 in (select colomn1 from t2) ,
what really happens?
I want to know if it executes the inner statement for every row?
PS: I have 300,000 rows in t1 and 50,000 rows in t2 and it is taking a hell of a time.
I'm flabbergasted to see that everyone points out to use JOIN as if it is the same thing. IT IS NOT!, not with the information given here. E.g. What if t2.column1 has doubles ?
=> Assuming there are no doubles in t2.column1, then yes, put a UNIQUE INDEX on said column and use a JOIN construction as it is more readable and easier to maintain. If it is going to be faster; that depends on what the query engine makes from it. In MSSQL the query-optimizer (probably) would consider them the same thing; maybe MySQL is 'not so eager' to recognize this... don't know.
=> Assuming there can be doubles in t2.column1, put a (non-unique) INDEX on said column and rewrite the WHERE IN (SELECT ..) into a WHERE EXISTS ( SELECT * FROM t2 WHERE t2.column1 = t1.column1). Again, mostly for readability and ease of maintenance; most likely the query engine will treat them the same...
The things to remember are
Always make sure you have proper indexing (but don't go overboard)
Always realize that what really happens will be an interpretation of your sql-code; not a 'direct translation'. You can write the same functionality in different ways to achieve the same goal. And some of these are indeed more resilient to different scenarios.
If you only have 10 rows, pretty much everything works. If you have 10M rows it could be worth examining the query plan... which most-likely will be different from the one with 10 rows.
A join would be quicker, viz:
select t1.* from t1 INNER JOIN t2 on t1.colomn1=t2.colomn1
Try with INNER JOIN
SELECT t1.*
FROM t1
INNER JOIN t2 ON t1.column1=t2.column1
You should do indexing in column1 and then you can use inner join
for indexing
CREATE INDEX index1 ON t1 (col1);
CREATE INDEX index2 ON t2 (col2);
select t1.* from t1 INNER JOIN t2 on t1.colomn1=t2.colomn1
what is the different between this queries? which one should I prefer and why?
SELECT
t1.*,
t2.x
FROM
t1,
t2
WHERE
t2.`id` = t1.`id`
or
SELECT
t1.*,
t2.x
FROM
t1
INNER JOIN # LEFT JOIN ?
t2
ON t2.`id` = t1.`id`
Does using commas has the same effect than use LEFT JOINS?
That's embarrassing. It's the first time I asked myself about this for years. I ever used the first version, but now i'm feeling like I missed some lines in my first SQL induction. ;)
The "comma" syntax is equivalent to the INNER JOIN syntax. The query optimizer should be running the same query for you regardless of how you ask for it. There's a general recommendation to do your joins using the JOIN language and do your filtering using the WHERE language as it makes your intention more clear
The queries are the same, the first is just short-hand syntaxfor the second. LEFT JOIN is something entirely different.
INNER JOIN only shows the records common to both tables. LEFT JOIN takes all records from the left table and match it to those of the right table. If no records in the right table match, NULL will be selected in their place.
inner joins are left joins that skip on empty (null) results in the second table.
query optimizers have more possibilites when everything is done using the where clause. some rdbms don't support commands like natural join and other joins, so using "where" is something one can really rely on.
The first query is called cartisian query and usually provide undesired results in the large database.
Instead using joins will produce exactly what you want.
When I'm selecting data from multiple tables I used to use JOINS a lot and recently I started to use another way but I'm unsure of the impact in the long run.
Examples:
SELECT * FROM table_1 LEFT JOIN table_2 ON (table_1.column = table_2.column)
So this is your basic LEFT JOIN across tables but take a look at the query below.
SELECT * FROM table_1,table_2 WHERE table_1.column = table_2.column
Personally if I was joining across lets say 7 tables of data I would prefer to do this over JOINS.
But are there any pros and cons in regards to the 2 methods ?
Second method is a shortcut for INNER JOIN.
SELECT * FROM table_1 INNER JOIN table_2 ON table_1.column = table_2.column
Will only select records that match the condition in both tables (LEFT JOIN will select all records from table on the left, and matching records from table on the right)
Quote from http://dev.mysql.com/doc/refman/5.0/en/join.html
[...] we consider each comma in a list of table_reference items as equivalent to an inner join
And
INNER JOIN and , (comma) are semantically equivalent in the absence of a join condition: both produce a Cartesian product between the specified tables (that is, each and every row in the first table is joined to each and every row in the second table).
However, the precedence of the comma operator is less than of INNER JOIN, CROSS JOIN, LEFT JOIN, and so on. If you mix comma joins with the other join types when there is a join condition, an error of the form Unknown column 'col_name' in 'on clause' may occur. Information about dealing with this problem is given later in this section.
In general there are quite a few things mentioned there, that should make you consider not using commas.
The first method is the ANSI/ISO version of the Join. The second method is the older format (pre-89) to produce the equivalent of an Inner Join. It does this by cross joining all the tables you list and then narrowing the Cartesian product in the Where clause to produce the equivalent of an inner join.
I would strongly recommend against the second method.
It is harder for other developers to read
It breaks the rule of least astonishment to other developers who will wonder whether you simply did not know any better or if there was some specific reason for not using the ANSI/ISO format.
It will cause you grief when you start trying to use that format with something other than Inner Joins.
It makes it harder to discern your intent especially in a large query with many tables. Are all of these tables supposed to be inner joins? Did you miss something in the Where clause and create a cross join? Did you intend to make a cross join? Etc.
There is simply no reason to use the second format and in fact many database systems are ending support for that format.
ANSI Syntax
Both queries are JOINs, and both use ANSI syntax but one is older than the other.
Joins using with the JOIN keyword means that ANSI-92 syntax is being used. ANSI-89 syntax is when you have tables comma separated in the FROM clause, and the criteria that joins them is found in the WHERE clause. When comparing INNER JOINs, there is no performance difference - this:
SELECT *
FROM table_1 t1, table_2 t2
WHERE t1.column = t2.column
...will produce the same query plan as:
SELECT *
FROM TABLE_1 t1
JOIN TABLE_2 t2 ON t2.column = t1.column
Apples to Oranges
Another difference is that the two queries are not identical - a LEFT [OUTER] JOIN will produce all rows from TABLE_1, and references to TABLE_2 in the output will be NULL if there's no match based on the JOIN criteria (specified in the ON clause). The second example is an INNER JOIN, which will only produce rows that have matching records in TABLE_2. Here's a link to a visual representation of JOINs to reinforce the difference...
Pros/Cons
The main reason to use ANSI-92 syntax is because ANSI-89 doesn't have any OUTER JOIN (LEFT, RIGHT, FULL) support. ANSI-92 syntax was specifically introduced to address this shortcoming, because vendors were implementing their own, custom syntax. Oracle used (+); SQL Server used an asterisk on the side of the equals in the join criteria (IE: t1.column =* t2.column).
The next reason to use ANSI-92 syntax is that it's more explicit, more readable, while separating what is being used for joining tables vs actual filteration.
I personally feel the explicit join syntax (A JOIN B, A LEFT JOIN B) is preferable. Both because it's more explicit about what you're doing, and because if you use implicit join syntax for inner joins, you still have to use the explicit syntax for outer joins and thus your SQL formatting will be inconsistent.