mysql SELECT "FROM t1, t2" -or- "FROM t1 JOIN t2 ON" - mysql

what is the different between this queries? which one should I prefer and why?
SELECT
t1.*,
t2.x
FROM
t1,
t2
WHERE
t2.`id` = t1.`id`
or
SELECT
t1.*,
t2.x
FROM
t1
INNER JOIN # LEFT JOIN ?
t2
ON t2.`id` = t1.`id`
Does using commas has the same effect than use LEFT JOINS?
That's embarrassing. It's the first time I asked myself about this for years. I ever used the first version, but now i'm feeling like I missed some lines in my first SQL induction. ;)

The "comma" syntax is equivalent to the INNER JOIN syntax. The query optimizer should be running the same query for you regardless of how you ask for it. There's a general recommendation to do your joins using the JOIN language and do your filtering using the WHERE language as it makes your intention more clear

The queries are the same, the first is just short-hand syntaxfor the second. LEFT JOIN is something entirely different.
INNER JOIN only shows the records common to both tables. LEFT JOIN takes all records from the left table and match it to those of the right table. If no records in the right table match, NULL will be selected in their place.

inner joins are left joins that skip on empty (null) results in the second table.
query optimizers have more possibilites when everything is done using the where clause. some rdbms don't support commands like natural join and other joins, so using "where" is something one can really rely on.

The first query is called cartisian query and usually provide undesired results in the large database.
Instead using joins will produce exactly what you want.

Related

inner join on condition or using where?

i have this query
SELECT t1.col1
FROM table1 t1
INNER JOIN table2 t2 ON t2.col1 = t1.col1
WHERE t2.col IN (1, 2, 3)
and this query
SELECT t1.col1
FROM table1 t1
INNER JOIN table2 t2 ON t2.col1 = t1.col1 AND t2.col IN (1, 2, 3)
both game me the same execution plan and told me
Using where;
does that mean that my 2nd query is converted to the first form from the optimizer and that i should follow the first form?
and is there a way to check the optimizer new query?
SQL is not a procedural language. It is a descriptive language. A query describes the result set that you want to produce.
With an inner join, the two queries in your question are identical -- they produce the same result set under all circumstances. Which to prefer is a stylistic preference. MySQL should treat the two the same way from an optimization perspective.
One preference is that filters on a single table are more appropriate for WHERE and ON.
With an outer join, the two queries are not the same, and you should use the one that expresses your intent.
The first one filters after joining the two tables; while the second one joins using the filtering of the values as a condition of joining the two tables. You may check this thread:
SQL join: where clause vs. on clause
You can check how much does it take to do both queries, but anyway, you can write the query in both ways, the optimizer will try to
optimize the query either if it is the first query or the second one.
I like this rule:
ON ... says how the tables are related.
WHERE ... filters the results.
This becomes important with LEFT JOIN because ON does not act as a filter.
This shows you that the Optimizer treats them the same:
EXPLAIN EXTENDED SELECT ...
SHOW WARNINGS;

MySQL Performance - JOIN ON vs JOIN USING on multiple tables

Speaking in principle, which one is faster:
SELECT FROM t1 INNER JOIN t2 INNER JOIN t3 ON t1.c = t2.c = t3.c
vs
SELECT FROM t1 INNER JOIN t2 USING (c) INNER JOIN t3 USING (c)
The easiest way for you to tell this would be to look at your explain plan. If you look at both, you'll probably see zero difference.
The using() keyword here is simply a shorthand expression. It evaluates to the same thing as your other option, and therefore makes no difference to performance.
The FROM part of any SQL statement is logically processed first before any other part of SQL statement and both statements are joining in the FROM portion (just don't add the join criteria to a WHERE clause). Assuming both have syntax correct, then the bigger question is whether these are flowing from M:1, M:1 context and whether there are indexes on both primary and foreign keys. Then run them through a query analyzer to gage the actual performance.

Why does LIKE '%%' depend on irrelative joins?

I have a huge query which runs quite well by itself. It has a lot of join statements. So, its structure looks like this:
SELECT ... FROM mytable t
LEFT OUTER JOIN mytable2 t2 ON t2.attr = t.attr1
LEFT OUTER JOIN mytable3 t3 ON t3.attr = t.attr3
...
LEFT OUTER JOIN mytableN tN ON tN.attr = t.attrN
It runs just for a millisecond. But if I add LIKE statement:
SELECT ... FROM mytable t
LEFT OUTER JOIN mytable2 t2 ON t2.attr = t.attr1
LEFT OUTER JOIN mytable3 t3 ON t3.attr = t.attr3
...
LEFT OUTER JOIN mytableN tN ON tN.attr = t.attrN
WHERE tK.attrP LIKE '%Something%'
then it almost never ends. I could not wait till the end and had to stop it manually. But at the same time, if I rewrite the query just like this
SELECT ... FROM mytable t
LEFT OUTER JOIN mytablek tK ON tK.attr = t.attr1
WHERE tK.attrP LIKE '%Something%'
then it again starts to run like a flash. Why is that? I think, that there is no logic, that all those extra joins which have nothing to do with this field attrP have some effect on the speed of the query. I guess, I know how to optimize this query, but still I think that, the more I work with MySQL, the less I like it. Hundreds of times I struggled with something which had no reasonable explanation.
EDIT
Well, I thought that I knew how to optimize it - to use inner join in this way:
SELECT ... FROM mytable t
... bunch of joins
INNER JOIN mytablek tK ON tK.attr = t.attr1 AND tK.attrP LIKE '%Something%'
... bunch of joins
But this has no effect.
EDIT
Well, I found a solution - to use match against. But unfortunatelly this solution is not universal. In fact, match against throws an error when you try to search in a field returned by a subquery. Poor mysql
Adding WHERE tK.attrP LIKE '%Something%' probably removes records from the result set. We don't know how many, though. Maybe 1%, maybe 99%.
We don't even know if we only joined mytable with mytableK and used that clause, what percentage of the records would be affected. Would it be worth joining these tables first and with the supposedly few records left, do the other joins using just loops to get those other tables' records? Or should we better join everything first using great join algorithms on the tables and only at last filter with LIKE?
We don't know and the dbms doesn't know either.
But you notice that the dbms is fast on the pure joins, but slow when it applies the LIKE clause. So hint the dbms to do one thing first and the other later:
SELECT *
FROM
(
SELECT ... FROM mytable t
LEFT OUTER JOIN mytable2 t2 ON t2.attr = t.attr1
LEFT OUTER JOIN mytable3 t3 ON t3.attr = t.attr3
...
LEFT OUTER JOIN mytableN tN ON tN.attr = t.attrN
)
WHERE tK_attrP LIKE '%Something%';
The SQL engine has many options when running the query. One of the strengths of the language is the optimizer which chooses the "best" way to run a given query. Of course, when the engine things is best is not necessarily the best.
The second point is that your condition is turning the left joins to inner joins. So, you might as well write the query that way (for clarity).
With that background, there are two possible answers to your question. The first is that when you run the other queries, you are noting when results first appear. This is the "time-to-first-row" measurement. However, the rows that match your more complicated query are at the end of the input. And MySQL needs to process all the non-matching rows to find the matching ones. This would be particularly true if some of the intermediate results create a cartesian product for a given row in the first table.
Another possibility is that the execution plan changes. Because the left joins are really inner joins, MySQL has a lot of flexibility in rewriting them.
My next recommendation is to put the join to table mytablek as the first table, rather than the last. Perhaps that will help MySQL find the best optimization.
The second would be to use a subquery:
(select t.*
from mytablek tk
where tK.attrP LIKE '%Something%'
) tk
This may force the engine to whittle down the rows quickly and point the optimizer in a better direction.
from

Does mysql optimize the IN clause

When i execute this mysql query like
select * from t1 where colomn1 in (select colomn1 from t2) ,
what really happens?
I want to know if it executes the inner statement for every row?
PS: I have 300,000 rows in t1 and 50,000 rows in t2 and it is taking a hell of a time.
I'm flabbergasted to see that everyone points out to use JOIN as if it is the same thing. IT IS NOT!, not with the information given here. E.g. What if t2.column1 has doubles ?
=> Assuming there are no doubles in t2.column1, then yes, put a UNIQUE INDEX on said column and use a JOIN construction as it is more readable and easier to maintain. If it is going to be faster; that depends on what the query engine makes from it. In MSSQL the query-optimizer (probably) would consider them the same thing; maybe MySQL is 'not so eager' to recognize this... don't know.
=> Assuming there can be doubles in t2.column1, put a (non-unique) INDEX on said column and rewrite the WHERE IN (SELECT ..) into a WHERE EXISTS ( SELECT * FROM t2 WHERE t2.column1 = t1.column1). Again, mostly for readability and ease of maintenance; most likely the query engine will treat them the same...
The things to remember are
Always make sure you have proper indexing (but don't go overboard)
Always realize that what really happens will be an interpretation of your sql-code; not a 'direct translation'. You can write the same functionality in different ways to achieve the same goal. And some of these are indeed more resilient to different scenarios.
If you only have 10 rows, pretty much everything works. If you have 10M rows it could be worth examining the query plan... which most-likely will be different from the one with 10 rows.
A join would be quicker, viz:
select t1.* from t1 INNER JOIN t2 on t1.colomn1=t2.colomn1
Try with INNER JOIN
SELECT t1.*
FROM t1
INNER JOIN t2 ON t1.column1=t2.column1
You should do indexing in column1 and then you can use inner join
for indexing
CREATE INDEX index1 ON t1 (col1);
CREATE INDEX index2 ON t2 (col2);
select t1.* from t1 INNER JOIN t2 on t1.colomn1=t2.colomn1

Pros / Cons of MySql JOINS

When I'm selecting data from multiple tables I used to use JOINS a lot and recently I started to use another way but I'm unsure of the impact in the long run.
Examples:
SELECT * FROM table_1 LEFT JOIN table_2 ON (table_1.column = table_2.column)
So this is your basic LEFT JOIN across tables but take a look at the query below.
SELECT * FROM table_1,table_2 WHERE table_1.column = table_2.column
Personally if I was joining across lets say 7 tables of data I would prefer to do this over JOINS.
But are there any pros and cons in regards to the 2 methods ?
Second method is a shortcut for INNER JOIN.
SELECT * FROM table_1 INNER JOIN table_2 ON table_1.column = table_2.column
Will only select records that match the condition in both tables (LEFT JOIN will select all records from table on the left, and matching records from table on the right)
Quote from http://dev.mysql.com/doc/refman/5.0/en/join.html
[...] we consider each comma in a list of table_reference items as equivalent to an inner join
And
INNER JOIN and , (comma) are semantically equivalent in the absence of a join condition: both produce a Cartesian product between the specified tables (that is, each and every row in the first table is joined to each and every row in the second table).
However, the precedence of the comma operator is less than of INNER JOIN, CROSS JOIN, LEFT JOIN, and so on. If you mix comma joins with the other join types when there is a join condition, an error of the form Unknown column 'col_name' in 'on clause' may occur. Information about dealing with this problem is given later in this section.
In general there are quite a few things mentioned there, that should make you consider not using commas.
The first method is the ANSI/ISO version of the Join. The second method is the older format (pre-89) to produce the equivalent of an Inner Join. It does this by cross joining all the tables you list and then narrowing the Cartesian product in the Where clause to produce the equivalent of an inner join.
I would strongly recommend against the second method.
It is harder for other developers to read
It breaks the rule of least astonishment to other developers who will wonder whether you simply did not know any better or if there was some specific reason for not using the ANSI/ISO format.
It will cause you grief when you start trying to use that format with something other than Inner Joins.
It makes it harder to discern your intent especially in a large query with many tables. Are all of these tables supposed to be inner joins? Did you miss something in the Where clause and create a cross join? Did you intend to make a cross join? Etc.
There is simply no reason to use the second format and in fact many database systems are ending support for that format.
ANSI Syntax
Both queries are JOINs, and both use ANSI syntax but one is older than the other.
Joins using with the JOIN keyword means that ANSI-92 syntax is being used. ANSI-89 syntax is when you have tables comma separated in the FROM clause, and the criteria that joins them is found in the WHERE clause. When comparing INNER JOINs, there is no performance difference - this:
SELECT *
FROM table_1 t1, table_2 t2
WHERE t1.column = t2.column
...will produce the same query plan as:
SELECT *
FROM TABLE_1 t1
JOIN TABLE_2 t2 ON t2.column = t1.column
Apples to Oranges
Another difference is that the two queries are not identical - a LEFT [OUTER] JOIN will produce all rows from TABLE_1, and references to TABLE_2 in the output will be NULL if there's no match based on the JOIN criteria (specified in the ON clause). The second example is an INNER JOIN, which will only produce rows that have matching records in TABLE_2. Here's a link to a visual representation of JOINs to reinforce the difference...
Pros/Cons
The main reason to use ANSI-92 syntax is because ANSI-89 doesn't have any OUTER JOIN (LEFT, RIGHT, FULL) support. ANSI-92 syntax was specifically introduced to address this shortcoming, because vendors were implementing their own, custom syntax. Oracle used (+); SQL Server used an asterisk on the side of the equals in the join criteria (IE: t1.column =* t2.column).
The next reason to use ANSI-92 syntax is that it's more explicit, more readable, while separating what is being used for joining tables vs actual filteration.
I personally feel the explicit join syntax (A JOIN B, A LEFT JOIN B) is preferable. Both because it's more explicit about what you're doing, and because if you use implicit join syntax for inner joins, you still have to use the explicit syntax for outer joins and thus your SQL formatting will be inconsistent.