inner join on condition or using where? - mysql

i have this query
SELECT t1.col1
FROM table1 t1
INNER JOIN table2 t2 ON t2.col1 = t1.col1
WHERE t2.col IN (1, 2, 3)
and this query
SELECT t1.col1
FROM table1 t1
INNER JOIN table2 t2 ON t2.col1 = t1.col1 AND t2.col IN (1, 2, 3)
both game me the same execution plan and told me
Using where;
does that mean that my 2nd query is converted to the first form from the optimizer and that i should follow the first form?
and is there a way to check the optimizer new query?

SQL is not a procedural language. It is a descriptive language. A query describes the result set that you want to produce.
With an inner join, the two queries in your question are identical -- they produce the same result set under all circumstances. Which to prefer is a stylistic preference. MySQL should treat the two the same way from an optimization perspective.
One preference is that filters on a single table are more appropriate for WHERE and ON.
With an outer join, the two queries are not the same, and you should use the one that expresses your intent.

The first one filters after joining the two tables; while the second one joins using the filtering of the values as a condition of joining the two tables. You may check this thread:
SQL join: where clause vs. on clause

You can check how much does it take to do both queries, but anyway, you can write the query in both ways, the optimizer will try to
optimize the query either if it is the first query or the second one.

I like this rule:
ON ... says how the tables are related.
WHERE ... filters the results.
This becomes important with LEFT JOIN because ON does not act as a filter.
This shows you that the Optimizer treats them the same:
EXPLAIN EXTENDED SELECT ...
SHOW WARNINGS;

Related

Does adding join condition on two different tables (excluding the table to be joined) slows down query and performance

I have 3 tables in mySQL => table1, table2 and table3 and the data in all three tables is large (>100k)
My join condition is :
select * from table1 t1
join table2 t2 on t1.col1 = t2.col1
join table3 t3 on t3.col2 = t2.col2 and t3.col3 = t1.col3
This query renders result very slow and according to me the issue is in the second join condition as if I remove the second condition, the query renders result instantly.
Can anyone please explain the reason of the query being slow?
Thanks in advance.
Do you have these indexes?
table2: (col1)
table3: (col2, col3) -- in either order
Another tip: Don't use * (as in SELECT *) unless you really need all the columns. It prevents certain optimizations. If you want to discuss this further, please provide the real query and SHOW CREATE TABLE for each table.
If any of the columns used for joining are not the same datatype, character set, and collation, then indexes may not be useful.
Please provide EXPLAIN SELECT ...; it will give some clues we can discuss.
How many rows in the resultset? Sounds like over 100K? If so, then perhaps the network transfer time is the real slowdown?
Since the second join is over both tables (two joins) it creates more checks on evaluation. This is creating a triangle rather than a long joined line.
Also, since all three tables have ~100K lines, even with clustered index on the given columns, it's bound to have a performance hit, also due to all columns being retrieved.
At least, have the select statement as T1.col1, T1.col2...,T2.col1... and so on.
Also have distinct indexes on all columns used in join condition.
More so, do you really want a huge join without a where clause? Try adding restrictive conditions for each table and see the magic as it first filters out the available set of results from each table (100k may become 10k) and then the join is attempted.
Also check SQL Profiler output to see if a TABLE SCAN is being used (most probably yes), if so, having an INDEX SCAN should improve the situation.

MySQL Performance - JOIN ON vs JOIN USING on multiple tables

Speaking in principle, which one is faster:
SELECT FROM t1 INNER JOIN t2 INNER JOIN t3 ON t1.c = t2.c = t3.c
vs
SELECT FROM t1 INNER JOIN t2 USING (c) INNER JOIN t3 USING (c)
The easiest way for you to tell this would be to look at your explain plan. If you look at both, you'll probably see zero difference.
The using() keyword here is simply a shorthand expression. It evaluates to the same thing as your other option, and therefore makes no difference to performance.
The FROM part of any SQL statement is logically processed first before any other part of SQL statement and both statements are joining in the FROM portion (just don't add the join criteria to a WHERE clause). Assuming both have syntax correct, then the bigger question is whether these are flowing from M:1, M:1 context and whether there are indexes on both primary and foreign keys. Then run them through a query analyzer to gage the actual performance.

How to make SQL query faster?

I have big DB. It's about 1 mln strings. I need to do something like this:
select * from t1 WHERE id1 NOT IN (SELECT id2 FROM t2)
But it works very slow. I know that I can do it using "JOIN" syntax, but I can't understand how.
Try this way:
select *
from t1
left join t2 on t1.id1 = t2.id
where t2.id is null
First of all you should optimize your indexes in both tables, and after that you should use join
There are different ways a dbms can deal with this task:
It can select id2 from t2 and then select all t1 where id1 is not in that set. You suggest this using the IN clause.
It can select record by record from t1 and look for each record if it finds a match in t2. You would suggest this using the EXISTS clause.
You can outer join the table then throw away all matches and stay with the non-matching entries. This may look like a bad way, especially when there are many matches, because you would get big intermediate data and then throw most of it away. However, depending on how the dbms works, it can be rather fast, for example when it applies hash join techniques.
It all depends on table sizes, number of matches, indexes, etc. and on what the dbms makes of your query. There are dbms that are able to completely re-write your query to find the best execution plan.
Having said all this, you can just try different things:
the IN clause with (SELECT DISTINCT id2 FROM t2). DISTINCT can reduce the intermediate result significantly and really speed up your query. (But maybe your dbms does that anyhow to get a good execution plan.)
use an EXISTS clause and see if that is faster
the outer join suggested by Parado

mysql SELECT "FROM t1, t2" -or- "FROM t1 JOIN t2 ON"

what is the different between this queries? which one should I prefer and why?
SELECT
t1.*,
t2.x
FROM
t1,
t2
WHERE
t2.`id` = t1.`id`
or
SELECT
t1.*,
t2.x
FROM
t1
INNER JOIN # LEFT JOIN ?
t2
ON t2.`id` = t1.`id`
Does using commas has the same effect than use LEFT JOINS?
That's embarrassing. It's the first time I asked myself about this for years. I ever used the first version, but now i'm feeling like I missed some lines in my first SQL induction. ;)
The "comma" syntax is equivalent to the INNER JOIN syntax. The query optimizer should be running the same query for you regardless of how you ask for it. There's a general recommendation to do your joins using the JOIN language and do your filtering using the WHERE language as it makes your intention more clear
The queries are the same, the first is just short-hand syntaxfor the second. LEFT JOIN is something entirely different.
INNER JOIN only shows the records common to both tables. LEFT JOIN takes all records from the left table and match it to those of the right table. If no records in the right table match, NULL will be selected in their place.
inner joins are left joins that skip on empty (null) results in the second table.
query optimizers have more possibilites when everything is done using the where clause. some rdbms don't support commands like natural join and other joins, so using "where" is something one can really rely on.
The first query is called cartisian query and usually provide undesired results in the large database.
Instead using joins will produce exactly what you want.

What are the differences between these query JOIN types and are there any caveats?

I have multiple queries (from different section of my site) i am executing
Some are like this:
SELECT field, field1
FROM table1, table2
WHERE table1.id = table2.id
AND ....
and some are like this:
SELECT field, field1
FROM table1
JOIN table2
USING (id)
WHERE ...
AND ....
and some are like this:
SELECT field, field1
FROM table1
LEFT JOIN table2
ON (table1.id = table2.id)
WHERE ...
AND ....
Which of these queries is better, or slower/faster or more standard?
The first two queries are equivalent; in the MySql world the using keyword is (well, almost - see the documentation but using is part of the Sql2003 spec and there are some differences in NULL values) the same as saying field1.id = field2.id
You could easily write them as:
SELECT field1, field2
FROM table1
INNER JOIN table2 ON (table1.id = table2.id)
The third query is a LEFT JOIN. This will select all the matching rows in both tables, and will also return all the rows in table1 that have no matches in table2. For these rows, the columns in table2 will be represented by NULL values.
I like Jeff Atwood's visual explanation of these
Now, on to what is better or worse. The answer is, it depends. They are for different things. If there are more rows in table1 than table2, then a left join will return more rows than an inner join. But the performance of the queries will be effected by many factors, like table size, the types of the column, what the database is doing at the same time.
Your first concern should be to use the query you need to get the data out. You might honestly want to know what rows in table1 have no match in table2; in this case you'd use a LEFT JOIN. Or you might only want rows that match - the INNER JOIN.
As Krister points out, you can use the EXPLAIN keyword to tell you how the database will execute each kind of query. This is very useful when trying to figure out just why a query is slow, as you can see where the database spends all of its time.
personally, i prefer using left joins in my queries, though you can run into issues in the case of null records or duplicates, but that can be resolved with a simple modification with an outer clause. it's my understanding that a join is a bit more resource intensive, but this is up for debate and might be based on personal preference.
just my $.02.
The third example, using ON (field1=field2) is the more common, and seems to be the more commonly accepted standard.
I don't know about the performance difference, you would have to run some EXPLAIN queries to see what MySQL actually ends up doing with them all really.
I do know though that the first, with WHERE being used to join them all, is much less readable on anything other than trivial queries. Once you have some complex conditions in a query, it's confusing to have "join conditions" all muddled in with "selection conditions".