Difference between "INNER JOIN table" and "INNER JOIN (SELECT table)"? - mysql

I work on a query in mysql that spend 30 sec to execute. The format is like this :
SELECT id
FROM table1 t1
INNER JOIN table2 t2
ON t1.id = t2.idt2
The INNER JOIN take 25 of 30 sec. When I write this like this :
SELECT id
FROM table1 t1
INNER JOIN (
SELECT idt2,col1,col2,col3
FROM table2
) t2
ON t1.id = t2.idt2
It take only 8 sec! Why does it work? I'm afraid of losing data.
(obviously, my query is more complex than this one, it's just an exemple)

Well you haven't shown us the EXPLAIN output
EXPLAIN SELECT id
FROM table1 t1
INNER JOIN table2 t2
ON t1.id = t2.idt2
this would definitly give us some insights of your query and table sctructures.
Based on your scenario, 1st query seems like you have issues with indexing.
What happened in your 2nd query is the optimizer is creating a temporary set from your subquery furthering filtering your data. I dont recommend doing that in MOST cases.
Purpose of subquery is to solve complex logic, not an instant solution for everything.

Related

How to optimize mysql on left join

I try to explain a very high level
I have two complex SELECT queries(for the sake of example I reduce the queries to the following):
SELECT id, t3_id FROM t1;
SELECT t3_id, MAX(added) as last FROM t2 GROUP BY t3_id;
query 1 returns 16k rows and query 2 returns 15k
each queries individually takes less than 1 second to compute
However what I need is to sort the results using column added of query 2, when I try to use LEFT join
SELECT
t1.id, t1.t3_Id
FROM
t1
LEFT JOIN
(SELECT t3_id, MAX(added) as last FROM t2 GROUP BY t3_id) AS t_t2
ON t_t2.t3_id = t1.t3_id
GROUP BY t1.t3_id
ORDER BY t_t2.last
However, the execution time goes up to over a 1 minute.
I like to understand the reason
what is the cause of such a huge explosion?
NOTE:
ALL the used columns on every table have been indexed
e.g. :
table t1 has index on id,t3_Id
table t2 has index on t3_id and added
EDIT1
after #Tim Biegeleisen suggestion, I change the query to the following now the query is executing in about 16 seconds. If I remove the ORDER BY it query gets executed in less than 1 seconds. The problem is that ORDER BY the sole reason for this.
SELECT
t1.id, t1.t3_Id
FROM
t1
LEFT JOIN
t2 ON t2.t3_id = t1.t3_id
GROUP BY t1.t3_id
ORDER BY MAX(t2.added)
Even though table t2 has an index on column t3_id, when you join t1 you are actually joining to a derived table, which either can't use the index, or can't use it completely effectively. Since t1 has 16K rows and you are doing a LEFT JOIN, this means the database engine will need to scan the entire derived table for each record in t1.
You should use MySQL's EXPLAIN to see what the exact execution strategy is, but my suspicion is that the derived table is what is slowing you down.
The correct query should be:
SELECT
t1.id,
t1.t3_Id,
MAX(t2.added) as last
FROM t1
LEFT JOIN t2 on t1.t3_Id = t2.t3_Id
GROUP BY t2.t3_id
ORDER BY last;
This is happen because a temp table is generating on each record.
I think you could try to order everything after the records are available. Maybe:
select * from (
select * from
(select t3_id,max(t1_id) from t1 group by t3_id) as t1
left join (select t3_id,max(added) as last from t2 group by t3_id) as t2
on t1.t3_id = t2.t3_id ) as xx
order by last

Which Query is faster if we put the "Where" inside the Join Table or put it at the end?

Ok, I am using Mysql DB. I have 2 simple tables.
Table1
ID-Text
12-txt1
13-txt2
42-txt3
.....
Table2
ID-Type-Text
13- 1 - MuTxt1
42- 1 - MuTxt2
12- 2 - Xnnn
Now I want to join these 2 tables to get all data for Type=1 in table 2
SQL1:
Select * from
Table1 t1
Join
(select * from Table2 where Type=1) t2
on t1.ID=t2.ID
SQL2:
Select * from
Table1 t1
Join
Table2 t2
on t1.ID=t2.ID
where t2.Type=1
These 2 queries give the same result, but which one is faster?
I don't know how Mysql does the Join (or How the Join works in Mysql) & that why I wonder this!!
Exxtra info, Now if i don't want type=1 but want t2.text='MuTxt1', so Sql2 will become
Select * from
Table1 t1
Join
Table2 t2
on t1.ID=t2.ID
where t2.text='MuTxt1'
I feel like this query is slower??
Sometimes the MySQL query optimizer does a pretty decent job and sometimes it sucks. Having said that, there are exception to my answer where the optimizer optimizes something else better.
Sub-Queries are generally expensive as MySQL will need to execute and store results seperately. Normally if you could use a sub-query or a join, the join is faster. Especially when using sub-query as part of your where clause and don't put a limit to it.
Select *
from Table1 t1
Join Table2 t2 on t1.ID=t2.ID
where t2.Type=1
and
Select *
from Table1 t1
Join Table2 t2
where t1.ID =t2.ID AND t2.Type=1
should perform equally well, while
Select *
from Table1 t1
Join (select *
from Table2
where Type=1) t2
on t1.ID=t2.ID
most likely is a lot slower as MySQL stores the result of select * from Table2 where Type=1 into a temporary table.
Generally joins work by building a table comprised of all combinations of rows from both table and afterwards removing lines which do not match the conditions. MySQL of course will try to use indexes containing the columns compared in the on clause and specified in the where clause.
If you are interested in which indexes are used, write EXPLAIN in front of your query and execute.
As per my view 2nd query is more better than first query in terms of code readability and performance. You can include filter condition in Join clause also like
Select * from
Table1 t1
Join
Table2 t2 on t1.ID=t2.ID and t2.Type=1
You can compare execution time for all queries in SQL fiddle here :
Query 1
Query 2
My Query
I think this question is hard to answer since we don't exactly know the internals of the query parser in the database. Usually these kind of constructions are evaluated by the database in a similar way (it can see that the first and second query are identical so parses it correctly, or not).
I would write the second one since it is more clear what is happening.

Left join part of the table

I am trying to join two table using left join, that is table1 left join table2.
I would only like part of the rows from A to be joined with B. Is it recommended that i use a sub query to filter rows from table1 or avoid them in where clause to improve my query performance?
select t1.a
,t1.b
,t2.c
from (select *
from table1
where a='x'
) t1 LEFT JOIN table2 t2 on t1.d=t2.d
or
select t1.a
,t1.b
,t2.c
from table1 t1 LEFT JOIN table2 t2 on t1.d=t2.d
where t1.a='x'
Check the query plan but I doubt it would make any difference.
It very depends on the structure and content of your database. The best way is to look into the query plan and compare it for both versions of your query.
You can find this documentation useful: MySQL Query Execution Plan

Sql queries difference

I would like to know whether this two versions are equivalent in result and which is better for performance reasons and why?
Nested Select in Select version
select
t1.c1,
t1.c2,
(select Count(t2.c1) from t2 where t2.id = t1.id) as count_t
from
t1
VS
select t1.c1,t1.c2, Count(t2.c1)
from t1,t2
where t2.id= t1.id
The first query is analog of this query -
SELECT
t1.c1,
t1.c2,
COUNT(t2.c1)
FROM t1
LEFT JOIN t2
ON t2.id = t1.id;
It selects all records from first table, and all matched records from second table (it is LEFT JOIN condition).
The second is analog of this query -
SELECT
t1.c1,
t1.c2,
COUNT(t2.c1)
FROM t1
JOIN t2
ON t2.id = t1.id;
It selects only matched records in both tables (it is INNER JOIN condition).
Well they are different queries. The top one will select all rows from t1 returning 0 for the count if there is no matching id in table t2.
The second query will only return rows where t1 and t2 both have a row with the same id.
The first query will likely suffer from performance issues on large data sets. The second query will potentially have a Cartesian issue. I would go with a join or left join based on your intent to have records from table 1 if table 2 has no related records and then add a group by statement to control the Cartesian.

Shorten a join query

I have a query with 3 joins:
SELECT t1.email, t2.firstname, t2.lastname, t4.value
FROM t1
left join t2 on t1.email = t2.email
Inner join t3 on t2.entity_id = t3.order_id
Inner join t4 on t3.product_id = t4.entity_id
WHERE t4.attribute_id = 126
I think my server just can't make it :) --> time is running out so an error occurs!
Thanks a lot
Table structur:
T1:
email (which is the same then in t2)
T2:
email firstname lastname orderid (which is called entity id in t3)
T3:
entityid product id (which is called entity id in t4)
T4:
entityid attributeid value
Unless t2 links straight to t4 there is no way.
Also, do you need a left join between t1 and t2?
As #Sachin already stated, you can't "shorten" this query unless t2 links straight to t4 without requiring a comparison with t3. However, in order to speed up your query, you should have indexes on some or all of the columns referenced in your join conditions (i.e. t1.email, t2.email, t2.entity_id, etc).
Having an index on each of these columns will give you much faster SELECT queries, but it will slow down your INSERT and UPDATE queries. So if you SELECT more often than you INSERT or UPDATE, then you should definitely be using indexes. If not, try to make indexes in wise places (tables that have INSERT or UPDATE statements run less often but still have a lot of rows, for instance).
For further clarification, see the following links:
More information on how indexes work
Syntax for creating indexes
Try your query this way:
SELECT t1.email, t2.firstname, t2.lastname, t4.value
FROM t4
INNER JOIN t3 ON t3.product_id = t4.entity_id
INNER JOIN t2 ON t2.entity_id = t3.order_id
INNER JOIN t1 ON t1.email = t2.email
WHERE t4.attribute_id = 126
It's basically your query but "backwards". Your original way, your DBMS has to try to join t2 for ALL records in t1, then join t3 for ALL records found in t2 before it can even attempt to address your WHERE clause.
My way, you're finding all the records in t4 where attribute_id = 126 first, THEN attempting to join other tables. It should be a lot quicker. You should then be able to speed things up even more by making sure the proper indexes exist on the tables involved. You can prepend the keyword EXPLAIN to your query to see how the DBMS attempts to seek data in your query.