Let's say I have two tables A and B and the following query:
select *
from A
inner join B on A.id = B.id
Where A.id = 5
Does mysql first performs the join or the where?
Edit:
Cause if for example A contains 1000 rows, after the where condition it'll contain only 1 row.
Performing join on a 1 row table is much more efficient so it seems like performing the where first and only then the join is more efficient.
The join happens before the where, however...
The where clause is a filter for all rows returned by the join, but the optimizer will recognise that if an index exists on A.id, it will be used to retrieve rows from A that match, then the join will happen, then theoretically the where clause will filter the results, but again the optimizer will recognise that the condition will already be met so it will skip it as a filter.
All that said, the optimizer will always return the same result as would be returned without the optimizer.
Related
I have this two queries following and noticed they have a huge performance difference
Query1
SELECT count(distinct b.id) FROM tableA as a
LEFT JOIN tableB as b on a.id = b.aId
GROUP BY a.id
Query2
SELECT count(distinct b.id) FROM tableA as a
LEFT JOIN (SELECT * FROM tableB) as b on a.id = b.aId
GROUP BY a.id
the queries are basically joining one table to another and I noticed that Query1 takes about 80ms whereas Query2 takes about 2sec with thousands of data in my system. Could anyone explain me why this happens ? and if it's a wise choice to use only Query2 style whenever I am forced to use it ? or is there a better way to do the same thing but better than Query2 ?
When you replace tableB with (SELECT * FROM tableB) you are forcing the query engine to materialize a subquery, or intermediate table result. In other words, in the second query, you aren't actually joining directly to tableB, you are joining to some intermediate table. As a result of this, any indices which might have existed on tableB to make the query faster would not be available. Based on your current example, I see no reason to use the second version.
Under certain conditions you might be forced to use the second version though. For example, if you needed to transform tableB in some way, you might need a subquery to do that.
I have a complex query which results in a table which includes a time column. There are always two rows with the same time:
The result also contains a value column. The value of two rows with the same time is always different.
I now want to extend the query to join the rows with the same time together. So my thought was to join the derived table like this:
SELECT A.time, A.value AS valueA, B.value as valueB FROM
(
OLD_QUERY
) AS A INNER JOIN A AS B ON
A.time=B.time AND
A.value <> B.value;
However, the JOIN A AS B part of the query does not work. A is not recognized as the derived table. MySQL is searching for a table A in the database and does not find it.
So the question is: How can I join a derived table?
You cannot join a single reference to a table (or subquery) to itself; a subquery must be repeated.
Example: You cannot even do
SELECT A.* FROM sometable AS A INNER JOIN A ...
The A after the INNER JOIN is invalid unless you actually have a real table called A.
You can insert the subquery's results into another table, and use that; but it cannot be a true TEMPORARY table, as those cannot be joined to themselves or referenced twice at all in almost any query. _By referenced twice, I mean joined, unioned, used as an "WHERE IN" subquery when it is already referenced in the FROM.
If nothing else distinguishes the rows, you can just use aggregation to get the two values:
select time, min(value), max(value)
from (<your query here>) a
group by time;
In MySQL 8+, you can use a cte:
with a as (
<your query here>
)
select a1.time, a1.value, a2.value
from a a1 join
a a2
on a1.time = a2.time and a1.value <> a2.value;
If I have this query:
select * from tableA
left outer join tableB on tableA.id=tableB.id
AND tableB.foo = 1
where tableA.owner=10
I get 29 results, but if I move that AND into the WHERE clause like:
select * from tableA
left outer join tableB on tableA.id=tableB.id
where tableA.owner=10
AND tableB.foo = 1
I then get only 17 results.
I've looked all around and cannot find a definitive guide as to how using the AND differs when you use it in the JOIN versus the WHERE clause. Can anyone explain this to me?
Also, if I do something like AND tableB.foo = NULL in the JOIN all of my tableB.foo fields are NULL in the query results, even if they are not null in the table. Does having the AND in the JOIN clause change that field in the FROM selection before being filtered by the WHERE clause?
All of the criteria for the table you are outer-joining to should be in the JOIN clause (like your first query). Putting it in the WHERE clause (like your second query) implicitly converts the OUTER JOIN to an INNER JOIN.
As for your question about AND tableB.foo = NULL that is not proper MySQL syntax. NULLs require special treatment, using operators like IS NULL. You should use AND tableB.foo IS NULL instead.
An outer join joins just the same as an inner join. With the addition that when there is no match for a record, a dummy record with all columns set to null gets joined (so you still get the row from the first table in your results).
In your first query you are looking for matches in tableB with the same ID and foo = 1. For records in tableA with no such match you still get a result row (with all tableA fields null).
In the second query you are looking for matches in tableB with the same ID. For records in tableA with no such match you still get a result row (with all tableA fields null). Then in your where clause you only keep rows with foo = 1. This dismisses all outer-joined records (because their foo is null) and you are where you would have been with a plain inner join.
So always put all criteria on an outer-joined table in the ON clause. (There is one exception though; an anti join, but you can learn that pattern another time.)
I am trying the below queries where TABLE B is empty AND records in TABLEA
--This query fetched no records
SELECT TABLEA.COLA,TABLEA.COLB FROM TABLEA
LEFT JOIN TABLEB
ON TABLEA.ID=TABLEB.ID
WHERE TABLEB.COL1<>'XYZ'
--This query fetched records .
SELECT COL1 FROM
(
SELECT TABLEA.COLA,TABLEA.COLB FROM TABLEA
LEFT JOIN TABLEB
ON TABLEA.ID=TABLEB.ID
)A WHERE COL1 <>'XYZ'
Could you help me why first query didnt return any records though they look same. My understanding of first query is "I did a left join so if records doesnt exist in tableb, it should be replaced with NULL values. As NULL <>'xyz' all records should be fetched right..
Placing a WHERE condition on the OUTER joined table of an OUTER JOIN effectively renders that join as an INNER JOIN. So, if there are no rows in the outer table which satisfy the condition, then no rows will be returned.
The solution then is to include any such conditions within the join itself. In the example above this is as simple as changing WHERE to AND.
The one condition that must be placed in the WHERE clause is the test for NULL, the so-called exclusion join - I.e. when you actually want to return the inverse set.
I have something in a query that I have to edit, that I don't understand.
There are 4 tables that are joined: tickets, tasks, tickets_users, users. The whole query is not important, but you have an example at the end of the post. What bugs me is this kind of code used many times in relation to other tables:
(SELECT name
FROM users
WHERE users.id=tickets_users.users_id
) AS RequesterName,
Is this a subquery with the tables users and tickets_users joined? What is this?
WHERE users.id=tickets_users.users_id
If this was a join I would have expected to see:
ON users.id = tickets_users.users_id
And how is this different from a typical join? Just use the same column definition: users.name and just join with the users table.
Can anyone enlighten me on the advanced SQL querying prowess of the original author?
The query looks like this:
SELECT
description,
(SELECT name
FROM users
WHERE users.id = tickets_users.users_id) AS RequesterName,
(SELECT description
FROM tickets
WHERE tickets.id = ticket_tasks.tickets_id) AS TicketDescription,
ticket_tasks.content AS TaskDescription
FROM
ticket_tasks
RIGHT JOIN
tickets ON ticket_tasks.tickets_id = tickets.id
INNER JOIN
tickets_users ON tickets_users.tickets_id = tickettasks.tickets_id
Thanks,
This is what is called a correlated subquery. To describe it in simple terms its doing a select inside a select.
However doing this more than once in ANY query is not recommended AT ALL.. the performance issue with this will be huge.
A correlated subquery will return a row by row comparison for each row of the select... if that doesnt make sense then think of it this way...
SELECT
id,
(SELECT id FROM tableA AS ta WHERE ta.id > t.id)
FROM
tableB AS t;
This will do for each row in tableB, every row in tableA will be selected and compared to tableB id.
NOTE:
If you have 100 rows in all 4 tables and you do a correlated subquery for each one then you are doing 100*100*100*100 row comparisons. thats 100,000,000 (one hundred million) comparisons!
A correlated subquery is NOT a join, but rather a subquery..
SELECT *
FROM
(SELECT id FROM t -- this is a subquery
) AS temp
However, JOINs are different... generally you can do it one of these two ways
This is the faster way
SELECT *
FROM t
JOIN t1 ON t1.id = t.id
This is the slower way
SELECT *
FROM t, t1
WHERE t1.id = t.id
what the second join is doing is making the Cartesian Product of the two tables and then filtering out the extra stuff in the WHERE clause as opposed to the first JOIN that filters as it joins.
For the different types of joins theres a few and all are useful in their prospective actions..
INNER JOIN (same as JOIN)
LEFT JOIN
RIGHT JOIN
LEFT OUTER JOIN
RIGHT OUTER JOIN
In mysql FULL JOIN or FULL OUTER JOIN does not exist.. so in order to do a FULL join you need to combine a LEFT and RIGHT join. See this link for a better understanding of what joins do with Venn diagrams LINK
REMEMBER this is for SQL so it includes the FULL joins as well. those don't work in MySQL.