I am trying to get one table, along with the lowest value of a column of another table by LEFT JOIN. I am using subquery to do this.
Sample Snippet:
SELECT *
FROM A
JOIN
(select A_id,
MIN(id) AS complete_date
from C
group by A_id) B ON (A.id=B.A_id)
WHERE A.status="complete";
Is there any possible and efficient way to achieve this without subquery and group by.
A correlated subquery -- with the right indexes -- is often the fastest approach:
SELECT A.*,
(SELECT MIN(C.id)
FROM C
WHERE A.id = C.A_id
) as complete_date
FROM A
WHERE A.status = 'complete;
This avoids the aggregation on an entire table, which is why there is a performance gain.
The index you need is on C(A_Id, id) (the second column is not as important as the first). You may also want an index on A(status).
Related
so I've been working on optimize the inner join with the subquery which has group by statement. the query below it takes around 1.8 to 2 sec to fetch. I would like to optimize it and I think the subquery would be a key.
I'm not sure the subquery with group by in inner join can use index to join the other table. what I believe is that column in subquery (which is A2, C2 in this case) can not have a its own index inside inner join. is that correct?
so, my question is how can I optimize this query statement and is that possible to set index on A2, C2 in inner join.
SELECT A, C, X.S
FROM tb_g X
INNER JOIN (
SELECT A AS A2, MAX(C) AS C2
FROM tb_g
GROUP BY A
) Y
ON X.A = Y.A2 AND X.C = Y.C2;
A composite index on A, C should allow this query to be optimized as well as possible:
ALTER TABLE tb_g ADD INDEX (A, C);
This index allows the subquery to be calculated entirely with the index, and then the join with the intermediate table can be done with optimal fetching in the original table.
I have this two queries following and noticed they have a huge performance difference
Query1
SELECT count(distinct b.id) FROM tableA as a
LEFT JOIN tableB as b on a.id = b.aId
GROUP BY a.id
Query2
SELECT count(distinct b.id) FROM tableA as a
LEFT JOIN (SELECT * FROM tableB) as b on a.id = b.aId
GROUP BY a.id
the queries are basically joining one table to another and I noticed that Query1 takes about 80ms whereas Query2 takes about 2sec with thousands of data in my system. Could anyone explain me why this happens ? and if it's a wise choice to use only Query2 style whenever I am forced to use it ? or is there a better way to do the same thing but better than Query2 ?
When you replace tableB with (SELECT * FROM tableB) you are forcing the query engine to materialize a subquery, or intermediate table result. In other words, in the second query, you aren't actually joining directly to tableB, you are joining to some intermediate table. As a result of this, any indices which might have existed on tableB to make the query faster would not be available. Based on your current example, I see no reason to use the second version.
Under certain conditions you might be forced to use the second version though. For example, if you needed to transform tableB in some way, you might need a subquery to do that.
I have a complex query which results in a table which includes a time column. There are always two rows with the same time:
The result also contains a value column. The value of two rows with the same time is always different.
I now want to extend the query to join the rows with the same time together. So my thought was to join the derived table like this:
SELECT A.time, A.value AS valueA, B.value as valueB FROM
(
OLD_QUERY
) AS A INNER JOIN A AS B ON
A.time=B.time AND
A.value <> B.value;
However, the JOIN A AS B part of the query does not work. A is not recognized as the derived table. MySQL is searching for a table A in the database and does not find it.
So the question is: How can I join a derived table?
You cannot join a single reference to a table (or subquery) to itself; a subquery must be repeated.
Example: You cannot even do
SELECT A.* FROM sometable AS A INNER JOIN A ...
The A after the INNER JOIN is invalid unless you actually have a real table called A.
You can insert the subquery's results into another table, and use that; but it cannot be a true TEMPORARY table, as those cannot be joined to themselves or referenced twice at all in almost any query. _By referenced twice, I mean joined, unioned, used as an "WHERE IN" subquery when it is already referenced in the FROM.
If nothing else distinguishes the rows, you can just use aggregation to get the two values:
select time, min(value), max(value)
from (<your query here>) a
group by time;
In MySQL 8+, you can use a cte:
with a as (
<your query here>
)
select a1.time, a1.value, a2.value
from a a1 join
a a2
on a1.time = a2.time and a1.value <> a2.value;
Im trying to do two queries on the same table to get the Count(*) value.
I have this
SELECT `a`.`name`, `a`.`points` FROM `rank` AS a WHERE `id` = 1
And in the same query I want to do this
SELECT `b`.`Count(*)` FROM `rank` as b WHERE `b`.`points` >= `a`.`points`
I tried searching but did not find how to do a Count(*) in the same query.
Typically you would not intermingle a non aggregate and aggregate query together in MySQL. You might do this in databases which support analytic functions, such as SQL Server, but not in (the current version of) MySQL. That being said, your second query can be handled using a correlated subquery in the select clause the first query. So you may try the following:
SELECT
a.name,
a.points,
(SELECT COUNT(*) FROM rank b WHERE b.points >= a.points) AS cnt
FROM rank a
WHERE a.id = 1;
As I understand from the question, you want to find out in a table for a given id how many rows have the points greater than this row. This can be achieved using full join.
select count(*) from rank a join rank b on(a.id != b.id) where a.id=1 and b.points >= a.points;
I am trying to make a view of records in t1 where the source id from t1 is not in t2.
Like... "what records are not present in the other table?"
Do I need to include t2 in the FROM clause? Thanks
SELECT t1.fee_source_id, t1.company_name, t1.document
FROM t1
WHERE t1.fee_source_id NOT IN (
SELECT t1.fee_source_id
FROM t1 INNER JOIN t2 ON t1.fee_source_id = t2.fee_source
)
ORDER BY t1.aif_id DESC
You're looking to effect an anti-join, for which there are three possibilities in MySQL:
Using IN:
SELECT fee_source_id, company_name, document
FROM t1
WHERE fee_source_id NOT IN (SELECT fee_source FROM t2)
ORDER BY aif_id DESC
Using EXISTS:
SELECT fee_source_id, company_name, document
FROM t1
WHERE NOT EXISTS (
SELECT * FROM t2 WHERE t2.fee_source = t1.fee_source_id LIMIT 1
)
ORDER BY aif_id DESC
Using JOIN:
SELECT t1.fee_source_id, t1.company_name, t1.document
FROM t1 LEFT JOIN t2 ON t2.fee_source = t1.fee_source_id
WHERE t2.fee_source IS NULL
ORDER BY t1.aif_id DESC
According to #Quassnoi's analysis:
Summary
MySQL can optimize all three methods to do a sort of NESTED LOOPS ANTI JOIN.
It will take each value from t_left and look it up in the index on t_right.value. In case of an index hit or an index miss, the corresponding predicate will immediately return FALSE or TRUE, respectively, and the decision to return the row from t_left or not will be made immediately without examining other rows in t_right.
However, these three methods generate three different plans which are executed by three different pieces of code. The code that executes EXISTS predicate is about 30% less efficient than those that execute index_subquery and LEFT JOIN optimized to use Not exists method.
That’s why the best way to search for missing values in MySQL is using a LEFT JOIN / IS NULL or NOT IN rather than NOT EXISTS.
However, I'm not entirely sure how this analysis reconciles with the MySQL manual section on Optimizing Subqueries with EXISTS Strategy which (to my reading) suggests that the second approach above should be more efficient than the first.
Another option below (similar to anti-join)... Great answer above though. Thanks!
SELECT D1.deptno, D1.dname
FROM dept D1
MINUS
SELECT D2.deptno, D2.dname
FROM dept D2, emp E2
WHERE D2.deptno = E2.deptno
ORDER BY 1;