inner join optimization after group by the subquery - mysql

so I've been working on optimize the inner join with the subquery which has group by statement. the query below it takes around 1.8 to 2 sec to fetch. I would like to optimize it and I think the subquery would be a key.
I'm not sure the subquery with group by in inner join can use index to join the other table. what I believe is that column in subquery (which is A2, C2 in this case) can not have a its own index inside inner join. is that correct?
so, my question is how can I optimize this query statement and is that possible to set index on A2, C2 in inner join.
SELECT A, C, X.S
FROM tb_g X
INNER JOIN (
SELECT A AS A2, MAX(C) AS C2
FROM tb_g
GROUP BY A
) Y
ON X.A = Y.A2 AND X.C = Y.C2;

A composite index on A, C should allow this query to be optimized as well as possible:
ALTER TABLE tb_g ADD INDEX (A, C);
This index allows the subquery to be calculated entirely with the index, and then the join with the intermediate table can be done with optimal fetching in the original table.

Related

MySql Join tables using aggregate(min) in where condition, without subquery

I am trying to get one table, along with the lowest value of a column of another table by LEFT JOIN. I am using subquery to do this.
Sample Snippet:
SELECT *
FROM A
JOIN
(select A_id,
MIN(id) AS complete_date
from C
group by A_id) B ON (A.id=B.A_id)
WHERE A.status="complete";
Is there any possible and efficient way to achieve this without subquery and group by.
A correlated subquery -- with the right indexes -- is often the fastest approach:
SELECT A.*,
(SELECT MIN(C.id)
FROM C
WHERE A.id = C.A_id
) as complete_date
FROM A
WHERE A.status = 'complete;
This avoids the aggregation on an entire table, which is why there is a performance gain.
The index you need is on C(A_Id, id) (the second column is not as important as the first). You may also want an index on A(status).

How to join a derived table

I have a complex query which results in a table which includes a time column. There are always two rows with the same time:
The result also contains a value column. The value of two rows with the same time is always different.
I now want to extend the query to join the rows with the same time together. So my thought was to join the derived table like this:
SELECT A.time, A.value AS valueA, B.value as valueB FROM
(
OLD_QUERY
) AS A INNER JOIN A AS B ON
A.time=B.time AND
A.value <> B.value;
However, the JOIN A AS B part of the query does not work. A is not recognized as the derived table. MySQL is searching for a table A in the database and does not find it.
So the question is: How can I join a derived table?
You cannot join a single reference to a table (or subquery) to itself; a subquery must be repeated.
Example: You cannot even do
SELECT A.* FROM sometable AS A INNER JOIN A ...
The A after the INNER JOIN is invalid unless you actually have a real table called A.
You can insert the subquery's results into another table, and use that; but it cannot be a true TEMPORARY table, as those cannot be joined to themselves or referenced twice at all in almost any query. _By referenced twice, I mean joined, unioned, used as an "WHERE IN" subquery when it is already referenced in the FROM.
If nothing else distinguishes the rows, you can just use aggregation to get the two values:
select time, min(value), max(value)
from (<your query here>) a
group by time;
In MySQL 8+, you can use a cte:
with a as (
<your query here>
)
select a1.time, a1.value, a2.value
from a a1 join
a a2
on a1.time = a2.time and a1.value <> a2.value;

How to join 3 tables where each has the key to the next in line

Imagine the following scenario:
There are 3 tables A, B and C.
Table A has no knowledge of either table B and table C.
Table B has a foreign key to table A.
Table C has foreign key to table B.
In table B as well as in table C there can be multiple items sharing the same foreign key value.
As you can see, the items from C are indirectly referenced to A through B.
What I want is to get all entries from A that are referenced in C but without any information from B or C in my result tables and without duplicates.
Is this even possible?
I have tried this like so but have no idea if it is correct:
select tableA.*
from tableA,
(select distinct tableB.AId as Aid
from tableB left join tableC on tableC.BId = tableB.id
group by tableB.id)
as temp
where tableA.id = temp.Aid
I am not sure if I understand it correctly, but you can try this one:
SELECT DISTINCT `A`.`id`, `A`.`value1`, `A`.`value2` FROM `A`
INNER JOIN `B` ON `B`.`id-a` = `A`.`id`
INNER JOIN `C` ON `C`.`id-b` = `B`.`id`
It returns all values from table A if there is a key on Table C which is linked to Table B with corresponding foreign key on table A
An alternative approach to Masoud's good response would be to use an exists though a correlated subquery.
The below subquery joins B to C in a correlated fashion (notice the B.IDA to A.ID and A is outside the subquery).
If we assume good database design, then A will not have duplicate records, thus we can omit a distinct here since we are not joining A to the other tables. Instead we are simply checking for the existence of an "A" record in the B table which must have a record in the C table due to the inner join. This has two advantages for performance
It doesn't have to join all the records together which would then
necessitate a distinct; thus you don't have the performance hit on
the distinct.
It can early escape. once a key value of A is found in the
subquery (B to C join) , it can stop looking and thus don't have to join all of B to all of A.
We select "1" in the subquery as we don't care what we select as the value will not be used anywhere. We're just using the coloration of A to (B JOIN C) to determine what in A to display.
SELECT A.*
FROM A
WHERE EXISTS( SELECT 1
FROM C
INNER JOIN B
on C.IDB = B.ID)
AND B.IDA = A.ID)
Taking what you tried and reviewing it:
select tableA.*
from tableA,
(select distinct tableB.AId as Aid
from tableB left join tableC on tableC.BId = tableB.id
group by tableB.id)
as temp
where tableA.id = temp.Aid
Starting with the "FROM"
You have tableA, (subquery) temp. This is a CROSS JOIN meaning all records from A will be joined to ALL records of (B JOIN C) so if you have 1000 records in A and 1000 records in the temp result then you'd be telling the database engine to generate 1000*1000 records in your result set; which then gets filtered to only include records matching in temp and A. The engine may be smart enough to avoid the cross join and optimize the query, but I find it confusing to maintain. So I would rewrite as
SELECT tableA.*
FROM tableA
INNER JOIN (SELECT distinct tableB.AId as Aid
FROM tableB left join tableC on tableC.BId = tableB.id
GROUP BY tableB.id) as temp
ON tableA.id = temp.Aid
Looking at the subquery (temp)
We don't need a group by as we are not aggregating. The distinct does bring us down to 1 record but at a cost to execution time.
So I would re-write as this:
SELECT tableA.*
FROM tableA
INNER JOIN (SELECT distinct tableB.AId as Aid
FROM tableB
LEFT JOIN tableC
on tableC.BId = tableB.id) as temp
ON tableA.id = temp.Aid
Then looking at the whole, if we change the outer query join to temp and make it an exists... using coloration we don't have the performance hit of the join, nor the distinct. and I'd switch the left join to an inner as we only want records in C and B so we'd have null in B if we left it as a "LEFT JOIN" which serve no purpose for us.
This gets me to the answer I initially provided.
SELECT tableA.*
FROM tableA
WHERE EXISTS (SELECT 1
FROM tableB
INNER JOIN tableC
on tableC.BId = tableB.id
AND tableB.AID = A.ID) as temp

MySQL trying to reuse results of subquery in an efficient way

I have a query like this:
SELECT q,COUNT(x),y,
(SELECT i FROM (SELECT q,w FROM tableA WHERE conds)
JOIN tableC ON (cond)
WHERE id = t.q)
FROM (SELECT q,w FROM tableA WHERE conds) t
JOIN tableB
GROUP BY q
The subquery (SELECT q,w FROM tableA WHERE conds) returns several hundred rows. After the GROUP BY q there is around 20 rows left.
The subquery (SELECT i FROM (SELECT q,w FROM tableA WHERE conds) join tableC WHERE id = t.q) uses inside of it the exactly same subquery as the one above, but then also selects a fraction of the results based on which q value is currently being grouped.
My problem seems to be this. The performance is too slow because I can't seem to put the WHERE id = t.q inside the (SELECT q,w, FROM Table A WHERE conds) subquery. I can only guess that for every unique value of q, the query is being run, it produces hundreds of rows and then has to perform the WHERE clause on an un-indexed temporary table. I think I need to perform the WHERE before the full join
Any ideas please?
This query could produce the same results, but so much information is missing from the question, who can be sure?
Select
q,
count(x),
y,
i
From
tableA a
inner join
tableC c
on cond and c.id = a.q
cross join -- is this an inner join?
tableB b
Where
conds
Group By
q,
y,
i

Transforming a Complicated Requirement into a SQL Query

I am having trouble with the relational algebra and transformation into SQL of this rather complicated query:
I need to select all values from table A joined to table B where there are no matching records in table B, or there are matching records but the set of matching records do not have a field that contains one of 4 of a possible 8 total values.
Database is MySQL 5.0... using an InnoDB engine for the tables.
Select
a.*
from
a
left join
b
on
a.id=b.id
where
b.id is null
or
b.field1 not in ("value1","value2","value3","value4");
I'm not sure if there is any real performance improvement but one other way is:
SELECT
*
FROM
tableA
WHERE
id NOT IN ( SELECT id FROM tableB WHERE field1 NOT IN ("value1", "value2"));
Your requirements are a bit unclear. My 1st interpretation is that you only want the A columns, and never more than 1 instance of a given A row.
select * from A where not exists (
select B.id
from B
where B.id=A.id
and B.field in ('badVal1','badVal2','badVal3','badVal4')
)
My 2nd interpretation is you want all columns from (A outer joined to B), with perhaps more than one instance of an A row if there are multiple B rows, as long as not exists B row with forbidden value.
select * from A
left outer join B on A.id=B.id
where not exists (
select C.id
from B as C
where A.id=C.id
and C.field in ('badVal1','badVal2','badVal3','badVal4')
)
Both queries could be expressed using NOT IN instead of correlated NOT EXISTS. Its hard to know which would be faster without knowing the data.