MySQL trying to reuse results of subquery in an efficient way - mysql

I have a query like this:
SELECT q,COUNT(x),y,
(SELECT i FROM (SELECT q,w FROM tableA WHERE conds)
JOIN tableC ON (cond)
WHERE id = t.q)
FROM (SELECT q,w FROM tableA WHERE conds) t
JOIN tableB
GROUP BY q
The subquery (SELECT q,w FROM tableA WHERE conds) returns several hundred rows. After the GROUP BY q there is around 20 rows left.
The subquery (SELECT i FROM (SELECT q,w FROM tableA WHERE conds) join tableC WHERE id = t.q) uses inside of it the exactly same subquery as the one above, but then also selects a fraction of the results based on which q value is currently being grouped.
My problem seems to be this. The performance is too slow because I can't seem to put the WHERE id = t.q inside the (SELECT q,w, FROM Table A WHERE conds) subquery. I can only guess that for every unique value of q, the query is being run, it produces hundreds of rows and then has to perform the WHERE clause on an un-indexed temporary table. I think I need to perform the WHERE before the full join
Any ideas please?

This query could produce the same results, but so much information is missing from the question, who can be sure?
Select
q,
count(x),
y,
i
From
tableA a
inner join
tableC c
on cond and c.id = a.q
cross join -- is this an inner join?
tableB b
Where
conds
Group By
q,
y,
i

Related

How to join 3 tables where each has the key to the next in line

Imagine the following scenario:
There are 3 tables A, B and C.
Table A has no knowledge of either table B and table C.
Table B has a foreign key to table A.
Table C has foreign key to table B.
In table B as well as in table C there can be multiple items sharing the same foreign key value.
As you can see, the items from C are indirectly referenced to A through B.
What I want is to get all entries from A that are referenced in C but without any information from B or C in my result tables and without duplicates.
Is this even possible?
I have tried this like so but have no idea if it is correct:
select tableA.*
from tableA,
(select distinct tableB.AId as Aid
from tableB left join tableC on tableC.BId = tableB.id
group by tableB.id)
as temp
where tableA.id = temp.Aid
I am not sure if I understand it correctly, but you can try this one:
SELECT DISTINCT `A`.`id`, `A`.`value1`, `A`.`value2` FROM `A`
INNER JOIN `B` ON `B`.`id-a` = `A`.`id`
INNER JOIN `C` ON `C`.`id-b` = `B`.`id`
It returns all values from table A if there is a key on Table C which is linked to Table B with corresponding foreign key on table A
An alternative approach to Masoud's good response would be to use an exists though a correlated subquery.
The below subquery joins B to C in a correlated fashion (notice the B.IDA to A.ID and A is outside the subquery).
If we assume good database design, then A will not have duplicate records, thus we can omit a distinct here since we are not joining A to the other tables. Instead we are simply checking for the existence of an "A" record in the B table which must have a record in the C table due to the inner join. This has two advantages for performance
It doesn't have to join all the records together which would then
necessitate a distinct; thus you don't have the performance hit on
the distinct.
It can early escape. once a key value of A is found in the
subquery (B to C join) , it can stop looking and thus don't have to join all of B to all of A.
We select "1" in the subquery as we don't care what we select as the value will not be used anywhere. We're just using the coloration of A to (B JOIN C) to determine what in A to display.
SELECT A.*
FROM A
WHERE EXISTS( SELECT 1
FROM C
INNER JOIN B
on C.IDB = B.ID)
AND B.IDA = A.ID)
Taking what you tried and reviewing it:
select tableA.*
from tableA,
(select distinct tableB.AId as Aid
from tableB left join tableC on tableC.BId = tableB.id
group by tableB.id)
as temp
where tableA.id = temp.Aid
Starting with the "FROM"
You have tableA, (subquery) temp. This is a CROSS JOIN meaning all records from A will be joined to ALL records of (B JOIN C) so if you have 1000 records in A and 1000 records in the temp result then you'd be telling the database engine to generate 1000*1000 records in your result set; which then gets filtered to only include records matching in temp and A. The engine may be smart enough to avoid the cross join and optimize the query, but I find it confusing to maintain. So I would rewrite as
SELECT tableA.*
FROM tableA
INNER JOIN (SELECT distinct tableB.AId as Aid
FROM tableB left join tableC on tableC.BId = tableB.id
GROUP BY tableB.id) as temp
ON tableA.id = temp.Aid
Looking at the subquery (temp)
We don't need a group by as we are not aggregating. The distinct does bring us down to 1 record but at a cost to execution time.
So I would re-write as this:
SELECT tableA.*
FROM tableA
INNER JOIN (SELECT distinct tableB.AId as Aid
FROM tableB
LEFT JOIN tableC
on tableC.BId = tableB.id) as temp
ON tableA.id = temp.Aid
Then looking at the whole, if we change the outer query join to temp and make it an exists... using coloration we don't have the performance hit of the join, nor the distinct. and I'd switch the left join to an inner as we only want records in C and B so we'd have null in B if we left it as a "LEFT JOIN" which serve no purpose for us.
This gets me to the answer I initially provided.
SELECT tableA.*
FROM tableA
WHERE EXISTS (SELECT 1
FROM tableB
INNER JOIN tableC
on tableC.BId = tableB.id
AND tableB.AID = A.ID) as temp

Joining 3 Subqueries

I need a little bit of help in creating this query. I'm joining TableA and TableB and getting a value out of it; then joining TableA and TableC and getting a value out if it. Finally I am substracting both values.
I'm not sure how to write this in a single query using a lot of JOIN or if I just do 2 subqueries and then substract them.
So far I have something like:
SELECT SUM(A.quantity) From TableA JOIN Table B WHERE ...
then
SELECT SUM(A.quantity) From TableA JOIN Table C WHERE ...
Given the chance that maybe TableA and TableB have no result, but TableA and TableC does, or viceversa, or maybe both have or maybe both won't, I can't just JOIN TableA and TableB and TableC
You can do this with a cross join:
select coalesce(s1.q1, 0) - coalesce(s2.q2, 0)
from (SELECT SUM(A.quantity) as q1 From TableA JOIN Table B WHERE ...) s1 cross join
(SELECT SUM(A.quantity) as q2 From TableA JOIN Table C WHERE ...) s2;
If one of the result sets returns NULL, the coalesce() treats the value as 0.

SQL add rows count from a second table to the main query

I'm trying to improve a (not so much) simple query:
I need to retrieve every row from Table A.
Then join Table A with Table B so I get all the data I need.
At the same time, I need to add an extra column with the count() from Table C.
Something like:
SELECT a.*,
(SELECT Count(*)
FROM table_c c
WHERE c.a_id = a.id) AS counter,
b.*
FROM table_a a
LEFT JOIN table_b b
ON b.a_id = a.id
This works, ok, but in reality, I'm just making 2 queries and I need to improve this so it only do one (if, its even possible).
Anyone knows how can I achive that?
The simplest approach is likely to just move the correlated sub-query into a sub-query.
NOTE: Many optimisers deal with correlated sub-queries extremely effectively. Your example query could be perfectly reasonable.
SELECT
a.*,
b.*,
c.row_count
FROM
table_a a
LEFT JOIN
table_b b
ON b.a_id = a.id
LEFT JOIN
(
SELECT
a_id,
Count(*) row_count
FROM
table_c
GROUP BY
a_id
)
c
ON c.a_id = a.id
Another Note: SQL is an expression, it is not executed directly, it is translated into a plan using nest loops, hash joins, etc. Do not assume that having two queries is a bad thing. In this case my example may significantly minimise the number of reads compared to a single query and then use of GROUP BY and COUNT(DISTINCT).
Try this:
SELECT
tmp.*,
SUM(IF(c.a_id IS NULL,0,1)) as counter,
FROM (
SELECT
a.id as aid,
b.id as bid,
a.*,
b.*
FROM
table_a a
LEFT JOIN table_b b
ON b.a_id = a.id
) as tmp
LEFT JOIN table_c c
ON c.a_id = tmp.id
GROUP BY
tmp.aid,
tmp.bid

How to join 2 tables without an ON clause

I want to get the SUM(column_a) from two different tables, and get their difference. I am using MySQL.
Table A's sum = 1234
Table B's sum = 4001
I'm not sure what to put in my ON clause:
SELECT
SUM(a.column1) AS table_a_sum,
SUM(b.column1) AS table_b_sum,
SUM(a.column1) - SUM(b.column1) AS difference
FROM table_a a
JOIN table_b b
ON ??????
A join without condition is a cross join. A cross join repeats each row for the left hand table for each row in the right hand table:
FROM table_a a
CROSS JOIN table_b b
Note that in MySQL, cross join / join / inner join are identical. So you could write:
FROM table_a a
JOIN table_b b
As long as you omit the on clause, this will work as a cross join.
If you'd like to sum two columns from two tables, a cross join would not work because it repeats rows. You'd get highly inflated numbers. For sums, a better approach uses subqueries, per #sgeddes answer.
Here's one option using subqueries -- there are several ways to do this:
SELECT
table_a_sum,
table_b_sum,
table_a_sum - table_b_sum AS difference
FROM
(SELECT SUM(column1) table_a_sum FROM table_a) a,
(SELECT SUM(column1) table_b_sum FROM table_b) b
You want to summarize first and then do the calculations:
select a.suma, b.sumb, a.suma - b.sumb
from (select sum(a.column1) as suma from tablea) a cross join
(select sum(b.column1) as sumb from tableb) b
Doing the cross join between the tables will generate a cartesian product that will mess up your sums.

SQL - selecting userid which has max number of not null rows in a different table

I've 3 tables say A,B,C.
Table A has userid column.
Table B has caid column.
Table C has lisid and image columns.
one userid can have one or several caids.
one caid can have one or several lisids.
how do I select a userid which has maximum number of rows with image column as not null (in some lisids image column is blank and in some it has some value).
can someone please help.
Presumably, the ids are spread among the tables in a reasonable fashion. If so, the following should do this:
select b.userid, count(*)
from TableB b join
TableC c
on b.caid = c.caid
where c.image is not null
group by b.userid
order by count(*) desc
limit 1
The question in the comments is how you connect TableA to TableB and TableB to TableC. The reasonable approach is to have the userid in TableB and the caid in TableC.
Getting all the rows with the max requires a bit more work. Essentially, you have to join in the above query to get the list
select s.*
from (select b.userid, count(*) as cnt
from TableB b join
TableC c
on b.caid = c.caid
) s
(select count(*) as maxcnt
from TableB b join
TableC c
on b.caid = c.caid
group by b.userid
order by count(*) desc
limit 1
) smax
on s.cnt = smax.cnt
Other databses have a set of functions called window functions/ranking functions that make this sort of query much simpler. Alas, MySQL does not offer these.