mysql, SQL Inner join internals

mysql, SQL Inner join internals - mysql

I have 4 tables with schemas as below :
create table A
(
id int,
B_id int,
D_id int
);
create table B
(
id int,
C_id int
);
create table C
(
id int,
C_Parent_id int
);
create table C_Parent
(
id int
);
create table D
(
id int
);
In my case any row in A can be linked to a row in B / D and any row in B can be linked to a row in C & any row in C is definitely linked to a row in C_Parent.
I now insert some test data as below so that a row in table A is associated to a row in table B which is NOT associated to any row in table C.
I want to fetch some data from A , B(via A-> B link), D (A->D link) & C( if B->C link exists).
The problem is the query below does not work :
select A.id as Aid, B.id as bid, C.id as Cid, D.id as Did from
A inner join B on A.B_id = B.id
left outer join C on B.C_id = C.id
inner join C_Parent on C_Parent.id = C.C_Parent_id
left outer join D on A.D_id = D.id;
Perhaps because the inner join is getting applied on the entire result of join obtained till this point & not just on the row from table C as is described in the join condition (on C_Parent.id = C.C_Parent_id). Why is that?
I have modified above query to below query which works
select A.id as Aid, B.id as bid, C.id as Cid, D.id as Did from
A inner join B on A.B_id = B.id
left outer join
(
C inner join C_Parent on C_Parent.id = C.C_Parent_id
)on B.C_id = C.id
left outer join D on A.D_id = D.id

Q: Why is that?
A: When C.C_Parent_id is NULL, the equality comparison in the join condition
C_Parent.id = C.C_Parent_id
won't evaluate to TRUE, so a row is not returned, because it's an INNER JOIN operation.
And C.C_Parent_ID is going to be NULL anytime a row from B was returned when there wasn't a "matching" row from C.
Here's how I think about how the outer join operates. And this may help you understand what's happening.
For sake of example, consider
SELECT b.id
, c.b_id
FROM b
LEFT
JOIN c
ON c.b_id = b
This is going to return all rows from b, along with matching rows from c, just like an inner join, except,
(as you already know) if this were an inner join, when there isn't a matching row from c, the row from b won't be returned.
But with the OUTER JOIN, the row from b is returned.
So this is how I think about it...
When there isn't a matching row from c, the database conjures a dummy row to serve in the place of a matching row from c. The dummy row has all the same columns as c, but all of the columns in the dummy row have a value of NULL. And since there is now a "matching" row, the row from b can be returned.
So it's like an inner join except the database is generating a "dummy" row of NULL values to serve as a matching row.
I'm not sure if that helps you understand it. But that's how I got my brain wrapped around an "outer join" operation.
So in your example, when a dummy row from C gets conjured to satisfy the outer join, all of the columns from C are NULL. Which is why, when you get to the join condition ... C.C_Parent_ID is NULL.

Related

How to join 3 tables where each has the key to the next in line

Imagine the following scenario:
There are 3 tables A, B and C.
Table A has no knowledge of either table B and table C.
Table B has a foreign key to table A.
Table C has foreign key to table B.
In table B as well as in table C there can be multiple items sharing the same foreign key value.
As you can see, the items from C are indirectly referenced to A through B.
What I want is to get all entries from A that are referenced in C but without any information from B or C in my result tables and without duplicates.
Is this even possible?
I have tried this like so but have no idea if it is correct:
select tableA.*
from tableA,
(select distinct tableB.AId as Aid
from tableB left join tableC on tableC.BId = tableB.id
group by tableB.id)
as temp
where tableA.id = temp.Aid

I am not sure if I understand it correctly, but you can try this one:
SELECT DISTINCT `A`.`id`, `A`.`value1`, `A`.`value2` FROM `A`
INNER JOIN `B` ON `B`.`id-a` = `A`.`id`
INNER JOIN `C` ON `C`.`id-b` = `B`.`id`
It returns all values from table A if there is a key on Table C which is linked to Table B with corresponding foreign key on table A

An alternative approach to Masoud's good response would be to use an exists though a correlated subquery.
The below subquery joins B to C in a correlated fashion (notice the B.IDA to A.ID and A is outside the subquery).
If we assume good database design, then A will not have duplicate records, thus we can omit a distinct here since we are not joining A to the other tables. Instead we are simply checking for the existence of an "A" record in the B table which must have a record in the C table due to the inner join. This has two advantages for performance
It doesn't have to join all the records together which would then
necessitate a distinct; thus you don't have the performance hit on
the distinct.
It can early escape. once a key value of A is found in the
subquery (B to C join) , it can stop looking and thus don't have to join all of B to all of A.
We select "1" in the subquery as we don't care what we select as the value will not be used anywhere. We're just using the coloration of A to (B JOIN C) to determine what in A to display.
SELECT A.*
FROM A
WHERE EXISTS( SELECT 1
FROM C
INNER JOIN B
on C.IDB = B.ID)
AND B.IDA = A.ID)
Taking what you tried and reviewing it:
select tableA.*
from tableA,
(select distinct tableB.AId as Aid
from tableB left join tableC on tableC.BId = tableB.id
group by tableB.id)
as temp
where tableA.id = temp.Aid
Starting with the "FROM"
You have tableA, (subquery) temp. This is a CROSS JOIN meaning all records from A will be joined to ALL records of (B JOIN C) so if you have 1000 records in A and 1000 records in the temp result then you'd be telling the database engine to generate 1000*1000 records in your result set; which then gets filtered to only include records matching in temp and A. The engine may be smart enough to avoid the cross join and optimize the query, but I find it confusing to maintain. So I would rewrite as
SELECT tableA.*
FROM tableA
INNER JOIN (SELECT distinct tableB.AId as Aid
FROM tableB left join tableC on tableC.BId = tableB.id
GROUP BY tableB.id) as temp
ON tableA.id = temp.Aid
Looking at the subquery (temp)
We don't need a group by as we are not aggregating. The distinct does bring us down to 1 record but at a cost to execution time.
So I would re-write as this:
SELECT tableA.*
FROM tableA
INNER JOIN (SELECT distinct tableB.AId as Aid
FROM tableB
LEFT JOIN tableC
on tableC.BId = tableB.id) as temp
ON tableA.id = temp.Aid
Then looking at the whole, if we change the outer query join to temp and make it an exists... using coloration we don't have the performance hit of the join, nor the distinct. and I'd switch the left join to an inner as we only want records in C and B so we'd have null in B if we left it as a "LEFT JOIN" which serve no purpose for us.
This gets me to the answer I initially provided.
SELECT tableA.*
FROM tableA
WHERE EXISTS (SELECT 1
FROM tableB
INNER JOIN tableC
on tableC.BId = tableB.id
AND tableB.AID = A.ID) as temp

Selecting Rows from one table based on conditions in two separate tables

I've tried using joins and wherein statements for the following but I either get a timeout because it's taking too long to run or I get duplicate column name error.
I have 3 tables:
A, B, C
I'd like to create a table consisting of rows from A based on constraints in B and C. So The row in A has to fulfill a condition in B OR C:
(A.ID = B.ID and A.PURCHASE = B.PURCHASE) OR (A.ID = C.ID AND A.PURCHASE = C.PURCHASE).
I've been using mysql for around... a week and this is the closest I've gotten(it hangs):
CREATE TABLE D
SELECT T1.*
FROM TABLE A AS T1, TABLE B AS T2, TABLE B AS T3
JOIN T2, T3 ON ((T1.CUSTOMER_ID = T2.CUSTOMER_ID AND T1.DAY_ID =
T2.DAY_ID) OR (T1.CUSTOMER_ID = T3.CUSTOMER_ID AND T1.DAY_ID = T2.DAY_ID));
Thanks for any help!

Join together A, B, and C using LEFT JOIN, and then retain records from A where a match occurred in either B or C.
INSERT INTO D
SELECT DISTINCT a.* -- remove duplicate records
FROM tableA a
LEFT JOIN tableB b
ON a.CUSTOMER_ID = b.CUSTOMER_ID AND a.DAY_ID = b.DAY_ID
LEFT JOIN tableC c
ON a.CUSTOMER_ID = c.CUSTOMER_ID AND a.DAY_ID = c.DAY_ID
WHERE b.CUSTOMER_ID IS NOT NULL OR -- retain records where either
c.CUSTOMER_ID IS NOT NULL -- condition matches
If the D table is not already created, then create it using the same definition as for table A.

using distinct rows for join mysql

Suppose I have table A, B
ID in A is unique but in table B, ID is not unique
I want to SELECT DISTINCT ID
query 1:
SELECT DISTINCT ID FROM A a LEFT JOIN B b ON a.ID = B.ID WHERE ...
query 2:
SELECT DISTINCT ID FROM A WHERE ID IN (SELECT DISTINCT ID FROM B where ...)
or
SELECT DISTINCT ID FROM A a LEFT JOIN (SELECT DISTINCT ID FROM B) b ON a.ID = B.ID WHERE ...
The end result is same but
what happens in query 1 is the space of temp table is more as multiple rows from table B will come with repeated ID
In query 2 i am able to optimize space and further processing as it will have limited rows with all distinct ID's
Isn't there any way to use DISTINCT rows from table B using join and avoiding subqueries?
Actually I have even table C which I will join with this, so I need to care for the number of rows taking part in 2nd join when taking join further with table C.

SELECT DISTINCT ID FROM A a LEFT JOIN (SELECT DISTINCT ID FROM B) b ON a.ID = B.ID WHERE ...
Is this what you want?
Edit so the answer is a bit more visible:
Since your A is unique, but B isn't you can just swap the values :
SELECT DISTINCT ID FROM B b LEFT JOIN A a on a.ID = b.ID WHERE...

Get records not present in another table with a specific user id

I've this table (a)
And this table (b)
Now I have to get all records from A which are not present in B (a.id not present as b.idDomanda) and where B.idUser is not 1. So In this case, it should return only id 2 from a, but it returns 1 and 2.
This is my Query
SELECT a.* FROM a LEFT JOIN b ON a.id=b.idDomanda WHERE ( b.idUser <> 1 OR b.idUser IS NULL ) GROUP BY a.id

You want to move the condition on b to the on clause:
SELECT a.*
FROM a LEFT JOIN
b
ON a.id = b.idDomanda and b.idUser <> 1
WHERE b.idUser IS NULL
GROUP BY a.id;
The group by suggests that you might want to use not exists instead:
select a.*
from a
where not exists (select 1
from b
where a.id = b.idDomanda and b.idUser <> 1
);

There should be no results given your data set.
All records from A which are not present in B (a.id not present as b.idDomanda)
Given the test data set all of A is in fact IN b.idDomanda... even when filtering out userId = 1.
but as the previous person pointed out that is the query to check.

MYSQL: evaluating a missing row into a result

I am trying to fetch records from 2 tables mapped by an id where on the second table there may be a row that is missing.
I have a column called name on the second table which contains a string value. The value I need to extract is 'subscriptions' but this does not always exist in the table. There is the possibility to have different values within this column which I do not want to extract.
Is it possible to check to see if the value exists and if it doesn't output null to all the fields.
So far I have this which returns all the records
select COUNT(*)
from PUser a, PAttribute b
where exists (select null
from PAttribute c
where c.name = 'subscriptions' or c.name is null)
and a.id = b.userid;
Hope that explains it.
EDIT
PUser table
id
other columns
PAttribute table
userid mapped to PUser.id
name
Now a userid can have multiple rows each with a different value in name eg, 'subscriptions', 'source', 'etc' 'etc'
I want to fetch all users who have the value 'subscriptions' in the name column or if the row doesnt exist with the value 'subscriptions' as they may not have any.
If they don't have this row the output should be null.
EDIT 2:
Worked this out and I needed
select COUNT(*),(select b.stringValue from PAttribute b where b.userid = a.id and b.name = 'subscriptions') from PUser a order by a.id desc;

Your example is using implicit joins, which are inner joins. This means that a result will only be returned if a row exists in both tables. Instead, you need to use a left join. Change your query to this:
select COUNT(*)
from PUser a LEFT JOIN PAttribute b ON a.id = b.userID
where exists (select null
from PAttribute c
where c.name = 'subscriptions' or c.name is null);
Or (not exactly sure what your desired behavior is), this might work for you:
SELECT count(*)
FROM PUser a LEFT JOIN PAttribute b ON a.id = b.userID
WHERE b.name = 'subscriptions' OR b.name IS NULL;

If you want to exclude rows that do not contain 'subscriptions', you could use the JOIN ON form and in order to keep rows from PUser even there is no matching row from PAttribute with name set to 'subrciptions', and thus obtaining null fields, exploit OUTER JOIN.
select COUNT(*)
from PUser a OUTER JOIN PAttribute b ON ( a.id = b.userid AND b.name = 'subscriptions' )
;
This is a little bit different from your query: EXISTS is less perfomant and, moreover, the SELECT in the EXISTS does search for a row in PAttribute with name equal to null, that is quite different from handling missing rows.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

mysql, SQL Inner join internals - mysql

Related

How to join 3 tables where each has the key to the next in line

Selecting Rows from one table based on conditions in two separate tables

using distinct rows for join mysql

Get records not present in another table with a specific user id

MYSQL: evaluating a missing row into a result

Categories

Resources