How this queries will be evaluated?
(what i ask is: what will be the logic that the db-engine will use to gather the data?)?
A:
SELECT tableA.* FROM tableA
LEFT JOIN tableB ON tableB.key1 = tableA.key1
INNER JOIN tableC ON tableC.key2 = tableB.key2
B:
SELECT tableA.* FROM tableA
INNER JOIN tableC
LEFT JOIN tableB
ON (tableB.key1 = tableA.key1) AND (tableC.key2 = tableB.key2)
C:
What is the syntax, for joining multiple tables? (A and B for example)
D:
What is the logic behind the order of joins?
(How different joins (left, and inner) should be combines in a query?)
ANY-one?
SELECT tableA.* FROM tableA
LEFT JOIN tableB ON tableB.key1 = tableA.key1
INNER JOIN tableC ON tableC.key2 = tableB.key2
Since no brackets are used this should be evaluated left-to-right, so:
SELECT tableA.*
FROM
(tableA LEFT JOIN tableB ON tableB.key1 = tableA.key1)
INNER JOIN tableC ON tableC.key2 = tableB.key2
Meaning you first select all records from table A, with the matching records from B if they exist (outer join). That result set is then joined with C, but since you join on B.Key, all previous records where B = null will now disappear.
I am quite sure that the first join could be an inner join, giving the same result.
SELECT tableA.* FROM tableA
INNER JOIN tableC
LEFT JOIN tableB
ON (tableB.key1 = tableA.key1) AND (tableC.key2 = tableB.key2)
Now we first cross-join every record from A with every record from C (cartesian product).
That (possibly huge and possibly meaningless) resultset we extend with data from B where we can find it (meaning we add a record from B wherever we have a match with either A or B).
In general, when joining several tables, just take it step by step and always try to realize what you are joining with what. When in doubt, use brackets :)
Related
I'm trying to JOIN a Master Dataset, via a left join with 2 other Datasets, all of them have the same Key field. So nothing special there.
One of those secondary Datasets is the result of another Query and therefor might or might not exist. Obviously my JOIN statement fails when this table doesn't exist.
Below a really simplified version of the code, the JOIN is used to exclude rows from the table_a that exist in table b or c (if they exist).
SELECT a.id, a.name
FROM table_a a
LEFT JOIN table_b b
ON a.id = b.id
LEFT JOIN table c c
ON a.id = c.id
WHERE b.id IS NULL
AND c.id IS NULL;
I am not sure that I understand your question well, but I think that you should better do:
SELECT a.id,a.name
FROM table_a a
WHERE a.id NOT IN
(SELECT id FROM table_b)
AND a.id NOT IN
(SELECT id FROM table_c)
Any query optimizer should have the exact same performance with this request, and I find it much more readable.
Imagine the following scenario:
There are 3 tables A, B and C.
Table A has no knowledge of either table B and table C.
Table B has a foreign key to table A.
Table C has foreign key to table B.
In table B as well as in table C there can be multiple items sharing the same foreign key value.
As you can see, the items from C are indirectly referenced to A through B.
What I want is to get all entries from A that are referenced in C but without any information from B or C in my result tables and without duplicates.
Is this even possible?
I have tried this like so but have no idea if it is correct:
select tableA.*
from tableA,
(select distinct tableB.AId as Aid
from tableB left join tableC on tableC.BId = tableB.id
group by tableB.id)
as temp
where tableA.id = temp.Aid
I am not sure if I understand it correctly, but you can try this one:
SELECT DISTINCT `A`.`id`, `A`.`value1`, `A`.`value2` FROM `A`
INNER JOIN `B` ON `B`.`id-a` = `A`.`id`
INNER JOIN `C` ON `C`.`id-b` = `B`.`id`
It returns all values from table A if there is a key on Table C which is linked to Table B with corresponding foreign key on table A
An alternative approach to Masoud's good response would be to use an exists though a correlated subquery.
The below subquery joins B to C in a correlated fashion (notice the B.IDA to A.ID and A is outside the subquery).
If we assume good database design, then A will not have duplicate records, thus we can omit a distinct here since we are not joining A to the other tables. Instead we are simply checking for the existence of an "A" record in the B table which must have a record in the C table due to the inner join. This has two advantages for performance
It doesn't have to join all the records together which would then
necessitate a distinct; thus you don't have the performance hit on
the distinct.
It can early escape. once a key value of A is found in the
subquery (B to C join) , it can stop looking and thus don't have to join all of B to all of A.
We select "1" in the subquery as we don't care what we select as the value will not be used anywhere. We're just using the coloration of A to (B JOIN C) to determine what in A to display.
SELECT A.*
FROM A
WHERE EXISTS( SELECT 1
FROM C
INNER JOIN B
on C.IDB = B.ID)
AND B.IDA = A.ID)
Taking what you tried and reviewing it:
select tableA.*
from tableA,
(select distinct tableB.AId as Aid
from tableB left join tableC on tableC.BId = tableB.id
group by tableB.id)
as temp
where tableA.id = temp.Aid
Starting with the "FROM"
You have tableA, (subquery) temp. This is a CROSS JOIN meaning all records from A will be joined to ALL records of (B JOIN C) so if you have 1000 records in A and 1000 records in the temp result then you'd be telling the database engine to generate 1000*1000 records in your result set; which then gets filtered to only include records matching in temp and A. The engine may be smart enough to avoid the cross join and optimize the query, but I find it confusing to maintain. So I would rewrite as
SELECT tableA.*
FROM tableA
INNER JOIN (SELECT distinct tableB.AId as Aid
FROM tableB left join tableC on tableC.BId = tableB.id
GROUP BY tableB.id) as temp
ON tableA.id = temp.Aid
Looking at the subquery (temp)
We don't need a group by as we are not aggregating. The distinct does bring us down to 1 record but at a cost to execution time.
So I would re-write as this:
SELECT tableA.*
FROM tableA
INNER JOIN (SELECT distinct tableB.AId as Aid
FROM tableB
LEFT JOIN tableC
on tableC.BId = tableB.id) as temp
ON tableA.id = temp.Aid
Then looking at the whole, if we change the outer query join to temp and make it an exists... using coloration we don't have the performance hit of the join, nor the distinct. and I'd switch the left join to an inner as we only want records in C and B so we'd have null in B if we left it as a "LEFT JOIN" which serve no purpose for us.
This gets me to the answer I initially provided.
SELECT tableA.*
FROM tableA
WHERE EXISTS (SELECT 1
FROM tableB
INNER JOIN tableC
on tableC.BId = tableB.id
AND tableB.AID = A.ID) as temp
Very often join fields have the same name in joined tables. If just join
SELECT a.*, b.* FROM a INNER JOIN b ON a.id=b.id
it will produce id field twice.
Is it possible to include ALL fields from joined table EXCEPT joined one?
UPDATE
I am using MySQL but standard way is also interesting to me!
UPDATE 2
Regarding USING syntax, how to use it with multiple joins?
SELECT * FROM
a INNER JOIN b USING (b_id)
INNER JOIN c USING (c_id)
swears table b doesn't contain c_id field, which is true, since it is inside a.
Normally I would write
SELECT * FROM
a INNER JOIN b ON a.b_id = b.b_id
INNER JOIN c ON a.c_id = c.c_id
In standard SQL this is achieved through USING
select *
from a
join b using (id);
This will return the id column only once.
I wanted to have a full outer join in memsql. Something like
SELECT *
FROM A FULL OUTER JOIN B
ON A.id = B.id
Is it possible ?
It appears that MemSQL does not have a FULL OUTER JOIN syntax. However, you should be able to simulate a full outer join in MemSQL using a combination of LEFT and RIGHT OUTER JOIN operations:
SELECT * FROM A
LEFT OUTER JOIN B ON A.id = B.id
UNION ALL
SELECT * FROM A
RIGHT OUTER JOIN B on A.id = B.id
WHERE ISNULL(A.id)
The first SELECT covers the orange area, namely matching records between A and B along with records in A which do not match to anything in B. The second query obtains only records in B which do not match to anything in A. Using UNION ALL instead of UNION ensures that duplicates are not removed.
All,
I'm reviewing some basic MySQL JOIN tutorials. Jeff Atwood gives an example on his blog that goes like this:
SELECT * FROM TableA
LEFT OUTER JOIN TableB
ON TableA.name = TableB.name
It seems to me that the query doesn't need - from a semantic standpoint - the TableB mention on the second line. Do I understand that correctly? It seems to me that all the info is already available in the 3rd line.
I'm not trying to stir any type of trouble about the efficiency of the SQL language, I just want to understand whether this mention of TableB brings any new info, in this context or in others.
It doesn't bring new info, but it does bring precision and the ability for you to alias the table:
SELECT a.* FROM TableA a
LEFT OUTER JOIN TableBWhichHasAReallyLongUnweildyName b
ON a.name = b.name
Note that in this formulation I can both explicitly ask for the data from a and figure out the join more easily in line 3. However, the compiler wouldn't have non-arbitrary guidance as to where to get that information of what table to use unless I explicitly declare it. Consider if this was the way SQL worked:
SELECT b.* FROM TableA a
LEFT OUTER JOIN
ON TableA.name = TableB.name b
WHERE TableB.value v > 1
Am I saying that the tableB is aliased 'b', or TableB.name is aliased 'b'? It's just confusing; better to be explicit and authoritative.
TableB is only present on the third line because the columns you're joining on aren't uniquely named.
For example, say you wanted to join TableA and TableB on columns Foo and Bar, where only TableA has a ciolumn called Foo and only TableB has a column named Bar. Then you could write this:
SELECT * FROM TableA
LEFT OUTER JOIN TableB
ON Foo = Bar