According to Google search: since MySQL does not support full outer join, it could be simulated via union and/or union all. But both of these either remove genuine duplicates or show spurious duplicates.
What would be correct and efficient way?
This question seems relevant but couldn't get the answer of it.
You can use a LEFT JOIN and a RIGHT JOIN:
SELECT * FROM tableA LEFT JOIN tableB ON tableA.b_id = tableB.id
UNION ALL
SELECT * FROM tableA RIGHT JOIN tableB ON tableA.b_id = tableB.id
WHERE tableA.b_id IS NULL
There is also some information on Wikipedia about this topic: Full outer join.
The Wikipedia article suggests using a UNION in MySQL. This is slightly slower than UNION ALL, but more importantly it won't always give the correct result - it will remove duplicated rows from the output. So prefer to use UNION ALL instead of UNION here.
Related
I have this two queries following and noticed they have a huge performance difference
Query1
SELECT count(distinct b.id) FROM tableA as a
LEFT JOIN tableB as b on a.id = b.aId
GROUP BY a.id
Query2
SELECT count(distinct b.id) FROM tableA as a
LEFT JOIN (SELECT * FROM tableB) as b on a.id = b.aId
GROUP BY a.id
the queries are basically joining one table to another and I noticed that Query1 takes about 80ms whereas Query2 takes about 2sec with thousands of data in my system. Could anyone explain me why this happens ? and if it's a wise choice to use only Query2 style whenever I am forced to use it ? or is there a better way to do the same thing but better than Query2 ?
When you replace tableB with (SELECT * FROM tableB) you are forcing the query engine to materialize a subquery, or intermediate table result. In other words, in the second query, you aren't actually joining directly to tableB, you are joining to some intermediate table. As a result of this, any indices which might have existed on tableB to make the query faster would not be available. Based on your current example, I see no reason to use the second version.
Under certain conditions you might be forced to use the second version though. For example, if you needed to transform tableB in some way, you might need a subquery to do that.
Consider two table
tableA and tableB
tableA
|id|driver_id|vehicle_id|is_allowed|license_number|driver_name|
tableB
|id|driver_id|vehicle_id|offence|payable_amount|driver_name|
Goal: find driver_id and vehicle_id of allowed driver whose name is XYZ.
Query1:SELECT * FROM tableA,tableB {join-condition}{filter-condition}
SELECT tableA.driver_id,tableA.vehicle_id FROM tableA,tableB
WHERE
tableA.driver_id=tableB.driver_id AND
tableA.vehicle_id=tableB.vehicle_id AND
tableA.driver_name='XYZ' AND
tableB.driver_name='XYZ' AND
tableA.is_allowed = 1
Query2:SELECT * FROM (SELECT * FROM tableA {filter-condition}) JOIN (SELECT * FROM tableB {filter-condition}) ON {join-condition}{filter-condition}
SELECT tableAA.driver_id,tableAA.vehicle_id FROM
(SELECT tableA.driver_id,tableA.vehicle_id from tableA WHERE tableA.driver_name='XYZ' AND
tableA.is_allowed = 1) as tableAA,
JOIN
(SELECT tableB.driver_id,tableB.vehicle_id from tableB WHERE tableB.driver_name='XYZ') as tableBB
ON
tableAA.driver_id=tableBB.driver_id AND
tableAA.vehicle_id=tableBB.vehicle_id
which type of query is readable, optimized and according to standard.
A correct version would look like this:
SELECT a.driver_id, a.vehicle_id
FROM tableA a JOIN
tableB b
ON a.driver_id = b.driver_id AND
a.vehicle_id = b.vehicle_id
WHERE a.driver_name = 'XYZ' AND
b.driver_name = 'XYZ' AND
a.is_allowed = 1;
Notes:
JOIN is accepted as the right way to combine tables in the FROM clause. Simple rule: Never use commas in the FROM clause.
The ON clause should contain all predicates that contain columns from more than one table.
The use of table aliases is a preference that makes queries easier to write and to read.
You might want to use IN or EXISTS, because your query is not returning columns from TableB.
Do not use unnecessary subqueries in the FROM clause. In some databases (notably MySQL), this impedes the use of indexes and adds additional overhead for materialization of the intermediate table.
And, the answer to your question is that the first version is probably the optimized version (because it does not materialize subqueries unnecessarily). Neither version is preferred.
First one is better in case of standard and performance but is very old fashioned so it can be written in this way
SELECT tableA.driver_id,tableA.vehicle_id
FROM tableA
INNER JOIN tableB ON tableA.driver_id=tableB.driver_id
AND tableA.vehicle_id=tableB.vehicle_id
AND tableA.driver_name='XYZ'
AND tableB.driver_name='XYZ'
AND tableA.is_allowed = 1
I'm working on a query and came up with a solution like this
SELECT a.c1, a.c2, b.c3, b.c4
FROM table4 AS a
LEFT JOIN ((
SELECT c1, c2 FROM table1 LEFT JOIN table2 ... something
) UNION ALL (
SELECT c3, c4 FROM table1 LEFT JOIN table3 ... something
)) AS b
ON a.key = b.key
WHERE a.key ... something
ORDER BY a.key
This works perfectly as I want and not much execution time used. But it feels a little weird for me to have a result of UNION clause be in another SELECT and be JOINed with another table in a query.
So my question is : is it a good practice to use a query like this. Or I should find a better solution?
This query is rather complex, but whether this complexity is needed for what you want to achieve, is not the question here. And we cannot answer this either, not knowing your table structures, real query, and desired results.
But as to the technical side: Your query is perfectly okay. If this is the most straight-forward way to get to the desired data, there is nothing against it. There is no rule that you shouldn't outer join a union result or the like. You just put your data together step by step, and if you end up with this query, then that's it.
As to UNION vs. UNION ALL, as mentioned by Gordon Linoff: It often happens that someone writes UNION instead of UNION ALL, thus unwillingly giving the DBMS more work to do than necessary. In your case, however, it may change the result, so you must consider whether duplicates are possible in your UNION query and whether you want them removed or not.
try this, in my opinion its should be better
SELECT a.c1,a.c2,b.c3,c.c4
FROM table1 as a
LEFT JOIN table2 as b on ...
LEFT JOIN table3 as c on ...
WHERE something
ORDER BY a.key
I'm now dealing with PLSQL developer, which is my very first time. And I find this kind of query
select * from tableA, tableB
where tableA.field1 = tableB.field1(+)
I'm wondering the function of the (+) in the query. Could you guys be so kind to explain it ?
where tableA.field1 = tableB.field1(+)
This is the old syntax for an outer join, adopted by Oracle, and made redundant when ANSI actually standardised the SQL language. Oracle themselves now suggest you use outer join in preference to this old syntax (from the link below):
Oracle recommends that you use the FROM clause OUTER JOIN syntax rather than the Oracle join operator.
See this entry in the Oracle docs for more detail.
This is Oracle SQL OUTER JOIN syntax
It can be interpreted as
select * from tableA
OUTER JOIN tableB ON tableA.field1 = tableB.field1
From the oracle documentation:
(+) Indicates that the preceding column is the outer join column in a join.
It can be used as
select * from tableA right outer join tableB where tableA.field1 = tableB.field1
(+)operator indicates that it will return all the rows from the right table(matching and non matching) both rows from the right table.
And matching rows are returned from the left table.
If rows are not matching from the right table then it returns null.
+ is used to retrive the mathced and unmached records from the table.
example:
table A and table B
if you are using like A.column1=B.column1(+)
it retrives the unmached records from table A and its called as left outer join.
That's Oracle specific notation for a LEFT OUTER JOIN
Exemple :
select ...
from a,b
where a.id=b.id(+)
The query would be re-written
SELECT ...
FROM a
LEFT JOIN b ON b.id = a.id
we have been searching for it but all we see is 2 tables by the left and right inner/outer joins.
I love you guys.
MySQL doesn't support FULL OUTER JOIN.
As you mention, you can simulate a FULL OUTER JOIN of two tables using a combination of LEFT and RIGHT OUTER joins.
SELECT * FROM tableA LEFT JOIN tableB ON tableA.b_id = tableB.id
UNION ALL
SELECT * FROM tableA RIGHT JOIN tableB ON tableA.b_id = tableB.id
WHERE tableA.b_id IS NULL
The same technique can in theory be extended to more than two tables. I'd suggest first using the above approach to join two of the tables as a view. Then use the same approach again to join the view to the third table.
I don't know what to say about the love part, but
Having tables named a and b:
SELECT a.*, b.* FROM a, b
Does this the trick?