Can I specify the order in which the joins occur? - mysql

Say I have three tables, A, B and C. Conceptually A (optionally) has one B, and B (always) has one C.
Table A:
a_id
... other stuff
Table B:
a_fk_id (foreign key to A.a_id, unique, primary, not null)
c_fk_id (foreign key to C.c_id, not null)
... other stuff
Table C:
c_id
... other stuff
I want to select All records from A as well as their associated records from B and C if present. However, the B and C data must only occur in the result if both B and C are present.
I feel like I want to do:
SELECT *
FROM
A
LEFT JOIN B on A.a_id=B.a_fk_id
INNER JOIN C on B.c_fk_id=C.c_id
But Joins seem to be left associative (the first join happens before the second join), so this will not give records from A that don't have an entry in C.
AFAICT I must use sub queries, something along the lines of:
SELECT *
FROM
A
LEFT JOIN (
SELECT * FROM B INNER JOIN C ON B.c_fk_id=C.c_id
) as tmp ON A.id = tmp.a_fk_id
but once I have a couple of such relationships in a query (in reality I may have two or three nested), I'm worried both about code complexity and about the query optimizer.
Is there a way for me to specify the join order, other than this subquery method?
Thanks.

In SQL Server you can do
SELECT *
FROM a
LEFT JOIN b
INNER JOIN c
ON b.c_fk_id = c.c_id
ON a.id = b.a_fk_id
The position of the ON clause means that the LEFT JOIN on b logically happens last. As far as I know this is standard (claimed to be ANSI prescribed here) but I'm sure the downvotes will notify me if it doesn't work in MySQL!

Edit: And that's what I get for talking faster than I think. My previous solution doesn't work because 'c' hasn't been joined yet. Let's try this again.
We can use a WHERE clause to limit the results to only those that match the criteria you're looking for, where C has a valid (IS NOT NULL) or B does not have a value (IS NULL). Like this:
SELECT *
FROM a
LEFT JOIN b ON (b.a = a.a)
LEFT JOIN c ON (c.b = b.b)
WHERE (c.c IS NOT NULL OR b.b IS NULL);
Without WHERE Results:
mysql> SELECT * FROM a LEFT JOIN b ON (b.a = a.a) LEFT JOIN c ON (c.b = b.b);
+------+------+------+------+------+
| a | a | b | c | b |
+------+------+------+------+------+
| 1 | 1 | 1 | 1 | 1 |
| 1 | 1 | 2 | NULL | NULL |
| 2 | 2 | 3 | 2 | 3 |
| 3 | NULL | NULL | NULL | NULL |
| 4 | NULL | NULL | NULL | NULL |
+------+------+------+------+------+
With WHERE Results:
mysql> SELECT * FROM a LEFT JOIN b ON (b.a = a.a) LEFT JOIN c ON (c.b = b.b) WHERE (c.c IS NOT NULL OR b.b IS NULL);
+------+------+------+------+------+
| a | a | b | c | b |
+------+------+------+------+------+
| 1 | 1 | 1 | 1 | 1 |
| 2 | 2 | 3 | 2 | 3 |
| 3 | NULL | NULL | NULL | NULL |
| 4 | NULL | NULL | NULL | NULL |
+------+------+------+------+------+

Yes, you use the STRAIGHT_JOIN for this.
When using this keyword the join will occur in the exact order that you specify.
See: http://dev.mysql.com/doc/refman/5.5/en/join.html

Well, I thought up another solution as well, and I'm posting it for completeness (Though I'm actually using Martin's answer).
Use a RIGHT JOIN:
SELECT
*
FROM
b
INNER JOIN c ON b.c_fk_id = c.c_id
RIGHT JOIN a ON a.id = b.a_fk_id
I'm pretty sure every piece I've read about JOINS said that RIGHT JOINs were pointless, but there you are.

Related

Can I be selective on what rows I join on in MySQL

Suppose I have two tables, people and emails. emails has a person_id, an address, and an is_primary:
people:
id
emails:
person_id
address
is_primary
To get all email addresses per person, I can do a simple join:
select * from people join emails on people.id = emails.person_id
What if I only want (at most) one row from the right table for each row in the left table? And, if a particular person has multiple emails and one is marked as is_primary, is there a way to prefer which row to use when joining?
So, if I have
people: emails:
------ -----------------------------------------
| id | | id | person_id | address | is_primary |
------ -----------------------------------------
| 1 | | 1 | 1 | a#b.c | true |
| 2 | | 2 | 1 | b#b.c | false |
| 3 | | 3 | 2 | c#b.c | true |
| 4 | | 4 | 4 | d#b.c | false |
------ -----------------------------------------
is there a way to get this result:
------------------------------------------------
| people.id | emails.id | address | is_primary |
------------------------------------------------
| 1 | 1 | a#b.c | true |
| 2 | 3 | c#b.c | true | // chosen over b#b.c because it's primary
| 3 | null | null | null | // no email for person 3
| 4 | 4 | d#b.c | false | // no primary email for person 4
------------------------------------------------
You got it a bit wrong, how left/right joins work.
This join
select * from people join emails on people.id = emails.person_id
will get you every column from both tables for all records that match your ON condition.
The left join
select * from people left join emails on people.id = emails.person_id
will give you every record from people, regardless if there's a corresponding record in emails or not. When there's not, the columns from the emails table will just be NULL.
If a person has multiple emails, multiple records will be in the result for this person. Beginners often wonder then, why the data has duplicated.
If you want to restrict the data to the rows where is_primary has the value 1, you can do so in the WHERE clause when you're doing an inner join (your first query, although you ommitted the inner keyword).
When you have a left/right join query, you have to put this filter in the ON clause. If you would put it in the WHERE clause, you would turn the left/right join into an inner join implicitly, because the WHERE clause would filter the NULL rows that I mentioned above. Or you could write the query like this:
select * from people left join emails on people.id = emails.person_id
where (emails.is_primary = 1 or emails.is_primary is null)
EDIT after clarification:
Paul Spiegel's answer is good, therefore my upvote, but I'm not sure if it performs well, since it has a dependent subquery. So I created this query. It may depend on your data though. Try both answers.
select
p.*,
coalesce(e1.address, e2.address) AS address
from people p
left join emails e1 on p.id = e1.person_id and e1.is_primary = 1
left join (
select person_id, address
from emails e
where id = (select min(id) from emails where emails.is_primary = 0 and emails.person_id = e.person_id)
) e2 on p.id = e2.person_id
Use a correlated subquery with LIMIT 1 in the ON clause of the LEFT JOIN:
select *
from people p
left join emails e
on e.person_id = p.id
and e.id = (
select e1.id
from emails e1
where e1.person_id = e.person_id
order by e1.is_primary desc, -- true first
e1.id -- If e1.is_primary is ambiguous
limit 1
)
order by p.id
sqlfiddle

Fastest way of doing a 1 to 1 left join on tables with a 1 to many relationship (MySQL)

I have two tables that have a 1 to many relationship, which I'm doing a 1:1 left join on. The query returns the correct results but it shows up in my slow query log (it takes up to 5s). Is there a better way to write this query?
select * from
tablea a left join tableb b
on a.tablea_id = b.tablea_id
and b.tableb_id = (select max(tableb_id) from tableb b2 where b2.tablea_id = a.tablea_id)
i.e. I would like TableA left joined to the row in TableB with the largest tableb_id.
TableA
tablea_id
1
2
TableB
tableb_id, tablea_id, data
1, 1, x
2, 1, y
Expected Result
tablea_id, tableb_id, data
1, 2, y
2, null, null
TableA has an index on tablea_id and TableB has a composite index on tablea_id,tableb_id.
Explain Output
+----+--------------------+---------------+--------+-----------------+---------------+---------+----------------------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+---------------+--------+-----------------+---------------+---------+----------------------+-------+-------------+
| 1 | PRIMARY | c | index | NULL | department_id | 4 | NULL | 18966 | Using index |
| 1 | PRIMARY | recent_cv_lut | eq_ref | PRIMARY,case_id | PRIMARY | 4 | func | 1 | |
| 2 | DEPENDENT SUBQUERY | cases_visits | ref | case_id | case_id | 4 | abcd_records_v2.c.id | 2 | Using index |
+----+--------------------+---------------+--------+-----------------+---------------+---------+----------------------+-------+-------------+
Likely, that correlated subquery is getting executed for each row from tableb.
(Without the output from EXPLAIN, we're really just guessing as to whether appropriate indexes are available, and if MySQL is making use of them.)
It might be more efficient to use an inline view query, to get the maximum tableb_id value for each tablea_id in one shot, and then use a join operation. Something like this:
SELECT a.*
, b.*
FROM tablea a
LEFT
JOIN ( SELECT n.tablea_id
, MAX(n.tableb_id) AS max_tableb_id
FROM tableb n
GROUP
BY n.tablea_id
) m
ON m.tablea_id = a.tablea_id
LEFT
JOIN tableb b
ON b.tablea_id = m.tablea_id
AND b.tableb_id = m.max_tableb_id
That's an alternative, but there's no guarantee that's going to be faster. It really depends, on a whole load of things that we don't have any information about. (Number of rows, cardinality, datatypes, available indexes, etc.)
EDIT
As an alternative, we could do the join between tablea and tableb in an inline view. This might improve performance. (Again, it really depends on a lot of things we don't have any information about.)
SELECT m.tablea_id
, m.foo
, b.*
FROM ( SELECT a.tablea_id
, a.foo
, MAX(n.tableb_id) AS max_tableb_id
FROM tablea a
LEFT
JOIN tableb n ON n.tablea_id = a.tablea_id
GROUP
BY a.tablea_id
) m
LEFT
JOIN tableb b
ON b.tablea_id = m.tablea_id
AND b.tableb_id = m.max_tableb_id

MySQL UNION Only Includes First Table

I need to perform a FULL OUTER JOIN on two tables and I'm trying to implement it in MySQL using the LEFT JOIN/RIGHT JOIN/UNION ALL technique.
Here are the original tables:
giving_totals:
+--------------+---------------+-------------+
| country_iso2 | total_given | supersector |
+--------------+---------------+-------------+
| AE | 1396986989.02 | 3 |
| AE | 596757809.20 | 4 |
| AE | 551810209.87 | 5 |
| AE | 25898255.77 | 7 |
| AE | 32817.63 | 9 |
...
+--------------+---------------+-------------+
receiving_totals:
+--------------+----------------+-------------+
| country_iso2 | total_received | supersector |
+--------------+----------------+-------------+
| AE | 34759000.00 | 3 |
| AE | 148793.82 | 7 |
| AE | 734.30 | 9 |
| AF | 6594479965.85 | 1 |
| AF | 2559712971.26 | 2 |
+--------------+----------------+-------------+
I want the resulting table to have one entry for each country for each supersector code even if it did not give or receive money for that sector (this is from the AidData project dataset in case anyone is familiar.) I thought to accomplish this by doing a UNION of a LEFT JOIN (to get all giving entries) and RIGHT JOIN (to get all receiving entries.) Here's the query I tried:
SELECT g.country_iso2 AS country_iso2, g.total_given AS `total_given`,R.total_received AS `total_received`,g.supersector AS `supersector`
FROM (`giving_totals` `g`
LEFT JOIN `receiving_totals` `r`
ON(((g.country_iso2 = r.country_iso2)
AND (g.supersector = r.supersector))))
UNION ALL
SELECT g.country_iso2 AS country_iso2, g.total_given AS `total_given`,R.total_received AS `total_received`,g.supersector AS `supersector`
FROM (`giving_totals` `g`
RIGHT JOIN `receiving_totals` `r`
ON(((g.country_iso2 = r.country_iso2)
AND (g.supersector = r.supersector))))
But this only returns the first join, whether or not I put the right or left join first. I think I may be misunderstanding the UNION operation because the individual joins each return what I expected. Any help is appreciated as always.
Here is an alternative method to do a full outer join:
SELECT driver.country_iso2 AS country_iso2,
g.total_given AS `total_given`,
R.total_received AS `total_received`,
driver.supersector AS `supersector`
from ((select distinct country_iso2, supersector
from giving_totals
) union
(select distinct country_iso2, supersector
from receiving_totals
)
) driver left outer join
giving_totals gt
on gt.country_iso2 = driver.country_iso2 and
gt.supersector = driver.country_iso2 left outer join
receiving_totals rt
on rt.country_iso2 = driver.country_iso2 and
rt.supersector = driver.country_iso2
That is, do the union as a subquery to get all the combinations you are interested in. Then you can do a left outer join to that table.
The reason for your problem is that aliases in the second query. You can try this instead:
SELECT r.country_iso2 AS country_iso2, g.total_given AS `total_given`,R.total_received AS `total_received`,r.supersector AS `supersector`
FROM (`giving_totals` `g`
RIGHT JOIN `receiving_totals` `r`
ON(((g.country_iso2 = r.country_iso2)
AND (g.supersector = r.supersector))))
The original form would have NULLs for these values.

When to use LEFT JOIN and when to use INNER JOIN?

I feel like I was always taught to use LEFT JOINs and I often see them mixed with INNERs to accomplish the same type of query throughout several pieces of code that are supposed to do the same thing on different pages. Here goes:
SELECT ac.reac, pt.pt_name, soc.soc_name, pt.pt_soc_code
FROM
AECounts ac
INNER JOIN 1_low_level_term llt on ac.reac = llt.llt_name
LEFT JOIN 1_pref_term pt ON llt.pt_code = pt.pt_code
LEFT JOIN 1_soc_term soc ON pt.pt_soc_code = soc.soc_code
LIMIT 100,10000
Thats one I am working on:
I see a lot like:
SELECT COUNT(DISTINCT p.`case`) as count
FROM FDA_CaseReports cr
INNER JOIN ae_indi i ON i.isr = cr.isr
LEFT JOIN ae_case_profile p ON cr.isr = p.isr
This seems like the LEFT may as well be INNER is there any catch?
Is there any catch? Yes there is -- left joins are a form of outer join, while inner joins are a form of, well, inner join.
Here's examples that show the difference. We'll start with the base data:
mysql> select * from j1;
+----+------------+
| id | thing |
+----+------------+
| 1 | hi |
| 2 | hello |
| 3 | guten tag |
| 4 | ciao |
| 5 | buongiorno |
+----+------------+
mysql> select * from j2;
+----+-----------+
| id | thing |
+----+-----------+
| 1 | bye |
| 3 | tschau |
| 4 | au revoir |
| 6 | so long |
| 7 | tschuessi |
+----+-----------+
And here we'll see the difference between an inner join and a left join:
mysql> select * from j1 inner join j2 on j1.id = j2.id;
+----+-----------+----+-----------+
| id | thing | id | thing |
+----+-----------+----+-----------+
| 1 | hi | 1 | bye |
| 3 | guten tag | 3 | tschau |
| 4 | ciao | 4 | au revoir |
+----+-----------+----+-----------+
Hmm, 3 rows.
mysql> select * from j1 left join j2 on j1.id = j2.id;
+----+------------+------+-----------+
| id | thing | id | thing |
+----+------------+------+-----------+
| 1 | hi | 1 | bye |
| 2 | hello | NULL | NULL |
| 3 | guten tag | 3 | tschau |
| 4 | ciao | 4 | au revoir |
| 5 | buongiorno | NULL | NULL |
+----+------------+------+-----------+
Wow, 5 rows! What happened?
Outer joins such as left join preserve rows that don't match -- so rows with id 2 and 5 are preserved by the left join query. The remaining columns are filled in with NULL.
In other words, left and inner joins are not interchangeable.
Here's a rough answer, that is sort of how I think about joins. Hoping this will be more helpful than a very precise answer due to the aforementioned math issues... ;-)
Inner joins narrow down the set of rows returns. Outer joins (left or right) don't change number of rows returned, but just "pick up" additional columns if possible.
In your first example, the result will be rows from AECounts that match the conditions specified to the 1_low_level_term table. Then for those rows, it tries to join to 1_pref_term and 1_soc_term. But if there's no match, the rows remain and the joined in columns are null.
An INNER JOIN will only return the rows where there are matching values in both tables, whereas a LEFT JOIN will return ALL the rows from the LEFT table even if there is no matching row in the RIGHT table
A quick example
TableA
ID Value
1 TableA.Value1
2 TableA.Value2
3 TableA.Value3
TableB
ID Value
2 TableB.ValueB
3 TableB.ValueC
An INNER JOIN produces:
SELECT a.ID,a.Value,b.ID,b.Value
FROM TableA a INNER JOIN TableB b ON b.ID = a.ID
a.ID a.Value b.ID b.Value
2 TableA.Value2 2 TableB.ValueB
3 TableA.Value3 3 TableB.ValueC
A LEFT JOIN produces:
SELECT a.ID,a.Value,b.ID,b.Value
FROM TableA a LEFT JOIN TableB b ON b.ID = a.ID
a.ID a.Value b.ID b.Value
1 TableA.Value1 NULL NULL
2 TableA.Value2 2 TableB.ValueB
3 TableA.Value3 3 TableB.ValueC
As you can see, the LEFT JOIN includes the row from TableA where ID = 1 even though there's no matching row in TableB where ID = 1, whereas the INNER JOIN excludes the row specifically because there's no matching row in TableB
HTH
Use an inner join when you want only the results that appear in both tables that matches the Join condition.
Use a left join when you want all the results from Table A, but if Table B has data relevant to some of Table A's records, then you also want to use that data in the same query.
Use a full join when you want all the results from both Tables.
For newbies, because it helped me when I was one: an INNER JOIN is always a subset of a LEFT or RIGHT JOIN, and all of these are always subsets of a FULL JOIN. It helped me understand the basic idea.

How to create left joins without repetition of rows on the left side

I have a scenario where there are two tables (tables A and B) linked in a one to many relationship. For a row in table A, the maximum number of linked rows in B is two, and these two rows (if they exist) are different from each other through a type column whose value is either x or y.
Aid | Name Bid | type | Aid
1 | name1 1 | x | 1
2 | name2 2 | x | 2
3 | name3 3 | y | 2
Now, what I want is to have a join query for the two tables in such a way that all rows in A will be displayed (no repetition) and two columns called type x and type y will hold a boolean / integer value to show the existence of types x and y for each row in A. i.e,
Aid | Name | Type X | Type Y |
1 | name1 | X | NULL |
2 | name2 | X | Y |
3 | name3 | NULL | NULL |
My DBMS is MySql.
Thanks.
You have to use two joins:
SELECT A.*, b1.type AS typeX, b2.type as typeY
FROM A
LEFT JOIN B b1
ON A.aid = b1.aid
AND b1.type = 'x'
LEFT JOIN B b2
ON a.aid = b2.aid
AND b2.type = 'y'
Well, this happens because your second table uses the EAV-model. If you had two tables, one for type_x and one for type_y, your relational schema would be a lot cleaner.
Offcourse, EAV does work, be it more clumsily:
SELECT a.aid, a.name, bx.type, by.type
FROM table_a a
LEFT JOIN table_b bx
ON a.aid = bx.aid
AND bx.type = 'x'
LEFT JOIN table_b by
ON a.aid = by.aid
AND by.type = 'y'