Understanding self-join with different ON clause [duplicate] - mysql

This question already has answers here:
Understanding the number of matched rows in LEFT JOIN
(5 answers)
Closed 4 years ago.
Here is my table structure:
// mytable
+----+---------+----------+
| id | related | subject |
+----+---------+----------+
| 1 | NULL | subject1 |
| 2 | 1 | |
+----+---------+----------+
And there are two queries which seem identical to me, but have different results in tests:
SELECT a.id, IFNULL(b.subject, a.subject)
FROM mytable a
LEFT JOIN mytable b ON a.id = b.related
+----+----------+
| 1 | subject1 |
| 2 | |
+----+----------+
SELECT a.id, IFNULL(b.subject, a.subject)
FROM mytable a
LEFT JOIN mytable b ON b.id = a.related
+----+----------+
| 1 | subject1 |
| 2 | subject1 |
+----+----------+
Look, it is self-join. So why the result of ON a.id = b.related and ON b.id = a.related is different?

Running your queries with SELECT * to uncover some of the mystery:
Your first query:
SELECT *
FROM mytable a
LEFT JOIN mytable b ON a.id = b.related;
Produces the following:
+----+---------+----------+--------+----------+----------+
| id | related | subject | id1 | related1 | subject1 |
+----+---------+----------+--------+----------+----------+
| 2 | 1 | <null> | <null> | <null> | <null> |
| 1 | <null> | subject1 | 2 | 1 | <null> |
+----+---------+----------+--------+----------+----------+
Your second query:
SELECT *
FROM mytable a
LEFT JOIN mytable b ON b.id = a.related;
Produces this:
+----+---------+----------+--------+----------+----------+
| id | related | subject | id1 | related1 | subject1 |
+----+---------+----------+--------+----------+----------+
| 2 | 1 | <null> | 1 | <null> | subject1 |
| 1 | <null> | subject1 | <null> | <null> | <null> |
+----+---------+----------+--------+----------+----------+
Your first query is joining id 2 to related 2. There is no related 2 and since id 2 has no subject, you get no subject out of your ifnull().
Your second query is joining related 1 to id 1 for a.id 2. This pulls a subject from b.id 1 and you get a subject back for id 2 as a result.
You really have to mentally map out how a LEFT JOIN works here and how it is affected by your ON clause. You have two very different queries here as a result.

Both queries are getting all rows from a.
Both queries are doing an outer join to b.
What's different is the condition that is used for finding a "match" from b.
(The queries might seem to be identical, but the truth is that they are significantly different.)
As a demonstration, run a query like this:
SELECT a.id AS `a_id`
, a.related AS `a_related`
, a.subject AS `a_subject`
, b.id AS `b_id`
, b.related AS `b_related`
, b.subject AS `b_subject`
FROM mytable a
LEFT
JOIN mytable b
ON b.related = a.id
And then change the ON clause
ON b.id = a.related
You might also want to repeat both of those queries removing the LEFT keyword (to make it an inner join instead of an outer join.)
One way to look at an outer join... when a matching row from b is not found, a dummy row from b is invented. That dummy row consists entirely of NULL values, and the dummy row is joined to a, as if it were a matching row. (This isn't necessarily what the database engine actually does, but thinking about it this way gives us an insight to the results that the outer join returns.)
Take a close look at the results of the queries, and you will be able to see why the results by the queries are different.
The fact that a and b refer to the same table is a special case. We would see the same results if those were two different tables. It really doesn't matter... to the query, those are two different sources which just happen to refer to the same table. Don't let the fact that a and b refer to the same table cause any confusion.

Related

SQL Join with NOT IN() does not work

I have 2 tables that contain both the same key p_id:
test1 test2
+-------------+ +----------------------+
| p_id | name | | o_id | name | p_id |
+-------------+ +----------------------+
| 1 | Paul | | 1 | London | 1 |
| 2 | Marc | | 2 | Paris | 1 |
+-------------+ +----------------------+
Now I want to get all entries from test1 that have no relationship to test2.
In the example above I have abstracted my tables so RIGHT JOIN is not possible (in reality I have to join 4 tables).
SELECT a.*,b.*
FROM test1 a
LEFT JOIN test2 b
ON a.p_id=b.p_id
WHERE b.p_id NOT IN(SELECT DISTINCT p_id FROM test2);
I expect one row with p_id=2. However I get an empty result.
When I change my code into this:
SELECT a.*,b.*
FROM test1 a
LEFT JOIN test2 b
ON a.p_id=b.p_id
WHERE a.p_id NOT IN(SELECT DISTINCT p_id FROM test2);
Then it works fine. But why? I thought LEFT JOIN is processed first (1 row as result) and after that WHERE is processed (JOIN has not found p_id in test2so b.p_id is null - null is not in subselect - so still 1 row as result).
Could someone explain this behavior, please?
It has to do with how NULL is handled in comparisons.
To test/see, you can run simple queries like:
SELECT 1
FROM DUAL
WHERE NULL = NULL;
SELECT 1
FROM DUAL
WHERE NULL NOT IN (1, 2, 3);
Neither return a row because both conditions return NULL which is "not true".
As Uueerdo said, it's a NULL comparison issue. But that aside, you should really use an anti-join:
SELECT a.*,b.*
FROM test1 a
LEFT JOIN test2 b
ON a.p_id=b.p_id
WHERE b.p_id IS NULL;
It's cleaner and generally more efficient.
Nothing wrong with NOT IN, since you are filtering the right table in Where clause it is implicitly converted to INNER JOIN.
Without the Where clause result will be like this
+------+------+--------+--------+--------+
| p_id | name | o_id | name | p_id |
+------+------+--------+--------+--------+
| 1 | Paul | 1 | London | 1 |
| 1 | Paul | 2 | Paris | 1 |
| 2 | Marc | (null) | (null) | (null) |
+------+------+--------+--------+--------+
In this if you are applying the filter
WHERE b.p_id NOT IN(SELECT DISTINCT p_id FROM test2);
The sub-query returns 1 which is present in the last column of above result. So you are not getting any result.
In case if you are wondering why the last record having NULL is not returned since it is not 1. It is because NULL cannot compared using =, IN, NOT IN,etc.. We need to use IS operator for checking NULL
Proper way to do this would be using NOT EXISTS. Handles NULL values as well
select *
from test1 a
Where Not Exists (select 1 from test2 b Where a.p_id = b.p_id)
In general, WHERE is executed first, then JOIN. Also, when you use LEFT JOIN, it's the LEFT table that has everything included, so your shouldn't expect one row with p_id=2 after the JOIN, as you stated.

Select all rows from a table with an indication of existence in another table using mysql query

I have three tables
Table a
+-----+-------+
| aid | value |
+-----+-------+
| 1 | A |
| 2 | B |
| 3 | C |
| 4 | D |
+-----+-------+
Table b
+-----+------+
| bid | name |
+-----+------+
| 1 | A |
| 2 | B |
| 3 | C |
+-----+------+
Table ba (mapping of table a and table b)
+-----+-----+
| bid | aid |
+-----+-----+
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 3 | 2 |
| 1 | 3 |
| 2 | 3 |
| 2 | 4 |
+-----+-----+
From these tables I want a query like
SELECT aid, mapped('true'-if(aid exist in ba) 'false'-otherwise)
FROM a
JOIN b
JOIN ba
WHERE bid=1
to get a result from where I can generate a list
(when bid=1)
A-mapped
B-not mapped
C-mapped
D-not mapped
(when bid=2)
A-mapped
B-not mapped
C-mapped
D-mapped
(when bid=3)
A-mapped
B-mapped
C-not mapped
D-not mapped
Right now I am generating the list in a while loop for all the rows of table 'a' and inside the loop a query is executed for each iteration to check the existence in table 'ba'.
I think this is supposed to be table b independent:
SELECT CONCAT_WS('-', a.value, IF(ba.aid IS NULL, "-not mapped", "-mapped"))
FROM a LEFT JOIN ba ON a.aid = ba.aid AND ba.bid = 1
ORDER BY a.aid
Note: I took "a" table as the base table since your samples included all values from "a" table.
This is a tricky question, but the difficult part is in figuring out how to formulate the query. Once that is out of the way, it is downhill from there. One approach is to use a cross join between the A and B tables to obtain all possible mappings. Then LEFT JOIN to the mapping table to determine which pairs are being mapped and which are not. Try the following query:
SELECT tb.bid, ta.value,
CASE WHEN ba.bid IS NOT NULL THEN 'mapped' ELSE 'not mapped' END AS label
FROM tb INNER JOIN ta -- cross join to obtain all bid/aid pairs
LEFT JOIN ba -- to determine which pairs are mapped/not mapped
ON ta.aid = ba.aid AND tb.bid = ba.bid
ORDER BY tb.bid, ta.value
Demo here:
SQLFiddle

Select value from left table and full join right table

This is a bit difficult to explain, but I'll give my best:
Let's say, I have table A:
event | task | ref_person
------+------+-----------
1 | 20 | 1
2 | 9 | 2
And I have table B (containing person):
id | name
---+-----
1 | foo
2 | bar
3 | jim
What does a MySQL-query look like, that produces this sort of table:
event | task | person
------+------+-------
1 | 20 | foo
1 | NULL | bar
1 | NULL | jim
2 | NULL | foo
2 | 9 | bar
2 | NULL | jim
My current approach is by using a RIGHT JOIN, but this won't get me the event combined with the NULL-value.
This is what my current statement looks like:
SELECT
a.*,
b.name
FROM
a
RIGHT JOIN b
ON b.id = a.ref_person
ORDER BY
a.event,
b.name
Notice
sqlfiddle seems down, I'll add one as soon as it's up again
You want to do a cross join to get all the rows, then case logic to get the task:
select a.event,
(case when a.ref_person = b.id then a.task end) as task,
b.name
from tablea a cross join
tableb b ;

Join tables with one known variable in second table

I have a problem with my SQL query.
Situation is as follows:
I have two tables, A and B.
Table A:
---------------------------------------------
*| A.id | A.t_id | A.f_id | A.type |*
---------------------------------------------
| 1 | 32 | 3 | Loading |
| 2 | 34 | 5 | Discharge |
| 3 | 32 | 3 | Discharge |
---------------------------------------------
Table B:
-----------------------
*| B.id | B.shipid |*
-----------------------
| 1 | 1 |
| 2 | 1 |
| 3 | 2 |
-----------------------
I need all the rows from A where A.type=Loading, A.t_id is B.id -> B.shipid=2 and . My query so far is:
SELECT * FROM A, B WHERE (A.type='Loading' AND B.shipid=2 AND A.t_id=B.id)
but this doesn't return the right records (none, actually) while the data should fit the query. Where does my query goes wrong?
Try this::
SELECT
*
FROM
A
INNER JOIN B ON A.t_id=B.id
WHERE A.type='Loading' AND B.shipid=2
Try this
SELECT * FROM A a JOIN B b ON a.t_id = b.id WHERE b.shipid = 2 AND a.type = 'Loading'
-- a and b are aliases for A and B. If you have any bigger table names, it's useful
And you should lookup SQL JOIN http://www.w3schools.com/sql/sql_join.asp
If you need all rows from A then it sounds like a left join... so please also provide your expected results anyway to confirm..
SELECT A.* FROM A
INNER JOIN B
ON A.f_id=B.id
AND A.type='Loading' AND B.shipid=2
;
Since you have a condition to get records for loading, and the only issue is that id aren't matching as per the above comments pointed out..
SQLFIDDLE DEMO

Select rows from one table, join most recent row from other table with one-to-many relationship

What I would like to do is select a specific set of rows from one table (table A) and join with another table (table B), such that only one record will appear from table A, joined with the most recent record from table B, based on a datetime column.
For example, table A has this structure (heavily simplified):
id | col_1 | col_2
---+-----------+----------------
1 | something | something else
2 | val_1 | val_2
3 | stuff | ting
4 | goats | sheep
And table B looks like this:
id | fk_A | datetime_col | col_3
---+-----------+---------------------+--------
1 | 1 | 2012-02-01 15:42:14 | Note 1
2 | 1 | 2012-02-02 09:46:54 | Note 2
3 | 1 | 2011-11-14 11:18:32 | Note 3
4 | 2 | 2009-04-30 16:49:01 | Note 4
5 | 4 | 2013-06-21 15:42:14 | Note 5
6 | 4 | 2011-02-01 18:44:24 | Note 6
What I would like is a result set that looks like this:
id | col_1 | col_2 | datetime_col | col_3
---+-----------+----------------+---------------------+--------
1 | something | something else | 2012-02-02 09:46:54 | Note 2
2 | val_1 | val_2 | 2009-04-30 16:49:01 | Note 4
3 | stuff | ting | NULL | NULL
4 | goats | sheep | 2013-06-21 15:42:14 | Note 5
So you can see that table B has been joined with table A on B.fk_A = A.id, but only the most recent corresponding record from B has been included in the results.
I have tried various combinations of SELECT DISTINCT, LEFT JOIN and sub-queries and I just can't get it to work, I either get no results or something like this:
id | col_1 | col_2 | datetime_col | col_3
---+-----------+----------------+---------------------+--------
1 | something | something else | 2012-02-01 15:42:14 | Note 1
1 | something | something else | 2012-02-02 09:46:54 | Note 2
1 | something | something else | 2011-11-14 11:18:32 | Note 3
2 | val_1 | val_2 | 2009-04-30 16:49:01 | Note 4
3 | stuff | ting | NULL | NULL
4 | goats | sheep | 2013-06-21 15:42:14 | Note 5
4 | goats | sheep | 2011-02-01 18:44:24 | Note 6
...with the records from table A repeated.
Obviously my SQL-fu is just not good enough for this task, so I would be most grateful if one of you kind people could point me in the right direction. I have done quite a bit of Googling and searching around SO and I have not found anything that matches this specific task, although I am sure the question has been asked before - I suspect there is an SQL keyword that I am forgetting/unaware of and if I searched for that I would find the answer instantly.
I think this question deals with the same problem although I am not 100% sure and the accepted answer involves SELECT TOP, which I thought (?) was not valid in MySQL.
As my actual query is much more complicated and joins several tables, I shall show it in case it makes any difference to how this is done:
SELECT `l` . * , `u`.`name` AS 'owner_name', `s`.`name` AS 'acquired_by_name', `d`.`type` AS `dtype` , `p`.`type` AS `ptype`
FROM `leads` l
LEFT JOIN `web_users` u ON `u`.`id` = `l`.`owner`
LEFT JOIN `web_users` s ON `s`.`id` = `l`.`acquired_by`
LEFT JOIN `deal_types` d ON `d`.`id` = `l`.`deal_type`
LEFT JOIN `property_types` p ON `p`.`id` = `l`.`property_type`
This query works and returns the data I want (sometimes I also add a WHERE clause but this works fine), but I would now like to:
LEFT JOIN `notes` n ON `n`.`lead_id` = `l`.`id`
...where notes contains the "many records" and leads contains the "one record" they relate to.
It should also be noted that potentially I would also want to return the oldest record (in a different query) but I imagine this will be a simple case of inverting an ASC/DESC somewhere, or something similarly easy.
I think this will help you:
SELECT A.id, A.col_1, A.col_2, A.datetime_col, A.col_3
FROM
(SELECT B.id, B.col_1, B.col_2, C.datetime_col, C.col_3
FROM tableA B LEFT OUTER JOIN tableB C ON B.id = C.id
ORDER BY C.datetime_col desc) as A
GROUP BY A.id