SQL Join with NOT IN() does not work - mysql

I have 2 tables that contain both the same key p_id:
test1 test2
+-------------+ +----------------------+
| p_id | name | | o_id | name | p_id |
+-------------+ +----------------------+
| 1 | Paul | | 1 | London | 1 |
| 2 | Marc | | 2 | Paris | 1 |
+-------------+ +----------------------+
Now I want to get all entries from test1 that have no relationship to test2.
In the example above I have abstracted my tables so RIGHT JOIN is not possible (in reality I have to join 4 tables).
SELECT a.*,b.*
FROM test1 a
LEFT JOIN test2 b
ON a.p_id=b.p_id
WHERE b.p_id NOT IN(SELECT DISTINCT p_id FROM test2);
I expect one row with p_id=2. However I get an empty result.
When I change my code into this:
SELECT a.*,b.*
FROM test1 a
LEFT JOIN test2 b
ON a.p_id=b.p_id
WHERE a.p_id NOT IN(SELECT DISTINCT p_id FROM test2);
Then it works fine. But why? I thought LEFT JOIN is processed first (1 row as result) and after that WHERE is processed (JOIN has not found p_id in test2so b.p_id is null - null is not in subselect - so still 1 row as result).
Could someone explain this behavior, please?

It has to do with how NULL is handled in comparisons.
To test/see, you can run simple queries like:
SELECT 1
FROM DUAL
WHERE NULL = NULL;
SELECT 1
FROM DUAL
WHERE NULL NOT IN (1, 2, 3);
Neither return a row because both conditions return NULL which is "not true".

As Uueerdo said, it's a NULL comparison issue. But that aside, you should really use an anti-join:
SELECT a.*,b.*
FROM test1 a
LEFT JOIN test2 b
ON a.p_id=b.p_id
WHERE b.p_id IS NULL;
It's cleaner and generally more efficient.

Nothing wrong with NOT IN, since you are filtering the right table in Where clause it is implicitly converted to INNER JOIN.
Without the Where clause result will be like this
+------+------+--------+--------+--------+
| p_id | name | o_id | name | p_id |
+------+------+--------+--------+--------+
| 1 | Paul | 1 | London | 1 |
| 1 | Paul | 2 | Paris | 1 |
| 2 | Marc | (null) | (null) | (null) |
+------+------+--------+--------+--------+
In this if you are applying the filter
WHERE b.p_id NOT IN(SELECT DISTINCT p_id FROM test2);
The sub-query returns 1 which is present in the last column of above result. So you are not getting any result.
In case if you are wondering why the last record having NULL is not returned since it is not 1. It is because NULL cannot compared using =, IN, NOT IN,etc.. We need to use IS operator for checking NULL
Proper way to do this would be using NOT EXISTS. Handles NULL values as well
select *
from test1 a
Where Not Exists (select 1 from test2 b Where a.p_id = b.p_id)

In general, WHERE is executed first, then JOIN. Also, when you use LEFT JOIN, it's the LEFT table that has everything included, so your shouldn't expect one row with p_id=2 after the JOIN, as you stated.

Related

Understanding self-join with different ON clause [duplicate]

This question already has answers here:
Understanding the number of matched rows in LEFT JOIN
(5 answers)
Closed 4 years ago.
Here is my table structure:
// mytable
+----+---------+----------+
| id | related | subject |
+----+---------+----------+
| 1 | NULL | subject1 |
| 2 | 1 | |
+----+---------+----------+
And there are two queries which seem identical to me, but have different results in tests:
SELECT a.id, IFNULL(b.subject, a.subject)
FROM mytable a
LEFT JOIN mytable b ON a.id = b.related
+----+----------+
| 1 | subject1 |
| 2 | |
+----+----------+
SELECT a.id, IFNULL(b.subject, a.subject)
FROM mytable a
LEFT JOIN mytable b ON b.id = a.related
+----+----------+
| 1 | subject1 |
| 2 | subject1 |
+----+----------+
Look, it is self-join. So why the result of ON a.id = b.related and ON b.id = a.related is different?
Running your queries with SELECT * to uncover some of the mystery:
Your first query:
SELECT *
FROM mytable a
LEFT JOIN mytable b ON a.id = b.related;
Produces the following:
+----+---------+----------+--------+----------+----------+
| id | related | subject | id1 | related1 | subject1 |
+----+---------+----------+--------+----------+----------+
| 2 | 1 | <null> | <null> | <null> | <null> |
| 1 | <null> | subject1 | 2 | 1 | <null> |
+----+---------+----------+--------+----------+----------+
Your second query:
SELECT *
FROM mytable a
LEFT JOIN mytable b ON b.id = a.related;
Produces this:
+----+---------+----------+--------+----------+----------+
| id | related | subject | id1 | related1 | subject1 |
+----+---------+----------+--------+----------+----------+
| 2 | 1 | <null> | 1 | <null> | subject1 |
| 1 | <null> | subject1 | <null> | <null> | <null> |
+----+---------+----------+--------+----------+----------+
Your first query is joining id 2 to related 2. There is no related 2 and since id 2 has no subject, you get no subject out of your ifnull().
Your second query is joining related 1 to id 1 for a.id 2. This pulls a subject from b.id 1 and you get a subject back for id 2 as a result.
You really have to mentally map out how a LEFT JOIN works here and how it is affected by your ON clause. You have two very different queries here as a result.
Both queries are getting all rows from a.
Both queries are doing an outer join to b.
What's different is the condition that is used for finding a "match" from b.
(The queries might seem to be identical, but the truth is that they are significantly different.)
As a demonstration, run a query like this:
SELECT a.id AS `a_id`
, a.related AS `a_related`
, a.subject AS `a_subject`
, b.id AS `b_id`
, b.related AS `b_related`
, b.subject AS `b_subject`
FROM mytable a
LEFT
JOIN mytable b
ON b.related = a.id
And then change the ON clause
ON b.id = a.related
You might also want to repeat both of those queries removing the LEFT keyword (to make it an inner join instead of an outer join.)
One way to look at an outer join... when a matching row from b is not found, a dummy row from b is invented. That dummy row consists entirely of NULL values, and the dummy row is joined to a, as if it were a matching row. (This isn't necessarily what the database engine actually does, but thinking about it this way gives us an insight to the results that the outer join returns.)
Take a close look at the results of the queries, and you will be able to see why the results by the queries are different.
The fact that a and b refer to the same table is a special case. We would see the same results if those were two different tables. It really doesn't matter... to the query, those are two different sources which just happen to refer to the same table. Don't let the fact that a and b refer to the same table cause any confusion.

Mysql: String comparison using = operator

I'm new to mysql and I'm learning join queries now. when I compare strings I got weird output that mentioned below. I have two tables
MariaDB [test]> select * from classroom;
+---------+-----------+
| subject | classroom |
+---------+-----------+
| maths | 1 |
| englishs| 2 |
+---------+-----------+
Table student:
MariaDB [test]> select * from student;
+------+------+---------+
| id | name | subject |
+------+------+---------+
| 1 | abc | maths |
| 2 | abcd | english |
+------+------+---------+
I have tried this query
select b.classroom,a.name,b.subject from student a left join classroom b
on a.subject = b.subject ;
and the output is like,
+-----------+------+---------+
| classroom | name | subject |
+-----------+------+---------+
| 1 | abc | maths |
| NULL | abcd | NULL |
+-----------+------+---------+
I don't understand why am getting second row if the strings are doesn't match in both tables.
This has nothing to do with string comparison.
You are using an outer join, but the result you are expecting is the one that inner joins gives.
Take a look at this post for a good explanation about inner and outer joins.
From that post:
An inner join of A and B gives the result of A intersect B, i.e. the inner part of a Venn diagram intersection.
An outer join of A and B gives the results of A union B, i.e. the outer parts of a Venn diagram union.
try this may be this will work.
select b.classroom,a.name,b.subject from student a,classroom b where a.subject = b.subject

Select value from left table and full join right table

This is a bit difficult to explain, but I'll give my best:
Let's say, I have table A:
event | task | ref_person
------+------+-----------
1 | 20 | 1
2 | 9 | 2
And I have table B (containing person):
id | name
---+-----
1 | foo
2 | bar
3 | jim
What does a MySQL-query look like, that produces this sort of table:
event | task | person
------+------+-------
1 | 20 | foo
1 | NULL | bar
1 | NULL | jim
2 | NULL | foo
2 | 9 | bar
2 | NULL | jim
My current approach is by using a RIGHT JOIN, but this won't get me the event combined with the NULL-value.
This is what my current statement looks like:
SELECT
a.*,
b.name
FROM
a
RIGHT JOIN b
ON b.id = a.ref_person
ORDER BY
a.event,
b.name
Notice
sqlfiddle seems down, I'll add one as soon as it's up again
You want to do a cross join to get all the rows, then case logic to get the task:
select a.event,
(case when a.ref_person = b.id then a.task end) as task,
b.name
from tablea a cross join
tableb b ;

Select rows from one table, join most recent row from other table with one-to-many relationship

What I would like to do is select a specific set of rows from one table (table A) and join with another table (table B), such that only one record will appear from table A, joined with the most recent record from table B, based on a datetime column.
For example, table A has this structure (heavily simplified):
id | col_1 | col_2
---+-----------+----------------
1 | something | something else
2 | val_1 | val_2
3 | stuff | ting
4 | goats | sheep
And table B looks like this:
id | fk_A | datetime_col | col_3
---+-----------+---------------------+--------
1 | 1 | 2012-02-01 15:42:14 | Note 1
2 | 1 | 2012-02-02 09:46:54 | Note 2
3 | 1 | 2011-11-14 11:18:32 | Note 3
4 | 2 | 2009-04-30 16:49:01 | Note 4
5 | 4 | 2013-06-21 15:42:14 | Note 5
6 | 4 | 2011-02-01 18:44:24 | Note 6
What I would like is a result set that looks like this:
id | col_1 | col_2 | datetime_col | col_3
---+-----------+----------------+---------------------+--------
1 | something | something else | 2012-02-02 09:46:54 | Note 2
2 | val_1 | val_2 | 2009-04-30 16:49:01 | Note 4
3 | stuff | ting | NULL | NULL
4 | goats | sheep | 2013-06-21 15:42:14 | Note 5
So you can see that table B has been joined with table A on B.fk_A = A.id, but only the most recent corresponding record from B has been included in the results.
I have tried various combinations of SELECT DISTINCT, LEFT JOIN and sub-queries and I just can't get it to work, I either get no results or something like this:
id | col_1 | col_2 | datetime_col | col_3
---+-----------+----------------+---------------------+--------
1 | something | something else | 2012-02-01 15:42:14 | Note 1
1 | something | something else | 2012-02-02 09:46:54 | Note 2
1 | something | something else | 2011-11-14 11:18:32 | Note 3
2 | val_1 | val_2 | 2009-04-30 16:49:01 | Note 4
3 | stuff | ting | NULL | NULL
4 | goats | sheep | 2013-06-21 15:42:14 | Note 5
4 | goats | sheep | 2011-02-01 18:44:24 | Note 6
...with the records from table A repeated.
Obviously my SQL-fu is just not good enough for this task, so I would be most grateful if one of you kind people could point me in the right direction. I have done quite a bit of Googling and searching around SO and I have not found anything that matches this specific task, although I am sure the question has been asked before - I suspect there is an SQL keyword that I am forgetting/unaware of and if I searched for that I would find the answer instantly.
I think this question deals with the same problem although I am not 100% sure and the accepted answer involves SELECT TOP, which I thought (?) was not valid in MySQL.
As my actual query is much more complicated and joins several tables, I shall show it in case it makes any difference to how this is done:
SELECT `l` . * , `u`.`name` AS 'owner_name', `s`.`name` AS 'acquired_by_name', `d`.`type` AS `dtype` , `p`.`type` AS `ptype`
FROM `leads` l
LEFT JOIN `web_users` u ON `u`.`id` = `l`.`owner`
LEFT JOIN `web_users` s ON `s`.`id` = `l`.`acquired_by`
LEFT JOIN `deal_types` d ON `d`.`id` = `l`.`deal_type`
LEFT JOIN `property_types` p ON `p`.`id` = `l`.`property_type`
This query works and returns the data I want (sometimes I also add a WHERE clause but this works fine), but I would now like to:
LEFT JOIN `notes` n ON `n`.`lead_id` = `l`.`id`
...where notes contains the "many records" and leads contains the "one record" they relate to.
It should also be noted that potentially I would also want to return the oldest record (in a different query) but I imagine this will be a simple case of inverting an ASC/DESC somewhere, or something similarly easy.
I think this will help you:
SELECT A.id, A.col_1, A.col_2, A.datetime_col, A.col_3
FROM
(SELECT B.id, B.col_1, B.col_2, C.datetime_col, C.col_3
FROM tableA B LEFT OUTER JOIN tableB C ON B.id = C.id
ORDER BY C.datetime_col desc) as A
GROUP BY A.id

Join to same table several times in a single query?

I'm trying to get deeper into "advanced"-SQL'ing but have a slight problem with some pretty basic stuff.
I have one table, where one row refers to another row. There are of course unique id's as well but I'll skip those here:
+----------+----------------------------+
| field | name | value |
+----------+----------------------------+
| 1 | aa | 0 |
| 1 | ab | 0 |
| 2 | ba | 1 |
| 2 | bb | 1 |
| 3 | ca | 2 |
| 3 | cb | 2 |
+----------+----------------------------+
What I want to accomplish is to get field when I know field=3 and name= 'ca'.
I've tried something like this:
SELECT table.value AS parent_id FROM table WHERE table.field=3 AND table.name='ca'
That works at some point, it lists everything at 2:field, I need then to find value 1 from the field. BUT if the 2:field does not have any references (as above illustrated, 1:field) then I need the last value which will be 2:field.
How would that be possible in MySQL?
What you need is a self-join by using the same table TWICE in the same query, but different ALIAS...
select
t1.field,
t1.name,
t1.value as ThisIsYourParentKey,
t2.name as ParentName,
t2.value as GrandParentKey
from
YourTable t1
left join YourTable t2
on t1.value = t2.field
where
t1.name = 'a2'