How does NOT IN subquery work with NULL values? - mysql

I am confused how the following works in MySQL. In the queries below, the first SELECT returns all rows from table2 while the second SELECT returns none of the rows. Is there an explanation of how NULL works with the NOT IN operator. Is there any documentation to explains this?
CREATE TABLE table1 (
id INT UNSIGNED NOT NULL AUTO_INCREMENT,
PRIMARY KEY (id)
);
CREATE TABLE table2 (
id INT UNSIGNED NOT NULL AUTO_INCREMENT,
table1_id INT UNSIGNED,
PRIMARY KEY (id)
);
INSERT INTO table2 (id, table1_id) VALUES (1, NULL);
SELECT COUNT(*) FROM table2 WHERE table1_id NOT IN (SELECT id FROM table1);
+----------+
| COUNT(*) |
+----------+
| 1 |
+----------+
INSERT INTO table1 (id) VALUES (1);
SELECT COUNT(*) FROM table2 WHERE table1_id NOT IN (SELECT id FROM table1);
+----------+
| COUNT(*) |
+----------+
| 0 |
+----------+

The reason is that according to the SQL specification, Foo IN(A,B,C) translates to ( Foo = A Or Foo = B Or Foo = C ). Thus, if we have Foo In(Null, 1, 2) we get Foo = Null Or Foo = 1 Or Foo = 2. Since Foo = Null is always UNKNOWN and evaluated to False for purposes of filtering, Nulls in your IN expression will return no results.

You can add IFNULL(id, '') to sub-query so it will work, example:
SELECT COUNT(*) FROM table2 WHERE table1_id NOT IN (SELECT IFNULL(id, '') FROM table1);

Related

Preferential Select Query

The issue that we are trying to tackle is best shown with the following illustrative example:
CREATE TABLE table_1
(
id INT UNSIGNED AUTO_INCREMENT,
colA INT,
colB VARCHAR(10),
PRIMARY KEY(id)
);
CREATE TABLE table_2
(
id INT UNSIGNED AUTO_INCREMENT,
colY INT,
colZ VARCHAR(10),
PRIMARY KEY(id)
);
INSERT INTO table_1(colA, colB) VALUES(1, 'NPD5A6V9EI'), (2, 'ISO4IK42YQ'), (4, 'J12QAN4O42'), (6,'V8YTZFHCU4');
INSERT INTO table_2(colY, colZ) VALUES(3, 'RBUNWLO753'), (4, 'X2BCEY7O8B'), (5, 'BNUS7R4225'), (6, '72NOWCTH5G');
We would like to select our result based on the value of colA in table_1 but if that does not return a result , we would like to return our result based on the value of colY in table_2. In other words SELECTing from table_2 is the backup for SELECTing from table_1. The query returns NULL only if neither table satisfies the condition.
A pseudo SQL query could be:
SELECT colB FROM table_1 where colA = 3 OR SELECT colZ FROM table_2 where colY = 3;
The query should return output based on the following I/O table:
I O
= =
1 NPD5A6V9EI -- From table_1
2 ISO4IK42YQ -- From table_1
3 RBUNWLO753 -- From table_2
4 J12QAN4O42 -- From table_1 (has precedence over table_2 entry)
5 BNUS7R4225 -- From table_2
6 V8YTZFHCU4 -- From table_1 (has precedence over table_2 entry)
9 NULL
Kindly suggest solutions that:
make use of the latest DB features (for posterity)
work with MySQL version 5.6.51 (for our application)
Write a subquery that generates all the I rows that you want.
Then left join this with the two tables, and use IFNULL to take the matching value from table_1 in preference to table_2.
SELECT ids.id AS I, IFNULL(t1.colB, t2.colZ) AS O
FROM (SELECT 1 AS id UNION ALL SELECT 2 UNION ALL SELECT 3 ... UNION ALL SELECT 9) AS ids
LEFT JOIN table_1 AS t1 ON t1.colA = ids.id
LEFT JOIN table_2 AS t2 ON t2.colY = ids.id
ORDER BY ids.id
I simply don't kn ow where you get your last row.
also with Myql 8 you can ise the window function ROW_NUMBER
the rest is self explantory, the sorting comes from colA and Col1, when there are teh same numbers the second column orderby2 comes and sorts first for the first table
CREATE TABLE table_1
(
id INT UNSIGNED AUTO_INCREMENT,
colA INT,
colB VARCHAR(10),
PRIMARY KEY(id)
);
CREATE TABLE table_2
(
id INT UNSIGNED AUTO_INCREMENT,
colY INT,
colZ VARCHAR(10),
PRIMARY KEY(id)
);
INSERT INTO table_1(colA, colB) VALUES(1, 'NPD5A6V9EI'), (2, 'ISO4IK42YQ'), (4, 'J12QAN4O42'), (6,'V8YTZFHCU4');
INSERT INTO table_2(colY, colZ) VALUES(3, 'RBUNWLO753'), (4, 'X2BCEY7O8B'), (5, 'BNUS7R4225'), (6, '72NOWCTH5G');
SELECT #i := #i +1 AS I,
colB AS O
FROM
(SELECT colA as orderby1,colB,1 ordberby2 froM table_1
UNION
SELECT colY, colZ,2 froM table_2 ) t1,(SELECT #i := 0) t2
ORDER BY orderby1,ordberby2
I | O
-: | :---------
1 | NPD5A6V9EI
2 | ISO4IK42YQ
3 | RBUNWLO753
4 | J12QAN4O42
5 | X2BCEY7O8B
6 | BNUS7R4225
7 | V8YTZFHCU4
8 | 72NOWCTH5G
db<>fiddle here

Select IDs from table where col_2 is not null (duplicate id)

I have a table that contains the following data
ID | Col_2
A | 'ABC'
A | 'GHI'
A | null
B | 'null'
B | 'HJH'
B | 'NBN'
C | null
I have two cases to cater :
Duplicate Ids:
Incase of duplicate ids I only want those IDs which do not have null in col_2
E.g.
Query should return :
A | 'ABC'
A | 'GHI'
B | 'HJH'
B | 'NBN'
Non Duplicate Id:
Incase of non duplicate id the query should return result irrespective of the value present in col_2
So the final result of the query should be
ID | Col_2
A | 'ABC'
A | 'GHI'
B | 'HJH'
B | 'NBN'
C | null
I have managed to create the following query where it is fulfilling the duplicate id case not the non duplicate case.
Query :
select id,col_2
from mytable
group by id,col_2
having (sum(case when col_2 is not null then 1 else 0 end) > 0)
What changes should be made in the query to cater the non duplicate case also.
Thanks in advance!!!
Assuming NULL is NULL and not a string and that you have only one NULL value per id, you can do something like this:
select t.*
from t
where t.col_2 is not null or
not exists (select 1 from t t2 where t2.id = t.id and t2.col_2 is not null);
If your null values can be duplicated and you want only one row for them, then tweak this to:
select t.*
from t
where t.col_2 is not null
union all
select distinct t.*
from t
where not exists (select 1 from t t2 where t2.id = t.id and t2.col_2 is not null);
Here is a db<>fiddle.
For performance, you want an index on (id, col_2).
If you just want the col_2 values for each id, you can concatenate them on each row:
select id, group_concat(col_2)
from t
group by id;
Another alternative uses window functions:
select t.id, col_2
from (select t.*,
rank() over (partition by id order by col_2 is not null desc) as seqnum
from t
) t
where seqnum = 1;

how to optimize sql when find the max record of each group while table is large?

I have a table which contains nearly 1 million+ records. I want to find the max record of each group.
Here is my sql:
SELECT *
FROM t
WHERE id IN (SELECT max(id) AS id
FROM t
WHERE a = 'some' AND b = 0
GROUP BY c, d);
Table declares as follow.
CREATE TABLE `t` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT COMMENT 'id',
`a` varchar(32) NOT NULL COMMENT 'a',
`b` tinyint(3) unsigned NOT NULL COMMENT 'b',
`c` bigint(20) unsigned NOT NULL COMMENT 'c',
`d` varchar(32) NOT NULL COMMENT 'd',
PRIMARY KEY (`id`),
KEY `idx_c_d` (`c`,`d`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='test table';
I have a union index on c and d. So the second statement(SELECT max(id) AS id FROM t WHERE a = 'some' AND b = 0 GROUP BY c, d) execute in 200ms. But the total statement cost nearly 6 seconds(The result contains 5000 rows).
Here is the explain shows(some columns are omitted).
+-------------+-------+-------+---------------+--------+---------+----------+--------------------------+
| select_type | table | type | possible_keys | key | rows | filtered | Extra |
+-------------+-------+-------+---------------+--------+---------+----------+--------------------------+
| PRIMARY | t | ALL | NULL | NULL | 9926024 | 100.00 | Using where |
| SUBQUERY | t | index | idx_1 | idex_1 | 9926024 | 1.00 | Using where; Using index |
+-------------+-------+-------+---------------+--------+---------+----------+--------------------------+
All different ways to "skin-a-cat", but here's slightly different... Since you are looking for IN, I would move that query into the front position. Also, it MAY help using MySQL's language specific keyword "STRAIGHT_JOIN" telling MySQL to do in the order you have listed. Again it MAY help
SELECT
T.*
FROM
(SELECT max(id) AS id
FROM t
WHERE b = 0
AND a = 'some'
GROUP BY c, d) PQ
JOIN T
on PQ.ID = T.ID
I would also have index specifically in order of
(b, a, c, d, id )
Obviously keep the primary ID key, and if using STRAIGHT_JOIN, would be
SELECT STRAIGHT_JOIN
T.* ( ... rest of query)
you can try by using corelated subquery and creating index in column c and d
SELECT t1.* FROM table_name t1
WHERE id = (SELECT max(id) AS id FROM table_name t2 where
t1.c=t2.c and t1.d=t2.d
) and t1.a = 'some' AND t1.b = 0
Avoiding the need for a sub query
SELECT t1.*
FROM t t1
LEFT OUTER JOIN t t2
ON t1.c = t2.c
AND t1.d = t2.d
AND t1.id < t2.id
AND t2.id IS NULL
AND t2.a = 'some'
AND t2.b = 0
I recommend using a correlated subquery:
SELECT t.*
FROM t
WHERE t.id IN (SELECT MAX(t2.id)
FROM t t2
WHERE t2.c = t.c AND t2.d = t.d AND
t2.a = 'some' AND t2.b = 0
);
This assumes that id is unique in the table.
For performance, you want an index on (c, d, a, b, id).

How can I join two tables but only return rows that don't match specific column which include NULL as well?

I have two tables which look like this:
T1: ID | oldID
T2: ID | newID
I basically need to join these tables when their IDs match. However, I only want to return the results like: ID | oldID | newID where oldID do not equal to newID.
Example data:
T1: 1 | 100
2 | NULL
3 | 200
4 | 500
T2: 1 | NULL
2 | 300
3 | 200
4 | 400
My expected result:
T3: 1 | 100 | NULL
2 | NULL | 300
4 | 500 | 400
Can anyone point me on the right track?
Try this answer, Hope this helps
CREATE TABLE #T1 (ID INT, oldID INT)
INSERT INTO #T1 VALUES(1,100)
INSERT INTO #T1 VALUES(2,NULL)
INSERT INTO #T1 VALUES(3,200)
INSERT INTO #T1 VALUES(4,500)
CREATE TABLE #T2 (ID INT, [NewId] INT)
INSERT INTO #T2 VALUES(1,NULL)
INSERT INTO #T2 VALUES(2,300)
INSERT INTO #T2 VALUES(3,200)
INSERT INTO #T2 VALUES(4,400)
select T1.ID,T1.oldID,T2.[NewId]
FROM #T1 T1, #T2 T2
WHERE T1.id=T2.id and ISNULL(T1.oldID,0) != ISNULL(T2.[NewId],0)
DROP TABLE #T1
DROP TABLE #T2
For SQL Server
SELECT T1.ID ,T1.oldID,T2.[newID]
FROM T1
INNER JOIN T2 ON T1.ID = T2.ID AND ISNULL(T1.oldID,0) <> ISNULL(T2.[newID],0)
For MySql, use IFNULL instead of ISNULL.
If zero(0) is a value in either oldID or newID, you can use any other unused values for ISNULL function(ie -1 )
select t1.ID,t1.oldID,t2.NewId
FROM T1 t1, T2 t2
WHERE t1.ID=t2.ID
AND COALESCE(t1.oldID,0) <> COALESCE(t2.NewId,0)
Explanation
select t1.ID,t1.oldID,t2.NewId
FROM T1 t1, T2 t2
WHERE t1.ID=t2.ID #IDs are equal in two tables
AND t1.oldID <> t2.NewId
Result is:
# ID, oldID, NewId
'4', '500', '400'
If you check like this, you will not get the result as you expected, the reason is NULL value not handled properly.
But If you use COALESE means it will check as below:
COALESCE(value1,value2,value3,...)
The above syntax is equivalent to the following IF-THEN-ELSE statement
IF value1 is not NULL THEN
result = value1;
ELSIF value2 is not NULL THEN
result = value2;
ELSIF value3 is not NULL THEN
result = value3;
ELSE
result = NULL;
END IF;
Example:
SELECT COALESCE(NULL, 2, 3); Returns 2
So In our Query,COALESCE(t1.oldID,0) will return 0 if the t1.oldID is null..
select t1.ID,t1.oldID,t2.NewId
FROM T1 t1, T2 t2
WHERE t1.ID=t2.ID
AND COALESCE(t1.oldID,0) <> COALESCE(t2.NewId,0)

mysql - Unable to get all columns with select *

I have following 2 tables t1, t2
CREATE TABLE t1 (
id INT PRIMARY KEY
);
CREATE TABLE t2 (
id INT PRIMARY KEY
);
INSERT INTO t1 VALUES (1),(2),(3);
INSERT INTO t2 VALUES (2),(3),(4);
I am running
select * from t1 left join t2 using(id);
Result:
+----+
| id |
+----+
| 1 |
| 2 |
| 3 |
+----+
On running script:
select t1.id, t2.id from t1 left join t2 using(id);
Result:
+----+------+
| id | id |
+----+------+
| 1 | NULL |
| 2 | 2 |
| 3 | 3 |
+----+------+
select * is supposed to return all the columns, so, why I am not getting 2 rows when I am using select *?
Note: I am using Mysql
as doc says:
Natural joins and joins with USING, including outer join variants, are processed according to the SQL:2003 standard:
Redundant columns of a NATURAL join do not appear. Consider this set of statements:
CREATE TABLE t1 (i INT, j INT);
CREATE TABLE t2 (k INT, j INT);
INSERT INTO t1 VALUES(1, 1);
INSERT INTO t2 VALUES(1, 1);
SELECT * FROM t1 JOIN t2 USING (j);
column j is named in the USING clause and should appear only once in the output, not twice.
This is the normal behavior of the USING clause. Here's a quote from MySQL documentation:
Similarly, in the second SELECT statement, column j is named in the USING clause and should appear only once in the output, not twice.
(JOIN syntax)
And here's from Wikipedia:
[...] any columns mentioned in the USING list will appear only once, with an unqualified name, rather than once for each table in the join.
(JOIN (SQL)