How can I select a grouped row where it only contains null values in another column? - mysql

It's a bit confusing, so I'll try to exemplify in the table below:
example-table
id
data1
data2
data3
1
NULL
NULL
NULL
1
NULL
NULL
NULL
2
1
NULL
1
2
1
NULL
NULL
2
1
NULL
1
3
1
NULL
NULL
3
1
NULL
1
4
NULL
NULL
NULL
So, grouping by the same ID, I only want to display those IDs that all their data, from all columns, are null. In the table example above, only IDs 1 and 4.
The code I'm trying to use - just for reference:
select id
from example-table
group by id, data1, data2, data3
having data1 is null and data2 is null and data3 is null;
Any suggestions?

You can group by id only and set the conditions in the HAVING clause:
SELECT id
FROM tablename
GROUP BY id
HAVING COALESCE(MAX(data1), MAX(data2), MAX(data3)) IS NULL;

Related

How to group by one column with multi column query in hive

Ihave the follwing data:
ID
Date1
Date2
Date3
1
2022/01/01
null
null
1
null
2021/04/01
null
2
2022/03/01
null
null
2
null
2021/06/01
null
3
2022/01/01
null
null
4
null
2021/04/01
null
and I'm trying to get the following result:
ID
Date1
Date2
Date3
1
2022/01/01
2021/04/01
null
2
2022/03/01
2021/06/01
null
3
2022/01/01
null
null
4
null
2021/04/01
null
trying group by ID, isn't working as I get the error expression not in group by key.
Any help is well received.
Aggregate by the ID column and then take the MAX() of the other three columns:
SELECT ID, MAX(Date1) AS Date1, MAX(Date2) AS Date2, MAX(Date3) AS Date3
FROM yourTable
GROUP BY ID;
Note that the MAX() aggregate function will by default ignore NULL values. Therefore, the non NULL values from each pair of similar ID records will be retained in the above query.

Delete rows with same columns data

I am having a temp table with the following data:
tbl_t
id name_id date t1 t2 s1 s2
1 25 10/05/20 same same NULL NULL
2 23 11/05/21 same same home NULL
3 25 12/05/20 same NULL NULL NULL
4 25 13/06/20 NULL NULL NULL NULL
Desire output:
tbl_t
id name_id date t1 t2 s1 s2
2 23 11/05/21 same same home NULL
3 25 12/05/20 same NULL NULL NULL
I want to delete all rows where t1=t2 and s1=s1
I tried the following sql but i noticed that it is not working.
DELETE FROM tbl_t WHERE t1=t2 AND s1=s2
The problem are the NULL values. Use the NULL-safe comparison operator:
DELETE FROM tbl_t
WHERE t1 <=> t2 AND s1 <=> s2;
Almost any comparison with NULL results in NULL -- including NULL = NULL. And NULL values are treated as false in a WHERE clause (or equivalently WHERE clauses only keep rows where the condition evaluates unequivocally to true).

MySQL - How to return distinct IDs where all rows for the same ID have null field value

I have a query with two joins that returns this data:
ID Score
1 NULL
1 5
1 6
2 NULL
2 NULL
3 5
3 8
3 3
3 NULL
3 NULL
3 7
4 NULL
4 NULL
4 3
4 9
I would like to return the unique IDs which have a NULL value in the Score column for each of the rows with the same ID. In this case, the query should only return one row with the ID of 2 since that is the only ID which has all NULL values in the Score column.
Thank you!
You could aggregate your original query by making it as a sub select and aplly count() on score column which has null values, So if all values are NULL for a particular ID then count will return 0, thus using having clause you can filter your results using result of count(Score)
select ID,
count(Score) count_null
from (your query)t
group by ID
having count_null = 0
Demo Based on your sample data set

Are these SQL statements equal?

I have the following table:
| sample_id (varchar, unique) | field1 (int) | field2 (int) | ...
--------------------------------------------------------------------
| 9b7acb476c4ab04c7ddbc | 100 | 56 | ...
| a2e4df67e98ccaf088abf | 23 | NULL | ...
| fcbe9cecd6b96cba7c6ee | NULL | 43 | ...
...
I have the following code created by a prior user to query two fields at the same time and getting a random subset of the rows:
SELECT sample_id, field1, field2
FROM samples
WHERE field1 != NULL
UNION ALL
SELECT sample_id, field1, field2
FROM samples
WHERE field2 != NULL
ORDER BY RAND()
LIMIT 1000
I thought of optimizing the code by rewriting the query as:
SELECT sample_id, field1, field2
FROM samples
WHERE field1 != NULL
OR field2 != NULL
ORDER BY RAND()
LIMIT 1000
Based on some documentation I read here it seems that both the queries are equivalent but I'm not sure how the ORDER BY RAND() line would be handled in the query. Is it only applied to the second query (i.e. the query after the UNION ALL)?
[THIS WAS THE ORIGINAL VERSION OF THE QUESTION]
Not at all. != NULL will filter out all data, because almost all comparisons to NULL return NULL, which is treated as false.
!= '' will return all values that do not contain an empty string and are not NULL.
The correct comparisons to NULL use is null and is not null.
[AFTER THE EDIT]
The query you want is:
SELECT sample_id, field1, field2
FROM samples
WHERE field1 IS NOT NULL OR field2 IS NOT NULL
ORDER BY RAND()
LIMIT 1000;
First of all, your queries are currently not working. They both select using conditions that are false, as all NULL comparisons are:
mysql> select '' != NULL, NULL != NULL, 0 != NULL, 'hello' != NULL, 42 != NULL, (1=0)!=NULL, (1=1)!=NULL;
+------------+--------------+-----------+-----------------+------------+-------------+-------------+
| '' != NULL | NULL != NULL | 0 != NULL | 'hello' != NULL | 42 != NULL | (1=0)!=NULL | (1=1)!=NULL |
+------------+--------------+-----------+-----------------+------------+-------------+-------------+
| NULL | NULL | NULL | NULL | NULL | NULL | NULL |
+------------+--------------+-----------+-----------------+------------+-------------+-------------+
1 row in set (0.00 sec)
select 1 from test where null;
Empty set (0.00 sec)
Now if you use a different condition, e.g. WHERE field1 IS NOT NULL, the queries may be still be subtly not equivalent.
The first updated UNIONed subquery will now return rows where field1 is not null. This will be supplemented by rows where field2 is not null.
The UNION ALL suppresses duplicates.
A row with field1 and field2 both null, if it exists, will be selected twice by UNION ALL, and have double probability of being chosen.
So in both cases you get at most 1000 records, but the two sets will be subtly different.
It may well be that your optimized query, once updated with IS NOT NULL instead of !=, is the actual query you needed from the beginning.
But if you do want double probability for doubly-nulled rows, then the optimized query will not be equivalent, and it may skew results if you use those data as input in some stochastic process.
The order by is applied to the result of the union.

Sql server Query to get the Table Altered Result

I have a table as below
ClientID AccountNumber BalanceOnDay0 BalanceOnDay1 BalanceOnDay2 BalanceOnDay3 BalanceOnDay4 BalanceOnDay5 BalanceOnDay6 BalanceOnDay7
ABC1 123 10 NULL NULL NULL NULL NULL NULL NULL
ABC1 123 NULL NULL NULL NULL NULL NULL NULL 3
I would like to see the result as beblow.
ClientID AccountNumber BalanceOnDay0 BalanceOnDay1 BalanceOnDay2 BalanceOnDay3 BalanceOnDay4 BalanceOnDay5 BalanceOnDay6 BalanceOnDay7
ABC1 123 10 NULL NULL NULL NULL NULL NULL 3
Please suggest!
You can use SUM() if you want to combine the balance values, if you have multiple records:
select clientid,
accountnumber,
sum(BalanceOnDay0) BalanceOnDay0,
sum(BalanceOnDay1) BalanceOnDay1,
sum(BalanceOnDay2) BalanceOnDay2,
sum(BalanceOnDay3) BalanceOnDay3,
sum(BalanceOnDay4) BalanceOnDay4,
sum(BalanceOnDay5) BalanceOnDay5,
sum(BalanceOnDay6) BalanceOnDay6,
sum(BalanceOnDay7) BalanceOnDay7
from table1
group by clientid, accountnumber
See SQL Fiddle with Demo