GROUP BY - do not group NULL - mysql

I'm trying to figure out a way to return results by using the group by function.
GROUP BY is working as expected, but my question is: Is it possible to have a group by ignoring the NULL field. So that it does not group NULLs together because I still need all the rows where the specified field is NULL.
SELECT `table1`.*,
GROUP_CONCAT(id SEPARATOR ',') AS `children_ids`
FROM `table1`
WHERE (enabled = 1)
GROUP BY `ancestor`
So now let's say I have 5 rows and the ancestor field is NULL, it returns me 1 row....but I want all 5.

Perhaps you should add something to the null columns to make them unique and group on that? I was looking for some sort of sequence to use instead of UUID() but this might work just as well.
SELECT `table1`.*,
IFNULL(ancestor,UUID()) as unq_ancestor
GROUP_CONCAT(id SEPARATOR ',') AS `children_ids`
FROM `table1`
WHERE (enabled = 1)
GROUP BY unq_ancestor

When grouping by column Y, all rows for which the value in Y is NULL are grouped together.
This behaviour is defined by the SQL-2003 standard, though it's slightly surprising because NULL is not equal to NULL.
You can work around it by grouping on a different value, some function (mathematically speaking) of the data in your grouping column.
If you have a unique column X then this is easy.
Input
X Y
-------------
1 a
2 a
3 b
4 b
5 c
6 (NULL)
7 (NULL)
8 d
Without fix
SELECT GROUP_CONCAT(`X`)
FROM `tbl`
GROUP BY `Y`;
Result:
GROUP_CONCAT(`foo`)
-------------------
6,7
1,2
3,4
5
8
With fix
SELECT GROUP_CONCAT(`X`)
FROM `tbl`
GROUP BY IFNULL(`Y`, `X`);
Result:
GROUP_CONCAT(`foo`)
-------------------
6
7
1,2
3,4
5
8
Let's take a closer look at how this is working
SELECT GROUP_CONCAT(`X`), IFNULL(`Y`, `X`) AS `grp`
FROM `tbl`
GROUP BY `grp`;
Result:
GROUP_CONCAT(`foo`) `grp`
-----------------------------
6 6
7 7
1,2 a
3,4 b
5 c
8 d
If you don't have a unique column that you can use, you can try to generate a unique placeholder value instead. I'll leave this as an exercise to the reader.

GROUP BY IFNULL(required_field, id)

SELECT table1.*,
GROUP_CONCAT(id SEPARATOR ',') AS children_ids
FROM table1
WHERE (enabled = 1)
GROUP BY ancestor
, CASE WHEN ancestor IS NULL
THEN table1.id
ELSE 0
END

Maybe faster version of previous solution in case you have unique identifier in table1 (let suppose it is table1.id) :
SELECT `table1`.*,
GROUP_CONCAT(id SEPARATOR ',') AS `children_ids`,
IF(ISNULL(ancestor),table1.id,NULL) as `do_not_group_on_null_ancestor`
FROM `table1`
WHERE (enabled = 1)
GROUP BY `ancestor`, `do_not_group_on_null_ancestor`

To union multiple tables and group_concat different column and a sum of the column for the (unique primary or foreign key) column to display a value in the same row
select column1,column2,column3,GROUP_CONCAT(if(column4='', null, column4)) as
column4,sum(column5) as column5
from (
select column1,group_concat(column2) as column2,sum(column3 ) as column3,'' as
column4,'' as column5
from table1
group by column1
union all
select column1,'' as column2,'' as column3,group_concat(column4) as
column4,sum(column5) as column5
from table 2
group by column1
) as t
group by column1

Related

Sorting by frequency in MySQL/SQL

Is there way of sorting by frequency that a value occurs? If a value appears in multiple rows, would we just use the WHERE clause? Is it just about making the query more specific?
As a simple example:
CREATE TABLE mytable
( id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY
, val VARCHAR(15) NOT NULL
);
INSERT INTO mytable (id, val) VALUES
(1,'one')
,(2,'prime')
,(3,'prime')
,(4,'square')
,(5,'prime')
,(6,'six')
,(7,'prime')
,(8,'cube')
,(9,'square')
;
We can write a simple query to return the rows
SELECT t.val
, t.id
FROM mytable t
ORDER BY t.val
But what query do we use to get the most frequently occurring values listed first? To return a result like this:
freq val id
---- ------ --
4 prime 2
4 prime 3
4 prime 5
4 prime 7
2 square 4
2 square 9
1 cube 8
1 one 1
1 six 6
where freq is the frequency (the count of the number of rows) that a value appears in the val column. The value 'prime' appears in four rows, so freq has a value of 4.
What MySQL SELECT query would I use to return a result like this?
Try this:
SELECT A.Freq , A.val , A.id
FROM ( SELECT COUNT(*) AS Freq , val , id
FROM mytable
GROUP BY val , id ) A
ORDER BY Freq DESC ;
EDIT:
As suggested by spencer7593, the id is defined as auto-increment in the table and hence the GROUP BY should not include it. Still, if that would be the case, it is not clear how the result could be as shown. I'm adding here an alternative SELECT that, supposedly, should yield the shown output:
SELECT B.Freq , A.val , A.id
FROM mytable A
INNER JOINT ( SELECT val , COUNT(*) AS Freq
FROM mytable
GROUP BY val) B
ON A.val = B.val
ORDER BY B.Freq DESC ;
[NOTE: This was NOT tested!!!!]

Select column one where column two is unique

I wasn't sure how to word my question, but here we go... With an example of what I'm trying to achieve.
I have a table, which looks like this:
-------------------
X_ID | Y_ID
-------------------
2 | 8
2 | 12
--------------------
I want to return one row per X_ID, which has only one Y_ID for that X_ID.
I don't want to return a row if it has > 1 Y_ID, for a particular X_ID. And, I want to return only one row out of all X_IDs that match the above rule.
Using the table above as an example, I need a query that would return 0 rows for the data in that table.
I need the same query to return 1 row from the following table
-------------------
X_ID | Y_ID
-------------------
2 | 8
2 | 12
3 | 19
3 | 19
-------------------
I need a query that will return one row form this - either of the bottom 2.
I just need the X_ID.
I've tried just about as much as I can think of, using DISTINCT and GROUP BY.
Any ideas?
Oddly, this would be more complicated in MSSQL since it would require you to group on X_ID and Y_ID; but this should work. Normally, the Y_ID in the results would be an effectively random selection of all the Y_ID values found in that group; but since we are specifically filtering out groups with more than one Y_ID, it ends up being the exact Y_ID you need.
SELECT X_ID, Y_ID
FROM (
SELECT X_ID, Y_ID, COUNT(DISTINCT Y_ID) AS yCount
FROM theTable
GROUP BY X_ID
HAVING yCount = 1
) AS subQ
;
I'm not quite as well versed in the intricacies of MSSQL, but I think something this would work (for both MySQL and MSSQL).
SELECT t1.X_ID, t1.Y_ID
FROM theTable AS t1
INNER JOIN (
SELECT X_ID, COUNT(DISTINCT Y_ID) AS yCount
FROM theTable
GROUP BY X_ID
HAVING COUNT(DISTINCT Y_ID) = 1
) AS t2 ON X_ID
;
I say think because I am not 100% sure MSSQL supports COUNT(DISTINCT value).
You can try this. It should do what you want it to do.
SELECT X_ID, Y_ID FROM theTable
GROUP BY X_ID
HAVING COUNT(DISTINCT X_ID, Y_ID) = 1

Select a value from MySQL database only in case it exists only once

Lets say I have a MySQL table that has the following entries:
1
2
3
2
5
6
7
6
6
8
When I do an "SELECT * ..." I get back all the entries. But I want to get back only these entries, that exist only once within the table. Means the rows with the values 2 (exists two times) and 6 (exists three times) have to be dropped completely out of my result.
I found a keyword DISTINCT but as far as I understood it only avoids entries are shown twice, it does not filters them completely.
I think it can be done somehow with COUNT, but all I tried was not really successful. So what is the correct SQL statement here?
Edit: to clarify that, the result I want to get back is
1
3
5
7
8
You can use COUNT() in combination with a GROUP BY and a HAVING clause like this:
SELECT yourCol
FROM yourTable
GROUP BY yourCol
HAVING COUNT(*) < 2
Example fiddle.
You want to mix GROUP BY and COUNT().
Assuming the column is called 'id' and the table is called 'table', the following statement will work:
SELECT * FROM `table` GROUP BY id HAVING COUNT(id) = 1
This will filter out duplicate results entirely (e.g. it'll take out your 2's and 6's)
Three ways. One with GROUP BY and HAVING:
SELECT columnX
FROM tableX
GROUP BY columnX
HAVING COUNT(*) = 1 ;
one with a correlated NOT EXISTS subquery:
SELECT columnX
FROM tableX AS t
WHERE NOT EXISTS
( SELECT *
FROM tableX AS t2
WHERE t2.columnX = t.columnX
AND t2.pk <> t.pk -- pk is the primary key of the table
) ;
and an improvement on the first way (if you have a primary key pk column and an index on (columnX, pk):
SELECT columnX
FROM tableX
GROUP BY columnX
HAVING MIN(pk) = MAX(pk) ;
select id from foo group by id having count(*) < 2;

Adding one extra row to the result of MySQL select query

I have a MySQL table like this
id Name count
1 ABC 1
2 CDF 3
3 FGH 4
using simply select query I get the values as
1 ABC 1
2 CDF 3
3 FGH 4
How I can get the result like this
1 ABC 1
2 CDF 3
3 FGH 4
4 NULL 0
You can see Last row. When Records are finished an extra row in this format
last_id+1, Null ,0 should be added. You can see above. Even I have no such row in my original table. There may be N rows not fixed 3,4
The answer is very simple
select (select max(id) from mytable)+1 as id, NULL as Name, 0 as count union all select id,Name,count from mytable;
This looks a little messy but it should work.
SELECT a.id, b.name, coalesce(b.`count`) as `count`
FROM
(
SELECT 1 as ID
UNION
SELECT 2 as ID
UNION
SELECT 3 as ID
UNION
SELECT 4 as ID
) a LEFT JOIN table1 b
ON a.id = b.id
WHERE a.ID IN (1,2,3,4)
UPDATE 1
You could simply generate a table that have 1 column preferably with name (ID) that has records maybe up 10,000 or more. Then you could simply join it with your table that has the original record. For Example, assuming that you have a table named DummyRecord with 1 column and has 10,000 rows on it
SELECT a.id, b.name, coalesce(b.`count`) as `count`
FROM DummyRecord a LEFT JOIN table1 b
ON a.id = b.id
WHERE a.ID >= 1 AND
a.ID <= 4
that's it. Or if you want to have from 10 to 100, then you could use this condition
...
WHERE a.ID >= 10 AND
a.ID <= 100
To clarify this is how one can append an extra row to the result set
select * from table union select 123 as id,'abc' as name
results
id | name
------------
*** | ***
*** | ***
123 | abc
Simply use mysql ROLLUP.
SELECT * FROM your_table
GROUP BY Name WITH ROLLUP;
select
x.id,
t.name,
ifnull(t.count, 0) as count
from
(SELECT 1 AS id
-- Part of the query below, you will need to generate dynamically,
-- just as you would otherwise need to generate 'in (1,2,3,4)'
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4
UNION ALL SELECT 5
) x
LEFT JOIN YourTable t
ON t.id = x.id
If the id does not exist in the table you're selecting from, you'll need to LEFT JOIN against a list of every id you want returned - this way, it will return the null values for ones that don't exist and the true values for those that do.
I would suggest creating a numbers table that is a single-columned table filled with numbers:
CREATE TABLE `numbers` (
id int(11) unsigned NOT NULL
);
And then inserting a large amount of numbers, starting at 1 and going up to what you think the highest id you'll ever see plus a thousand or so. Maybe go from 1 to 1000000 to be on the safe side. Regardless, you just need to make sure it's more-than-high enough to cover any possible id you'll run into.
After that, your query can look like:
SELECT n.id, a.*
FROM
`numbers` n
LEFT JOIN table t
ON t.id = n.id
WHERE n.id IN (1,2,3,4);
This solution will allow for a dynamically growing list of ids without the need for a sub-query with a list of unions; though, the other solutions provided will equally work for a small known list too (and could also be dynamically generated).

Is it possible to add conditions to a MAX() call in an aggregated query?

Background
My typical use case:
# Table
id category dataUID
---------------------------
0 A (NULL)
1 B (NULL)
2 C text1
3 C text1
4 D text2
5 D text3
# Query
SELECT MAX(`id`) AS `id` FROM `table`
GROUP BY `category`
This is fine; it will strip out any "duplicate categories" in the recordset that's being worked on, giving me the "highest" ID for each category.
I can then go on use this ID to pull out all the data again:
# Query
SELECT * FROM `table` JOIN (
SELECT MAX(`id`) AS `id` FROM `table`
GROUP BY `category`
) _ USING(`id`)
# Result
id category dataUID
---------------------------
0 A (NULL)
1 B (NULL)
3 C text1
5 D text3
Note that this is not the same as:
SELECT MAX(`id`) AS `id`, `category`, `dataUID` FROM `table`
GROUP BY `category`
Per the documentation:
In standard SQL, a query that includes a GROUP BY clause cannot refer
to nonaggregated columns in the select list that are not named in the
GROUP BY clause. For example, this query is illegal in standard SQL
because the name column in the select list does not appear in the
GROUP BY:
SELECT o.custid, c.name, MAX(o.payment) FROM orders AS o, customers
AS c WHERE o.custid = c.custid GROUP BY o.custid;
For the query to be legal, the name column must be omitted from the
select list or named in the GROUP BY clause.
MySQL extends the use of GROUP BY so that the select list can refer to
nonaggregated columns not named in the GROUP BY clause. This means
that the preceding query is legal in MySQL. You can use this feature
to get better performance by avoiding unnecessary column sorting and
grouping. However, this is useful primarily when all values in each
nonaggregated column not named in the GROUP BY are the same for each
group.
[..]
This extension assumes that the nongrouped columns will have the same group-wise values. Otherwise, the result is indeterminate.
So I'd get an unspecified value for dataUID — as an example, either text2 or text3 for result with id 5.
This is actually a problem for other fields in my real case; as it happens, for the dataUID column specifically, generally I don't really care which value I get.
Problem
However!
If any of the rows for a given category has a NULL dataUID, and at least one other row has a non-NULL dataUID, I'd like MAX to ignore the NULL ones.
So:
id category dataUID
---------------------------
4 D text2
5 D (NULL)
At present, since I pick out the row with the maximum ID, I get:
5 D (NULL)
But, because the dataUID is NULL, instead I want:
4 D text2
How can I get this? How can I add conditional logic to the use of aggregate MAX?
I thought of maybe handing MAX a tuple and pulling the id out from it afterwards:
GET_SECOND_PART_SOMEHOW(MAX((IF(`dataUID` NOT NULL, 1, 0), `id`))) AS `id`
But I don't think MAX will accept arbitrary expressions like that, let alone tuples, and I don't know how I'd retrieve the second part of the tuple after-the-fact.
slight tweak to #ypercube's answer. To get the ids you can use
SELECT COALESCE(MAX(CASE
WHEN dataUID IS NOT NULL THEN id
END), MAX(id)) AS id
FROM table
GROUP BY category
And then plug that into a join
This was easier than I thought, in the end, because it turns out MySQL will accept an arbitrary expression inside MAX.
I can get the ordering I want by injecting a leading character into id to serve as an ordering hint:
SUBSTRING(MAX(IF (`dataUID` IS NULL, CONCAT('a',`id`), CONCAT('b',`id`))) FROM 2)
Walk-through:
id category dataUID IF (`dataUID` IS NULL, CONCAT('a',`id`), CONCAT('b',`id`)
--------------------------------------------------------------------------------------
0 A (NULL) a0
1 B (NULL) a1
2 C text1 b2
3 C text1 b3
4 D text2 b4
5 D (NULL) a5
So:
SELECT
`category`, MAX(IF (`dataUID` IS NULL, CONCAT('a',`id`), CONCAT('b',`id`)) AS `max_id_with_hint`
FROM `table`
GROUP BY `category`
category max_id_with_hint
------------------------------
A a0
B a1
C b3
D b4
It's then a simple matter to chop the ordering hint off again.
Thanks in particular to #JlStone for setting me, via COALESCE, on the path to embedding expressions inside the call to MAX and directly manipulating the values supplied to MAX.
From what I can remember you can use COALESCE inside of grouping statements. For example.
SELECT MAX(COALESCE(`id`,1)) ...
hm seems I read to quickly the first time. I think maybe you want something like this?
SELECT * FROM `table` JOIN (
SELECT MAX(`id`) AS `id` FROM `table`
WHERE `dataUID` IS NOT NULL
GROUP BY `category`
) _ USING(`id`)
or perhaps
SELECT MAX(`id`) AS `id`,
COALESCE (`dataUID`, 0) as `dataUID`
FROM `table`
GROUP BY `category`
select *
from t1
join (
select max(id) as id,
max(if(dataGUID is NULL, NULL, id)) as fallbackid,
category
from t1 group by category) as ids
on if(ids.id = fallbackid or fallbackid is null, id, fallbackid) = t1.id;
SELECT t.*
FROM table AS t
JOIN
( SELECT DISTINCT category
FROM table
) AS tdc
ON t.id =
COALESCE(
( SELECT MAX(id) AS id
FROM table
WHERE category = tdc.category
AND dataUID IS NOT NULL
)
, ( SELECT MAX(id) AS id
FROM table
WHERE category = tdc.category
AND dataUID IS NULL
)
)
you need clause OVER
SELECT id, category,dataUID
FROM
(
SELECT ROW_NUMBER() OVER (PARTITION BY category ORDER BY id desc, dataUID desc ) rn,
id, category,dataUID FROM table
) q
WHERE rn=1
Consider that sorting by desc moves null values at last.