How to bypass a reference to an outer table in subquery? - mysql

I've been dealing with these two tables:
Document
id company_id etc
=======================
1 2 x
2 2 x
Version
id document_id version date_created date_issued date_accepted
==========================================================================
1 1 1 2013-04-29 2013-04-30 NULL
2 2 1 2013-05-01 NULL NULL
3 1 2 2013-05-01 2013-05-01 2013-05-03
There's a page where I want to list all documents with their attributes.
And I would like to add a single have status from each document.
The status can be derived from the most present date that corresponding Versions have.
It is possible that an older version is being accepted.
The query result I am looking for is like this:
id company_id etc status
==================================
1 2 x accepted
2 2 x created
I started out by making a query which combines all dates and add a status next to it.
It works as expected and when I add the document_id things look alright.
SELECT `status`
FROM (
SELECT max(date_created) as `date`,'created' as `status` FROM version WHERE document_id = 1
UNION
SELECT max(date_issued),'issued' FROM version WHERE document_id = 1
UNION
SELECT max(date_accepted),'accepted' FROM version WHERE document_id = 1
ORDER BY date DESC
LIMIT 1
) as maxi
When I try to incorporate this query as a subquery, I can't make it work.
SELECT *, (
SELECT `status` FROM (
SELECT max(date_created) as `date`,'created' as `status`FROM version WHERE document_id = document.id
UNION
SELECT max(date_issued),'issued' FROM version WHERE document_id = document.id
UNION
SELECT max(date_accepted),'accepted' FROM version WHERE document_id = document.id
ORDER BY date DESC
LIMIT 1
) as maxi
) as `status`
FROM `document`
This will get me the error Unknown column 'document.id' in 'where clause'. So I've read around at SO and figured it simply can't reach the value offer.id since it's a subquery in a subquery. So I tried to take another approach and get all the statuses at once, to avoid the WHERE statement, and JOIN them. I ended up with the next query.
SELECT MAX(`date`),`status`, document_id
FROM (
SELECT datetime_created as `date`, 'created' as `status`,document_id FROM `version`
UNION
SELECT datetime_issued, 'issued',document_id FROM `version`
UNION
SELECT datetime_accepted, 'accepted',document_id FROM `version`
) as dates
GROUP BY offer_id
No error this time but I realized that the status couldn't be the correct one since it got lost during the GROUP BY. I've tried combinations of the two but both flaws keep hindering me. Could any one suggest how to do this in a single query without changing my database? (I know that saving the dates in a separate table would simply things)

I have not tested this, but you can do it like this (you might need to tweak the details)
It is basically looking at it from a completely different angle.
select
d.*,
(CASE GREATEST(ifnull(v.date_created, 0), ifnull(v.date_issued,0), ifnull(v.date_accepted,0) )
WHEN null THEN 'unknown'
WHEN v.date_accepted THEN 'accepted'
WHEN v.date_issued THEN 'issued'
WHEN v.date_created THEN 'created'
END) as status
from document d
left join version v on
v.document_id = d.document_id and
not exists (select 1 from (select * from version) x where x.document_id = v.document_id and x.id <> v.id and x.version > v.version)

Can you normalise your table designs to move the status / dates onto a different table from the Versions?
If no possibly something like this:-
SELECT Document.id, Document.company_id, Document.etc, CASE WHEN Sub1.status = 3 THEN 'accepted' WHEN Sub1.status = 2 THEN 'issued' WHEN Sub1.status = 1 THEN 'created' ELSE NULL END AS status
FROM Document
INNER JOIN (
SELECT document_id, MAX(CASE WHEN date_accepted IS NOT NULL THEN 3 WHEN date_issued IS NOT NULL THEN 2 WHEN date_created IS NOT NULL THEN 1 ELSE NULL END) AS status
FROM Version
GROUP BY document_id
) Sub1
ON Document.id = Sub1.document_id
The subselect gets the highest status for any document from the version table. Each possible versions highest status is returned as a number, and by grouping that on the document id it will get the highest status of any version. This is joined back against the Document table and the number for the version number converted into the text description.

select Doc.document_id,Doc.company_id,Doc.etc,f.status
from Document Doc
inner join
(select Ver.document_id,
case when Ver.date_accepted is not null then 'Accepted'
when Ver.date_issued is not null then 'Issued'
when Ver.date_created is not null then 'Created'
end as status
from version Ver
inner join (
select document_id,MAX(version) VersionId
from version
group by document_id
)t on t.document_id=Ver.document_id
where t.VersionId=Ver.version
)f on Doc.document_id=f.document_id
SQL Fiddle

Related

How to simulate a N-dimension INTERSECT with generic SQL

Here is an example of 2 extract from the same table:
SELECT source_id
FROM table_cust_string_value
WHERE cust_string_id=2 AND VALUE LIKE '%TATA%';
SELECT source_id
FROM table_cust_string_value
WHERE cust_string_id=4 AND VALUE LIKE '%TUTU%';
They give 2 sets of source_id.
Right. Now if I need an intersect of those with MySQL (where INTERSECT does not exist) I found this way:
SELECT DISTINCT source_id
FROM (
SELECT source_id
FROM table_cust_string_value
WHERE cust_string_id=2 AND VALUE LIKE '%TATA%'
) t1
INNER JOIN (
SELECT source_id
FROM table_cust_string_value
WHERE cust_string_id=4 AND VALUE LIKE '%TUTU%'
) t2
USING (source_id);
but what if I need to do this from N sets ?
I can't find a solution + I'm worried about the perf. of doing it this way
You can use a grouping approach. Depending on what indexes you have available this might work out better.
SELECT source_id
FROM table_cust_string_value
WHERE cust_string_id IN ( 2, 4 )
GROUP BY source_id
HAVING MAX(CASE WHEN cust_string_id = 2 AND VALUE LIKE '%TATA%' THEN 1 END) = 1
AND MAX(CASE WHEN cust_string_id = 4 AND VALUE LIKE '%TUTU%' THEN 1 END) = 1

SQL/MySQL DELETE all rows EXCEPT 2 of them

I have a database table setup like this:
id | code | group_id | status ---
---|-------|---------|------------
1 | abcd1 | group_1 | available
2 | abcd2 | group_1 | available
3 | adsd3 | group_1 | available
4 | dfgd4 | group_1 | available
5 | vfcd5 | group_1 | available
6 | bgcd6 | group_2 | available
7 | abcd7 | group_2 | available
8 | ahgf8 | group_2 | available
9 | dfgd9 | group_2 | available
10 | qwer6 | group_2 | available
In the example above, each group_id has 5 total rows (arbitrary for example, total rows will be dynamic and vary), I need to remove every row that matches available in status except for 2 of them (which 2 does not matter, as long as there are 2 of them remaining)
Basically every unique group_id should only have 2 total rows with status of available. I am able to do a simple SQL query to remove all of them, but struggling to come up with a SQL query to remove all except for 2 ... please helppppp :)
If code is unique, you can use subqueries to keep the "min" and "max"
DELETE FROM t
WHERE t.status = 'available'
AND (t.group_id, t.code) NOT IN (
SELECT group_id, MAX(code)
FROM t
WHERE status = 'available'
GROUP BY group_id
)
AND (t.group_id, t.code) NOT IN (
SELECT group_id, MIN(code)
FROM t
WHERE status = 'available'
GROUP BY group_id
)
Similarly, with an auto increment id:
DELETE FROM t
WHERE t.status = 'available'
AND t.id NOT IN (
SELECT MAX(id) FROM t WHERE status = 'available' GROUP BY group_id
UNION
SELECT MIN(id) FROM t WHERE status = 'available' GROUP BY group_id
)
I reworked the subquery into a UNION instead in this version, but the "AND" format would work just as well too. Also, if "code" was unique across the whole table, the NOT IN could be simplified down to excluding the group_id as well (though it would still be needed in the subqueries' GROUP BY clauses).
Edit: MySQL doesn't like subqueries referencing tables being UPDATEd/DELETEd in the WHERE of the query doing the UPDATE/DELETE; in those cases, you can usually double-wrap the subquery to give it an alias, causing MySQL to treat it as a temporary table (behind the scenes).
DELETE FROM t
WHERE t.status = 'available'
AND t.id NOT IN (
SELECT * FROM (
SELECT MAX(id) FROM t WHERE status = 'available' GROUP BY group_id
UNION
SELECT MIN(id) FROM t WHERE status = 'available' GROUP BY group_id
) AS a
)
Another alternative, I don't recall if MySQL complains as much about joins in DELETE/UPDATE....
DELETE t
FROM t
LEFT JOIN (
SELECT MIN(id) AS minId, MAX(id) AS maxId, 1 AS keep_flag
FROM t
WHERE status = 'available'
GROUP BY group_id
) AS tKeep ON t.id IN (tKeep.minId, tKeep.maxId)
WHERE t.status = 'available'
AND tKeep.keep_flag IS NULL
To keep the min and max ids, I think a join is the simplest solution:
DELETE t
FROM t LEFT JOIN
(SELECT group_id, MIN(id) as min_id, MAX(id) as max_id
FROM t
WHERE t.status = 'available'
GROUP BY group_id
) tt
ON t.id IN (tt.min_id, tt.max_id)
WHERE t.status = 'available' AND
tt.group_id IS NULL;
If the column "id" is the PRIMARY KEY or a UNIQUE KEY, then we could use a correlated subquery to get the second lowest value for a particular group_id.
We could then use that to identify rows for group_id that have higher values of the "id" column.
A query something like this:
SELECT t.`id`
, t.`group_id`
FROM `setup_like_this` t
WHERE t.`status` = 'available'
AND t.`id`
> ( SELECT s.`id`
FROM `setup_like_this` s
WHERE s.`status` = 'available'
AND s.`group_id` = t.`group_id`
ORDER
BY s.`id`
LIMIT 1,1
)
We test that as a SELECT first, to examine the rows that are returned. When we are satisfied this query is returning the set of rows we want to delete, we can replace SELECT ... FROM with DELETE t.* FROM to convert it to a DELETE statement to remove the rows.
Error 1093 encountered converting to DELETE statement.
One workaround is to make the query above into a inline view, and then join to the target table
DELETE q.*
FROM `setup_like_this` q
JOIN ( -- inline view, query from above returns `id` of rows we want to delete
SELECT t.`id`
, t.`group_id`
FROM `setup_like_this` t
WHERE t.`status` = 'available'
AND t.`id`
> ( SELECT s.`id`
FROM `setup_like_this` s
WHERE s.`status` = 'available'
AND s.`group_id` = t.`group_id`
ORDER
BY s.`id`
LIMIT 1,1
)
) r
ON r.id = q.id
select id, code, group_id, status
from (
select id, code, group_id, status
, ROW_NUMBER() OVER (
PARTITION BY group_id
ORDER BY id DESC) row_num
) rownum
from a
) q
where rownum < 3

Is there a simpler way to find MODE(S) of some values in MySQL

MODE is the value that occurs the MOST times in the data, there can be ONE MODE or MANY MODES
here's some values in two tables (sqlFiddle)
create table t100(id int auto_increment primary key, value int);
create table t200(id int auto_increment primary key, value int);
insert into t100(value) values (1),
(2),(2),(2),
(3),(3),
(4);
insert into t200(value) values (1),
(2),(2),(2),
(3),(3),
(4),(4),(4);
right now, to get the MODE(S) returned as comma separated list, I run the below query for table t100
SELECT GROUP_CONCAT(value) as modes,occurs
FROM
(SELECT value,occurs FROM
(SELECT value,count(*) as occurs
FROM
T100
GROUP BY value)T1,
(SELECT max(occurs) as maxoccurs FROM
(SELECT value,count(*) as occurs
FROM
T100
GROUP BY value)T2
)T3
WHERE T1.occurs = T3.maxoccurs)T4
GROUP BY occurs;
and the below query for table t200 (same query just with table name changed) I have 2 tables in this example because to show that it works for cases where there's 1 MODE and where there are multiple MODES.
SELECT GROUP_CONCAT(value) as modes,occurs
FROM
(SELECT value,occurs FROM
(SELECT value,count(*) as occurs
FROM
T200
GROUP BY value)T1,
(SELECT max(occurs) as maxoccurs FROM
(SELECT value,count(*) as occurs
FROM
T200
GROUP BY value)T2
)T3
WHERE T1.occurs = T3.maxoccurs)T4
GROUP BY occurs;
My question is "Is there a simpler way?"
I was thinking like using HAVING count(*) = max(count(*)) or something similar to get rid of the extra join but couldn't get HAVING to return the result i wanted.
UPDATED:
as suggested by #zneak, I can simplify T3 like below:
SELECT GROUP_CONCAT(value) as modes,occurs
FROM
(SELECT value,occurs FROM
(SELECT value,count(*) as occurs
FROM
T200
GROUP BY value)T1,
(SELECT count(*) as maxoccurs
FROM
T200
GROUP BY value
ORDER BY count(*) DESC
LIMIT 1
)T3
WHERE T1.occurs = T3.maxoccurs)T4
GROUP BY occurs;
Now is there a way to get ride of T3 altogether?
I tried this but it returns no rows for some reason
SELECT value,occurs FROM
(SELECT value,count(*) as occurs
FROM t200
GROUP BY `value`)T1
HAVING occurs=max(occurs)
basically I am wondering if there's a way to do it such that I only need to specify t100 or t200 once.
UPDATED: i found a way to specify t100 or t200 only once by adding a variable to set my own maxoccurs like below
SELECT GROUP_CONCAT(CASE WHEN occurs=#maxoccurs THEN value ELSE NULL END) as modes
FROM
(SELECT value,occurs,#maxoccurs:=GREATEST(#maxoccurs,occurs) as maxoccurs
FROM (SELECT value,count(*) as occurs
FROM t200
GROUP BY `value`)T1,(SELECT #maxoccurs:=0)mo
)T2
You are very close with the last query. The following finds one mode:
SELECT value, occurs
FROM (SELECT value,count(*) as occurs
FROM t200
GROUP BY `value`
LIMIT 1
) T1
I think your question was about multiple modes, though:
SELECT value, occurs
FROM (SELECT value, count(*) as occurs
FROM t200
GROUP BY `value`
) T1
WHERE occurs = (select max(occurs)
from (select `value`, count(*) as occurs
from t200
group by `value`
) t
);
EDIT:
This is much easier in almost any other database. MySQL supports neither with nor window/analytic functions.
Your query (shown below) does not do what you think it is doing:
SELECT value, occurs
FROM (SELECT value, count(*) as occurs
FROM t200
GROUP BY `value`
) T1
HAVING occurs = max(occurs) ;
The final having clause refers to the variable occurs but does use max(occurs). Because of the use of max(occurs) this is an aggregation query that returns one row, summarizing all rows from the subquery.
The variable occurs is not using for grouping. So, what value does MySQL use? It uses an arbitrary value from one of the rows in the subquery. This arbitrary value might match, or it might not. But, the value only comes from one row. There is no iteration over it.
I realize this is a very old question but in looking for the best way to find the MODE in a MySQL table, I came up with this:
SELECT [column name], count(*) as [ccount] FROM [table] WHERE [field] = [item] GROUP BY [column name] ORDER BY [ccount] DESC LIMIT 1 ;
In my actual situation, I had a log with recorded events in it. I wanted to know during which period (1, 2 or 3 as recorded in my log) the specific event occurred the most number of times. (Eg, the MODE of "period" column of the table for that specific event
My table looked like this (abridged):
EVENT_TYPE | PERIOD
-------------------------
1 | 3
1 | 3
1 | 3
1 | 2
2 | 1
2 | 1
2 | 1
2 | 3
Using the query:
SELECT event_type, period, count(*) as pcount FROM proto_log WHERE event_type = 1 GROUP BY period ORDER BY pcount DESC LIMIT 1 ;
I get the result:
> EVENT_TYPE | PERIOD | PCOUNT
> --------------------------------------
1 | 3 | 3
Using this result, the period column ($result['period'] for example) should contain the MODE for that query and of course pcount contains the actual count.
If you wanted to get multiple modes, I suppse you could keep adding other criteria to your WHERE clause using ORs:
SELECT event_type, period, count(*) as pcount FROM proto_log WHERE event_type = 1 ***OR event_type = 2*** GROUP BY period ORDER BY pcount DESC LIMIT 2 ;
The multiple ORs should give you the additional results and the LIMIT increase will add the additional MODES to the results. (Otherwise it will still only show the top 1 result)
Results:
EVENT_TYPE | PERIOD | PCOUNT
--------------------------------------
1 | 3 | 3
2 | 1 | 3
I am not 100% sure this is doing exactly what I think it is doing, or if it will work in all situations, so please let me know if I am on or off track here.

MySQL sorting by date with GROUP BY

My table titles looks like this
id |group|date |title
---+-----+--------------------+--------
1 |1 |2012-07-26 18:59:30 | Title 1
2 |1 |2012-07-26 19:01:20 | Title 2
3 |2 |2012-07-26 19:18:15 | Title 3
4 |2 |2012-07-26 20:09:28 | Title 4
5 |2 |2012-07-26 23:59:52 | Title 5
I need latest result from each group ordered by date in descending order. Something like this
id |group|date |title
---+-----+--------------------+--------
5 |2 |2012-07-26 23:59:52 | Title 5
2 |1 |2012-07-26 19:01:20 | Title 2
I tried
SELECT *
FROM `titles`
GROUP BY `group`
ORDER BY MAX( `date` ) DESC
but I'm geting first results from groups. Like this
id |group|date |title
---+-----+--------------------+--------
3 |2 |2012-07-26 18:59:30 | Title 3
1 |1 |2012-07-26 19:18:15 | Title 1
What am I doing wrong?
Is this query going to be more complicated if I use LEFT JOIN?
This page was very helpful to me; it taught me how to use self-joins to get the max/min/something-n rows per group.
In your situation, it can be applied to the effect you want like so:
SELECT * FROM
(SELECT group, MAX(date) AS date FROM titles GROUP BY group)
AS x JOIN titles USING (group, date);
I found this topic via Google, looked like I had the same issue.
Here's my own solution if, like me, you don't like subqueries :
-- Create a temporary table like the output
CREATE TEMPORARY TABLE titles_tmp LIKE titles;
-- Add a unique key on where you want to GROUP BY
ALTER TABLE titles_tmp ADD UNIQUE KEY `group` (`group`);
-- Read the result into the tmp_table. Duplicates won't be inserted.
INSERT IGNORE INTO titles_tmp
SELECT *
FROM `titles`
ORDER BY `date` DESC;
-- Read the temporary table as output
SELECT *
FROM titles_tmp
ORDER BY `group`;
It has a way better performance. Here's how to increase speed if the date_column has the same order as the auto_increment_one (you then don't need an ORDER BY statement) :
-- Create a temporary table like the output
CREATE TEMPORARY TABLE titles_tmp LIKE titles;
-- Add a unique key on where you want to GROUP BY
ALTER TABLE titles_tmp ADD UNIQUE KEY `group` (`group`);
-- Read the result into the tmp_table, in the natural order. Duplicates will update the temporary table with the freshest information.
INSERT INTO titles_tmp
SELECT *
FROM `titles`
ON DUPLICATE KEY
UPDATE `id` = VALUES(`id`),
`date` = VALUES(`date`),
`title` = VALUES(`title`);
-- Read the temporary table as output
SELECT *
FROM titles_tmp
ORDER BY `group`;
Result :
+----+-------+---------------------+---------+
| id | group | date | title |
+----+-------+---------------------+---------+
| 2 | 1 | 2012-07-26 19:01:20 | Title 2 |
| 5 | 2 | 2012-07-26 23:59:52 | Title 5 |
+----+-------+---------------------+---------+
On large tables this method makes a significant point in terms of performance.
Well, if dates are unique in a group this would work (if not, you'll see several rows that match the max date in a group). (Also, bad naming of columns, 'group', 'date' might give you syntax errors and such specially 'group')
select t1.* from titles t1, (select group, max(date) date from titles group by group) t2
where t2.date = t1.date
and t1.group = t2.group
order by date desc
Another approach is to make use of MySQL user variables to identify a "control break" in the group values.
If you can live with an extra column being returned, something like this will work:
SELECT IF(s.group = #prev_group,0,1) AS latest_in_group
, s.id
, #prev_group := s.group AS `group`
, s.date
, s.title
FROM (SELECT t.id,t.group,t.date,t.title
FROM titles t
ORDER BY t.group DESC, t.date DESC, t.id DESC
) s
JOIN (SELECT #prev_group := NULL) p
HAVING latest_in_group = 1
ORDER BY s.group DESC
What this is doing is ordering all the rows by group and by date in descending order. (We specify DESC on all the columns in the ORDER BY, in case there is an index on (group,date,id) that MySQL can do a "reverse scan" on. The inclusion of the id column gets us deterministic (repeatable) behavior, in the case when there are more than one row with the latest date value.) That's the inline view aliased as s.
The "trick" we use is to compare the group value to the group value from the previous row. Whenever we have a different value, we know that we are starting a "new" group, and that this row is the "latest" row (we have the IF function return a 1). Otherwise (when the group values match), it's not the latest row (and we have the IF function returns a 0).
Then, we filter out all the rows that don't have that latest_in_group set as a 1.
It's possible to remove that extra column by wrapping that query (as an inline view) in another query:
SELECT r.id
, r.group
, r.date
, r.title
FROM ( SELECT IF(s.group = #prev_group,0,1) AS latest_in_group
, s.id
, #prev_group := s.group AS `group`
, s.date
, s.title
FROM (SELECT t.id,t.group,t.date,t.title
FROM titles t
ORDER BY t.group DESC, t.date DESC, t.id DESC
) s
JOIN (SELECT #prev_group := NULL) p
HAVING latest_in_group = 1
) r
ORDER BY r.group DESC
If your id field is an auto-incrementing field, and it's safe to say that the highest value of the id field is also the highest value for the date of any group, then this is a simple solution:
SELECT b.*
FROM (SELECT MAX(id) AS maxid FROM titles GROUP BY group) a
JOIN titles b ON a.maxid = b.id
ORDER BY b.date DESC
Use the below mysql query to get latest updated/inserted record from table.
SELECT * FROM
(
select * from `titles` order by `date` desc
) as tmp_table
group by `group`
order by `date` desc
Use the following query to get the most recent record from each group
SELECT
T1.* FROM
(SELECT
MAX(ID) AS maxID
FROM
T2
GROUP BY Type) AS aux
INNER JOIN
T2 AS T2 ON T1.ID = aux.maxID ;
Where ID is your auto increment field and Type is the type of records, you wanted to group by.
MySQL uses an dumb extension of GROUP BY which is not reliable if you want to get such results therefore, you could use
select id, group, date, title from titles as t where id =
(select id from titles where group = a.group order by date desc limit 1);
In this query, each time the table is scanned full for each group so it can find the most recent date. I could not find any better alternate for this. Hope this will help someone.

Is it possible to add conditions to a MAX() call in an aggregated query?

Background
My typical use case:
# Table
id category dataUID
---------------------------
0 A (NULL)
1 B (NULL)
2 C text1
3 C text1
4 D text2
5 D text3
# Query
SELECT MAX(`id`) AS `id` FROM `table`
GROUP BY `category`
This is fine; it will strip out any "duplicate categories" in the recordset that's being worked on, giving me the "highest" ID for each category.
I can then go on use this ID to pull out all the data again:
# Query
SELECT * FROM `table` JOIN (
SELECT MAX(`id`) AS `id` FROM `table`
GROUP BY `category`
) _ USING(`id`)
# Result
id category dataUID
---------------------------
0 A (NULL)
1 B (NULL)
3 C text1
5 D text3
Note that this is not the same as:
SELECT MAX(`id`) AS `id`, `category`, `dataUID` FROM `table`
GROUP BY `category`
Per the documentation:
In standard SQL, a query that includes a GROUP BY clause cannot refer
to nonaggregated columns in the select list that are not named in the
GROUP BY clause. For example, this query is illegal in standard SQL
because the name column in the select list does not appear in the
GROUP BY:
SELECT o.custid, c.name, MAX(o.payment) FROM orders AS o, customers
AS c WHERE o.custid = c.custid GROUP BY o.custid;
For the query to be legal, the name column must be omitted from the
select list or named in the GROUP BY clause.
MySQL extends the use of GROUP BY so that the select list can refer to
nonaggregated columns not named in the GROUP BY clause. This means
that the preceding query is legal in MySQL. You can use this feature
to get better performance by avoiding unnecessary column sorting and
grouping. However, this is useful primarily when all values in each
nonaggregated column not named in the GROUP BY are the same for each
group.
[..]
This extension assumes that the nongrouped columns will have the same group-wise values. Otherwise, the result is indeterminate.
So I'd get an unspecified value for dataUID — as an example, either text2 or text3 for result with id 5.
This is actually a problem for other fields in my real case; as it happens, for the dataUID column specifically, generally I don't really care which value I get.
Problem
However!
If any of the rows for a given category has a NULL dataUID, and at least one other row has a non-NULL dataUID, I'd like MAX to ignore the NULL ones.
So:
id category dataUID
---------------------------
4 D text2
5 D (NULL)
At present, since I pick out the row with the maximum ID, I get:
5 D (NULL)
But, because the dataUID is NULL, instead I want:
4 D text2
How can I get this? How can I add conditional logic to the use of aggregate MAX?
I thought of maybe handing MAX a tuple and pulling the id out from it afterwards:
GET_SECOND_PART_SOMEHOW(MAX((IF(`dataUID` NOT NULL, 1, 0), `id`))) AS `id`
But I don't think MAX will accept arbitrary expressions like that, let alone tuples, and I don't know how I'd retrieve the second part of the tuple after-the-fact.
slight tweak to #ypercube's answer. To get the ids you can use
SELECT COALESCE(MAX(CASE
WHEN dataUID IS NOT NULL THEN id
END), MAX(id)) AS id
FROM table
GROUP BY category
And then plug that into a join
This was easier than I thought, in the end, because it turns out MySQL will accept an arbitrary expression inside MAX.
I can get the ordering I want by injecting a leading character into id to serve as an ordering hint:
SUBSTRING(MAX(IF (`dataUID` IS NULL, CONCAT('a',`id`), CONCAT('b',`id`))) FROM 2)
Walk-through:
id category dataUID IF (`dataUID` IS NULL, CONCAT('a',`id`), CONCAT('b',`id`)
--------------------------------------------------------------------------------------
0 A (NULL) a0
1 B (NULL) a1
2 C text1 b2
3 C text1 b3
4 D text2 b4
5 D (NULL) a5
So:
SELECT
`category`, MAX(IF (`dataUID` IS NULL, CONCAT('a',`id`), CONCAT('b',`id`)) AS `max_id_with_hint`
FROM `table`
GROUP BY `category`
category max_id_with_hint
------------------------------
A a0
B a1
C b3
D b4
It's then a simple matter to chop the ordering hint off again.
Thanks in particular to #JlStone for setting me, via COALESCE, on the path to embedding expressions inside the call to MAX and directly manipulating the values supplied to MAX.
From what I can remember you can use COALESCE inside of grouping statements. For example.
SELECT MAX(COALESCE(`id`,1)) ...
hm seems I read to quickly the first time. I think maybe you want something like this?
SELECT * FROM `table` JOIN (
SELECT MAX(`id`) AS `id` FROM `table`
WHERE `dataUID` IS NOT NULL
GROUP BY `category`
) _ USING(`id`)
or perhaps
SELECT MAX(`id`) AS `id`,
COALESCE (`dataUID`, 0) as `dataUID`
FROM `table`
GROUP BY `category`
select *
from t1
join (
select max(id) as id,
max(if(dataGUID is NULL, NULL, id)) as fallbackid,
category
from t1 group by category) as ids
on if(ids.id = fallbackid or fallbackid is null, id, fallbackid) = t1.id;
SELECT t.*
FROM table AS t
JOIN
( SELECT DISTINCT category
FROM table
) AS tdc
ON t.id =
COALESCE(
( SELECT MAX(id) AS id
FROM table
WHERE category = tdc.category
AND dataUID IS NOT NULL
)
, ( SELECT MAX(id) AS id
FROM table
WHERE category = tdc.category
AND dataUID IS NULL
)
)
you need clause OVER
SELECT id, category,dataUID
FROM
(
SELECT ROW_NUMBER() OVER (PARTITION BY category ORDER BY id desc, dataUID desc ) rn,
id, category,dataUID FROM table
) q
WHERE rn=1
Consider that sorting by desc moves null values at last.