GROUP BY a specific order? - mysql

I am trying to do a GROUP BY statement, with the grouped by column showing the item with the newest timestamp. However, I don't think it's possible to order BEFORE a GROUP BY statement. Is the following subselect the only way to do what I'm trying to accomplish?
SELECT thread_id, content, timestamp FROM
(
SELECT thread_id, content, timestamp FROM messaging_message
ORDER BY thread_id, timestamp desc
) combined
GROUP BY thread_id
Note that for a given thread_id, there may be multiple messages associated to it, and thus multiple content and timestamps for each thread_id.

If I understand correctly and you want the most recent content per thread_id, use a MAX() aggregate to find the timestamp, and JOIN against it :
SELECT thread_id, content, timestamp
FROM
messaging_message m
JOIN (
SELECT thread_id, MAX(timestamp) AS maxts
FROM messaging_message
GROUP BY thread_id
) maxt ON m.thread_id = maxt.thread_id AND m.timestamp = maxt.maxts
The ORDER BY doesn't come into play at all by this method. It's all done by grouping.
MySQL, unlike other RDBMS doesn't strictly require you to have every SELECT column accounted for in the GROUP BY, so you could probably just do
SELECT thread_id, content, MAX(timestamp) AS maxts FROM messaging_message GROUP BY thread_id
However, that isn't portable and so I don't recommend it. Instead, the JOIN subquery returns the pair of timestamp and thread_id. Those are used to match up against the related content and any other columns you may need from the row. If you had a unique id on each row, you could also make a subquery which returns only the id for each MAX(timestamp) and use it inside an IN(). But absent that unique id, a join against the thread_id, timestamp pair does the job.

Related

Proper way to use MySQL GROUP BY for returning one result from a referenced table

I often have a situation with two tables in MySQL where I need one record for each foreign key. For example:
table post {id, ...}
table comment {id, post_id, ...}
SELECT * FROM comment GROUP BY post_id ORDER BY id ASC
-- Oldest comment for each post
or
table client {id, ...}
table payment {id, client_id, ...}
SELECT * FROM payment GROUP BY client_id ORDER BY id DESC
-- Most recent payment from each client
These queries often fail because the "SELECT list is not in GROUP BY clause" and contains nonaggregated columns.
Failed Solutions
I can usually work around this with a min()/max() but that creates a very slow query with mis-matched results (row with min(id) isn't equal to row with min(textfield))
SELECT min(id), min(textfield), ... FROM table GROUP BY fk_id
Adding all the columns to GROUP BY results in duplicate records (from the fk_id) which defeats the purpose of GROUP BY.
SELECT id, textfield, ... FROM table GROUP BY fk_id, id, textfield
Same idea as #GurV but using a join instead of a correlated subquery. The basic idea here is that the subquery finds, for each post which has comments, the oldest post and its corresponding id in the comments table. We then join back to comments again to restrict to the records we want.
SELECT t1.*
FROM comments t1
INNER JOIN
(
SELECT post_id, MIN(id) AS min_id
FROM comments
GROUP BY post_id
) t2
ON t1.post_id = t2.post_id AND
t1.id = t2.min_id
You can use a correlated query with aggregation to find out the earliest comment for each post:
select *
from comments c1
where id = (
select min(id)
from comments c2
where c1.post_id = c2.post_id
)
Compound index - comments(id, post_id) should be helpful.
If you are querying the whole table with many rows, then it will. This query is more useful and performant if you are querying for a small subset of posts. If you are querying the whole table, then #Tim's answer is better suited I think.

GROUP BY in subquery to get accurate ranking

I'm trying to get the rank of a particular lap time of a specific track owned by a particular user.
There are multiple rows (laps) in this table for a specific user. So I'm trying to GROUP BY as seen in the subquery of FIND_IN_SET.
Right now MySQL (latest version) is complaining that my session_id,user_id,track_id,duration are not aggregated for the GROUP BY.
Which I don't understand why its complaining about this since the GROUP BY is in a subquery.
session_lap_times schema:
session_id, int
user_id, int
track_id, int
duration, decimal
This is what I've got so far.
SELECT
session_id
user_id,
track_id,
duration,
FIND_IN_SET( duration,
(SELECT GROUP_CONCAT( duration ORDER BY duration ASC ) FROM
(SELECT user_id,track_id,min(duration)
FROM session_lap_times GROUP BY user_id,track_id) AS aa WHERE track_id=s1.track_id)
) as ranking
FROM session_lap_times s1
WHERE user_id=1
It seems like its trying to enforce the group by rules on the parent queries as well.
For reference, this is the error I'm getting: http://imgur.com/a/ILufE
Any help is greatly appreciated.
If I'm not mistaken, the problem is here (broken out for clarity):
SELECT user_id,track_id,any_value(duration)
FROM session_lap_times
GROUP BY user_id
The query is probably barfing because track_id is in the select and not in the group by. That means the subselect doesn't stand on its own and makes the whole thing fail.
Try adding track_id to your group by and adjust from there.
You are grouping by user_id but you do not do any aggregation in select or having in the following sub-query
SELECT
user_id,any_value(track_id),any_value(duration)
FROM session_lap_times GROUP BY user_id
You are using GROUP_CONCAT in a wrong context in the following sub-query because you do not group any column in ranking temporary table.
(SELECT GROUP_CONCAT( duration ORDER BY duration ASC ) FROM
(SELECT user_id,track_id,any_value(duration)
FROM session_lap_times GROUP BY user_id,track_id) AS aa WHERE track_id=s1.track_id)
) as ranking

Do I need inner ORDER BY when there is an outer ORDER BY?

Here is my query:
( SELECT id, table_code, seen, date_time FROM events
WHERE author_id = ? AND seen IS NULL
) UNION
( SELECT id, table_code, seen, date_time FROM events
WHERE author_id = ? AND seen IS NOT NULL
LIMIT 2
) UNION
( SELECT id, table_code, seen, date_time FROM events
WHERE author_id = ?
ORDER BY (seen IS NULL) desc, date_time desc -- inner ORDER BY
LIMIT 15
)
ORDER BY (seen IS NULL) desc, date_time desc; -- outer ORDER BY
As you see there is an outer ORDER BY and also one of those subqueries has its own ORDER BY. I believe that ORDER BY in subquery is useless because final result will be sorted by that outer one. Am I right? Or that inner ORDER BY has any effect on the sorting?
Also my second question about query above: in reality I just need id and table_code. I've selected seen and date_time just for that outer ORDER BY, Can I do that better?
You need the inner order by when you have a limit in the query. So, the third subquery is choosing 15 rows based on the order by.
In general, when you have limit, you should be using order by. This is particularly true if you are learning databases. You might seem to get the right answer -- and then be very surprised when it doesn't work at some later point in time. Just because something seems to work doesn't mean that it is guaranteed to work.
The outer order by just sorts all the rows returned by the subqueries.

SQL find distinct and show other columns

I have read many replies and to similar questions but cannot seem to apply it to my situation. I have a table that averages 10,000 records and is ever changing. It containing a column called deviceID which has about 20 unique values, another called dateAndTime and many others including status1 and status2. I need to isolate one instance each deviceID, showing the record that had the most current dateAndTime. This works great using:
select DISTINCT deviceID, MAX(dateAndTime)
from MyTable
Group By deviceID
ORDER BY MAX(dateAndTime) DESC
(I have noticed omitting DISTINCT from the above statement also yields the same result)
However, I cannot expand this statement to include the fields status fields without incurring errors in the statement or incorrect results. I have tried using IN and EXISTS and syntax to isolate rows, all without luck. I am wondering how I can nest or re-write this query so that the results will display the unique deviceID's, the date of the most recent record and the corresponding status fields associated with those unique records.
If you can guarantee that the DeviceID + DateAndTime is UNIQUE you can do the following:
SELECT *
FROM
MyTable as T1,
(SELECT DeviceID, max(DateAndTime) as mx FROM MyTable group by DeviceID) as T2
WHERE
T1.DeviceID = T2.DeviceID AND
T1.DateAndTime = T2.mx
So basically what happens is, that you do a group by on the DeviceID (NOTE: A GROUP BY always goes with an aggregate function. We are using MAX in this case).
Then you join the Query with the Table, and add the DeviceID + DateAndTime in the WHERE clause.
Side Note... GROUP BY will return distinct elements with or without adding DISTINCT because all rows are distinct by default.
Maybe:
SELECT a.*
FROM( SELECT DISTINCT *,
ROW_NUMBER() OVER (PARTITION BY deviceID ORDER BY dateAndTime DESC) as rown
FROM MyTable ) a
WHERE a.rown = 1

How do I specify the order of GROUP BY?

How do I ensure, when I GROUP BY QID, that only the most recent row is returned?
ID, QID, VALUE, TIMESTAMP
45,1,Male,1362044759
58,1,Female,1362045122
59,1,Male,1362045149
60,1,Female,1362045153
82,1,Female,1362045863
83,1,Female,1362045887
92,1,Male,1362046012
101,1,Female, 1362046401
SELECT ID, QID, VALUE, TIMESTAMP FROM table GROUP BY ID
...returns the first row. I can't simply do a LIMIT 1, as this is just an example, there are lots of QIDs in the table, which are all grouped.
Thanks.
I'm assuming here you want the "latest" row for each QID. You would normally use a derived-table subquery to get each QID's latest TIMESTAMP value and then join on that:
SELECT ...
FROM myTable AS t
INNER JOIN (SELECT QID, MAX(`TIMESTAMP`) AS MaxT FROM myTable GROUP BY QID) l
ON t.QID = l.QID AND l.maxT = t.`TIMESTAMP`
This is also assuming your TIMESTAMP column increases as time goes on.
If you want the most recent record returned:
SELECT *
FROM TBL
ORDER BY `TIMESTAMP` DESC
LIMIT 1;
Otherwise, if you want to get the most recent record for each group of QID check this Stack Overflow Post that treat your same problem with optimal solutions.
You could use 'GROUP_CONCAT' in order to extract grouped data.
SELECT GROUP_CONCAT(ID ORDER BY TIMESTAMP DESC) AS latest_id