SQL - GROUP BY max value - mysql

Note: I'm not sure if I gave this question the most leading title since I'm not sure on the correct approach towards this, but I couldn't find other examples anywhere since it's quite a specific query.
So, I have a table "votes", which is filled with votes created by users (uniquely identified as a number in the user_id column) which correspond to relevent posts in another table (vote records "upvote" each relevent post within the user interface).
I intend to sort these votes (by datetime) in order of latest vote created for each post (post_id column), and as such, avoiding duplicate returned values of each post_id.
I input the following query:
SELECT id, user_id, post_id, created, MAX(created)
FROM votes
GROUP BY post_id, user_id
ORDER BY max(created) DESC
And get returned:
Table: votes
id | user_id | post_id | created | MAX(created)
----+-----------+-----------+-----------------------+--------------------
115 | 1 | 42 | 2014-07-03 23:08:31 | 2016-03-07 12:08:31
----+-----------+-----------+-----------------------+--------------------
237 | 2 | 101 | 2014-02-13 23:05:14 | 2016-03-05 23:05:14
----+-----------+-----------+-----------------------+--------------------
431 | 7 | 944 | 2014-10-22 22:58:37 | 2016-03-03 19:58:37
----+-----------+-----------+-----------------------+--------------------
255 | 15 | 101 | 2014-02-15 14:02:01 | 2016-02-01 23:05:14
----+-----------+-----------+-----------------------+--------------------
... | ... | ... | ... | ...
As you can see, there is a duplicate of the post_id "101". The result of this query seems to sort by maximum created time for each user_id, showing duplicated post_id's, e.g. there are two post_id column rows of "101", when I would only like to diplay the only post_id column value of "101" which has the maximum created time (MAX(created)).
The post_id and user_id columns seemingly must be grouped together, else if I just group by post_id I'm unable to sort by MAX(created) since it won't return the max(created) for each post_id.
How do I remove these duplicated post_id values that don't return the maximum created time?
What I'm after:
Table: votes
id | user_id | post_id | created | MAX(created)
----+-----------+-----------+-----------------------+--------------------
115 | 1 | 42 | 2014-07-03 23:08:31 | 2016-03-07 12:08:31
----+-----------+-----------+-----------------------+--------------------
237 | 2 | 101 | 2014-02-13 23:05:14 | 2016-03-05 23:05:14
----+-----------+-----------+-----------------------+--------------------
431 | 7 | 944 | 2014-10-22 22:58:37 | 2016-03-03 19:58:37
----+-----------+-----------+-----------------------+--------------------
... | ... | ... | ... | ...

Assuming you only want the last vote for each post:
SELECT v.*
FROM posts p
JOIN votes v
ON v.id =
(
SELECT id
FROM votes vi
WHERE post_id = p.id
ORDER BY
created DESC
LIMIT 1
)

If you are looking for getting last user_id whom edited post_id, try group by post_id and ordering by time desc (or id if it is auto increment).
SELECT tbl.* , GROUP_CONCAT('(',tbl.user_id,',',tbl.created,')') as myhistory FROM
(SELECT id, user_id, post_id, created, MAX(created)
FROM votes
ORDER BY max(created) DESC
) as tbl
GROUP BY tbl.post_id
If you need history for (user_id,time) you can use group_concat function as mentioned in code for myhistory column.

SELECT maintable.*
FROM TABLE_NAME maintable
LEFT OUTER JOIN TABLE_NAME temporarytable
ON maintable.GROUPING_BY_COLUMN = temporarytable.GROUPING_BY_COLUMN
AND maintable.COLUMN_WHERE_THE_MAXIMUM_IS_NEEDED < temporarytable.COLUMN_WHERE_THE_MAXIMUM_IS_NEEDED
WHERE temporarytable.COLUMN_WHERE_THE_MAXIMUM_IS_NEEDED IS NULL
ORDER BY PRIMARY_KEY_COLUMN DESC
LIMIT 50;
An alternative way to get the maximum value from a group. This query does not require aggregation, as is the case with “GROUP BY”.
In addition, when grouping using “GROUP BY”, each of the groups is sorted by primary key, which also takes a lot of time.
My query compares the values of one table with another. Until he can find nothing more. If nothing else is found, then this is the maximum.
This query can help you save time getting the maximum value from the group.

Related

MySQL group by with left join

I am trying to do a very complex query (at least extremely complex for me not for YOU :) )
I have users and comments table.
SQL Fiddle: http://sqlfiddle.com/#!9/b1f845/2
select user_id, status_id from comments where user_id in (2,3);
+---------+-----------+
| user_id | status_id |
+---------+-----------+
| 2 | 10 |
| 2 | 10 |
| 2 | 10 |
| 2 | 7 |
| 2 | 7 |
| 2 | 10 |
| 3 | 9 |
| 2 | 9 |
| 2 | 6 |
+---------+-----------+
If I use
select user_id, status_id from comments where user_id in (2,3)
It returns a lot of duplicate values.
What I want to get if possible.
If you see status_id = 10 has user_id= 2,3 and 4 and 2 multiple times.
So from here I want to get maximum of latest user_id (unique) so for example,
it will be user_id = 4 and 2 now the main complex part. I now want to get users information of user_id= 4 and 2 in one column so that at the end I can get something like this
status_id | userOneUserName | userTwoUserName
10 sadek4 iamsadek2
---------------------------------------------
7 | iamsadek2 | null
---------------------------------------------
9 . | iamsadek2 | sadek2
---------------------------------------------
6 | iamsadek2 | null
How can I achieve such a complex things.
Currently I have to do it using application logic.
Thank you for your time.
I think this might be what you literally want here:
SELECT DISTINCT
status_id,
(SELECT MAX(user_id) FROM comments c2 WHERE c1.status_id = c2.status_id) user_1,
(SELECT user_id FROM comments c2 WHERE c1.status_id = c2.status_id
ORDER BY user_id LIMIT 1 OFFSET 1) user_2
FROM comments c1
WHERE user_id IN (2,3);
Demo (your update Fiddle)
We can use correlated subqueries to find the max user_id and second-to-max user_id for each status_id, and then spin each of those out as two separate columns. Using a GROUP_CONCAT approach might be preferable here, since it would also allow you to easily accommodate any numbers of users as a CSV list.
Also, if you were using MySQL 8+ or greater, then we could take advantage of the rank analytic functions, which would also be easier.
select status_id, GROUP_CONCAT(distinct(user_id) SEPARATOR ',')
from comments
group by status_id
I would suggest using GROUP BY and GROUP_CONCAT, e.g. like so:
SELECT status_id, GROUP_CONCAT(userName) AS users, GROUP_CONCAT(DISTINCT c.user_id) AS user_ids
FROM (
SELECT DISTINCT status_id, user_id FROM comments WHERE user_id in (2,3)
) c
JOIN users u ON (c.user_id = u.id)
GROUP BY status_id
ORDER BY status_id DESC

How can I count summed rows as one row in LIMIT?

I want to select user's notifications according to these rules:
all unread notifications
always 2 read notifications
at least 15 notifications (by default)
Here is my query which gets user's notifications ids:
( SELECT id FROM events -- all unread messages
WHERE author_id = ? AND seen = 0
) UNION
( SELECT id FROM events -- 2 read messages
WHERE author_id = ? AND seen <> 0
ORDER BY date_time desc
LIMIT 2
) UNION
( SELECT id FROM events -- at least 15 rows by default
WHERE author_id = ?
ORDER BY seen, date_time desc
LIMIT 15
)
And then I select the matched ids in query above plus other info like this: (I don't want to combine these two queries because of some reasons in reality)
SELECT SUM(score) score, post_id, title, content, date_time
FROM events
GROUP BY post_id, title, content, date_time
ORDER BY seen, MAX(date_time) desc
WHERE id IN ($ids)
It works and all fine.
The problem is: When the first query selects 15 rows which all have the same post_id, then the second query will sum them up and show it as one notification row with total-scores.
I guess I have to add that SUM() also in the first query? And that GROUP BY? Any idea?
An example of the problem, if an user earn 15 upvotes, the first query selects them as 15 notifications, and the second query make it one notification. How can I get 15 separated notification? (those notification which will be summed in the second query should be counted as one notification in the first query, how?)
As you finally want 15 rows per group, you should have rules on groups rather than on messages in my opinion.
You can aggregate your data per group and then check whether the group shall be in your results. You'd do this in the HAVING clause with conditional aggregation, i.e. an aggregation function used on a conditional expression. This is one method to count unread messages for example:
SUM(CASE WHEN seen = 0 THEN 1 ELSE 0 END)
This is another:
COUNT(CASE WHEN seen = 0 THEN 1 END)
(The ELSE branch is omitted and defaults to null, which is not count.)
In MySQL these expressions are even simpler, because false equals 0 and true equals 1. So in MySQL you'd count with:
SUM(seen = 0)
You can use other aggregation functions, too:
HAVING MAX(seen = 0) = 0 -- no unread messages
HAVING MIN(seen = 0) = 1 -- no read messages
Now let's select all groups with at least one unread message:
SELECT SUM(score) AS score, post_id, title, content, date_time
FROM events
GROUP BY post_id, title, content, date_time
HAVING SUM(seen = 0) > 0;
(We could also use HAVING MAX(seen = 0) = 1.)
Now your UNION approach to get all groups with at least one unread message, plus as many other groups as necessary to get at least 15 groups:
(
SELECT SUM(score) AS score, post_id, title, content, date_time, SUM(seen = 0) as unread
FROM events
GROUP BY post_id, title, content, date_time
HAVING SUM(seen = 0) > 0
)
UNION
(
SELECT SUM(score) AS score, post_id, title, content, date_time, SUM(seen = 0) as unread
FROM events
GROUP BY post_id, title, content, date_time
ORDER BY SUM(seen = 0) DESC, date_time DESC
LIMIT 15
)
ORDER BY (unread = 0), date_time DESC;
If you want the single IDs for above groups, then use IN:
SELECT id
FROM events
WHERE (post_id, title, content, date_time) IN
(
SELECT post_id, title, content, date_time
FROM (<above query>) q
);
This is not an answer, but too long for a comment:
You think the rules are all clear, but are they? Let's say it's not at least 15 but only at least 5 rows you want in your final results. From the following table you'd want the IDs 1, 2, 3, and 4, because these are unread. But what about the others?
id | score | post_id | title | content | date_time | seen
---+-------+---------+-------+---------+---------------------+-----
1 | 10 | 11 | hello | it's me | 2018-01-11 12:34:56 | 0
2 | 20 | 22 | hello | it's me | 2018-01-12 12:34:56 | 0
3 | 30 | 33 | hello | it's me | 2018-01-13 12:34:56 | 0
4 | 40 | 44 | hello | it's me | 2018-01-14 12:34:56 | 0
5 | 50 | 11 | hello | it's me | 2018-01-11 12:34:56 | 1
6 | 60 | 22 | hello | it's me | 2018-01-12 12:34:56 | 1
7 | 70 | 44 | hello | it's me | 2018-01-14 12:34:56 | 1
8 | 80 | 55 | hello | it's me | 2018-01-05 12:34:56 | 1
9 | 90 | 55 | hello | it's me | 2018-01-05 12:34:56 | 1
Does it matter that there are read notifications for the same groups? Does it matter that they are newer than notifications 8 and 9? Or will you simply add ID 8 (or 9?) to the set and be done?
No matter whether you select IDs 1, 2, 3, 4, and say 8 or you select all rows, you'd end up with five groups. So please tell us which IDs you'd select and why.

calculate the Ranking of questionnaire

I am trying to develop a ranking table for a sort of questionnaire.
Each day a question is asked at 16h (4:00 pm), which can be answered by 17:59:59 the following day. The table has to show the position of the participants taking into account the correct answers is the time.
My table will be of the sort:
+-------+---------+---------------------+
|userid | correct | timestamp |
+-------+---------+---------------------+
| 2 | 1 | 2018-02-07 16:00:01 |
| 1 | 1 | 2018-02-07 16:02:00 |
| 3 | 1 | 2018-02-07 17:00:00 |
| 1 | 0 | 2018-02-08 16:00:02 |
| 3 | 1 | 2018-02-08 16:00:05 |
| 2 | 0 | 2018-02-08 16:01:00 |
+-------+---------+---------------------+
For now I started with this query:
SELECT `userid`, `correct `, `timestamp`,
count(correct) as count
FROM `results`
WHERE correct = 1
GROUP BY `userid `
ORDER BY count DESC, timestamp DESC
But I have already realized that this is not what I intend because the ranking has to be cumulative but taking into account the several days.
Does anyone have an idea how I can do this?
A user from Stackoverflow Portugal advised this code but it is not working either.
SELECT userid, SUM(correct),
SUM(TIMESTAMPDIFF(HOUR,(timestamp,CAST(CONCAT_WS(' ',date(timestamp), '17:59:59') as DATETIME)))) time
FROM results
GROUP BY userid
ORDER BY correct DESC, time
Don’t deal with this datetime (16h), this may be changed and you will be lost on your query.
Instead, you should count by userid and questionnaire_id. To do so:
add new table questionnaire [id, title] (you can add extra column
later : created_time, end_time, …)
edit your record table by adding the questionnaire id as FK : [userid, questionnaireid, correct, timestamp]
then count normally: Correct answer by user, by questionnaire
SELECT userid, questionnaireid ,
sum(correct) as total
FROM results r
INNER JOIN questionnaire q
ON r.questionnaireid = q.id
WHERE correct = 1
GROUP BY userid, questionnaireid
ORDER BY total DESC, id ASC

In mysql: how can I select the most recently added row when selecting by MAX if two values are equal (application is a games high score table)

I am trying to construct a highscore table from entries in a table with the layout
id(int) | username(varchar) | score(int) | modified (timestamp)
selecting the highest scores per day for each user is working well using the following:
SELECT id, username, MAX( score ) AS hiscore
FROM entries WHERE DATE( modified ) = CURDATE( )
Where I am stuck is that in some cases plays may achieve the same score multiple times in the same day, in which case I need to make sure that it is always the earliest one that is selected because 2 scores match will be the first to have reached that score who wins.
if my table contains the following:
id | username | score | modified
________|___________________|____________|_____________________
1 | userA | 22 | 2014-01-22 08:00:14
2 | userB | 22 | 2014-01-22 12:26:06
3 | userA | 22 | 2014-01-22 16:13:22
4 | userB | 15 | 2014-01-22 18:49:01
The returned winning table in this case should be:
id | username | score | modified
________|___________________|____________|_____________________
1 | userA | 22 | 2014-01-22 08:00:14
2 | userB | 22 | 2014-01-22 12:26:06
I tried to achieve this by adding ORDER BY modified desc to the query, but it always returns the later score. I tried ORDER BY modified asc as well, but I got the same result
This is the classic greatest-n-per-group problem, which has been answered frequently on StackOverflow. Here's a solution for your case:
SELECT e.*
FROM entries e
JOIN (
SELECT DATE(modified) AS modified_date, MAX(score) AS score
FROM entries
GROUP BY modified_date
) t ON DATE(e.modified) = t.modified_date AND e.score = t.score
WHERE DATE(e.modified) = CURDATE()
I think this would works for you and is the simplest way:
SELECT username, MAX(score), MIN(modified)
FROM entries
GROUP BY username
This returns this in your case:
"userB";22;"2014-01-22 12:26:06"
"userA";22;"2014-01-22 08:00:14"
However, I think what you want (in your example would be wrong) the most recent row. To do it, you need this:
SELECT username, MAX(score), MAX(modified)
FROM entries
GROUP BY username
Which returns:
"userB";22;"2014-01-22 18:49:01"
"userA";22;"2014-01-22 16:13:22"

Sort data before using GROUP BY?

I have read that grouping happens before ordering, is there any way that I can order first before grouping without having to wrap my whole query around another query just to do this?
Let's say I have this data:
id | user_id | date_recorded
1 | 1 | 2011-11-07
2 | 1 | 2011-11-05
3 | 1 | 2011-11-06
4 | 2 | 2011-11-03
5 | 2 | 2011-11-06
Normally, I'd have to do this query in order to get what I want:
SELECT
*
FROM (
SELECT * FROM table ORDER BY date_recorded DESC
) t1
GROUP BY t1.user_id
But I'm wondering if there's a better solution.
Your question is somewhat unclear but I have a suspicion what you really want is not any GROUP aggregates at all, but rather ordering by date first, then user ID:
SELECT
id,
user_id,
date_recorded
FROM tbl
ORDER BY date_recorded DESC, user_id ASC
Here would be the result. Note reordering by date_recorded from your original example
id | user_id | date_recorded
1 | 1 | 2011-11-07
3 | 1 | 2011-11-06
2 | 1 | 2011-11-05
5 | 2 | 2011-11-06
4 | 2 | 2011-11-03
Update
To retrieve the full latest record per user_id, a JOIN is needed. The subquery (mx) locates the latest date_recorded per user_id, and that result is joined to the full table to retrieve the remaining columns.
SELECT
mx.user_id,
mx.maxdate,
t.id
FROM (
SELECT
user_id,
MAX(date_recorded) AS maxdate
FROM tbl
GROUP BY user_id
) mx JOIN tbl t ON mx.user_id = t.user_id AND mx.date_recorded = t.date_recorded
Iam just using the technique
"Using order clause before group by inserting it in group_concat clause"
SELECT SUBSTRING_INDEX(group_concat(cast(id as char)
ORDER BY date_recorded desc),',',1),
user_id,
SUBSTRING_INDEX(group_concat(cast(`date_recorded` as char)
ORDER BY `date_recorded` desc),',',1)
FROM data
GROUP BY user_id