How can I count summed rows as one row in LIMIT? - mysql

I want to select user's notifications according to these rules:
all unread notifications
always 2 read notifications
at least 15 notifications (by default)
Here is my query which gets user's notifications ids:
( SELECT id FROM events -- all unread messages
WHERE author_id = ? AND seen = 0
) UNION
( SELECT id FROM events -- 2 read messages
WHERE author_id = ? AND seen <> 0
ORDER BY date_time desc
LIMIT 2
) UNION
( SELECT id FROM events -- at least 15 rows by default
WHERE author_id = ?
ORDER BY seen, date_time desc
LIMIT 15
)
And then I select the matched ids in query above plus other info like this: (I don't want to combine these two queries because of some reasons in reality)
SELECT SUM(score) score, post_id, title, content, date_time
FROM events
GROUP BY post_id, title, content, date_time
ORDER BY seen, MAX(date_time) desc
WHERE id IN ($ids)
It works and all fine.
The problem is: When the first query selects 15 rows which all have the same post_id, then the second query will sum them up and show it as one notification row with total-scores.
I guess I have to add that SUM() also in the first query? And that GROUP BY? Any idea?
An example of the problem, if an user earn 15 upvotes, the first query selects them as 15 notifications, and the second query make it one notification. How can I get 15 separated notification? (those notification which will be summed in the second query should be counted as one notification in the first query, how?)

As you finally want 15 rows per group, you should have rules on groups rather than on messages in my opinion.
You can aggregate your data per group and then check whether the group shall be in your results. You'd do this in the HAVING clause with conditional aggregation, i.e. an aggregation function used on a conditional expression. This is one method to count unread messages for example:
SUM(CASE WHEN seen = 0 THEN 1 ELSE 0 END)
This is another:
COUNT(CASE WHEN seen = 0 THEN 1 END)
(The ELSE branch is omitted and defaults to null, which is not count.)
In MySQL these expressions are even simpler, because false equals 0 and true equals 1. So in MySQL you'd count with:
SUM(seen = 0)
You can use other aggregation functions, too:
HAVING MAX(seen = 0) = 0 -- no unread messages
HAVING MIN(seen = 0) = 1 -- no read messages
Now let's select all groups with at least one unread message:
SELECT SUM(score) AS score, post_id, title, content, date_time
FROM events
GROUP BY post_id, title, content, date_time
HAVING SUM(seen = 0) > 0;
(We could also use HAVING MAX(seen = 0) = 1.)
Now your UNION approach to get all groups with at least one unread message, plus as many other groups as necessary to get at least 15 groups:
(
SELECT SUM(score) AS score, post_id, title, content, date_time, SUM(seen = 0) as unread
FROM events
GROUP BY post_id, title, content, date_time
HAVING SUM(seen = 0) > 0
)
UNION
(
SELECT SUM(score) AS score, post_id, title, content, date_time, SUM(seen = 0) as unread
FROM events
GROUP BY post_id, title, content, date_time
ORDER BY SUM(seen = 0) DESC, date_time DESC
LIMIT 15
)
ORDER BY (unread = 0), date_time DESC;
If you want the single IDs for above groups, then use IN:
SELECT id
FROM events
WHERE (post_id, title, content, date_time) IN
(
SELECT post_id, title, content, date_time
FROM (<above query>) q
);

This is not an answer, but too long for a comment:
You think the rules are all clear, but are they? Let's say it's not at least 15 but only at least 5 rows you want in your final results. From the following table you'd want the IDs 1, 2, 3, and 4, because these are unread. But what about the others?
id | score | post_id | title | content | date_time | seen
---+-------+---------+-------+---------+---------------------+-----
1 | 10 | 11 | hello | it's me | 2018-01-11 12:34:56 | 0
2 | 20 | 22 | hello | it's me | 2018-01-12 12:34:56 | 0
3 | 30 | 33 | hello | it's me | 2018-01-13 12:34:56 | 0
4 | 40 | 44 | hello | it's me | 2018-01-14 12:34:56 | 0
5 | 50 | 11 | hello | it's me | 2018-01-11 12:34:56 | 1
6 | 60 | 22 | hello | it's me | 2018-01-12 12:34:56 | 1
7 | 70 | 44 | hello | it's me | 2018-01-14 12:34:56 | 1
8 | 80 | 55 | hello | it's me | 2018-01-05 12:34:56 | 1
9 | 90 | 55 | hello | it's me | 2018-01-05 12:34:56 | 1
Does it matter that there are read notifications for the same groups? Does it matter that they are newer than notifications 8 and 9? Or will you simply add ID 8 (or 9?) to the set and be done?
No matter whether you select IDs 1, 2, 3, 4, and say 8 or you select all rows, you'd end up with five groups. So please tell us which IDs you'd select and why.

Related

Calculate unique items seen by users via sql

I need help to resolve the next case.
The data which users want to see is accessible by pagination requests and later these requests are stored in the database in the next form:
+----+---------+-------+--------+
| id | user id | first | amount |
+----+---------+-------+--------+
| 1 | 1 | 0 | 5 |
| 2 | 1 | 10 | 10 |
| 3 | 1 | 10 | 5 |
| 4 | 1 | 15 | 10 |
| 5 | 2 | 0 | 10 |
| 6 | 2 | 0 | 5 |
| 7 | 2 | 10 | 5 |
+----+---------+-------+--------+
The table is ordered by user id asc, first asc, amount desc.
The task is to write the SQL statement which calculate what total unique amount of data the user has seen.
For the first user total amount must be 20, since the request with id=1 returned first 5 items, with id=2 returned another 10 items. Request with id=3 returns data already 'seen' by request with id=2. Request with id=4 intersects with id=2, but still returns 5 'unseen' pieces of data.
For the second user total amount must be 15.
As a result of SQL statement, I should get the next output:
+---------+-------+
| user id | total |
+---------+-------+
| 1 | 20 |
+---------+-------+
| 2 | 15 |
+---------+-------+
I am using MySQL 5.7, so window functions are not available for me. I stuck with this task for a day already and still cannot get the desired output. If it is not possible with this setup, I will end up calculating the results in the application code. I would appreciate any suggestions or help with resolving this task, thank you!
This is a type of gaps and islands problem. In this case, use a cumulative max to determine if one request intersects with a previous request. If not, that is the beginning of an "island" of adjacent requests. A cumulative sum of the beginnings assigns an "island", then an aggregation counts each island.
So, the islands look like this:
select userid, min(first), max(first + amount) as last
from (select t.*,
sum(case when prev_last >= first then 0 else 1 end) over
(partition by userid order by first) as grp
from (select t.*,
max(first + amount) over (partition by userid order by first range between unbounded preceding and 1 preceding) as prev_last
from t
) t
) t
group by userid, grp;
You then want this summed by userid, so that is one more level of aggregation:
with islands as (
select userid, min(first) as first, max(first + amount) as last
from (select t.*,
sum(case when prev_last >= first then 0 else 1 end) over
(partition by userid order by first) as grp
from (select t.*,
max(first + amount) over (partition by userid order by first range between unbounded preceding and 1 preceding) as prev_last
from t
) t
) t
group by userid, grp
)
select userid, sum(last - first) as total
from islands
group by userid;
Here is a db<>fiddle.
This logic is similar to Gordon's, but runs on older releases of MySQL, too.
select userid
-- overall length minus gaps
,max(maxlast)-min(minfirst) + sum(gaplen) as total
from
(
select userid
,prevlast
,min(first) as minfirst -- first of group
,max(last) as maxlast -- last of group
-- if there was a gap, calculate length of gap
,min(case when prevlast < first then prevlast - first else 0 end) as gaplen
from
(
select t.*
,first + amount as last -- last value in range
,( -- maximum end of all previous rows
select max(first + amount)
from t as t2
where t2.userid = t.userid
and t2.first < t.first
) as prevlast
from t
) as dt
group by userid, prevlast
) as dt
group by userid
order by userid
See fiddle

Select promoted items grouped by another attribute

From table like below:
id | node_id | promoted | group_type | created_at |status
------------------------------------------------------------------
8 | 4321 | 1 | 3 | 2018-01-08 13:29:55| 1
4 | 4321 | 0 | 3 | 2018-01-06 11:22:53| 1
3 | 4321 | 0 | 1 | 2018-01-05 23:19:02| 1
2 | 4321 | 1 | 1 | 2018-01-05 21:20:15| 1
1 | 4321 | 1 | 3 | 2018-01-05 11:09:51| 1
I have to get one id and group_type values per each group_type.
If there is promoted item in the group, query should return it's id and group_type.
If there are more than one promoted items in the group, most recent promoted record should be returned.
If there is no promoted item in the group, query should return most recent record.
Using query below I managed to get almost what I need
SELECT a.id, a.group_type, a.promoted, a.created_at
FROM (
SELECT group_type, MAX(promoted) AS max_promoted
FROM nodes
WHERE node_id=4321 AND status=1
GROUP BY group_type
) AS g
INNER JOIN nodes AS a
ON a.group_type = g.group_type AND a.promoted = g.max_promoted
WHERE node_id= 4321 AND status=1 ORDER BY created_at
Unfortunately when there is more than one promoted item in the group I get both.
Any idea how to get only one promoted item per group?
EDIT:
If there is more than one group, query should return multiple rows but one per every group.
You can limit the result of the query by adding LIMIT 0,1 at the end of the query.
As you have ordered your result it will works.
For more information about LIMIT see : https://dev.mysql.com/doc/refman/5.7/en/limit-optimization.html
Edited: You should order items in descending to get the latest one on top and limit items as per required i.e. 1 or 2 and so on. Also union will help in getting latest result either promoted in case not promoted. The last limit will result only single (required) row. Here's your query:
(SELECT a.id, a.group_type, a.promoted, a.created_at
FROM (
SELECT group_type, MAX(promoted) AS max_promoted
FROM nodes
WHERE node_id=4321 and status=1
GROUP BY group_type
) AS g
INNER JOIN nodes AS a
ON a.group_type = g.group_type AND a.promoted = g.max_promoted
WHERE node_id= 4321 and status=1 ORDER BY created_at desc
limit 1)
union
(select a.id, a.group_type, a.promoted, a.created_at from nodes a order by created_at desc limit 1)
limit 1
Hope it helps!

SQL - GROUP BY max value

Note: I'm not sure if I gave this question the most leading title since I'm not sure on the correct approach towards this, but I couldn't find other examples anywhere since it's quite a specific query.
So, I have a table "votes", which is filled with votes created by users (uniquely identified as a number in the user_id column) which correspond to relevent posts in another table (vote records "upvote" each relevent post within the user interface).
I intend to sort these votes (by datetime) in order of latest vote created for each post (post_id column), and as such, avoiding duplicate returned values of each post_id.
I input the following query:
SELECT id, user_id, post_id, created, MAX(created)
FROM votes
GROUP BY post_id, user_id
ORDER BY max(created) DESC
And get returned:
Table: votes
id | user_id | post_id | created | MAX(created)
----+-----------+-----------+-----------------------+--------------------
115 | 1 | 42 | 2014-07-03 23:08:31 | 2016-03-07 12:08:31
----+-----------+-----------+-----------------------+--------------------
237 | 2 | 101 | 2014-02-13 23:05:14 | 2016-03-05 23:05:14
----+-----------+-----------+-----------------------+--------------------
431 | 7 | 944 | 2014-10-22 22:58:37 | 2016-03-03 19:58:37
----+-----------+-----------+-----------------------+--------------------
255 | 15 | 101 | 2014-02-15 14:02:01 | 2016-02-01 23:05:14
----+-----------+-----------+-----------------------+--------------------
... | ... | ... | ... | ...
As you can see, there is a duplicate of the post_id "101". The result of this query seems to sort by maximum created time for each user_id, showing duplicated post_id's, e.g. there are two post_id column rows of "101", when I would only like to diplay the only post_id column value of "101" which has the maximum created time (MAX(created)).
The post_id and user_id columns seemingly must be grouped together, else if I just group by post_id I'm unable to sort by MAX(created) since it won't return the max(created) for each post_id.
How do I remove these duplicated post_id values that don't return the maximum created time?
What I'm after:
Table: votes
id | user_id | post_id | created | MAX(created)
----+-----------+-----------+-----------------------+--------------------
115 | 1 | 42 | 2014-07-03 23:08:31 | 2016-03-07 12:08:31
----+-----------+-----------+-----------------------+--------------------
237 | 2 | 101 | 2014-02-13 23:05:14 | 2016-03-05 23:05:14
----+-----------+-----------+-----------------------+--------------------
431 | 7 | 944 | 2014-10-22 22:58:37 | 2016-03-03 19:58:37
----+-----------+-----------+-----------------------+--------------------
... | ... | ... | ... | ...
Assuming you only want the last vote for each post:
SELECT v.*
FROM posts p
JOIN votes v
ON v.id =
(
SELECT id
FROM votes vi
WHERE post_id = p.id
ORDER BY
created DESC
LIMIT 1
)
If you are looking for getting last user_id whom edited post_id, try group by post_id and ordering by time desc (or id if it is auto increment).
SELECT tbl.* , GROUP_CONCAT('(',tbl.user_id,',',tbl.created,')') as myhistory FROM
(SELECT id, user_id, post_id, created, MAX(created)
FROM votes
ORDER BY max(created) DESC
) as tbl
GROUP BY tbl.post_id
If you need history for (user_id,time) you can use group_concat function as mentioned in code for myhistory column.
SELECT maintable.*
FROM TABLE_NAME maintable
LEFT OUTER JOIN TABLE_NAME temporarytable
ON maintable.GROUPING_BY_COLUMN = temporarytable.GROUPING_BY_COLUMN
AND maintable.COLUMN_WHERE_THE_MAXIMUM_IS_NEEDED < temporarytable.COLUMN_WHERE_THE_MAXIMUM_IS_NEEDED
WHERE temporarytable.COLUMN_WHERE_THE_MAXIMUM_IS_NEEDED IS NULL
ORDER BY PRIMARY_KEY_COLUMN DESC
LIMIT 50;
An alternative way to get the maximum value from a group. This query does not require aggregation, as is the case with “GROUP BY”.
In addition, when grouping using “GROUP BY”, each of the groups is sorted by primary key, which also takes a lot of time.
My query compares the values of one table with another. Until he can find nothing more. If nothing else is found, then this is the maximum.
This query can help you save time getting the maximum value from the group.

In mysql: how can I select the most recently added row when selecting by MAX if two values are equal (application is a games high score table)

I am trying to construct a highscore table from entries in a table with the layout
id(int) | username(varchar) | score(int) | modified (timestamp)
selecting the highest scores per day for each user is working well using the following:
SELECT id, username, MAX( score ) AS hiscore
FROM entries WHERE DATE( modified ) = CURDATE( )
Where I am stuck is that in some cases plays may achieve the same score multiple times in the same day, in which case I need to make sure that it is always the earliest one that is selected because 2 scores match will be the first to have reached that score who wins.
if my table contains the following:
id | username | score | modified
________|___________________|____________|_____________________
1 | userA | 22 | 2014-01-22 08:00:14
2 | userB | 22 | 2014-01-22 12:26:06
3 | userA | 22 | 2014-01-22 16:13:22
4 | userB | 15 | 2014-01-22 18:49:01
The returned winning table in this case should be:
id | username | score | modified
________|___________________|____________|_____________________
1 | userA | 22 | 2014-01-22 08:00:14
2 | userB | 22 | 2014-01-22 12:26:06
I tried to achieve this by adding ORDER BY modified desc to the query, but it always returns the later score. I tried ORDER BY modified asc as well, but I got the same result
This is the classic greatest-n-per-group problem, which has been answered frequently on StackOverflow. Here's a solution for your case:
SELECT e.*
FROM entries e
JOIN (
SELECT DATE(modified) AS modified_date, MAX(score) AS score
FROM entries
GROUP BY modified_date
) t ON DATE(e.modified) = t.modified_date AND e.score = t.score
WHERE DATE(e.modified) = CURDATE()
I think this would works for you and is the simplest way:
SELECT username, MAX(score), MIN(modified)
FROM entries
GROUP BY username
This returns this in your case:
"userB";22;"2014-01-22 12:26:06"
"userA";22;"2014-01-22 08:00:14"
However, I think what you want (in your example would be wrong) the most recent row. To do it, you need this:
SELECT username, MAX(score), MAX(modified)
FROM entries
GROUP BY username
Which returns:
"userB";22;"2014-01-22 18:49:01"
"userA";22;"2014-01-22 16:13:22"

How to include dates with zero messages into the resultset anyway?

I have the following table with messages:
+---------+---------+------------+----------+
| msg_id | user_id | m_date | m_time |
+-------------------+------------+----------+
| 1 | 1 | 2011-01-22 | 06:23:11 |
| 2 | 1 | 2011-01-23 | 16:17:03 |
| 3 | 1 | 2011-01-23 | 17:05:45 |
| 4 | 2 | 2011-01-22 | 23:58:13 |
| 5 | 2 | 2011-01-23 | 23:59:32 |
| 6 | 2 | 2011-01-24 | 21:02:41 |
| 7 | 3 | 2011-01-22 | 13:45:00 |
| 8 | 3 | 2011-01-23 | 13:22:34 |
| 9 | 3 | 2011-01-23 | 18:22:34 |
| 10 | 3 | 2011-01-24 | 02:22:22 |
| 11 | 3 | 2011-01-24 | 13:12:00 |
+---------+---------+------------+----------+
What I want is for each day, to see how many messages each user has sent BEFORE and AFTER 16:00:
SELECT
user_id,
m_date,
SUM(m_time <= '16:00') AS before16,
SUM(m_time > '16:00') AS after16
FROM messages
GROUP BY user_id, m_date
ORDER BY user_id, m_date ASC
This produces:
user_id m_date before16 after16
-------------------------------------
1 2011-01-22 1 0
1 2011-01-23 0 2
2 2011-01-22 0 1
2 2011-01-23 0 1
2 2011-01-24 0 1
3 2011-01-22 1 0
3 2011-01-23 1 1
3 2011-01-24 2 0
Because user 1 has written no messages on 2011-01-24, this date is not in the resultset. However, this is undesirable. I have a second table in my database, called "date_range":
+---------+------------+
| date_id | d_date |
+---------+------------+
| 1 | 2011-01-21 |
| 1 | 2011-01-22 |
| 1 | 2011-01-23 |
| 1 | 2011-01-24 |
+---------+------------+
I want to check the "messages" against this table. For each user, all these dates have to be in the resultset. As you can see, none of the users have written messages on 2011-01-21, and as said, user 1 has no messages on 2011-01-24. The desired output of the query would be:
user_id d_date before16 after16
-------------------------------------
1 2011-01-21 0 0
1 2011-01-22 1 0
1 2011-01-23 0 2
1 2011-01-24 0 0
2 2011-01-21 0 0
2 2011-01-22 0 1
2 2011-01-23 0 1
2 2011-01-24 0 1
3 2011-01-21 0 0
3 2011-01-22 1 0
3 2011-01-23 1 1
3 2011-01-24 2 0
How can I link the two tables so that the query result also holds rows with zero values for before16 and after16?
Edit: yes, I have a "users" table:
+---------+------------+
| user_id | user_date |
+---------+------------+
| 1 | foo |
| 2 | bar |
| 3 | foobar |
+---------+------------+
Test bed:
create table messages (msg_id integer, user_id integer, _date date, _time time);
create table date_range (date_id integer, _date date);
insert into messages values
(1,1,'2011-01-22','06:23:11'),
(2,1,'2011-01-23','16:17:03'),
(3,1,'2011-01-23','17:05:05');
insert into date_range values
(1, '2011-01-21'),
(1, '2011-01-22'),
(1, '2011-01-23'),
(1, '2011-01-24');
Query:
SELECT p._date, p.user_id,
coalesce(m.before16, 0) b16, coalesce(m.after16, 0) a16
FROM
(SELECT DISTINCT user_id, dr._date FROM messages m, date_range dr) p
LEFT JOIN
(SELECT user_id, _date,
SUM(_time <= '16:00') AS before16,
SUM(_time > '16:00') AS after16
FROM messages
GROUP BY user_id, _date
ORDER BY user_id, _date ASC) m
ON p.user_id = m.user_id AND p._date = m._date;
EDIT:
Your initial query is left as is, I hope it doesn't requires any explanations;
SELECT DISTINCT user_id, dr._date FROM messages m, date_range dr will return a cartesian or CROSS JOIN of two tables, which will give me all required date range for each user in subject. As I'm interested in each pair only once, I use DISTINCT clause. Try this query with and without it;
Then I use LEFT JOIN on two sub-selects.
This join means: first, INNER join is performed, i.e. all rows with matching fields in the ON condition are returned. Then, for each row in the left-side relation of the join that has no matches on the right side, return NULLs (thus the name, LEFT JOIN, i.e. left relation is always there and right is expected to have NULLs). This join will do what you expect — return user_id + date combinations even if there were no messages in the given date for a given user. Note that I use user_id + date sub-select first (on the left) and messages query second (on the right);
coalesce() is used to replace NULL with zero.
I hope this clarifies how this query works.
Give this a shot:
select u.user_id, u._date,
sum(_time <= '16:00') as before16,
sum(_time > '16:00') as after16
from (
select m.user_id, d._date
from messages m
cross join date_range d
group by m.user_id, d._date
) u
left join messages m on u.user_id=m.user_id
and u._date=m._date
group by u.user_id, u._date
The inner query is just building a set of all possible/desired user-date pairs. It would be more efficient to use a users table, but you didn't mention that you had one, so I won't assume. otherwise, you just need the left join to not remove the non-joined records.
EDIT
--More detailed explanation: taking the query apart.
Start with the innermost query; the goal is to get a list of all desired dates for every user. Since there's a table of users and a table of dates it can look like this:
select distinct u.user_id, d.d_date
from users u
cross join date_range d
The key here is the cross join, taking every row in the users table and associating it with every row in the date_range table. The distinct keyword is really just a shorthand for a group by on all columns, and is here just in case there's duplicated data.
Note that there are several other methods of getting this same result set (like in my original query), but this is probably the simplest from both a logical and computational standpoint.
Really, the only other steps are to add the left join (associating all of the rows we got above to all available data, and not removing anything that doesn't have any data) and the group by and select components which are basically the same as you had before. So, putting everything together it looks like this:
select t.user_id, t.d_date,
sum(m.m_time <= '16:00') as before16,
sum(m.m_time > '16:00') as after16
from (
select distinct u.user_id, d.d_date
from users u
cross join date_range d
) t
left join messages m on t.user_id = m.user_id
and t.d_date = m.m_date
group by t.user_id, t.d_date
Based on some other comments/questions, note the explicit use of prefixes for all uses of all tables and sub-queries (which is pretty straight forward since we're not using any table more than once anymore): u for the users table, d for the date_range table, t for the sub-query containing the dates to use for each user, and m for the message table. This is probably where my first explanation fell a little short, since I used the message table twice, both times with the same prefix. It works there because of the context of both uses (one was in a sub-query), but it probably isn't the best practice.
It is not neat. But if you have a user table. Then maybe something like this:
SELECT
user_id,
_date,
SUM(_time <= '16:00') AS before16,
SUM(_time > '16:00') AS after16
FROM messages
GROUP BY user_id, _date
UNION
SELECT
user_id,
date_range,
0 AS before16,
0 AS after16
FROM
users,
date_range
ORDER BY user_id, _date ASC
chezy525's solution works great, I ported it to postgresql and removed/renamed some aliases:
select users_and_dates.user_id, users_and_dates._date,
SUM(case when _time <= '16:00' then 1 else 0 end) as before16,
SUM(case when _time > '16:00' then 1 else 0 end) as after16
from (
select messages.user_id, date_range._date
from messages
cross join date_range
group by messages.user_id, date_range._date
) users_and_dates
left join messages on users_and_dates.user_id=messages.user_id
and users_and_dates._date=messages._date
group by users_and_dates.user_id, users_and_dates._date;
and ran on my machine, worked perfectly