Select first N messages each user receives - mysql

I have a table that stores messages sent to users, the layout is as follows
id (auto-incrementing) | message_id | user_id | datetime_sent
I'm trying to find the first N message_id's that each user has received, but am completely stuck. I can do it easily on a per-user basis (when defining the user ID in the query), but not for all users.
Things to note:
Many users can get the same message_id
Message ID's aren't sent sequentially (i.e. we can send message 400 before message 200)
This is a read only mySQL database
EDIT: On second thought I removed this bit but have added it back in since someone was kind enough to work on it
The end goal is to see what % of users opened one of the first N messages they received.
That table of opens looks like this:
user_id | message_id | datetime_opened

This is an untested answer to the original question (with 2 tables and condition on first 5):
SELECT DISTINCT user_id
FROM (
SELECT om.user_id,
om.message_id,
count(DISTINCT sm2.message_id) messages_before
FROM opened_messages om
INNER JOIN sent_messages sm
ON om.user_id = sm.user_id
AND om.message_id = sm.message_id
LEFT JOIN sent_messages sm2
ON om.user_id = sm2.user_id
AND sm2.datetime_sent < sm.datetime_sent
GROUP BY om.user_id,
om.message_id
HAVING messages_before < 5
) AS base
The subquery joins in sm2 to count the number of preceding messages that were sent to the same user, and then the having clause makes sure that there are fewer than 5 earlier messages sent. As for the same user there might be multiple messages (up to 5) with that condition, the outer query only lists the unique users that comply to the condition.

To get the first N (here 2) messages, try
SELECT
user_id
, message_id
FROM (
SELECT
user_id
, message_id
, id
, (CASE WHEN #user_id != user_id THEN #rank := 1 ELSE #rank := #rank + 1 END) AS rank,
(CASE WHEN #user_id != user_id THEN #user_id := user_id ELSE #user_id END) AS _
FROM (SELECT * FROM MessageSent ORDER BY user_id, id) T
JOIN (SELECT #cnt := 0) c
JOIN (SELECT #user_id := 0) u
) R
WHERE rank < 3
ORDER BY user_id, id
;
which uses a RANK substitute, derived from #Seaux response to Does mysql have the equivalent of Oracle's “analytic functions”?
To extend this to your original question, just add the appropriate calculation:
SELECT
COUNT(DISTINCT MO.user_id) * 100 /
(SELECT COUNT(DISTINCT user_id)
FROM (
SELECT
user_id
, message_id
, id
, (CASE WHEN #user_id != user_id THEN #rank := 1 ELSE #rank := #rank + 1 END) AS rank,
(CASE WHEN #user_id != user_id THEN #user_id := user_id ELSE #user_id END) AS _
FROM (SELECT * FROM MessageSent ORDER BY user_id, id) T
JOIN (SELECT #cnt := 0) c
JOIN (SELECT #user_id := 0) u
) R2
WHERE rank < 3
) AS percentage_who_read_one_of_the_first_messages
FROM MessageOpened MO
JOIN
(SELECT
user_id
, message_id
FROM (
SELECT
user_id
, message_id
, id
, (CASE WHEN #user_id != user_id THEN #rank := 1 ELSE #rank := #rank + 1 END) AS rank,
(CASE WHEN #user_id != user_id THEN #user_id := user_id ELSE #user_id END) AS _
FROM (SELECT * FROM MessageSent ORDER BY user_id, id) T
JOIN (SELECT #cnt := 0) c
JOIN (SELECT #user_id := 0) u
) R
WHERE rank < 3) MR
ON MO.user_id = MR.user_id
AND MO.message_id = MR.message_id
;
With no CTEs in MySQL, and being in a read-only database - I see no way around having the above query twice in the statement.
See it in action: SQL Fiddle.
Please comment if and as this requires adjustment / further detail.

Related

mysql / sql: how to delete all rows except the Nth last per user?

I have a message (id, userid, message) table that grows rapidly.
I would like to delete all messages per user except his last 30
ex:
if user1 has 100 messages, we will delete the first 70,
if user2 has 40 messages, we will delete the first 10,
if userN has 10 messages, no action is taken
Is there a way to do it with a single SQL ?
My idea for now is to make a LOOP with PHP and lake N sql, which is very long for N users.
MySQL (pre 8.0) doesn't have a really convenient way to do this. One method uses variables to enumerate the values:
select m.*,
(#rn := if(#u = userid, #rn + 1,
if(#u := userid, 1, 1)
)
) as seqnum
from (select m.*
from messages m
order by userid, id desc
) m cross join
(select #u := -1, #rn := 0) params;
You can turn this into a delete using join:
delete m
from messages m join
(select m.*,
(#rn := if(#u = userid, #rn + 1,
if(#u := userid, 1, 1)
)
) as seqnum
from (select m.*
from messages m
order by userid, id desc
) m cross join
(select #u := -1, #rn := 0) params
) mm
on m.id = mm.id
where seqnum > 30;
As I say in a comment, I don't think this is a good solution for a real-world problem. The history of messages is useful and there are probably other ways to achieve the performance you want. The difference between 30 messages for a user and 70 messages for a user should not have that much of an effect on performance, in a tuned system.
SET #row_number = 0;
DELETE FROM MESSAGE
WHERE ID IN
( SELECT ID FROM
(SELECT ID,
#row_number:=CASE
WHEN #userid = userid THEN
#row_number + 1
ELSE 1
END AS num,
#userid:=userid as userid
FROM MESSAGE) A
WHERE NUM > 70 )

how to calculate user ranking from 2 different tables

I have a users table with phase1 and phase2 columns that i need to calculate the users rank in each phase and store it in these fields.
the ranking is calculated based on a different table points where i have the points by phase for each user.
what i am trying to do is
sum all points for each user by phase and calculate his rank based on that
in case the user points are equal compare the sum of grade1 in case that is also equal compare the sum of grade2
update users table with his rank in each phase
here is how my new table look like with some demo data
sql fiddle demo
currently I use the below code to calculate the ranking from my old table where both rank and user info are in the same table
old sql fiddle demo
update users a
join (
select id,
(
select count(distinct total)
from users d
where c.total < d.total
) +1 rank
from users c
) b on a.id = b.id
set a.rank = b.rank
there are analytics function in oracle called as rank() and dense_rank() which can be useful to get your result.
As you are using mysql, I tried to convert those function in mysql equivalent.
You can get the desired result with following query which you can use to update users table. You may have to change it further if for the logic when there is tie on grades as well.
set #pk1 ='';
set #rn1 =1;
set #tot ='';
set #val =1;
SELECT id,
name,
phase,
phasetotal,
denseRank
FROM
(
SELECT id,
name,
phase,
phasetotal,
#rn1 := if(#pk1=phase, #rn1+#val,1) as denseRank,
#val := if(#pk1=phase, if(#tot=phasetotal, #val+1, 1),1) as value,
#pk1 := phase,
#tot := phasetotal
FROM
(
select users.id,users.name, points.phase, sum(points.points)
as phasetotal from users,points where users.id = points.userid
group by users.id, points.phase order by points.phase, phasetotal desc, points.grade1 desc, points.grade2 desc
) A
) B;
Here's the update query
set #pk1 ='';
set #rn1 =1;
set #tot ='';
set #val =1;
UPDATE users u join (
SELECT id,
name,
phase,
phasetotal,
denseRank
FROM
(
SELECT id,
name,
phase,
phasetotal,
#rn1 := if(#pk1=phase, #rn1+#val,1) as denseRank,
#val := if(#pk1=phase, if(#tot=phasetotal, #val+1, 1),1) as value,
#pk1 := phase,
#tot := phasetotal
FROM
(
select users.id,users.name, points.phase, sum(points.points)
as phasetotal from users,points where users.id = points.userid
group by users.id, points.phase order by points.phase, phasetotal desc, points.grade1 desc, points.grade2 desc
) A
) B ) C on u.id = C.id
SET u.phase1 = CASE WHEN C.phase = 1 and u.phase1 = 0 THEN C.denseRank ELSE u.phase1 END,
u.phase2 = CASE WHEN C.phase = 2 and u.phase2 = 0 THEN C.denseRank ELSE u.phase2 END;

MYSQL retrieving user activity where the same user_id can appear a maximum of 3 times

I'm retrieving rows from an user activity table like so
SELECT user_id, type, source_id FROM activity ORDER BY date DESC LIMIT 5
But I don't want the activity feed to be able to be clogged up by the same user, so I want to be able to retrieve a maximum of 3 rows out of 5 that contain the same user_id.
Any ideas how I could do this? Thanks :)
Here is a "traditional" way, where you first enumerate the user idsand use this information as a filter:
SELECT user_id, type, source_id
FROM (select a.*,
#rn := if (#user_id = user_id, #rn + 1, 1) as rn,
#user_id := user_id
from activity a cross join
(select #rn := 0, #user_id := -1) const
order by user_id
) a
WHERE rn <= 3
ORDER BY date DESC
LIMIT 5;
You can try this:-
SELECT user_id, type, source_id
FROM activity
WHERE 3 > (
SELECT count( * )
FROM activity AS activity1
WHERE activity .user_id = activity1.user_id
AND activity.user_id > activity1.user_id)
ORDER BY activity.user_id DESC
LIMIT 5

Select users with a difference larger than zero between the last two entries in a specific table with SQL?

Here's the SQLFiddle Link to my tables.
I basically want to select only Jack and Jill, as there is a non-zero difference between the last two nums entries of the table foo with the user being their respective names.
How is this possible?
Note: just to mention, in my foo table, I have around 100000 rows, so it would be good if there was a very quick and fast way of retrieving the data.
I prefer doing this using limit with the offset to get the two most recent values. Happily, your table has an id column for determining the order.
select user,
(select num from foo f2 where f2.user = f.user order by f2.id desc limit 1
) lastval,
(select num from foo f2 where f2.user = f.user order by f2.id desc limit 1, 2
) lastval2
from foo f
group by user
having lastval <> lastval2
Here's one way (although I think you'd be more likely to JOIN on a user's id rather than their name!?!...
SELECT u.*
FROM
( SELECT x.*, COUNT(*) rank FROM foo x JOIN foo y ON y.user = x.user AND y.id >= x.id GROUP BY x.id)a
LEFT
JOIN
( SELECT x.*, COUNT(*) rank FROM foo x JOIN foo y ON y.user = x.user AND y.id >= x.id GROUP BY x.id)b
ON b.user = a.user
AND b.num = a.num
AND b.rank = a.rank + 1
JOIN users u
ON u.user = a.user
WHERE b.id IS NULL
AND a.rank = 1;
I think this query can be rewritten as follows, which might be faster...
SELECT u.*
FROM
( SELECT id
, user
, num
, #prev_user := #curr_user
, #curr_user := user
, #rank := IF(#prev_user = #curr_user, #rank+1, #rank:=1) rank
FROM foo
JOIN (SELECT #curr_user := null, #prev_user := null, #rank := 0) sel1
ORDER
BY user
, id DESC
) a
LEFT
JOIN
( SELECT id
, user
, num
, #prev_user := #curr_user
, #curr_user := user
, #rank := IF(#prev_user = #curr_user, #rank+1, #rank:=1) rank
FROM foo
JOIN (SELECT #curr_user := null, #prev_user := null, #rank := 0) sel1
ORDER
BY user
, id DESC
) b
ON b.user = a.user
AND b.num = a.num
AND b.rank = a.rank + 1
JOIN users u
ON u.user = a.user
WHERE b.id IS NULL
AND a.rank = 1;
Based on Strawberrys 2nd solution I have tried this.
SELECT user, MIN(num) AS MinNum, MAX(num) AS MaxNum
FROM ( SELECT id
, user
, num
, #prev_user := #curr_user
, #curr_user := user
, #rank := IF(#prev_user = #curr_user, #rank+1, 1) AS rank
FROM foo
JOIN (SELECT #curr_user := null, #prev_user := null, #rank := 1) sel1
ORDER BY user, id DESC
) AS Sub
WHERE rank <= 2
GROUP BY user
HAVING MinNum != MaxNum
This is getting the details ranked as a subselect and rejecting where the rank is greater than 2 (unfortunately the user variables give strange results if you try and check this within the subselect). The results are then grouped on user and the min and max value of num are returned. If they are different then the row is returned (and as you only have 1 or 2 rows per user, the min and max will only be different if there are 2 rows returned AND they have different values).
Advantage of this is that it avoids joining 2 100000 sets against each other and also only needs to do the ranking once (although you would hope that MySQL would optimise this 2nd issue away anyway).

query to add incremental field based on GROUP BY

Have a table photos
photos.id
photos.user_id
photos.order
A) Is it possible via a single query to group all photos by user and then update the order 1,2,3..N ?
B) added twist, what if some of the photos already have an order value associated? Make sure that the new photos.order never gets repeated and fills in ant orders lower or higher than those existing (as best as possible)
My only thought is just to run a script on this and loop through it and re'order' everything?
photos.id int(10)
photos.created_at datetime
photos.order int(10)
photos.user_id int(10)
Right now data may look like this
user_id = 1
photo_id = 1
order = NULL
user_id = 2
photo_id = 2
order = NULL
user_id = 1
photo_id = 3
order = NULL
the desired result would be
user_id = 1
photo_id = 1
order = 1
user_id = 2
photo_id = 2
order = 1
user_id = 1
photo_id = 3
order = 2
A)
You can use a variable that increments with each row and resets with each user_ID to get the row count.
SELECT ID,
User_ID,
`Order`
FROM ( SELECT #r:= IF(#u = User_ID, #r + 1,1) AS `Order`,
ID,
User_ID,
#u:= User_ID
FROM Photos,
(SELECT #r:= 1) AS r,
(SELECT #u:= 0) AS u
ORDER BY User_ID, ID
) AS Photos
Example on SQL Fiddle
B)
My First solution was to just add Order to the sorting that adds the row number, therefore anything with an Order Gets sorted by its order first, but this only works if your ordering system has no gaps and starts at 1:
SELECT ID,
User_ID,
RowNumber AS `Order`
FROM ( SELECT #r:= IF(#u = User_ID, #r + 1,1) AS `RowNumber`,
ID,
User_ID,
#u:= User_ID
FROM Photos,
(SELECT #i:= 1) AS r,
(SELECT #u:= 0) AS u
ORDER BY User_ID, `Order`, ID
) AS Photos
ORDER BY `User_ID`, `Order`
Example using Order Field
ORDERING WITH GAPS
I have eventually found a way of maintaining the sort order even when there are gaps in the sequence.
SELECT ID, User_ID, `Order`
FROM Photos
WHERE `Order` IS NOT NULL
UNION ALL
SELECT Photos.ID,
Photos.user_ID,
Numbers.RowNum
FROM ( SELECT ID,
User_ID,
#r1:= IF(#u1 = User_ID,#r1 + 1,1) AS RowNum,
#u1:= User_ID
FROM Photos,
(SELECT #r1:= 0) AS r,
(SELECT #u1:= 0) AS u
WHERE `Order` IS NULL
ORDER BY User_ID, ID
) AS Photos
INNER JOIN
( SELECT User_ID,
RowNum,
#r2:= IF(#u2 = User_ID,#r2 + 1,1) AS RowNum2,
#u2:= User_ID
FROM ( SELECT DISTINCT p.User_ID, o.RowNum
FROM Photos AS p,
( SELECT #i:= #i + 1 AS RowNum
FROM INFORMATION_SCHEMA.COLLATION_CHARACTER_SET_APPLICABILITY,
( SELECT #i:= 0) AS i
) AS o
WHERE RowNum <= (SELECT COUNT(*) FROM Photos P1 WHERE p.User_ID = p1.User_ID)
AND NOT EXISTS
( SELECT 1
FROM Photos p2
WHERE p.User_ID = p2.User_ID
AND o.RowNum = p2.`Order`
)
AND p.`Order` IS NULL
ORDER BY User_ID, RowNum
) AS p,
(SELECT #r2:= 0) AS r,
(SELECT #u2:= 0) AS u
ORDER BY user_ID, RowNum
) AS numbers
ON Photos.User_ID = numbers.User_ID
AND photos.RowNum = numbers.RowNum2
ORDER BY User_ID, `Order`
However as you can see this is pretty complicated. This works by treating those with an order value separately to those without. The top query just ranks all photos with no order value in order of ID for each user. The bottom query uses a cross join to generates a sequential list from 1 to n for each user ID (up to the number of entries for each User_ID). So with a data set like this:
ID User_ID Order
1 1 NULL
2 2 NULL
3 1 NULL
4 1 1
5 1 3
6 2 2
7 2 3
It would generate
UserID RowNum
1 1
1 2
1 3
1 4
2 1
2 2
2 3
It then uses NOT EXISTS to elimiate all combinations already used by Photos with a non null order, and ranked in order of RowNum partitioned by User_ID giving
UserID RowNum Rownum2
1 2 1
1 4 2
2 1 1
The RowNum2 value can then be matched with the rownum value achieved in the from subquery, giving the correct order value. Long winded, but it works.
Example on SQL Fiddle
Worked for me. I needed to increment version grouping by 4 fields (host, folder, fileName, status) and sort by 1 (downloadedAtTicks).
This is is my SELECT
SET #status := NULL;
SET #version := NULL;
SELECT
id,
host,
folder,
fileName,
status,
downloadedAtTicks,
version,
IF(IF(status IS NULL, 0, status) = #status, #version := #version + 1, #version := 0) AS varVersion,
#status := IF(status IS NULL, 0, status) AS varStatus
FROM csvsource
ORDER BY host, folder, fileName, status, downloadedAtTicks;
And this is my UPDATE
SET #status := NULL;
SET #version := NULL;
UPDATE
csvsource csv,
(SELECT
id,
IF(IF(status IS NULL, 0, status) = #status, #version := #version + 1, #version := 0) AS varVersion,
#status := IF(status IS NULL, 0, status) AS varStatus
FROM csvsource
ORDER BY host, folder, fileName, status, downloadedAtTicks) AS sub
SET
csv.version = sub.varVersion
WHERE csv.id = sub.id;