Composite query - mysql

Table with following columns:
Player_id (primary key), Event_type(A,B,C), Points.
1 player may appear many times for every event_type
I would like to show an overall ranking with DESC SUM(Points) GROUP BY player_id from all event-type while putting some conditions:
only best 5 results per player_id for event type A
only best 2 results per player_id for event type B
only best 3 results per player_id for event type C
I have tried in vain :
SUM(points) WHERE event_type ="X"
GROUP BY Player_id ORDER BY SUM(points) LIMIT N
Ive been fighting this headache for a week now, pretty confused when it comes to include sub-queries, UNION, or temp tables. I cant figure out how to put all the pieces together...
My dream would be to get this overall ranking running with the ability to access detailed points breakdown per player upon click....
Open to any kind of help on this one...thanks!
Example of the source table :
player_id------event_type-------score-----
---1-------------------A----------------5----------
---1-------------------A---------------10---------
---1-------------------A----------------5---------
---1-------------------A----------------5---------
---1-------------------A----------------2---------
---1-------------------A----------------15---------
---1-------------------A----------------10---------
---1-------------------C----------------20---------
---1-------------------B----------------5---------
---1-------------------B----------------5---------
---1-------------------B----------------20---------
---2-------------------A----------------50---------
---2-------------------B----------------55---------
Desired output according to this example:
Rank---player_id-------overall_score-----
----1----------2-----------105 POINTS [50 from A(best 5) + 55 from B (best 2)]---------
----2----------1-----------90 POINTS [45 from A(best 5) + 20 from C (best3) + 25 from B (best 2)]---------

First of all: The features you desire are called sliding window and ranking. Oracle implements these with the OVER-keyword and the rank()-function. MySQL does not support these features, so we have to work around this.
I used this answer to create the following query. Give him a +1 too, if this is helpful to you.
SELECT
`player_id`, `event`, `points`,
(SELECT 1 + count(*)
FROM `points`
WHERE `l`.`player_id` = `player_id`
AND `l`.`event` = `event`
AND `points` > `l`.`points`
) AS `rank`
FROM
`points` `l`
This will output for every player_id and event the rank of the points. For example:
Assuming (player_id, event, points) has (1,A,10), (1,A,5), (1,A,2), (1,A,2), (1,A,1), (2,A,0) then the output would be
player_id event points rank
1 A 10 1
1 A 5 2
1 A 2 3
1 A 2 3
1 A 1 5
2 A 0 1
The rank is not dense, so if you have duplicate tuples, you will have output tuples with the same rank as well as gaps in your rank number.
To get the top N* tuples for each player_id and event you could either create a view or use the subquery in the condition. The view is the preferred way, but you don't have the priviledge to create views on many servers.
Creating a view that contains the rank as column.
CREATE VIEW `points_view`
AS SELECT
`player_id`, `event`, `points`,
(SELECT 1 + count(*)
FROM `points`
WHERE `l`.`player_id` = `player_id`
AND `l`.`event` = `event`
AND `points` > `l`.`points`
) as `rank`
FROM
`points` `l`
Get the desired top N results from the view:
SELECT
`player_id`, `event`, `points`
FROM `points_view`
WHERE
`event` = 'A' AND `rank` <= 5
OR
`event` = 'B' AND `rank` <= 2
OR
`event` = 'C' AND `rank` <= 3
Using the rank in the condition
SELECT
`player_id`, `event`, `points`
FROM
`points` `l`
WHERE
(SELECT 1 + count(*)
FROM `points`
WHERE `l`.`player_id` = `player_id`
AND `l`.`event` = `event`
AND `points` > `l`.`points`
) <= N
To further get a different amount of tuples depending on your event, you could do
SELECT
`player_id`, `event`, `points`
FROM
`points` `l`
WHERE
`event` = 'A' AND
(SELECT 1 + count(*)
FROM `points`
WHERE `l`.`player_id` = `player_id`
AND `l`.`event` = `event`
AND `points` > `l`.`points`
) <= 5
OR
`event` = 'B' AND
(SELECT 1 + count(*)
FROM `points`
WHERE `l`.`player_id` = `player_id`
AND `l`.`event` = `event`
AND `points` > `l`.`points`
) <= 2
OR
`event` = 'C' AND
(SELECT 1 + count(*)
FROM `points`
WHERE `l`.`player_id` = `player_id`
AND `l`.`event` = `event`
AND `points` > `l`.`points`
) <= 3
I would just use the maximum of your N's which is 5 and ignore the other tuples for the other event-types as MySQL does not optimize this query which results in 3 separate dependent subqueries. If performance is not an issue or you don't have much data anyways, keep it that way.
* As I explained the rank is not dense, so getting all tuples with rank <= N will generally result in more than N tuples. The additional tuples are duplicates.
Simply removing duplicates is a bad idea as you can see from the example table. If you wanted the top 5 results for player_id = 1 and event = A, you would need both tuples (1,A,2). They both have rank 3. But if you remove one of them, you will only end up with the top 4 results (1,A,10,1), (1,A,5,2), (1,A,2,3), (1,A,1,5).
To get a dense rank you could use this subquery
(SELECT count(DISTINCT `points`)
FROM `points`
WHERE `l`.`player_id` = `player_id`
AND `l`.`event` = `event`
AND `points` >= `l`.`points`
) as `dense_rank`
Be careful as this will still produce duplicate ranks.
Edit
To sum all event's points to one score, use GROUP BY
SELECT
`player_id`, SUM(`points`)
FROM `points_view`
WHERE
`event` = 'A' AND `rank` <= 5
OR
`event` = 'B' AND `rank` <= 2
OR
`event` = 'C' AND `rank` <= 3
GROUP BY `player_id`
ORDER BY SUM(`points`) DESC
Before the partitioning (GROUP BY) the result contains the correct amount of top-scores so you can simply sum all points together.
The big problem you are facing here is that neither rank nor dense_rank will give you the tool get exactly 5 tuples for each player_id and event. For example: If someone got 1000 times 1 point for event A, he will end up with 1000 points as all points will get rank and dense_rank 1.
There is the ROWNUM but again: MySQL does not support this, so we have to emulate this. The problem with ROWNUM is that it will generate a composite numer for all tuples. But we want composite numbers for groups of player_id, event. I'm still working on this solution though.
Edit2
Using this answer I found this solution to work:
select
player_id, sum( points )
from
(
select
player_id,
event,
points,
/* increment current_pos and reset to 0 if player_id or event changes */
#current_pos := if (#current_player = player_id AND
#current_event = event, #current_pos, 0) + 1 as position,
#current_player := player_id,
#current_event := event
from
(select
/* global variable init */
#current_player := null,
#current_event := null,
#current_pos := 0) set_pos,
points
order by
player_id,
event,
points desc
) pos
WHERE
pos.event = 'A' AND pos.position <= 5
OR
pos.event = 'B' AND pos.position <= 2
OR
pos.event = 'C' AND pos.position <= 3
GROUP BY player_id
ORDER BY SUM( points ) DESC
The inner query selects (player_id, event, points)-tuples, sorts them by player_id and event and finally gives each tuple a composite number which is reset to 0 every time either player_id or event changes. Because of the order all tuples with the same player_id will be consecutive. the outer query does the same as the previously used query does with the view.
Edit3 (see comments)
You can create intermediate sums, or different kind of partitions with OLAPs ROLLUP-operator. The query would for example look like this:
select
player_id, event, sum( points )
from
(
select
player_id,
event,
points,
/* increment current_pos and reset to 0 if player_id or event changes */
#current_pos := if (#current_player = player_id AND
#current_event = event, #current_pos, 0) + 1 as position,
#current_player := player_id,
#current_event := event
from
(select
/* global variable init */
#current_player := null,
#current_event := null,
#current_pos := 0) set_pos,
points
order by
player_id,
event,
points desc
) pos
WHERE
pos.event = 'A' AND pos.position <= 5
OR
pos.event = 'B' AND pos.position <= 2
OR
pos.event = 'C' AND pos.position <= 3
GROUP BY player_id, event WITH ROLLUP
/* NO ORDER BY HERE. SEE DOCUMENTATION ON MYSQL's ROLLUP FOR REASON */
The result will now first be grouped by player_id, event, then by only player_id and lastly by null (summing up all rows).
The first groups look like (player_id, event, sum(points)) = {(1, A, 20), (1,B,5)} where 20 and 5 are the sum of the points regarding player_id and event. The second groups look like (player_id, event, sum(points)) = {(1,NULL,25)}. 25 is the sum of all points regarding the player_id. Hope that helps. :-)

You probably need to give the sum(points) a name.
So do:
select player,sum(points) as points from table where event_type = "x" group by player order by points desc limit 5;
(I'd need to see your exact table schema to write this as something you can just drop in, but this is the gist of it)

Related

how to implement two aggregate functions on the same column mysql

SELECT max(sum(`orderquantity`)), `medicinename`
FROM `orerdetails`
WHERE `OID`=
(
SELECT `OrderID`
FROM `order`
where `VID` = 5 AND `OrerResponse` = 1
)
GROUP BY `medicinename`
i want to get the max of the result(sum of the order quantity) but it gives error any soultion to solve this
You don't need Max() here. Instead sort your recordset by that Sum('orderquantity') descending, and take the first record returned:
SELECT sum(`orderquantity`) as sumoforderqty, `medicinename`
FROM `orerdetails`
WHERE `OID`=
(
SELECT `OrderID`
FROM `order`
where `VID` = 5 AND `OrerResponse` = 1
)
GROUP BY `medicinename`
ORDER BY sumoforderqty DESC
LIMIT 1

MySQL - Limit Field To 5 Maximum Occurrences

Background:
I run a platform which allows users to follow creators and view their content.
The following query successfully displays 50 posts ordered by popularity. There is also some other logic to not show posts the user has already saved/removed, but that is not relevant for this question.
Problem:
If one creator is particularly popular (high popularity), the top 50 posts returned will nearly all be by that creator.
This skews the results as ideally the 50 posts returned will not be in favor of one particular author.
Question:
How can I limit it so the author (which uses the field posted_by) is returned no more than 5 times. It could be less, but definitely no more than 5 times should one particular author be returned.
It should still be finally ordered by popularity DESC
SELECT *
FROM `source_posts`
WHERE `posted_by` IN (SELECT `username`
FROM `source_accounts`
WHERE `id` IN (SELECT `sourceid`
FROM `user_source_accounts`
WHERE `profileid` = '100'))
AND `id` NOT IN (SELECT `postid`
FROM `user_posts_removed`
WHERE `profileid` = '100')
AND `live` = '1'
AND `added` >= Date_sub(Now(), INTERVAL 1 month)
AND `popularity` > 1
ORDER BY `popularity` DESC
LIMIT 50
Thank you.
Edit:
I am using MySQL version 5.7.24, so unfortunately the row_number() function will not work in this instance.
In MySQL 8+, you would simply use row_number():
select sp.*
from (select sp.*,
row_number() over (partition by posted_by order by popularity desc) as seqnum
from source_posts sp
) sp
where seqnum <= 5
order by popularity desc
limit 50;
I'm not sure what the rest of your query is doing, because it is not described in your question. You can, of course, add additional filtering criteria or joins.
EDIT:
In earlier versions, you can use variables:
select sp.*
from (select sp.*,
(#rn := if(#p = posted_by, #rn + 1,
if(#p := posted_by, 1, 1)
)
) as rn
from (select sp.*
from source_posts sp
order by posted_by, popularity desc
) sp cross join
(select #p := '', #rn := 0) params
) sp
where rn <= 5
order by popularity desc
limit 50;
Could try the row number function. Using that, it would assign each employee a distinct "id." So if one employee had 50 records, only those with a row_number (named as "rank") less than or equal to 5 would be returned.
Select *
from(
SELECT `source_posts.*`, row_number() over (partition by `username` order by `popularity` desc) as rank
FROM `source_posts`
WHERE `posted_by` IN (SELECT `username`
FROM `source_accounts`
WHERE `id` IN (SELECT `sourceid`
FROM `user_source_accounts`
WHERE `profileid` = '100'))
AND `id` NOT IN (SELECT `postid`
FROM `user_posts_removed`
WHERE `profileid` = '100')
AND `live` = '1'
AND `added` >= Date_sub(Now(), INTERVAL 1 month)
AND `popularity` > 1
ORDER BY `popularity` DESC
LIMIT 50 `enter code here`)
where rank <= 5

MySQL: Effectively group into equally-sized buckets by data

Suppose I have a table of Players, each player has a score, and now I want to divide all players into levels of equal size, based on their score, so if I have n players, level 1 will have the first n/10 players with the highest score, level 2 will have the next n/10, and so on.
I have come up with a query:
UPDATE Players SET Level=? WHERE PlayerID IN (
SELECT * FROM (
SELECT PlayerID FROM Players ORDER BY Score DESC, PlayerID ASC LIMIT ?,?
) AS T1
);
Where I run this 10 times, with the first parameter running from 1-10, the second is 0, n/10, 2*n/10, ... and the third is always n/10.
This works, but it takes quite a long time. Trying to get a better result, I have created a temporary table:
CREATE TEMPORARY TABLE TempTable (
IDX INT UNSIGNED NOT NULL AUTO_INCREMENT,
ID INT UNSIGNED NOT NULL,
PRIMARY KEY (IDX)
) ENGINE=MEMORY;
INSERT INTO TempTable (ID) SELECT PlayerID FROM Players ORDER BY Score DESC, PlayerID ASC;
Then I run ten times:
UPDATE Players SET Level=? WHERE PlayerID IN (
SELECT * FROM TempTable WHERE IDX BETWEEN ? AND ?
);
With the appropriate parameters, and finally:
DROP TABLE TempTable;
However, this runs even slower. So is there a more efficient way to do this in MySQL? I've found this answer, but it appears NTILE is not available in MySQL.
Note: Players have an index on PlayerID (Primary key) and on Score, although running without index on Score doesn't seem to make much of a difference. The reason I sort also by PlayerID is so I have well-defined (consistent) behavior in case of ties.
You could try using a ranking function. This is what I'd use:
SELECT PlayerID,
score,
#levelLimit,
#counter := #counter + 1 AS counter,
#level := IF(#counter % #levelLimit = 0, #level:= #level + 1, #level) as level
FROM Players,
(SELECT #counter := 0) a,
(SELECT #levelLimit := round(count(*)/4 -- number of groups you want to end with
, 0)
FROM Players) b,
(SELECT #level := 1) c
ORDER BY Score DESC,
PlayerID ASC
;
To update the table:
UPDATE Players join (
SELECT PlayerID,
score,
#levelLimit, #counter := #counter + 1 AS counter,
#level := IF(#counter % #levelLimit = 0, #level:= #level + 1, #level) AS level
FROM Players,
(SELECT #counter := 0) a,
(SELECT #levelLimit := round(count(*)/4 -- number of clusters
, 0)
FROM Players) b,
(SELECT #level := 1) c
ORDER BY Score DESC,
PlayerID ASC
) as a on a.PlayerID = Players.PlayerID
SET Players.level = a.level
http://sqlfiddle.com/#!9/7f55f9/3
The reason that your query is slow is because of this limit bit at the end:
SELECT PlayerID FROM Players ORDER BY Score DESC, PlayerID ASC LIMIT ?,?
Without an offset, limit you would be doing a table scan in ten steps. With a offset,limit You are doing it several times over! Essentially to get the offset the whole set of data has to be sorted and then only can mysql move to the data of interest. My suggestion is to avoid limit clause entire by breaking up the field into levels based on their scores.
For example if you have 10 levels, you could do a simple query to get
SELECT max(score), min(score) from ...
and then split the fields into 10 equals levels based on the difference of the highest and lovest score. If like stack overflow you have millions of users with a score of one, instead of min you can choose an arbitary number of the lowest bound.
then
UPDATE Players SET Level=? WHERE PlayerID IN (
SELECT * FROM (
SELECT PlayerID FROM score < level_upper_bound and score > leve_lower bound ) AS T1
);
You would still be doing a table scan in 10 steps, but now there is only one table scan and not 10

Trying to use ID in MySQL SubSubQuery

So I'll show you what I'm trying to do and explain my problem, there may be an answer different to the approach I'm trying to take.
The query I'm trying to perform is as follows:
SELECT *
FROM report_keywords rk
WHERE rk.report_id = 231
AND (
SELECT SUM(t.conv) FROM (
SELECT conv FROM report_keywords t2 WHERE t2.campaign_id = rk.campaign_id ORDER BY conv DESC LIMIT 10
) t
) >= 30
GROUP BY rk.campaign_id
The error I get is
Unknown column 'rk.campaign_id' in 'where clause'
Obviously this is saying that the table alias rk is not making it to the subsubquery. What I'm trying to do is get all of the campaigns where the sum of the top 10 conversions is greater than or equal to 30.
The relevant table structure is:
id INT,
report_id INT,
campaign_id INT,
conv INT
Any help would be greatly appreciated.
Update
Thanks to Kickstart I was able to do what I wanted. Here's my final query:
SELECT campaign_id, SUM(conv) as sum_conv
FROM (
SELECT campaign_id, conv, #Sequence := if(campaign_id = #campaign_id, #Sequence + 1, 1) AS aSequence, #campaign_id := campaign_id
FROM report_keywords
CROSS JOIN (SELECT #Sequence := 0, #campaign_id := 0) Sub1
WHERE report_id = 231
ORDER BY campaign_id, conv DESC
) t
WHERE aSequence <= 10
GROUP BY campaign_id
HAVING sum_conv >= 30
Possibly use a user variable to add a sequence number to get the latest 10 records for each one, then use SUM to get the count of those.
Something like this:-
SELECT rk.*
FROM report_keywords rk
INNER JOIN
(
SELECT campaign_id, SUM(conv) AS SumConv
FROM
(
SELECT campaign_id, conv, #Sequence := if(campaign_id = #campaign_id, #Sequence + 1, 1) AS aSequence, #campaign_id := campaign_id
FROM report_keywords
CROSS JOIN (SELECT #Sequence := 0, #campaign_id := "") Sub1
ORDER BY campaign_id, conv
) Sub2
WHERE aSequence <= 10
GROUP BY campaign_id
) Sub3
ON rk.campaign_id = Sub3.campaign_id AND Sub3.SumConv >= 30
WHERE rk.report_id = 231

Checking for maximum length of consecutive days which satisfy specific condition

I have a MySQL table with the structure:
beverages_log(id, users_id, beverages_id, timestamp)
I'm trying to compute the maximum streak of consecutive days during which a user (with id 1) logs a beverage (with id 1) at least 5 times each day. I'm pretty sure that this can be done using views as follows:
CREATE or REPLACE VIEW daycounts AS
SELECT count(*) AS n, DATE(timestamp) AS d FROM beverages_log
WHERE users_id = '1' AND beverages_id = 1 GROUP BY d;
CREATE or REPLACE VIEW t AS SELECT * FROM daycounts WHERE n >= 5;
SELECT MAX(streak) AS current FROM ( SELECT DATEDIFF(MIN(c.d), a.d)+1 AS streak
FROM t AS a LEFT JOIN t AS b ON a.d = ADDDATE(b.d,1)
LEFT JOIN t AS c ON a.d <= c.d
LEFT JOIN t AS d ON c.d = ADDDATE(d.d,-1)
WHERE b.d IS NULL AND c.d IS NOT NULL AND d.d IS NULL GROUP BY a.d) allstreaks;
However, repeatedly creating views for different users every time I run this check seems pretty inefficient. Is there a way in MySQL to perform this computation in a single query, without creating views or repeatedly calling the same subqueries a bunch of times?
This solution seems to perform quite well as long as there is a composite index on users_id and beverages_id -
SELECT *
FROM (
SELECT t.*, IF(#prev + INTERVAL 1 DAY = t.d, #c := #c + 1, #c := 1) AS streak, #prev := t.d
FROM (
SELECT DATE(timestamp) AS d, COUNT(*) AS n
FROM beverages_log
WHERE users_id = 1
AND beverages_id = 1
GROUP BY DATE(timestamp)
HAVING COUNT(*) >= 5
) AS t
INNER JOIN (SELECT #prev := NULL, #c := 1) AS vars
) AS t
ORDER BY streak DESC LIMIT 1;
Why not include user_id in they daycounts view and group by user_id and date.
Also include user_id in view t.
Then when you are queering against t add the user_id to the where clause.
Then you don't have to recreate your views for every single user you just need to remember to include in your where clause.
That's a little tricky. I'd start with a view to summarize events by day:
CREATE VIEW BView AS
SELECT UserID, BevID, CAST(EventDateTime AS DATE) AS EventDate, COUNT(*) AS NumEvents
FROM beverages_log
GROUP BY UserID, BevID, CAST(EventDateTime AS DATE)
I'd then use a Dates table (just a table with one row per day; very handy to have) to examine all possible date ranges and throw out any with a gap. This will probably be slow as hell, but it's a start:
SELECT
UserID, BevID, MAX(StreakLength) AS StreakLength
FROM
(
SELECT
B1.UserID, B1.BevID, B1.EventDate AS StreakStart, DATEDIFF(DD, StartDate.Date, EndDate.Date) AS StreakLength
FROM
BView AS B1
INNER JOIN Dates AS StartDate ON B1.EventDate = StartDate.Date
INNER JOIN Dates AS EndDate ON EndDate.Date > StartDate.Date
WHERE
B1.NumEvents >= 5
-- Exclude this potential streak if there's a day with no activity
AND NOT EXISTS (SELECT * FROM Dates AS MissedDay WHERE MissedDay.Date > StartDate.Date AND MissedDay.Date <= EndDate.Date AND NOT EXISTS (SELECT * FROM BView AS B2 WHERE B1.UserID = B2.UserID AND B1.BevID = B2.BevID AND MissedDay.Date = B2.EventDate))
-- Exclude this potential streak if there's a day with less than five events
AND NOT EXISTS (SELECT * FROM BView AS B2 WHERE B1.UserID = B2.UserID AND B1.BevID = B2.BevID AND B2.EventDate > StartDate.Date AND B2.EventDate <= EndDate.Date AND B2.NumEvents < 5)
) AS X
GROUP BY
UserID, BevID