SQL query, how to improve? - mysql

I did a task to write an SQL query and I wonder if I can improve it somehow.
Description:
Let's say we have a db on some online service
Let's create tables, and insert some data
create table players (
player_id integer not null unique,
group_id integer not null
);
create table matches (
match_id integer not null unique,
first_player integer not null,
second_player integer not null,
first_score integer not null,
second_score integer not null
);
insert into players values(20, 2);
insert into players values(30, 1);
insert into players values(40, 3);
insert into players values(45, 1);
insert into players values(50, 2);
insert into players values(65, 1);
insert into matches values(1, 30, 45, 10, 12);
insert into matches values(2, 20, 50, 5, 5);
insert into matches values(13, 65, 45, 10, 10);
insert into matches values(5, 30, 65, 3, 15);
insert into matches values(42, 45, 65, 8, 4);
The output of the query should be:
group_id | winner_id
--------------------
1 | 45
2 | 20
3 | 40
So, we should output the winner (player id) of each group. Winner is the player, who got max amount of points in matches.
If user is alone in the group - he's a winner automatically, in case players have equal amount of points - the winner is the one, who has lower id value.
Output should be ordered by group_id field
My solution:
SELECT
results.group_id,
results.winner_id
FROM
(
SELECT
summed.group_id,
summed.player_id AS winner_id,
MAX(summed.sum) AS total_score
FROM
(
SELECT
mapped.player_id,
mapped.group_id,
SUM(mapped.points) AS sum
FROM
(
SELECT
p.player_id,
p.group_id,
CASE WHEN p.player_id = m.first_player THEN m.first_score WHEN p.player_id = m.second_player THEN m.second_score ELSE 0 END AS points
FROM
players AS p
LEFT JOIN matches AS m ON p.player_id = m.first_player
OR p.player_id = m.second_player
) AS mapped
GROUP BY
mapped.player_id
) as summed
GROUP BY
summed.group_id
ORDER BY
summed.group_id
) AS results;
It works, but I'm 99% sure it can be cleaner. Will be thankful for any suggestions

First, use UNION ALL to extract from matches 2 columns: player_id and score for all players and their scores.
Then aggregate to get each player's total score.
Finally do a LEFT join of players to the resultset you obtained, use GROUP_CONCAT() to collect all players of each group in descending order respective to their total score and with SUBSTRING_INDEX() pick the 1st player:
SELECT p.group_id,
SUBSTRING_INDEX(GROUP_CONCAT(p.player_id ORDER BY t.score DESC, p.player_id), ',', 1) winner_id
FROM players p
LEFT JOIN (
SELECT player_id, SUM(score) score
FROM (
SELECT first_player player_id, first_score score FROM matches
UNION ALL
SELECT second_player, second_score FROM matches
) t
GROUP BY player_id
) t ON t.player_id = p.player_id
GROUP BY p.group_id;
See the demo.
Note that, by doing a LEFT join, you get in the results all groups, even the ones that do not have any players that participated in any match (just like your sample data), in which case the winner is an arbitrary player (just like your expected results).

You can unpivot the matches table and sum the points per player (which is I think what you want):
select p.player_id, p.group_id, sum(score) as sum_score
from ((select first_player as player_id, first_score as score
from matches
) union all
(select second_player as player_id, second_score as score
from matches
)
) mp
players p
using (player_id)
group by p.player_id, p.group_id;
Next, you can introduce a window function to get the top:
select player_id, group_id, sum_score
from (select p.player_id, p.group_id, sum(score) as sum_score,
row_number() over (partition by p.group_id order by sum(score) desc p.player_id asc) as seqnum
from ((select first_player as player_id, first_score as score
from matches
) union all
(select second_player as player_id, second_score as score
from matches
)
) mp
players p
using (player_id)
group by p.player_id, p.group_id
) pg
where seqnum = 1;
If you actually want the maximum score over all the matches rather than the sum(), then use max() instead of sum().

Here's another way:
WITH match_records AS (
SELECT match_id,first_player players, first_score scores FROM matches UNION ALL
SELECT match_id,second_player, second_score FROM matches
)
SELECT group_id, player_id
FROM
(SELECT group_id, player_id, players, SUM(scores) ts,
ROW_NUMBER() OVER (PARTITION BY group_id ORDER BY SUM(scores) DESC) pos
FROM players p LEFT JOIN match_records mr ON mr.players=p.player_id
GROUP BY group_id, player_id, players) fp
WHERE pos=1
ORDER BY group_id;
It's basically the same idea as others (to un-pivot the matches table) but with a slightly different operation.
Demo fiddle

Related

MYSQL sum of max score taken users list

I have following table
CREATE TABLE Table1
(`userid` varchar(11), `score` int, `type` varchar(22));
INSERT INTO Table1
(`userid`, `score`,`type`)
VALUES
(11, 2,'leader'),
(11, 6,'leader'),
(13, 6,'leader'),
(15, 4,'leader'),
(15, 4,'leader'),
(12, 1,'leader'),
(14, 1,'leader');
I need to get userid of the maximum score take user.
if the max score is the same for two or more user need to get that userid also.
I have try following query
SELECT userid, sum(score) as totalScore
FROM Table1 WHERE type = "leader" GROUP BY userid
ORDER BY totalScore DESC;
But it gets all user data, cant get the max score take the first two users id.
But I need to get only first two row of data ..
Please help me
On MySQL 8+, I suggest using the RANK() analytic function:
WITH cte AS (
SELECT userid, SUM(score) AS totalScore,
RANK() OVER (ORDER BY SUM(score) DESC) rnk
FROM Table1
WHERE type = 'leader'
GROUP BY userid
)
SELECT userid, totalScore
FROM cte
WHERE rnk = 1;
if you need just top 2 records add limit in your query :
SELECT userid, sum(score) as totalScore
FROM Table1 WHERE type = "leader" GROUP BY userid
ORDER BY totalScore DESC LIMIT 2;

How can I improve the query to make the ranking of players still correct even some of the players haven't play for a while?

I am trying to track the rank change of a player in the leaderboard according to the month and year. Due to that there are some players play no games in some certain time, their rank may be lower during that period.
The simplfied version of the table can be created:
create table rating
(player_id Integer(20) ,
game_id integer(20),
start_date_time date,
rating int (10)
)
INSERT INTO rating (player_id, game_id,start_date_time,rating) VALUES (1, 1,'2019-01-02',1250);
INSERT INTO rating (player_id, game_id,start_date_time,rating) VALUES (1, 2,'2019-01-03',2230);
INSERT INTO rating (player_id, game_id,start_date_time,rating) VALUES (1, 3,'2019-02-04',3362);
INSERT INTO rating (player_id, game_id,start_date_time,rating) VALUES (1, 4,'2019-02-05',1578);
INSERT INTO rating (player_id, game_id,start_date_time,rating) VALUES (2, 5,'2019-01-03',2269);
INSERT INTO rating (player_id, game_id,start_date_time,rating) VALUES (2, 6,'2019-01-05',3641);
INSERT INTO rating (player_id, game_id,start_date_time,rating) VALUES (2, 7,'2019-02-07',1548);
INSERT INTO rating (player_id, game_id,start_date_time,rating) VALUES (2, 8,'2019-02-09',1100);
INSERT INTO rating (player_id, game_id,start_date_time,rating) VALUES (3, 9,'2019-01-03',4690);
INSERT INTO rating (player_id, game_id,start_date_time,rating) VALUES (3, 10,'2019-01-05',3258);
INSERT INTO rating (player_id, game_id,start_date_time,rating) VALUES (3, 11,'2019-01-07',1520);
INSERT INTO rating (player_id, game_id,start_date_time,rating) VALUES (3, 12,'2019-01-09',3652);
The query I used is as followed:
select q1.rating_rank, q1.rating, q1.month,q1.year from (
SELECT player_id,month(start_date_time) as month, year(start_date_time) as year, round(avg(rating),2) as rating, count(*) as games_palyed,
rank() over(
partition by year(start_date_time),month(start_date_time)
order by avg(rating) desc ) as rating_rank
FROM rating
group by player_id,month(start_date_time), year(start_date_time)
having rating is not null) as q1
where player_id=1
The result I got is:
rating_rank rating month year
3 1740.00 1 2019
1 2470.00 2 2019
But the third guy(id=3) is clearly better among them but because he didnt play for february, so the first guy can be ranked no.1.
In this situation i still want the third to be the 1 on the leaderboard. How should I fix this?
I am thinking maybe I can use a period which is 15 days before the date and 15 days after the date instead of the exact month. But I'm not sure how exactly can that to be done?
Thank you.
It took time but i thing you can accomplish with this query:
First you have to generate all combinations between the possible datas and players:
This will give the dates:
WITH recursive mnths as (
select date_add(min(start_date_time),interval -DAY(min(start_date_time))+1 DAY) as mnth ,
date_add(max(start_date_time),interval -DAY(max(start_date_time))+1 DAY) as maxmnth from rating
UNION ALL -- start date begining of next month
SELECT DATE_ADD(mnth, INTERVAL +1 MONTH) , maxmnth
FROM mnths WHERE
mnth < maxmnth
)
An this will bind with all the players:
select * from mnths
cross join (Select distinct player_id from rating) as P
Then, besides the calculation involved you also need to get the rating value until the current period. This will be done by this subquery:
(
SELECT
round(avg(rating),2) as rating
from
rating
where start_date_time < mnths.mnth and player_id = P.player_id
group by player_id,month(start_date_time), year(start_date_time)
having round(avg(rating),2) is not null
order by year(start_date_time) desc, month(start_date_time) desc
limit 1 ) as PrevRating,
That will allow you to rank using not only the current rank but the previous one when it does not exist.
order by CASE WHEN AUX.rating IS NULL THEN case WHEN AUX.PrevRating IS NULL THEN
Binding all together you'll end up with this:
WITH recursive mnths as (
select date_add(min(start_date_time),interval -DAY(min(start_date_time))+1 DAY) as mnth ,
date_add(max(start_date_time),interval -DAY(max(start_date_time))+1 DAY) as maxmnth from rating
UNION ALL -- start date begining of next month
SELECT DATE_ADD(mnth, INTERVAL +1 MONTH) , maxmnth
FROM mnths WHERE
mnth < maxmnth
)
select q1.rating_rank, q1.rating, q1.month,q1.year from (
select AUX.player_id, AUX.month, AUX.year, CASE WHEN AUX.rating IS NULL THEN case WHEN AUX.PrevRating IS NULL THEN 0 ELSE AUX.PrevRating END ELSE AUX.rating END as rating,
AUX.games_palyed,
rank() over(
partition by AUX.month, AUX.year
order by CASE WHEN AUX.rating IS NULL THEN case WHEN AUX.PrevRating IS NULL THEN 0 ELSE AUX.PrevRating END ELSE AUX.rating END desc ) as rating_rank
FROM(
select
P.player_id,
MONTH(mnths.mnth) as month,
YEAR(mnths.mnth) as year,
(
SELECT
round(avg(rating),2) as rating
from
rating
where start_date_time < mnths.mnth and player_id = P.player_id
group by player_id,month(start_date_time), year(start_date_time)
having round(avg(rating),2) is not null
order by year(start_date_time) desc, month(start_date_time) desc
limit 1 ) as PrevRating,
V.rating rating,
case when V.games_palyed IS NULL THEN 0 ELSE V.games_palyed END as games_palyed
from mnths
cross join (Select distinct player_id from rating) as P
LEFT JOIN
(SELECT
player_id,
month(start_date_time) as month,
year(start_date_time) as year,
round(avg(rating),2) as rating,
count(*) as games_palyed
from
rating as R
group by player_id,month(start_date_time), year(start_date_time)
having rating is not null
) V On YEAR(mnths.mnth) = V.year and MONTH(mnths.mnth) = V.month and P.player_id = V.player_id
) as AUX
) as q1
where q1.player_id=1
You can se the result here

Ranking SUM of Value Column Grouped by Date

I have a query which I would like to add a ranking column. My existing query has three tables as a union query, with a sum of the total order value for that week. This query produces the sum of the total order value for that week, grouped by WeekCommencing, however I am struggling to add a ranking column based on the highest to the lowest total value for that week.
My (Updated) SQLFiddle example is here http://sqlfiddle.com/#!9/f1d43/35
CREATE and INSERT statements:
CREATE TABLE IF NOT EXISTS ORD (
WeekCommencing DATE,
Value DECIMAL(20 , 6 ),
Orders INT(6)
);
CREATE TABLE IF NOT EXISTS REF (
WeekCommencing DATE,
Value DECIMAL(20 , 6 ),
Orders INT(6)
);
CREATE TABLE IF NOT EXISTS SOH (
WeekCommencing DATE,
Value DECIMAL(20 , 6 ),
Orders INT(6)
);
INSERT INTO ORD (WeekCommencing, Value, Orders) VALUES
('2017-07-24',1,1),
('2017-07-31',2,1),
('2017-07-17',3,1);
INSERT INTO REF (WeekCommencing, Value, Orders) VALUES
('2017-07-24',4,1),
('2017-07-17',5,1),
('2017-07-31',6,1);
INSERT INTO SOH (WeekCommencing, Value, Orders) VALUES
('2017-07-17',7,1),
('2017-07-24',8,1),
('2017-07-31',9,1);
My best effort to date:
SELECT
WeekCommencing,
SUM(Value) AS 'TotalValue',
SUM(Orders) AS 'Orders',
#r:=#r+1 As 'Rank'
FROM
(SELECT
WeekCommencing, Value, Orders
FROM
ORD
GROUP BY WeekCommencing UNION ALL SELECT
WeekCommencing, Value, Orders
FROM
REF
GROUP BY WeekCommencing UNION ALL SELECT
WeekCommencing, Value, Orders
FROM
SOH
GROUP BY WeekCommencing) t1,
(SELECT #r:=0) Rank
GROUP BY WeekCommencing DESC;
My attempt currently ranks the order of week commencing, rather than the ranking highest to lowest.
My desired result is
WeekCommencing TotalValue Orders Rank
2017-07-31 17 3 1
2017-07-24 13 3 3
2017-07-17 15 3 2
Thanks is advance
SELECT a.*
, #i:=#i+1 rank
FROM
( SELECT weekcommencing
, SUM(value) totalvalue
, COUNT(*) totalorders
FROM
( SELECT weekcommencing, value, orders FROM ord
UNION ALL
SELECT weekcommencing, value, orders FROM ref
UNION ALL
SELECT weekcommencing, value, orders FROM soh
) x
GROUP
BY weekcommencing
) a
, (SELECT #i:=0) vars
ORDER
BY totalvalue DESC;

MYSQL GROUP BY MAX score

I have a table called scores which contains columns
How do I select which id_team is top scorer per game
i m trying with this, but that's not correct result
SELECT MAX( score ) , id_team
FROM scores
GROUP BY `id_game`
LIMIT 0 , 30
You can use a self join to find out the right team id for game a which has max score
SELECT s.*
FROM scores s
JOIN (
SELECT MAX(score) score, id_game
FROM scores
GROUP BY id_game ) ss USING(score ,id_game )
LIMIT 0 , 30
select A.id_game, A.id_team as winning_team
from scores A,
(
select max(score) as max, id_game
from scores
group by id_game
)B
where A.id_game = B.id_game
and A.score = B.max

Composite query

Table with following columns:
Player_id (primary key), Event_type(A,B,C), Points.
1 player may appear many times for every event_type
I would like to show an overall ranking with DESC SUM(Points) GROUP BY player_id from all event-type while putting some conditions:
only best 5 results per player_id for event type A
only best 2 results per player_id for event type B
only best 3 results per player_id for event type C
I have tried in vain :
SUM(points) WHERE event_type ="X"
GROUP BY Player_id ORDER BY SUM(points) LIMIT N
Ive been fighting this headache for a week now, pretty confused when it comes to include sub-queries, UNION, or temp tables. I cant figure out how to put all the pieces together...
My dream would be to get this overall ranking running with the ability to access detailed points breakdown per player upon click....
Open to any kind of help on this one...thanks!
Example of the source table :
player_id------event_type-------score-----
---1-------------------A----------------5----------
---1-------------------A---------------10---------
---1-------------------A----------------5---------
---1-------------------A----------------5---------
---1-------------------A----------------2---------
---1-------------------A----------------15---------
---1-------------------A----------------10---------
---1-------------------C----------------20---------
---1-------------------B----------------5---------
---1-------------------B----------------5---------
---1-------------------B----------------20---------
---2-------------------A----------------50---------
---2-------------------B----------------55---------
Desired output according to this example:
Rank---player_id-------overall_score-----
----1----------2-----------105 POINTS [50 from A(best 5) + 55 from B (best 2)]---------
----2----------1-----------90 POINTS [45 from A(best 5) + 20 from C (best3) + 25 from B (best 2)]---------
First of all: The features you desire are called sliding window and ranking. Oracle implements these with the OVER-keyword and the rank()-function. MySQL does not support these features, so we have to work around this.
I used this answer to create the following query. Give him a +1 too, if this is helpful to you.
SELECT
`player_id`, `event`, `points`,
(SELECT 1 + count(*)
FROM `points`
WHERE `l`.`player_id` = `player_id`
AND `l`.`event` = `event`
AND `points` > `l`.`points`
) AS `rank`
FROM
`points` `l`
This will output for every player_id and event the rank of the points. For example:
Assuming (player_id, event, points) has (1,A,10), (1,A,5), (1,A,2), (1,A,2), (1,A,1), (2,A,0) then the output would be
player_id event points rank
1 A 10 1
1 A 5 2
1 A 2 3
1 A 2 3
1 A 1 5
2 A 0 1
The rank is not dense, so if you have duplicate tuples, you will have output tuples with the same rank as well as gaps in your rank number.
To get the top N* tuples for each player_id and event you could either create a view or use the subquery in the condition. The view is the preferred way, but you don't have the priviledge to create views on many servers.
Creating a view that contains the rank as column.
CREATE VIEW `points_view`
AS SELECT
`player_id`, `event`, `points`,
(SELECT 1 + count(*)
FROM `points`
WHERE `l`.`player_id` = `player_id`
AND `l`.`event` = `event`
AND `points` > `l`.`points`
) as `rank`
FROM
`points` `l`
Get the desired top N results from the view:
SELECT
`player_id`, `event`, `points`
FROM `points_view`
WHERE
`event` = 'A' AND `rank` <= 5
OR
`event` = 'B' AND `rank` <= 2
OR
`event` = 'C' AND `rank` <= 3
Using the rank in the condition
SELECT
`player_id`, `event`, `points`
FROM
`points` `l`
WHERE
(SELECT 1 + count(*)
FROM `points`
WHERE `l`.`player_id` = `player_id`
AND `l`.`event` = `event`
AND `points` > `l`.`points`
) <= N
To further get a different amount of tuples depending on your event, you could do
SELECT
`player_id`, `event`, `points`
FROM
`points` `l`
WHERE
`event` = 'A' AND
(SELECT 1 + count(*)
FROM `points`
WHERE `l`.`player_id` = `player_id`
AND `l`.`event` = `event`
AND `points` > `l`.`points`
) <= 5
OR
`event` = 'B' AND
(SELECT 1 + count(*)
FROM `points`
WHERE `l`.`player_id` = `player_id`
AND `l`.`event` = `event`
AND `points` > `l`.`points`
) <= 2
OR
`event` = 'C' AND
(SELECT 1 + count(*)
FROM `points`
WHERE `l`.`player_id` = `player_id`
AND `l`.`event` = `event`
AND `points` > `l`.`points`
) <= 3
I would just use the maximum of your N's which is 5 and ignore the other tuples for the other event-types as MySQL does not optimize this query which results in 3 separate dependent subqueries. If performance is not an issue or you don't have much data anyways, keep it that way.
* As I explained the rank is not dense, so getting all tuples with rank <= N will generally result in more than N tuples. The additional tuples are duplicates.
Simply removing duplicates is a bad idea as you can see from the example table. If you wanted the top 5 results for player_id = 1 and event = A, you would need both tuples (1,A,2). They both have rank 3. But if you remove one of them, you will only end up with the top 4 results (1,A,10,1), (1,A,5,2), (1,A,2,3), (1,A,1,5).
To get a dense rank you could use this subquery
(SELECT count(DISTINCT `points`)
FROM `points`
WHERE `l`.`player_id` = `player_id`
AND `l`.`event` = `event`
AND `points` >= `l`.`points`
) as `dense_rank`
Be careful as this will still produce duplicate ranks.
Edit
To sum all event's points to one score, use GROUP BY
SELECT
`player_id`, SUM(`points`)
FROM `points_view`
WHERE
`event` = 'A' AND `rank` <= 5
OR
`event` = 'B' AND `rank` <= 2
OR
`event` = 'C' AND `rank` <= 3
GROUP BY `player_id`
ORDER BY SUM(`points`) DESC
Before the partitioning (GROUP BY) the result contains the correct amount of top-scores so you can simply sum all points together.
The big problem you are facing here is that neither rank nor dense_rank will give you the tool get exactly 5 tuples for each player_id and event. For example: If someone got 1000 times 1 point for event A, he will end up with 1000 points as all points will get rank and dense_rank 1.
There is the ROWNUM but again: MySQL does not support this, so we have to emulate this. The problem with ROWNUM is that it will generate a composite numer for all tuples. But we want composite numbers for groups of player_id, event. I'm still working on this solution though.
Edit2
Using this answer I found this solution to work:
select
player_id, sum( points )
from
(
select
player_id,
event,
points,
/* increment current_pos and reset to 0 if player_id or event changes */
#current_pos := if (#current_player = player_id AND
#current_event = event, #current_pos, 0) + 1 as position,
#current_player := player_id,
#current_event := event
from
(select
/* global variable init */
#current_player := null,
#current_event := null,
#current_pos := 0) set_pos,
points
order by
player_id,
event,
points desc
) pos
WHERE
pos.event = 'A' AND pos.position <= 5
OR
pos.event = 'B' AND pos.position <= 2
OR
pos.event = 'C' AND pos.position <= 3
GROUP BY player_id
ORDER BY SUM( points ) DESC
The inner query selects (player_id, event, points)-tuples, sorts them by player_id and event and finally gives each tuple a composite number which is reset to 0 every time either player_id or event changes. Because of the order all tuples with the same player_id will be consecutive. the outer query does the same as the previously used query does with the view.
Edit3 (see comments)
You can create intermediate sums, or different kind of partitions with OLAPs ROLLUP-operator. The query would for example look like this:
select
player_id, event, sum( points )
from
(
select
player_id,
event,
points,
/* increment current_pos and reset to 0 if player_id or event changes */
#current_pos := if (#current_player = player_id AND
#current_event = event, #current_pos, 0) + 1 as position,
#current_player := player_id,
#current_event := event
from
(select
/* global variable init */
#current_player := null,
#current_event := null,
#current_pos := 0) set_pos,
points
order by
player_id,
event,
points desc
) pos
WHERE
pos.event = 'A' AND pos.position <= 5
OR
pos.event = 'B' AND pos.position <= 2
OR
pos.event = 'C' AND pos.position <= 3
GROUP BY player_id, event WITH ROLLUP
/* NO ORDER BY HERE. SEE DOCUMENTATION ON MYSQL's ROLLUP FOR REASON */
The result will now first be grouped by player_id, event, then by only player_id and lastly by null (summing up all rows).
The first groups look like (player_id, event, sum(points)) = {(1, A, 20), (1,B,5)} where 20 and 5 are the sum of the points regarding player_id and event. The second groups look like (player_id, event, sum(points)) = {(1,NULL,25)}. 25 is the sum of all points regarding the player_id. Hope that helps. :-)
You probably need to give the sum(points) a name.
So do:
select player,sum(points) as points from table where event_type = "x" group by player order by points desc limit 5;
(I'd need to see your exact table schema to write this as something you can just drop in, but this is the gist of it)