MySQL: Greatest n per group with joins and conditions

MySQL: Greatest n per group with joins and conditions - mysql

Table Structure
I have a table similar to the following:
venues
The following table describes a list of businesses
id name
50 Nando's
60 KFC
rewards
The table describes a number of rewards, the venue it corresponds to and the number of points needed to redeem the reward.
id venue_id name points
1 50 5% off 10
2 50 10% off 20
3 50 11% off 30
4 50 15% off 40
5 50 20% off 50
6 50 30% off 50
7 60 30% off 70
8 60 60% off 100
9 60 65% off 120
10 60 70% off 130
11 60 80% off 140
points_data
The table describes the number of points remaining a user has for each venue.
venue_id points_remaining
50 30
60 90
Note that this query is actually computed within SQL like so:
select * from (
select venue_id, (total_points - points_redeemed) as points_remaining
from (
select venue_id, sum(total_points) as total_points, sum(points_redeemed) as points_redeemed
from (
(
select venue_id, sum(points) as total_points, 0 as points_redeemed
from check_ins
group by venue_id
)
UNION
(
select venue_id, 0 as total_points, sum(points) as points_redeemed
from reward_redemptions rr
join rewards r on rr.reward_id = r.id
group by venue_id
)
) a
group by venue_id
) b
GROUP BY venue_id
) points_data
but for this question you can probably just ignore that massive query and assume the table is just called points_data.
Desired Output
I want to get a single query that gets:
The top 2 rewards the user is eligible for each venue
The lowest 2 rewards the user is not yet eligible for for each venue
So for the above data, the output would be:
id venue_id name points
2 50 10% off 20
3 50 11% off 30
4 50 15% off 40
5 50 20% off 50
7 60 30% off 70
8 60 60% off 100
9 60 65% off 120
What I got so far
The best solution I found so far is first getting the points_data, and then using code (i.e. PHP) to dynamically write the following:
(
select * from rewards
where venue_id = 50
and points > 30
ORDER BY points desc
LIMIT 2
)
union all
(
select * from rewards
where venue_id = 50
and points <= 30
ORDER BY points desc
LIMIT 2
)
UNION ALL
(
select * from rewards
where venue_id = 60
and points <= 90
ORDER BY points desc
LIMIT 2
)
UNION ALL
(
select * from rewards
where venue_id = 60
and points > 90
ORDER BY points desc
LIMIT 2
)
ORDER BY venue_id, points asc;
However, I feel the query can get a bit too long and in-efficient. For example, if a user has points in 400 venues, that is 800 sub-queries.
I tried also doing a join like so, but can't really get better than:
select * from points_data
INNER JOIN rewards on rewards.venue_id = points_data.venue_id
where points > points_remaining;
which is far from what I want.

Correlated subqueries counting the number of higher or lower rewards to determine the top or bottom entries are one way.
SELECT r1.*
FROM rewards r1
INNER JOIN points_data pd1
ON pd1.venue_id = r1.venue_id
WHERE r1.points <= pd1.points_remaining
AND (SELECT count(*)
FROM rewards r2
WHERE r2.venue_id = r1.venue_id
AND r2.points <= pd1.points_remaining
AND (r2.points > r1.points
OR r2.points = r1.points
AND r2.id > r1.id)) < 2
OR r1.points > pd1.points_remaining
AND (SELECT count(*)
FROM rewards r2
WHERE r2.venue_id = r1.venue_id
AND r2.points > pd1.points_remaining
AND (r2.points < r1.points
OR r2.points = r1.points
AND r2.id < r1.id)) < 2
ORDER BY r1.venue_id,
r1.points;
SQL Fiddle
Since MySQL 8.0 a solution using the row_number() window function would be an alternative. But I suppose you are on a lower version.
SELECT x.id,
x.venue_id,
x.name,
x.points
FROM (SELECT r.id,
r.venue_id,
r.name,
r.points,
pd.points_remaining,
row_number() OVER (PARTITION BY r.venue_id,
r.points <= pd.points_remaining
ORDER BY r.points DESC) rntop,
row_number() OVER (PARTITION BY r.venue_id,
r.points > pd.points_remaining
ORDER BY r.points ASC) rnbottom
FROM rewards r
INNER JOIN points_data pd
ON pd.venue_id = r.venue_id) x
WHERE x.points <= x.points_remaining
AND x.rntop <= 2
OR x.points > x.points_remaining
AND x.rnbottom <= 2
ORDER BY x.venue_id,
x.points;
db<>fiddle
The tricky part is here to partition the set also into the subset where the points of the user are enough to redeem the reward and the one where the points aren't enough, per venue. But as in MySQL logical expressions evaluate to 0 or 1 (in non Boolean context), the respective expressions can be used for that.

Related

MySQL order by distinct items first

I have a table of item possessions which is something like this:
item_possession
===============
id product_id status
1 50 (weapon) available
2 50 (weapon) unavailable
3 10 (shield) unavailable
4 10 (shield) unavailable
5 50 (weapon) available
6 20 (helmet) available
7 20 (helmet) available
8 50 (weapon) available
9 50 (weapon) available
10 30 (thunder) unavailable
11 20 (helmet) available
Note: This is a game and I can own duplicated products (I can sell them), please note that I can have different rows referencing the same item.
Is it possible to order my item possessions listing the distinct item first (I don't care the order, I can just use the table ID) and the duplicates at the end?
Something like this:
item_possession
===============
id product_id status
1 50 (weapon) available
3 10 (shield) unavailable
6 20 (helmet) available
8 30 (thunder) unavailable
2 50 (weapon) unavailable
4 10 (shield) unavailable
5 50 (weapon) available
7 20 (helmet) available
8 50 (weapon) available
9 50 (weapon) available
10 30 (thunder) unavailable
11 20 (helmet) available

or, old school...
SELECT x.*
FROM item_possession x
JOIN item_possession y
ON y.product_id = x.product_id
AND y.id <= x.id
GROUP
BY x.id
ORDER
BY COUNT(*)
, id;
EDIT: Actually, you seem to want this...
SELECT x.*
FROM item_possession x
LEFT
JOIN
( SELECT MIN(id) id FROM item_possession GROUP BY product_id ) y
ON y.id = x.id
ORDER
BY y.id IS NULL,x.id;
For further help, see Why should I provide an MCRE for what seems to me to be a very simple SQL query

You can use window functions for this:
order by row_number() over (partition by product_id order by id)
There is no need to put this in the select. You can also do this using a subquery in older versions:
order by (select count(*)
from item_possession t2
where t2.product_id = t.product_id and t2.id <= t.id
)
Alternatively, you can use variables -- which requires sorting twice:
select ip.*
from (select ip.*,
(#rn := if(#p = ip.product_id, #rn + 1,
if(#p := ip.product_id, 1, 1)
)
) as rn
from (select ip.* from item_possession ip order by ip.product_id, ip.id
) ip cross join
(select #p := -1, #rn := 0) params
) ip
order by rn, id;

You can use row_number()
select t.*, row_number() over (partition by product_id order by id) as seq
from table t
order by seq, id;

Trouble using group by to get a max value across two tables

I have been trying to solve a problem for a very long time- days- and I am not making any progress. Basically, I have two tables, players and matches. Each player in players has a unique player_id, as well as a group_id that identifies which group he/she belongs to. Each match in matches has the player_ids of two players in it, first_player and second_player, who are always from the same group. first_score corresponds to the score that first_player scores and second_score corresponds to the score that second_player scores. A match is won by who ever scores more. Here are the two tables:
create table players (
player_id integer not null unique,
group_id integer not null
);
create table matches (
match_id integer not null unique,
first_player integer not null,
second_player integer not null,
first_score integer not null,
second_score integer not null
);
Now what I am trying to do is to get the players with the most wins from each group, their group ID as well as the number of wins. So, for example, if there are three groups, the result would be something like:
Group Player Wins
1 24 23
2 13 25
3 34 20
Here's what I have right now
SELECT p1.group_id AS Group, p1.player_id AS Player, COUNT(*) AS Wins
FROM players p1, matches m1
WHERE (m1.first_player = p1.player_id AND m1.first_score > m1.second_score)
OR (m1.second_player = p1.player_id AND m1.second_score > m1.first_score)
GROUP BY p1.group_id
HAVING COUNT(*) >= (
SELECT COUNT(*)
FROM players p2, matches m2
WHERE p2.group_id = p1.group_id AND
((m2.first_player = p2.player_id AND m2.first_score > m2.second_score)
OR (m2.second_player = p2.player_id AND m2.second_score > m2.first_score))
)
My idea is to only select players whose wins are greater than, or equal to, the wins of all other players in his group. There is some syntactic problem with my query. I think I am using GROUP BY incorrectly as well.
There is also the issue of a tie in the number of wins, where I should just get the player with the least player_id. But I haven't even gotten to that point yet. I would really appreciate your help, thanks!
EDIT 1
I have a few sample data that I am running my query against.
SELECT * FROM players gives me this:
Player_ID Group_ID
100 1
200 1
300 1
400 2
500 2
600 3
700 3
SELECT * FROM matches gives me this:
match_id first_player second_player first_score second_score
1 100 200 10 20
2 200 300 30 20
3 400 500 30 10
4 500 400 20 20
5 600 700 20 10
So, the query should return:
Group Player Wins
1 200 2
2 400 1
3 600 1
Running the query as is returns the following error:
ERROR: column "p1.player_id" must appear in the GROUP BY clause or be used in an aggregate function
Now I understand that I have to specify player_id in the GROUP BY clause if I want to use it in the SELECT (or HAVING) statement, but I do not wish to group by player ID, only by the group ID.
Even if I do add p1.player_id to GROUP BY in my outer query, I get...the correct answer actually. But I am a bit confused. Doesn't Group By aggregate the table according to that column? Logically speaking, I only want to group by p1.group_id.
Also, if I were to have multiple players in a group with the highest number of wins, how can I just keep the one with the lowest player_id?
Edit 2
If I change the matches table to such that for Group 1, there are two players with 1 win each, the query result omits Group 1 from the result altogether.
So, if my matches table is:
match_id first_player second_player first_score second_score
1 100 200 10 20
2 200 300 10* 20
3 400 500 30 10
4 500 400 20 20
5 600 700 20 10
I would expect the result to be
Group Player Wins
1 200 1
1 300 1
2 400 1
3 600 1
However, I get the following:
Group Player Wins
2 400 1
3 600 1
Note that the desired result is
Group Player Wins
1 200 1
2 400 1
3 600 1
Since I wish to only take the player with the least player_id in the case of a draw.

WITH first_players AS (
SELECT group_id,player_id,SUM(first_score) AS scores FROM players p LEFT JOIN matches m ON p.player_id=m.first_player GROUP BY group_id,player_id
),
second_players AS (
SELECT group_id,player_id,SUM(second_score) AS scores FROM players p LEFT JOIN matches m ON p.player_id=m.second_player GROUP BY group_id,player_id
),
all_players AS (
WITH al AS (
SELECT group_id, player_id, scores FROM first_players
UNION ALL
SELECT group_id, player_id, scores FROM second_players
)
SELECT group_id, player_id,COALESCE(SUM(scores),0) AS scores FROM al GROUP BY group_id, player_id
),
players_rank AS (
SELECT *,
ROW_NUMBER() OVER(PARTITION BY group_id ORDER BY scores DESC, player_id ASC) AS score_rank,
ROW_NUMBER() OVER(PARTITION BY scores ORDER BY player_id ASC) AS id_rank FROM all_players ORDER BY group_id
)
SELECT group_id, player_id AS winner_id FROM players_rank WHERE score_rank=1 AND id_rank=1
Results
group_id winner_id
1 45
2 20
3 40
Try it Out

try like below
with cte as
(
select p.Group_ID,t1.winplayer,t1.numberofwin
row_number()over(partition by p.Group_ID order by t1.numberofwin desc,t1.winplayer) rn from players p join
(
SELECT count(*) as numberofwin,
case when first_score >second_score then first_player
else second_player end as winplayer
FROM matches group by case when first_score >second_score then first_player
else second_player end
) t1 on p.Player_ID =t1.winplayer
) select * from cte where rn=1

It works when you add the player_id in the GROUP BY because you know each player plays only in one group. So you group by the player in a certain group. That is why, logically, you can add the player_id to the GROUP BY.

MySQL: Range based on rows in external table

I am using MySQL to solve this problem. I need to give points to a user based on the total time spent by him on a question. I have calculated the time spent by the user. Let's say it is in user_time table.
user_id question_id time_spent
1 1 7
1 2 50
2 1 11
My points are range based:
[0-10) seconds: 100 points,
[10-20) seconds: 300 points,
[20-30) seconds: 500 points,
[30, inf): 1000 points
Exact 10 seconds will fetch me 300 points. Though, the changes of an exact number would be low given that I am computing from the system clock difference.
This information is currently scored in an external table points_table
time_spent points
0 100
10 300
20 500
30 1000
I need a query which finds out which range the seconds belong to and give me that result.
user_id question_id points
1 1 100
1 2 1000
2 1 300
I tried thinking of different type of joins but couldn't think of one which will answer this specific requirement.

I think the easiest approach is a correlated subquery. Something like this:
select ut.*,
(select pt.points
from points_table pt
where pt.time_spent <= ut.time_spent
order by pt.time_spent desc
limit 1
) as points
from user_time ut

Try this:
SELECT ut.user_id, ut.time_spent, A.points
FROM user_time ut
INNER JOIN (SELECT p1.time_spent AS time_spent1,
p2.time_spent AS time_spent2,
p1.points
FROM points_table p1
INNER JOIN points_table p2 ON p1.time_spent < p2.time_spent
GROUP BY p1.time_spent
) AS A ON ut.time_spent BETWEEN A.time_spent1 AND A.time_spent2

For another take on this, you could achieve the same result without having the points table:
SELECT *,
CASE
WHEN time_spent >= 30 THEN 1000
WHEN time_spent >= 20 THEN 500
WHEN time_spent >= 10 THEN 300
ELSE 100
END 'Points'
FROM user_time;

Missing records from one table in SQL Server 2008R2

Table 1:
Date PlacementID CampaignID Impressions
04/01/2014 100 10 1000
04/01/2014 101 10 1500
04/01/2014 100 11 500
Table 2:
Date PlacementID CampaignID Cost
04/01/2014 100 10 5000
04/01/2014 101 10 6000
04/01/2014 100 11 7000
04/01/2014 103 10 8000
When I have joined this table using Full Join and Left Join statement, I am not able to get uncommon record which is last row in table2 that display PlacementID 103 and campaignID 10 and Cost 8000. However I have searched all raw data and file but this missing records are not common between two sources. However, I want to include this records in final table. How can I do that? This two table are two different source and I have got results only common records.
Moreover, when I found out that missing value is exact value that are required in final figure so want to include every thing. I am including my SQL script below:
SELECT A.palcementid,
A.campaignid,
A.date,
Sum(A.impressions) AS Impressions,
Sum(CASE
WHEN C.placement_count > 1 THEN ( B.cost / C.placement_count )
ELSE B.cost
END) AS Cost
FROM table1 A
FULL JOIN table2 B
ON A.placementid = B.placementid
AND A.campaignid = B.campaignid
AND A.date = B.date
LEFT JOIN (SELECT Count(A.placementid) AS Placement_Count,
placementid. campaignid,
date
FROM table1
GROUP BY placementid,
campaignid,
date) c
ON A.placementid = C.placementid
AND A.campaignid = C.campaignid
AND A.date = C.date
GROUP BY A.placementid,
A.campaignid,
A.date
I am dividing Cost by placement because in source the cost was allocated for one placement only and one time so I have to divide those because in actual table the same Placementid repeat more than 1 times on same date.

As you didn't provide any expected output I guessing here but if the result you want is this:
PlacementID CampaignID Date Impressions Cost
----------- ----------- ----------------------- ----------- -----------
100 10 2014-04-01 02:00:00.000 1000 5000
100 11 2014-04-01 02:00:00.000 500 7000
101 10 2014-04-01 02:00:00.000 1500 6000
103 10 2014-04-01 02:00:00.000 NULL 8000
Then the following query should do it:
SELECT COALESCE(A.PlacementID,b.placementid) AS PlacementID,
COALESCE(A.campaignid, b.campaignid) AS CampaignID,
COALESCE(A.date, b.date) AS [Date],
SUM(A.impressions) AS Impressions,
SUM(CASE
WHEN C.placement_count > 1 THEN ( B.cost / C.placement_count )
ELSE B.cost
END ) AS Cost
FROM table1 A
FULL JOIN table2 B
ON A.[PlacementID] = B.placementid
AND A.campaignid = B.campaignid
AND A.date = B.date
LEFT JOIN (SELECT COUNT(PlacementID) AS Placement_Count,
placementid, campaignid,
date
FROM table1
GROUP BY placementid,
campaignid,
date) c
ON A.[PlacementID] = C.placementid
AND A.campaignid = C.campaignid
AND A.date = C.date
GROUP BY COALESCE(A.PlacementID, B.PlacementID),
COALESCE(A.campaignid, b.campaignid),
COALESCE(A.date, b.date)
Sample SQL Fiddle

MySql JOIN on most recent start_date?

I have two tables, one with transactions (with date). The other with a percentage and date the percentage it went into effect (assume 00:00:00). The percentage remains in effect until a new percent goes into effect. I need to join on the percentage that was in effect when the transaction happened.
transactions_table
event_date amount
2011-01-01 230
2011-02-18 194
2011-03-22 56
2011-04-30 874
percent_table
effective percent
2010-12-30 15
2011-03-05 25
2011-04-12 30
The result I'm looking for is:
event_date amount percent
2011-01-01 230 15
2011-02-18 194 15
2011-03-22 56 25
2011-04-30 874 30
I've tried:
SELECT t.event_date, t.amount, p.percent
FROM transactions_table AS t
LEFT JOIN percent_table AS p ON t.event_date >= p.effective
ORDER BY `t`.`event_date` DESC LIMIT 0 , 30;
That gives me, seemingly random percentages. It seems to me like I need to get the greatest date >= p.effective, not just any random date >= p.effective.
I tried:
SELECT t.event_date, p.percent
FROM bedic_sixsummits_transactions AS t
LEFT JOIN bedic_sixsummits_percent AS p ON MAX(t.event_date >= p.effective)
ORDER BY `t`.`event_date` DESC LIMIT 0 , 30
but MySQL just laughed at my feeble attempt.
How can I do this?

SELECT t.event_date, t.amount, p.percent
FROM bedic_sixsummits_transactions AS t
LEFT JOIN bedic_sixsummits_percent AS p
ON p.effective =
( SELECT MAX( p2.effective ) FROM bedic_sixsummits_percent AS p2
WHERE p2.effective <= t.event_date
)
ORDER BY t.event_date DESC LIMIT 0 , 30

Even more simpler and with no subquery:
SELECT event_date, amount, MAX(_percent) as _percent
FROM transactions_table
LEFT JOIN percent_table p1 ON event_date >= effective
GROUP BY event_date, amount
ORDER BY event_date;
http://sqlfiddle.com/#!3/e8ca3/17/0
Note that it is possible because of the business model involved. If you wan't to retrieve other fields of the percent_table it won't be appropriate anymore :/

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

MySQL: Greatest n per group with joins and conditions - mysql

Related

MySQL order by distinct items first

Trouble using group by to get a max value across two tables

MySQL: Range based on rows in external table

Missing records from one table in SQL Server 2008R2

MySql JOIN on most recent start_date?

Categories

Resources