MySQL query for multi-column distinct plus an ancillary column condition - mysql
Imagine a flat table that tracks game matches in which each game has three participants: an attacker, a defender and a bettor who is wagering on the outcome of the battle between players 1 and 2. The table includes the names of the players and the bettor of each game, as well as the date of the game, the scores of each player, the game venue and the name of the referee. I have included the CREATE sql for some sample data below.
DROP TABLE IF EXISTS `game`;
CREATE TABLE `game` (
`game_date` text,
`player_1` text,
`player_2` text,
`bettor` text,
`p1_score` double DEFAULT NULL,
`p2_score` double DEFAULT NULL,
`result` double DEFAULT NULL,
`venue` text,
`referee` text
)
INSERT INTO `game` VALUES ('2020-04-05','Bob','Kelly','Kevin',100,78,0.2,'TS1','Richard'),('2020-03-06','Jim','Bob','Dave',100,97,1.2,'TS2','Mike'),('2020-02-05','Jim','Bob','Kevin',100,86,0.9,'TS2','Mike'),('2020-01-06','Kelly','Bob','Jim',100,92,1.3,'TS2','Richard'),('2019-12-07','Kelly','Bob','Jim',100,98,1.7,'TS1','Mike'),('2019-11-07','Kelly','Bob','Kevin',78,100,2.1,'TS2','Mike'),('2019-10-08','Kelly','Bob','Kevin',97,100,1.5,'TS1','Mike'),('2019-09-08','Kelly','Jim','Dave',86,100,2.4,'TS1','Richard'),('2019-08-09','Kelly','Jim','Dave',92,100,2.8,'TS2','Mike'),('2019-07-10','Kelly','Jim','Dave',98,100,2.2,'TS2','Mike'),('2019-06-10','Kelly','Jim','Dave',100,78,1.9,'TS2','Richard'),('2019-05-11','Sarah','Jim','Kevin',100,97,2.1,'TS1','Mike'),('2019-04-11','Sarah','Jim','Kevin',100,86,2.1,'TS2','Mike'),('2019-03-12','Sarah','Jim','Kevin',100,92,2.8,'TS1','Mike'),('2019-02-10','Sarah','Jim','Kevin',100,98,1.8,'TS1','Richard');
I need a query that returns match info for each unique assembly of match participants... but only for the first match that the three participants ever played in all together, i.e., for the earliest game_date among the matches that all three participated in.
For example, a game where Bob was player 1, Kelly was player two and Kevin was the bettor would constitute a unique threesome. In the data, there is only one such pairing for this threesome so the query would return a row for that one match.
In the case of Sarah as player 1, Jim as player 2 and Kevin as bettor, there are four matches with that threesome and so the query would return only info for the earliest match, i.e., the one 2/10/2019.
Note that in the sample data there are two matches with the threesome 'Kelly','Bob','Jim'. There are also two other matchs with the threesome 'Kelly','Jim','Bob'. These are not the same because Bob and Jim swap places has player 2 and bettor. So the query would return one row for each of them, i.e., the matches dated '12/072019' and '08/09/2019', respectively.
Using DISTINCT, I can return a list of all of the unique player groupings.
SELECT DISTINCT player_1, player_2, bettor FROM games;
Using GROUP BY, I can return all of the game info for all of the matches the group played in.
SELECT * FROM games GROUP BY player_1, player_2, bettor;
But I can't figure out how to return all of the game info but only for the earliest game where all three participants played together and in distinct roles in the games.
I have tried sub-queries using MIN() for game_date but that's a loser. I suspect there is perhaps an INNER JOIN solution but I haven't found it yet.
I am grateful for any guidance you can provide.
One canonical approach uses a join to a subquery which identifies the earliest games for each trio:
SELECT g1.*
FROM games g1
INNER JOIN
(
SELECT player_1_name, player_2_name, player_3_name,
MIN(game_date) AS min_game_date
FROM games
GROUP BY player_1_name, player_2_name, player_3_name
) g2
ON g2.player_1_name = g1.player_1_name AND
g2.player_2_name = g1.player_2_name AND
g2.player_3_name = g1.player_3_name AND
g2.min_game_date = g1.game_date;
If you are running MySQL 8+, then the ROW_NUMBER analytic function provides another option:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY player_1_name, player_2_name,
player_3_name
ORDER BY game_date) rn
FROM games
)
SELECT *
FROM cte
WHERE rn = 1;
Related
In MySQL, How to Select a Row From A Table Exactly Once to Populate Another Table?
I have a table of seven recipes, each of which needs to be assigned to a student. Each student can be assigned a maximum of one recipe, and there are more total students than total recipes, so some students will not receive any assignment. In my table of assignments, I need to populate which recipe is assigned to which student. (In my business requirements, assignments must be a freestanding table; I cannot add a column to the recipes table). Below is the script I am using (including for creating sample data). I had hoped by using the NOT EXISTS clause, I could prevent a student from being assigned more than one recipe.... but this is not working because the same student is being assigned to every recipe. Any guidance on how to fix my script would be greatly appreciated. Thank you! /* CREATE TABLE HAVING SEVEN RECIPES */ CREATE TABLE TempRecipes( Recipe VARCHAR(16) ); INSERT INTO TempRecipes VALUES ('Cake'), ('Pie'), ('Cookies'), ('Ice Cream'), ('Brownies'), ('Jello'), ('Popsicles'); /* CREATE TABLE HAVING TEN STUDENTS, i.e. MORE STUDENTS THAN AVAILABLE RECIPES */ CREATE TABLE TempStudents( Student VARCHAR(16) ); INSERT INTO TempStudents VALUES ('Ann'), ('Bob'), ('Charlie'), ('Daphne'), ('Earl'), ('Francine'), ('George'), ('Heather'), ('Ivan'), ('Janet'); /* CREATE TABLE TO STORE THE ASSIGNMENTS */ CREATE TABLE TempAssignments( Recipe VARCHAR(16), Student VARCHAR(16) ); INSERT INTO TempAssignments( Recipe, Student ) SELECT TempRecipes.Recipe, ( SELECT S1.Student FROM TempStudents S1 WHERE NOT EXISTS (SELECT TempAssignments.Student FROM TempAssignments WHERE TempAssignments.Student = S1.Student) LIMIT 1 ) Student FROM TempRecipes;
One way you can consider is making two separate queries, make them as a derived table and assigning a unique identifier on each query that you can match against another. I think that the unique identifier can be a row number. This suggestion is for MySQL v8+ that supports ROW_NUMBER() function (or if I'm not mistaken; on MariaDB v10.2+?). You've already established these conditions: Each student can be assigned a maximum of one recipe. If students count are more than recipes then some students will not receive any recipe assignment. Let's assume that there's an additional condition: The recipe assigned will be random. So, both table will have basically the same query structure as such: SELECT Student, ROW_NUMBER() OVER (ORDER BY RAND()) AS Rn1 FROM TempStudents; SELECT Recipe, ROW_NUMBER() OVER (ORDER BY RAND()) AS Rn2 FROM TempRecipes; In that query, the additional condition no.3 of "random assignment" is implemented in the ROW_NUMBER() function. If you run the query as is, you'll almost definitely get different result of row number assignment every time. If you don't wish to do so - let's say maybe you prefer to order by student/recipe name descending - then you just replace ORDER BY RAND() with ORDER BY Student DESC. Next we'll make both queries as derived tables then join them by matching the row number like this: SELECT * FROM (SELECT Student, ROW_NUMBER() OVER (ORDER BY RAND()) AS Rn1 FROM TempStudents) a LEFT JOIN (SELECT Recipe, ROW_NUMBER() OVER (ORDER BY RAND()) AS Rn2 FROM TempRecipes) b ON a.Rn1=b.Rn2; The reason I'm doing LEFT JOIN here is to show that there will be some student without recipe assignment. Here's the result: Student Rn1 Recipe Rn2 Ann 1 Cookies 1 Bob 2 Jello 2 Charlie 3 Pie 3 Daphne 4 Brownies 4 Earl 5 Popsicles 5 Francine 6 Cake 6 George 7 Ice Cream 7 Heather 8 NULL NULL Ivan 9 NULL NULL Janet 10 NULL NULL If you're doing INNER JOIN then you'll not see the last 3 of the result above since they're no matching row number from the recipe table. Our last step is just adding insert command to the query like so: INSERT INTO TempAssignments SELECT Recipe, Student FROM .... Do note that this example is using random ordering therefore the result in the TempAssignments table after the insert might not be the same as the one you get while doing testing. Here's a fiddle for reference
JOINing tables while ignoring duplicates
So, let's say I have a hash/relational table that connects users, teams a user can join, and challenges in which teams participate (teams_users_challenges), as well as a table that stores entered data for all users in a given challenge (entry_data). I want to get the average scores for each user in the challenge (the average value per day in a given week). However, there is a chance that a user will somehow join more than one team erroneously (which shouldn't happen, but does on occasion). Here is the SQL query below that gets a particular user's score: SELECT tuc.user_id, SUM(ed.data_value) / 7 as value FROM teams_users_challenges tuc LEFT JOIN entry_data ed ON ( tuc.user_id = ed.user_id AND ed.entry_date BETWEEN '2013-09-16' AND '2013-09-22' ) WHERE tuc.challenge_id = ___ AND tuc.user_id = ___ If a user has mistakenly joined more than one team, (s)he would have more than one entry in teams_users_challenges, which would essentially duplicate the data retrieved. So if a user is on 3 different teams for the same challenge, (s)he would have 3 entries in teams_users_challenges, which would multiply their average value by 3, thanks to the LEFT JOIN that automatically takes in all records, and not just one. I've tried using GROUP BY, but that doesn't seem to restrict the data to only one instances within teams_users_challenges. Does anybody have any ideas as to how I could restrict the query to only take in one record within teams_users_challenges? ADDENDUM: The columns within teams_users_challenges are team_id, user_id, and challenge_id.
If this is a new empty table, you can express your 'business rule' that a user should only join one team per challenge as a unique constraint in SQL: alter table teams_users_challenges add constraint oneUserPerTeamPerChallenge unique ( user_id , team_id , challenge_id ); If you can't change the table, you'll need to group by user and team and pick a single challenge from each group in the query result. Maybe pick just the latest challenge.
I can't test it, but if you can't clean up the data as Yawar suggested, try: SELECT tuc.user_id, SUM(ed.data_value) / 7 as value FROM entry_data ed LEFT JOIN ( select tuc.user_id, tuc.challenge_id from teams_users_challenges tuc group by tuc.user_id, tuc.challenge_id ) AS SINGLE_TEAM ON SINGLE_TEAM.user_id = ed.user_id AND ed.entry_date BETWEEN '2013-09-16' AND '2013-09-22' WHERE tuc.challenge_id = ___ AND tuc.user_id = ___
Correct use of the HAVING clause, to return the unique row
I have 3 tables in this scenario: Teams, Players and PlayersInTeams. A Player is just a registered user. No W/L data is associated with a Player. A Team maintains win/loss records. If a player is playing by himself, then he plays with his "solo" team (a Team with only that Player in it). Every time Bob and Jan win together, their Team entry gets wins++. Every time Jason and Tommy lose together, their Team entry gets losses++. The PlayersInTeams table only has 2 columns, and it's an intersection table between Players and Teams: > desc PlayersInTeams ; +------------+---------+ | Field | Type | +------------+---------+ | fkPlayerId | int(11) | | fkTeamId | int(11) | +------------+---------+ So here is the tough part: Because a Player can be part of multiple Teams, it is important to fetch the right TeamId from the Teams table at the beginning of a match. A Player's SOLO team is given by select fkTeamId from PlayersInTeams where fkPlayerId=1 HAVING count(fkTeamId)=1; NO IT'S NOT!! But I don't understand why. I'm trying to say: Get the fkTeamId from PlayersInTeams where the fkPlayerId=1, but also, the count of rows that have this particular fkTeamId is exactly 1. The query returns (empty set), and actually if I change the HAVING clause to being incorrect (HAVING count(fkTeamId)<>1;), it returns the row I want.
To fix your query, add a group by. To compute the count per team, you'll need to change the where clause to return all teams that player 1 is on: select fkTeamId from PlayersInTeams where fkTeamId in ( select fkTeamId from PlayersInTeams where fkPlayerId = 1 ) group by fkTeamId having count(*) = 1; Example at SQL Fiddle. Below a detailed explanation of why your count(*) = 1 condition works in a surprising way. When a query contains an aggregate like count, but there is no group by clause, the database will treat the entire result set as a single group. In databases other than MySQL, you could not select a column that is not in a group by without an aggregate. In MySQL, all those columns are returned with the first value encountered by the database (essentially a random value from the group.) For example: create table YourTable (player int, team int); insert YourTable values (1,1), (1,2), (2,2); select player , team , count(team) from YourTable where player = 2 --> player team count(team) 1 1 1 The first two columns come from a random row with player = 1. The count(team) value is 2, because there are two rows with player = 1 and a non-null team. The count says nothing about the number of players in the team.
The most natural thing to do is to count the rows to see what is going on: select fkTeamId, count(*) from PlayersInTeams where fkPlayerId=1 group by fkTeamId; The group by clause is a more natural way to write the query: select fkTeamId from PlayersInTeams where fkPlayerId=1 having count(fkteamid) = 1 However, if there is only one row for a player, then your original version should work -- the filtering would take it to one row, the fkTeamId would be the team on the row and the having would be satisfied. One possibility is that you have duplicate rows in the data. If duplicates are a problem, you can do this: select fkTeamId from PlayersInTeams where fkPlayerId=1 having count(distinct fkteamid) = 1 EDIT for "solo team": As pointed out by Andomar, the definition of solo team is not quite what I expected. It is a player being the only player on the team. So, to get the list of teams where a given player is the team: select fkTeamId from PlayersInTeams group by fkTeamId having sum(fkPlayerId <> 1) = 0 That is, you cannot filter out the other players and expect to get this information. You specifically need them, to be sure they are not on the team. If you wanted to get all solo teams: select fkTeamId from PlayersInTeams group by fkTeamId having count(*) = 1
HAVING is usually used with a GROUP BY statement - it's like a WHERE which gets applied to the grouped data. SELECT fkTeamId FROM PlayersInTeams WHERE fkPlayerId = 1 GROUP BY fkTeamId HAVING COUNT(fkPlayerId) = 1
SqlFiddle here: http://sqlfiddle.com/#!2/36530/21 Try finding teams with one player first, then using that to find the player id if they are in any of those teams: select DISTINCT PlayersInTeams.fkTeamId from ( select fkTeamId from PlayersInTeams GROUP BY fkTeamId HAVING count(fkPlayerId)=1 ) AS Sub INNER JOIN PlayersInTeams ON PlayersInTeams.fkTeamId = Sub.fkTeamId WHERE PlayersInTeams.fkPlayerId = 1;
Everybody's answers here were very helpful. Besides missing GROUP BY, I found that my problem was mainly that my WHERE clause was "too early". Say we had insert into PlayersInTeams values (1, 1), -- player 1 is in solo team id=1 (1, 2),(2,2) -- player 1&2 are on duo team id=2 ; Now we try: select fkTeamId from PlayersInTeams where fkPlayerId=1 -- X kills off data we need group by fkTeamId HAVING count(fkTeamId)=1; In particular the WHERE was filtering the temporary resultset so that any row that didn't have fkPlayerId=1 was being cut out of the result set. Since each fkPlayerId is associated with each fkTeamId only once, of course the COUNT of the number of occurrences by fkTeamId is always 1. So you always get back just a list of the teams player 1 is part of, but you still don't know his SOLO team. Gordon's answer with my addendum select fkPlayerId,fkTeamId from PlayersInTeams group by fkTeamId having count(fkTeamId) = 1 and fkPlayerId=1 ; Was particularly good because it's a cheap query and does what we want. It chooses all the SOLO teams first, (chooses all fkTeamIds that only occur ONCE in the table (via HAVING count(*)=1)), then from there we go and say, "ok, now just give me the one that has this fkPlayerId=1". And voila, the solo team. DUO teams were harder, but similar. The difficulty was similar to the problem above only with ambiguity from teams of 3 with teams of 2. Details in an SQL Fiddle.
How to combine 2 columns into one and use that merged column to hold multiple values
I have a table with columns id, playername, sport1, sport2 However, my requirements say that a player can have n number of sports, so a multi-value column 'sports' is needed or else I would be going with sport3, sport4, and so on. How do I alter the table to combine sport1, sport2 and use that as a multi-value column for new rows going forward?
The classic solution is to have a second table with columns for the playerId and the sport. Then, to see all the sports a player has, join the two.
sqlite does not support multivalued fields. The only DBMS I am aware of that can do this is Microsoft Access but I wouldn't, in general, recommend switching from sqlite to Access, even if you are guaranteed to be running on Windows computers only. This is a classic problem of database design. I recommend you switch from a single table to multiple tables so that each entity type is accurately portrayed in the database. For example, the name of a sport is not an attribute of a player, it is an attribute of a sport, so you probably need a Sport table. You clearly need a Player table. And since players can play multiple sports, and each sport might be played by multiple players, you have a many-to-many relationship. This can be modelled in a RDBMS using multiple tables with foreign keys. Consider the following database structure (think of this as pseudo-code though it may work as is). CREATE TABLE Player ( PlayerId int NOT NULL PRIMARY KEY, Name string NOT NULL UNIQUE ) CREATE TABLE Sport ( SportId int NOT NULL PRIMAEY KEY, Name string NOT NULL UNIQUE ) CREATE TABLE PlayerSport ( PlayerId int NOT NULL FOREIGN KEY REFERENCES Player (PlayerId), SportId int NOT NULL FOREIGN KEY REFERENCES Sport (SportId), PRIMARY KEY ( PlayerId, SportId ) ) This structure comes about via a process of database normalization. You can now get a list of all sports for a player, SELECT Sport.Name FROM Sport INNER JOIN PlayerSport ON PlayerSport.SportId = Sport.SportId AND PlayerSport.PlayerId = #PlayerId ORDER BY Sport.Name Or all players for a sport, SELECT Player.Name FROM Player INNER JOIN PlayerSport ON PlayerSport.SportId = Player.PlayerId AND PlayerSport.SportId = #SportId ORDER BY Player.Name Or all players and sports, SELECT Player.Name AS Player, Sport.Name AS Sport FROM Player INNER JOIN PlayerSport ON PlayerSport.SportId = Player.PlayerId INNER JOIN Sport ON Sport.SportId = PlayerSport.SportId ORDER BY Player.Name, Sport.Name But note that the structure of this final query is not the same as your original table. To output a row per player and a column per sport is not possible in static SQL. You could construct a dynamic query but this is not easy, is error prone, and fallible to sql-injection security breaches.
Merging multiple rows as separate columns using keys from second table
I have two tables. Users: int player_id varchar player_name Games: int game_id int player_id1 int player_id2 I want to make a query that takes in a player id as a parameter, and returns info on each game, along with the player's name. So far, what I have is the following: SELECT game_id, player_id1, player_id2, from GAMES, GAME_STATES where player_id1=#playerid or player_id2=#playerid The part I'm stuck at is a simple way to have it return the names of players along with the player ids. The returning query would have 5 columns, one of the game id, two for each player id, and two for each of their names. One solution I thought of is: SELECT game_id, player_id1, (select player_name from USERS where player_id=player_id1) as player_name1, player_id2, (select player_name from USERS where player_id=player_id2) as player_name2, from GAMES, GAME_STATES where player_id1=#playerid or player_id2=#playerid However, this seems like a lot of extra work on the database since there would be 2 more queries per row returned. If I have to do that, I'm wondering if making requests for names as a second query on the client side is a better option? Then the client could create a list of unique ids, and do one query for all of them. I'm not too worried about latency since the client and server are in the same data center. Thank you for your help.
SELECT game_id, u1.player_name, u2.player_name FROM games AS game INNER JOIN users AS u1 ON y1.playerid = game.player_id1 INNER JOIN users AS u2 ON u2.playedid = game.player_id2 WHERE player_id1 = #playerid OR player_id2 = #playerid Should do the trick