I have the following table of matches of a 4 player game called games.
+---------+---------+---------+---------+---------+
| game_id | player1 | player2 | player3 | player4 |
+---------+---------+---------+---------+---------+
| 1001 | john | dave | NULL | NULL |
| 1002 | dave | john | mike | tim |
| 1003 | mike | john | dave | NULL |
| 1004 | tim | dave | NULL | NULL |
+---------+---------+---------+---------+---------+
There are two questions I want to be able to answer:
Who played in the most games? (Dave)
What pair of players played the most games together? (John & Dave)
For #1 I tried to adapt the answer I found here: mySQL query to find the most repeated value but it only seems to be able to answer the question for a single column. Meaning I could learn who was player1 the most, but not who played in the most games as any player:
SELECT player1 p1, COUNT(*) p1 FROM games
GROUP BY p1
ORDER BY p1 DESC;
Is there a way to join these columns together or would I have to handle this in application code?
Not sure where to start for #2. I'm wondering if my table structure should instead consolidate players to a single column:
+----+---------+--------+
| id | game_id | player |
+----+---------+--------+
| 1 | 1001 | john |
| 2 | 1001 | dave |
| 3 | 1002 | john |
| 4 | 1002 | dave |
| 5 | 1002 | mike |
| 6 | 1002 | tim |
+----+---------+--------+
Your best bet is normalizing database. This is a many-to-many relationship and needs a linked table to connect a game to its corresponding players. Then computations would be much more easier. Nevertheless, you could use a derived table for question one that unites all columns into one:
SELECT `player`,
COUNT(*) as `count`
FROM
(
SELECT `player1` `player`
FROM `games`
UNION ALL
SELECT `player2` `player`
FROM `games`
UNION ALL
SELECT `player3` `player`
FROM `games`
UNION ALL
SELECT `player4` `player`
FROM `games`
) p
GROUP BY `player` HAVING `player` IS NOT NULL
ORDER BY `count` DESC
See live demo here
For the second question you have to have an inner join on derived table:
SELECT `p`.`player`,
`p2`.`player`,
count(*) AS count
FROM
(
SELECT `game_id`, `player1` `player`
FROM `games`
UNION ALL
SELECT `game_id`, `player2` `player`
FROM `games`
UNION ALL
SELECT `game_id`, `player3` `player`
FROM `games`
UNION ALL
SELECT `game_id`, `player4` `player`
FROM `games`
) p
INNER JOIN
(
SELECT `game_id`, `player1` `player`
FROM `games`
UNION ALL
SELECT `game_id`, `player2` `player`
FROM `games`
UNION ALL
SELECT `game_id`, `player3` `player`
FROM `games`
UNION ALL
SELECT `game_id`, `player4` `player`
FROM `games`
) p2
ON `p`.`game_id` = `p2`.`game_id` AND `p`.`player` < `p2`.`player`
WHERE `p`.`player` IS NOT NULL AND `p2`.`player` IS NOT NULL
GROUP BY `p`.`player`, `p2`.`player`
ORDER BY `count` DESC
See live demo here
I would start with restructuring your design and introduce 3 tables
1) Player
which will have player data and their unique ids
CREATE TABLE players
(`id` int, `name` varchar(255))
;
INSERT INTO players
(`id`, `name`)
VALUES
(1, 'john'),
(2, 'dave'),
(3, 'mike'),
(4, 'tim');
2) Games which will have game data and their unique ids
CREATE TABLE games
(`id` int, `name` varchar(25))
;
INSERT INTO games
(`id`, `name`)
VALUES
(1001, 'G1'),
(1002, 'G2'),
(1003, 'G3'),
(1004, 'G4');
3) player_games to relate these 2 entities as many to many relationship via junction table which will hold game id and player id like as per your sample data
CREATE TABLE player_games
(`game_id` int, `player_id` int(11))
;
INSERT INTO player_games
(`game_id`, `player_id`)
VALUES
(1001, 1),
(1001, 2),
(1002, 1),
(1002, 2),
(1002, 3),
(1002, 4),
(1003, 3),
(1003, 1),
(1003, 2),
(1004, 4),
(1004, 2)
;
For Who played in the most games? Its dave not john as per your sample data set who played 4 games
select t.games_played,group_concat(t.name) players
from (
select p.name,
count(distinct pg.game_id) games_played
from player_games pg
join players p on p.id = pg.player_id
group by p.name
) t
group by games_played
order by games_played desc
limit 1
For above query there can be a chance that morethan one players have played most games like dave played 4 games and tim also played 4 games so both should be included
Demo
For What pair of players played the most games together? (John & Dave)
select t.games_played,group_concat(t.player_name) players
from (
select group_concat(distinct pg.game_id),
concat(least(p.name, p1.name), ' ', greatest(p.name, p1.name)) player_name,
count(distinct pg.game_id) games_played
from player_games pg
join player_games pg1 on pg.game_id = pg1.game_id
and pg.player_id <> pg1.player_id
join players p on p.id = pg.player_id
join players p1 on p1.id = pg1.player_id
group by player_name
) t
group by games_played
order by games_played desc
limit 1;
In above query i have self joined player_games table to get the combination of players against each game and then grouped data for each unique pair , Again followed same logic to handel that there can be a chance that morethan one pair of players have played most games
Demo
Related
I have two tables users and interests which i'm trying to join. Inside users table i have columns as id, name, interest, etc. The interest column contain multiple values as "1,2,3". My second table interests have 2 columns id and name as:
id | name
-------------
1 | business
2 | farming
3 | fishing
What i want to do is join interests table with users table so i get the following output:
users table:
id | name | interest | interest_name
----------------------------------------------
1 | username | "1,2" | "business, farming"
2 | username | "2,3" | " farming, fishing"
I wrote the following query to achieve this:
select users.*, interests.name as interest_name
from users
left join interests on users.interest = interests.id;
Results i got:
id | name | interest | interest_name
----------------------------------------
1 | username | "1,2" | "business"
2 | username | "2,3" | " farming"
Problem:
I'm only getting the name of first values from interest column whereas i want all the values from interest column i have already tried using group_concat and find_in_set but getting the same results.
In the case you cannot create an additional database table in order to normalize the data...
Here's a solution that creates an ad hoc, temporary user_interests table within the query.
SELECT users.id user_id, username, interests, interests.interest
FROM users
LEFT JOIN (
SELECT
users.id user_id,
(SUBSTRING_INDEX(SUBSTRING_INDEX(users.interests, ',', ui.ui_id), ',', -1) + 0) ui_id
FROM users
LEFT JOIN (SELECT id AS ui_id FROM interests) ui
ON CHAR_LENGTH(users.interests) - CHAR_LENGTH(REPLACE(users.interests, ',', '')) >= (ui.ui_id - 1)
) user_interests ON users.id = user_interests.user_id
LEFT JOIN interests ON user_interests.ui_id = interests.id
ORDER BY user_id, ui_id;
Outputs:
user_id | username | interest_ids | interest
--------+----------+--------------+---------
1 | fred | 3,4,8,6,10 | fishing
1 | fred | 3,4,8,6,10 | sports
1 | fred | 3,4,8,6,10 | religion
1 | fred | 3,4,8,6,10 | science
1 | fred | 3,4,8,6,10 | philanthropy
2 | joe | 7,11,8,9 | art
2 | joe | 7,11,8,9 | science
2 | joe | 7,11,8,9 | politics
2 | joe | 7,11,8,9 | cooking
As you can see...
SELECT
users.id user_id,
(SUBSTRING_INDEX(SUBSTRING_INDEX(users.interests, ',', ui.ui_id), ',', -1) + 0) ui_id
FROM users
LEFT JOIN (SELECT id AS ui_id FROM interests) ui
ON CHAR_LENGTH(users.interests) - CHAR_LENGTH(REPLACE(users.interests, ',', '')) >= (ui.ui_id - 1)
...builds and populates the temporary table user_interests with the users.interests field data normalized:
user_id | ui_id
--------+------
1 | 3
1 | 4
1 | 6
1 | 8
1 | 10
2 | 7
2 | 8
2 | 9
2 | 11
...which is then LEFT JOIN'ed between the users and interests tables.
Try it here: https://onecompiler.com/mysql/3yfhmgq3y
-- create
CREATE TABLE users (
id INT PRIMARY KEY,
username VARCHAR(20),
interests VARCHAR(20)
);
CREATE TABLE interests (
id INT PRIMARY KEY,
interest VARCHAR(20)
);
-- insert
INSERT INTO users VALUES (1, 'fred', '3,4,8,6,10'), (2, 'joe', '7,11,8,9');
INSERT INTO interests VALUES (1, 'business'), (2, 'farming'), (3, 'fishing'), (4, 'sports'), (5, 'technology'), (6, 'religion'), (7, 'art'), (8, 'science'), (9, 'politics'), (10, 'philanthropy'), (11, 'cooking');
-- select
SELECT users.id user_id, username, interests, interests.interest
FROM users
LEFT JOIN (
SELECT
users.id user_id,
(SUBSTRING_INDEX(SUBSTRING_INDEX(users.interests, ',', ui.ui_id), ',', -1) + 0) ui_id
FROM users
LEFT JOIN (SELECT id AS ui_id FROM interests) ui
ON CHAR_LENGTH(users.interests) - CHAR_LENGTH(REPLACE(users.interests, ',', '')) >= (ui.ui_id - 1)
) user_interests ON users.id = user_interests.user_id
LEFT JOIN interests ON user_interests.ui_id = interests.id
ORDER BY user_id, ui_id;
Inspired by Leon Straathof's and fthiella's answers to this SO question.
Pull the interest column out of the users table and create a user_interests table that contains the user ids and interest ids:
user_id | interest_id
--------+------------
1 | 1
1 | 2
2 | 2
2 | 3
Then join the users table to the user_interests table, and the user_interests table to the interests table:
SELECT users.username, interests.interest
FROM users
LEFT JOIN user_interests ON users.id = user_interests.user_id
LEFT JOIN interests ON user_interests.interest_id = interests.id
WHERE interest_id IS NOT NULL;
Outputs:
username | interest
---------+---------
Clark | business
Clark | farming
Dave | farming
Dave | fishing
Then use your server programming language to compile the query results.
Try it here: https://onecompiler.com/mysql/3yfe5pp7x
-- create
CREATE TABLE users (
id INTEGER PRIMARY KEY,
username TEXT NOT NULL
);
CREATE TABLE user_interests (
user_id INTEGER,
interest_id INTEGER,
UNIQUE KEY user_interests_constraint (user_id,interest_id)
);
CREATE TABLE interests (
id INTEGER PRIMARY KEY,
interest TEXT NOT NULL
);
-- insert
INSERT INTO users VALUES (1, 'Clark'), (2, 'Dave'), (3, 'Ava');
INSERT INTO interests VALUES (1, 'business'), (2, 'farming'), (3, 'fishing');
INSERT INTO user_interests VALUES (1, 1), (1, 2), (2, 2), (2, 3);
-- fetch
SELECT users.username, interests.interest
FROM users
LEFT JOIN user_interests ON users.id = user_interests.user_id
LEFT JOIN interests ON user_interests.interest_id = interests.id
WHERE interest_id IS NOT NULL;
Refer to another Stack Overflow question here, however the answers there didn't include the group_id 3 player.
I tried to replicate the answer in MySQL but I am not familiar with PostgreSQL. Anyone can show how to proceed it in MySQL?
The question is to return the max scored player as winner_id from each group
create table players (
player_id integer not null unique,
group_id integer not null
);
create table matches (
match_id integer not null unique,
first_player integer not null,
second_player integer not null,
first_score integer not null,
second_score integer not null
);
insert into players values(20, 2);
insert into players values(30, 1);
insert into players values(40, 3);
insert into players values(45, 1);
insert into players values(50, 2);
insert into players values(65, 1);
insert into matches values(1, 30, 45, 10, 12);
insert into matches values(2, 20, 50, 5, 5);
insert into matches values(13, 65, 45, 10, 10);
insert into matches values(5, 30, 65, 3, 15);
insert into matches values(42, 45, 65, 8, 4);
matches table
match_id | first_player | second_player | first_score | second_score
----------+--------------+---------------+-------------+--------------
1 | 30 | 45 | 10 | 12
2 | 20 | 50 | 5 | 5
13 | 65 | 45 | 10 | 10
5 | 30 | 65 | 3 | 15
42 | 45 | 65 | 8 | 4
Expected output
group_id | winner_id
----------+-----------
1 | 45
2 | 20
3 | 40
I presume that since you can't use the solution to the other question that you are using MySQL 5.7 or below. In that case, you have to simulate the ROW_NUMBER/PARTITION functionality, which you can do with a LEFT JOIN from a derived table of scores per player with itself, joining on the score being greater than that in the first table. Any player who has no scores greater in the joined table clearly has the highest score. Since there can be ties, we then take the minimum of the player_id values from that table (when there is no tie, this has no effect).
SELECT group_id, MIN(player_id) AS player_id
FROM (
SELECT t1.group_id, t1.player_id
FROM (
SELECT p.player_id, p.group_id,
SUM(CASE WHEN m.first_player = p.player_id THEN m.first_score
ELSE m.second_score
END) AS score
FROM players p
LEFT JOIN matches m ON m.first_player = p.player_id OR m.second_player = p.player_id
GROUP BY p.player_id, p.group_id
) t1
LEFT JOIN (
SELECT p.player_id, p.group_id,
SUM(CASE WHEN m.first_player = p.player_id THEN m.first_score
ELSE m.second_score
END) AS score
FROM players p
LEFT JOIN matches m ON m.first_player = p.player_id OR m.second_player = p.player_id
GROUP BY p.player_id, p.group_id
) t2 ON t2.group_id = t1.group_id AND t2.score > t1.score
GROUP BY t1.group_id, t1.player_id
HAVING COUNT(t2.player_id) = 0
) w
GROUP BY group_id
Output:
group_id player_id
1 45
2 20
3 40
Demo on db-fiddle
This is a follow up question to How to count distinct values from two columns into one number
I wanted to know how to do the counting part and neglected that i am already joining some other tables into the mix.
The answer given on the previous question is the correct one for that case.
Here's my additional problem now.
I have 3 tables:
Assignments
+----+-------------------+
| id | name |
+----+-------------------+
| 1 | first-assignment |
| 2 | second-assignment |
+----+-------------------+
Submissions
+----+---------------+------------+
| id | assignment_id | student_id |
+----+---------------+------------+
| 1 | 1 | 2 |
| 2 | 2 | 1 |
| 3 | 1 | 3 |
+----+---------------+------------+
Group_submissions
+----+---------------+------------+
| id | submission_id | student_id |
+----+---------------+------------+
| 1 | 1 | 1 |
| 2 | 2 | 2 |
+----+---------------+------------+
Each submission belongs to an assignment.
Submissions can be an individual submission or a group submission
When they are individual the one that did the submission in an assignment(assignment_id) goes into the submissions table(student_id)
When they are group submission the same thing happens with two additional details:
The one that does the submission goes into the submissions table
The others go to the group_submissions table and are associated with the id in the submissions table (so submission_id is a FK from the submissions table)
I want to return every assignment with it's columns, but also add the number of students that have made submissions into that assignment. Keep in mind that students that haven't done the submission(are not in the submissions table) but have participated in a group submission (are in the group_submissions table) also count
Something like this:
+----+-------------------+----------+
| id | name | students |
+----+-------------------+----------+
| 1 | first-assignment | 11 |
| 2 | second-assignment | 2 |
+----+-------------------+----------+
I tried 2 ways of getting the numbers:
count(distinct case when group_submissions.student_id is not null then
group_submissions.student_id when assignment_submissions.student_id is
not null then assignment_submissions.student_id end)
This doesn't work because the case statement will short circuit once the first condition is met. For example: If one student has done group submissions but has never actually done the submission he/she will be displayed on the group_submissions table only. So if on the submissions table the id is 1 and on the group_submission table the id is 2, and id 2 does not occur on the submissions table it will not be counted.
count(distinct case when group_submissions.student_id is not null then group_submissions.student_id end)
+ count(distinct case when submissions.student_id is not null then submissions.student_id end)
This one doesn't work because it gives duplicates if a student is in both tables.
NOTE: This is a MySQL database
Since you can't change the data, you'll need to use a UNION subquery, and then aggregate over that.
SELECT a.id, a.name, COUNT(DISTINCT x.student_id) AS students
FROM Assignments AS a
LEFT JOIN (
SELECT assignment_id, student_id FROM Submissions
UNION
SELECT s.assignment_id, g.student_id
FROM Submissions AS s
INNER JOIN Group_submissions AS g ON s.id = g.submission_id
) AS x ON a.id = x.assignment_id
GROUP BY a.id, a.name
;
Edit: vhu's first part is better as long as you cannot have assignment X submitted by student Y with a group_submission credit of student Z, and another for assignment X submitted directly by student Z or having a group_submission credit or student Y (because then they would be counted twice).
As students are either in submissions table or in group_submissions you can just simply join the tables and add the columns:
SELECT a.id,COUNT(s.student_id)+COUNT(gs.student_id) FROM assignments a
JOIN submissions s ON a.id = s.assignment_id
LEFT JOIN group_submissions gs ON s.id = gs.submission_id
GROUP BY a.id;
If there are duplicates, i.e. student can be both in submissions and group_submissions tables, then you can union the two and select from there:
SELECT assignment_id,COUNT(DISTINCT student_id)
FROM (
SELECT assignment_id,student_id
FROM submissions
UNION
SELECT assignment_id,gs.student_id
FROM group_submissions gs
JOIN submissions s on gs.submission_id = s.id) T1
GROUP BY assignment_id;
You tagged the question already as mysqk the version number is usually also interesting for a good answer
following gives you a correct answer
SELECT
a.id,a.name
, LENGTH(CONCAT(GROUP_CONCAT(s.`student_id`) ,IF(GROUP_CONCAT(gs.student_id) is NULL,'',','),IF(GROUP_CONCAT(gs.student_id) is NULL,'',GROUP_CONCAT(gs.student_id))))
- LENGTH(REPLACE(CONCAT(GROUP_CONCAT(s.`student_id`) ,IF(GROUP_CONCAT(gs.student_id) is NULL,'',','),IF(GROUP_CONCAT(gs.student_id) is NULL,'',GROUP_CONCAT(gs.student_id))), ',', '')) + 1 as count_studints
FROM
Submissions s
LEFT JOIN Group_submissions gs ON gs.submission_id = s.id
INNER JOIN Assignments a on s.assignment_id = a.id
WHERE s.`student_id` NOT IN (SELECT student_id
FROM Group_submissions gs
WHERE gs.submission_id = s.id)
GROUP BY a.id,a.name;
CREATE TABLE Group_submissions (
`id` INTEGER,
`submission_id` INTEGER,
`student_id` INTEGER
);
INSERT INTO Group_submissions
(`id`, `submission_id`, `student_id`)
VALUES
('1', '1', '1'),
('2', '2', '2');
CREATE TABLE Submissions (
`id` INTEGER,
`assignment_id` INTEGER,
`student_id` INTEGER
);
INSERT INTO Submissions
(`id`, `assignment_id`, `student_id`)
VALUES
('1', '1', '2'),
('2', '2', '1'),
('3', '1', '3'),
('4', '3', '1');
CREATE TABLE Assignments (
`id` INTEGER,
`name` VARCHAR(17)
);
INSERT INTO Assignments
(`id`, `name`)
VALUES
('1', 'first-assignment'),
('2', 'second-assignment'),
('3', 'third-assignment');
✓
✓
✓
✓
✓
✓
SELECT
a.id,a.name
, LENGTH(CONCAT(GROUP_CONCAT(s.`student_id`) ,IF(GROUP_CONCAT(gs.student_id) is NULL,'',','),IF(GROUP_CONCAT(gs.student_id) is NULL,'',GROUP_CONCAT(gs.student_id))))
- LENGTH(REPLACE(CONCAT(GROUP_CONCAT(s.`student_id`) ,IF(GROUP_CONCAT(gs.student_id) is NULL,'',','),IF(GROUP_CONCAT(gs.student_id) is NULL,'',GROUP_CONCAT(gs.student_id))), ',', '')) + 1 as count_studints
from
Submissions s
LEFT JOIN Group_submissions gs ON gs.submission_id = s.id
INNER JOIN Assignments a on s.assignment_id = a.id
WHERE s.`student_id` NOT IN (SELECT student_id
FROM Group_submissions gs
WHERE gs.submission_id = s.id)
GROUP BY a.id,a.name;
id | name | count_studints
-: | :---------------- | -------------:
1 | first-assignment | 3
2 | second-assignment | 2
3 | third-assignment | 1
db<>fiddle here
Basically what I wanted is that I can select all the race records with record holder and best time. I looked up about similar queries and managed to find 3 queries that were faster than the rest.
The problem is it completely ignores the race the userid 2 owns the record of.
These are my tables, indexes, and some sample data:
CREATE TABLE `races` (
`raceid` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(20) NOT NULL,
PRIMARY KEY (`raceid`),
UNIQUE KEY `name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE `users` (
`userid` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(20) NOT NULL,
PRIMARY KEY (`userid`),
UNIQUE KEY `name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE `race_times` (
`raceid` smallint(5) unsigned NOT NULL,
`userid` mediumint(8) unsigned NOT NULL,
`time` mediumint(8) unsigned NOT NULL,
PRIMARY KEY (`raceid`,`userid`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO `races` (`raceid`, `name`) VALUES
(1, 'Doherty'),
(3, 'Easter Basin Naval S'),
(5, 'Flint County'),
(6, 'Fort Carson'),
(4, 'Glen Park'),
(2, 'Palomino Creek'),
(7, 'Tierra Robada');
INSERT INTO `users` (`userid`, `name`) VALUES
(1, 'Player 1'),
(2, 'Player 2');
INSERT INTO `race_times` (`raceid`, `userid`, `time`) VALUES
(1, 1, 51637),
(1, 2, 50000),
(2, 1, 148039),
(3, 1, 120516),
(3, 2, 124773),
(4, 1, 101109),
(6, 1, 89092),
(6, 2, 89557),
(7, 1, 77933),
(7, 2, 78038);
So if I run these 2 queries:
SELECT rt1.raceid, r.name, rt1.userid, p.name, rt1.time
FROM race_times rt1
LEFT JOIN users p ON (rt1.userid = p.userid)
JOIN races r ON (r.raceid = rt1.raceid)
WHERE rt1.time = (SELECT MIN(rt2.time) FROM race_times rt2 WHERE rt1.raceid = rt2.raceid)
GROUP BY r.name;
or..
SELECT rt1.*, r.name, p.name
FROM race_times rt1
LEFT JOIN users p ON p.userid = rt1.userid
JOIN races r ON r.raceid = rt1.raceid
WHERE EXISTS (SELECT NULL FROM race_times rt2 WHERE rt2.raceid = rt1.raceid
GROUP BY rt2.raceid HAVING MIN(rt2.time) >= rt1.time);
I receive correct results as shown below:
raceid | name | userid | name | time |
-------+----------------------+--------+----------+--------|
1 | Doherty | 2 | Player 2 | 50000 |
3 | Easter Basin Naval S | 1 | Player 1 | 120516 |
6 | Fort Carson | 1 | Player 1 | 89092 |
4 | Glen Park | 1 | Player 1 | 101109 |
2 | Palomino Creek | 1 | Player 1 | 148039 |
7 | Tierra Robada | 1 | Player 1 | 77933 |
and here is the faulty query:
SELECT rt.raceid, r.name, rt.userid, p.name, rt.time
FROM race_times rt
LEFT JOIN users p ON p.userid = rt.userid
JOIN races r ON r.raceid = rt.raceid
GROUP BY r.name
HAVING rt.time = MIN(rt.time);
and the result is this:
raceid | name | userid | name | time |
-------+----------------------+--------+----------+--------|
3 | Easter Basin Naval S | 1 | Player 1 | 120516 |
6 | Fort Carson | 1 | Player 1 | 89092 |
4 | Glen Park | 1 | Player 1 | 101109 |
2 | Palomino Creek | 1 | Player 1 | 148039 |
7 | Tierra Robada | 1 | Player 1 | 77933 |
As you can see, race "Doherty" (raceid: 1) is owned by "Player 2" (userid: 2) and it is not shown along with the rest of race records (which are all owned by userid 1). What is the problem?
Regards,
Having is a post filter. The query gets all the results, and then further filters them based on having. The GROUP BY compacting the rows based on the group, which gives you the first entry in each set. Since player 1 is the first entry for race 1, that's the result that is being processed by the HAVING. It is then filtered out because its time does not equal the MIN(time) for the group result.
This is why the other ones you posted are using a sub-query. My personal preference is for the first example, as to me it's slightly easier to read. Performance wise they should be the same.
While it's not a bad idea to try and avoid sub queries in the where clause, this is mostly valid when you can accomplish the same result with a JOIN. Other times it's not possible to get the result with a JOIN and a sub query is required.
I have two tables, players and games, created as follows:
CREATE TABLE IF NOT EXISTS `players` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
`created_at` datetime NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;
CREATE TABLE IF NOT EXISTS `games` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`player` int(11) NOT NULL,
`played_at` datetime NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;
I wish to extract 3 values for each day:
The number of players created at that day
The number of players played at that day
The number of players having played for the first time at that day
So, suppose for example that the players table looks as follows:
+----+--------+---------------------+
| id | name | created_at |
+----+--------+---------------------+
| 1 | Alan | 2016-02-01 00:00:00 |
| 2 | Benny | 2016-02-01 06:00:00 |
| 3 | Calvin | 2016-02-02 00:00:00 |
| 4 | Dan | 2016-02-03 00:00:00 |
+----+--------+---------------------+
And the games table looks as follows:
+----+--------+---------------------+
| id | player | played_at |
+----+--------+---------------------+
| 1 | 1 | 2016-02-01 01:00:00 |
| 2 | 3 | 2016-02-02 02:00:00 |
| 3 | 2 | 2016-02-03 14:00:00 |
| 4 | 3 | 2016-02-03 17:00:00 |
| 5 | 3 | 2016-02-03 18:00:00 |
+----+--------+---------------------+
Then the query should return something like
+------------+-----+--------+-------+
| day | new | played | first |
+------------+-----+--------+-------+
| 2016-02-01 | 2 | 1 | 1 |
| 2016-02-02 | 1 | 1 | 1 |
| 2016-02-03 | 1 | 2 | 1 |
+------------+-----+--------+-------+
I have a solution for 1 (new):
SELECT Date(created_at) AS day,
Count(*) AS new
FROM players
GROUP BY day;
That's easy. I think I also have a solution for 2 (played), thanks to MySQL COUNT DISTINCT:
select Date(played_at) AS day,
Count(Distinct player) AS played
FROM games
GROUP BY day;
But I have no idea how to get the needed result for 3 (first). I also don't know how to put everything in a single query, to save execution time (the games table may include millions of records).
In case you need it, here's a query which inserts the example data:
INSERT INTO `players` (`id`, `name`, `created_at`) VALUES
(1, 'Alan', '2016-02-01 00:00:00'),
(2, 'Benny', '2016-02-01 06:00:00'),
(3, 'Calvin', '2016-02-02 00:00:00'),
(4, 'Dan', '2016-02-03 00:00:00');
INSERT INTO `games` (`id`, `player`, `played_at`) VALUES
(1, 1, '2016-02-01 01:00:00'),
(2, 3, '2016-02-02 02:00:00'),
(3, 2, '2016-02-03 14:00:00'),
(4, 3, '2016-02-03 17:00:00'),
(5, 3, '2016-02-03 18:00:00');
One version is to get all relevant data into a union and do the analysis from there;
SELECT SUM(type='P') new,
COUNT(DISTINCT CASE WHEN type='G' THEN pid END) played,
SUM(type='F') first
FROM (
SELECT id pid, DATE(created_at) date, 'P' type FROM players
UNION ALL
SELECT player, DATE(played_at) date, 'G' FROM games
UNION ALL
SELECT player, MIN(DATE(played_at)), 'F' FROM games GROUP BY player
) z
GROUP BY date;
In the union;
Records with type P is player creation statistics.
Records with type G is player related game statistics.
Records with type F is statistics for when players played their first game.
You can count the result of a temp table based on min(played_at) and filterd by having
select count(player) from
( select player, min(played_at)
from games
group by player
having min(played_at) = YOUR_GIVEN_DATE ) as t;
this query will give you the result:
select day,( select count(distinct(id)) from players where Date(created_at) = temp.day ) as no_created_at ,
( select count(distinct(player)) from games where Date(played_at) = temp.day) as no_played_at,
( select count(distinct(player)) from games where Date(played_at) =
(select min(Date(played_at)) from games internal_games
where internal_games.player =games.player and Date(games.played_at) = temp.day )) as no_first_played_at
from (
SELECT Date(created_at) AS day
FROM players
GROUP BY day
union
select Date(played_at) AS day
FROM games
GROUP BY day) temp
and the output:
Here's a solution with a bunch of subqueries, which accounts for the possibility that players may have been created on days with no games, and vice versa:
select
all_dates.date as day,
ifnull(new.num, 0) as new,
ifnull(players.num, 0) as players,
ifnull(first.num, 0) as first
from (
select date(created_at) as date from players
union
select date(played_at) from games
) as all_dates
left join (
select date(created_at) as created_at_date, count(*) as num
from players
group by created_at_date
) as new on all_dates.date = new.created_at_date
left join (
select date(played_at) as played_at_date, count(distinct player) as num
from games
group by played_at_date
) as players on all_dates.date = players.played_at_date
left join (
select min_date, count(*) num
from (
select player, date(min(played_at)) as min_date
from games
group by player
) as players_first
group by min_date
) as first on all_dates.date = first.min_date
order by day;