Query to count the top three items that appear multiple times - mysql

In the query below I have a basic table with two fields course_name and course_id. I am trying to display through the query the top three courses by the amount of times they appear in the table. I am using COUNT, then I am comparing the values in additional select to determine the top three and finally display the results by GROUP by course_id. I am getting a mysql syntax problem. How can I display the top three courses with the amount of times they appear in the table? FIDDLE
SELECT course_name, course_id, COUNT(1) AS cnt
FROM courses
JOIN (SELECT distinct cnt cnt3
FROM courses
ORDER BY cnt DESC
LIMIT 2, 1) x
ON cnt >= cnt3
ORDER by cnt DESC
GROUP BY course_id
Table Schema:
CREATE TABLE courses
(`course_name` varchar(15), `course_id` int)
;
INSERT INTO courses
(`course_name`, `course_id`)
VALUES
('Math', 1),
('Science', 2),
('PHYS', 3),
('Study Hall', 4),
('History', 5),
('Social Studies', 6),
('Math', 1),
('PHYS', 3),
('Math', 1),
('Science', 2),
('Science', 2),
('Study Hall', 4),
('History', 5)
;
Desired Result:
+-------------+-------+
| Course_name | Count |
+-------------+-------+
| Math | 3 |
| Science | 3 |
| PHYS | 2 |
| Study | 2 |
| History | 2 |
+-------------+-------+

Your syntax error is due to this:
ON cnt >= cnt3
cnt is an alias and you can't use it in a join. Also, your order by and group by clauses are in the wrong order.
Edit starts here
Looking at your query, you may have overengineered it. Would this not give you your answer?
select course_name, course_id, count(*) records
from courses
group by course_name, course_id
having count(*) > 1
order by records desc
limit 3

Select course_name,cnt from(
Select course_id,course_name,count(course_id) as cnt group by course_id,course_name
)tmp
Order by cnt desc limit 0,3

Related

How to select the first three rows for each user_id in MySQL [duplicate]

The following is the simplest possible example, though any solution should be able to scale to however many n top results are needed:
Given a table like that below, with person, group, and age columns, how would you get the 2 oldest people in each group? (Ties within groups should not yield more results, but give the first 2 in alphabetical order)
+--------+-------+-----+
| Person | Group | Age |
+--------+-------+-----+
| Bob | 1 | 32 |
| Jill | 1 | 34 |
| Shawn | 1 | 42 |
| Jake | 2 | 29 |
| Paul | 2 | 36 |
| Laura | 2 | 39 |
+--------+-------+-----+
Desired result set:
+--------+-------+-----+
| Shawn | 1 | 42 |
| Jill | 1 | 34 |
| Laura | 2 | 39 |
| Paul | 2 | 36 |
+--------+-------+-----+
NOTE: This question builds on a previous one- Get records with max value for each group of grouped SQL results - for getting a single top row from each group, and which received a great MySQL-specific answer from #Bohemian:
select *
from (select * from mytable order by `Group`, Age desc, Person) x
group by `Group`
Would love to be able to build off this, though I don't see how.
Here is one way to do this, using UNION ALL (See SQL Fiddle with Demo). This works with two groups, if you have more than two groups, then you would need to specify the group number and add queries for each group:
(
select *
from mytable
where `group` = 1
order by age desc
LIMIT 2
)
UNION ALL
(
select *
from mytable
where `group` = 2
order by age desc
LIMIT 2
)
There are a variety of ways to do this, see this article to determine the best route for your situation:
http://www.xaprb.com/blog/2006/12/07/how-to-select-the-firstleastmax-row-per-group-in-sql/
Edit:
This might work for you too, it generates a row number for each record. Using an example from the link above this will return only those records with a row number of less than or equal to 2:
select person, `group`, age
from
(
select person, `group`, age,
(#num:=if(#group = `group`, #num +1, if(#group := `group`, 1, 1))) row_number
from test t
CROSS JOIN (select #num:=0, #group:=null) c
order by `Group`, Age desc, person
) as x
where x.row_number <= 2;
See Demo
In other databases you can do this using ROW_NUMBER. MySQL doesn't support ROW_NUMBER but you can use variables to emulate it:
SELECT
person,
groupname,
age
FROM
(
SELECT
person,
groupname,
age,
#rn := IF(#prev = groupname, #rn + 1, 1) AS rn,
#prev := groupname
FROM mytable
JOIN (SELECT #prev := NULL, #rn := 0) AS vars
ORDER BY groupname, age DESC, person
) AS T1
WHERE rn <= 2
See it working online: sqlfiddle
Edit I just noticed that bluefeet posted a very similar answer: +1 to him. However this answer has two small advantages:
It it is a single query. The variables are initialized inside the SELECT statement.
It handles ties as described in the question (alphabetical order by name).
So I'll leave it here in case it can help someone.
Try this:
SELECT a.person, a.group, a.age FROM person AS a WHERE
(SELECT COUNT(*) FROM person AS b
WHERE b.group = a.group AND b.age >= a.age) <= 2
ORDER BY a.group ASC, a.age DESC
DEMO
How about using self-joining:
CREATE TABLE mytable (person, groupname, age);
INSERT INTO mytable VALUES('Bob',1,32);
INSERT INTO mytable VALUES('Jill',1,34);
INSERT INTO mytable VALUES('Shawn',1,42);
INSERT INTO mytable VALUES('Jake',2,29);
INSERT INTO mytable VALUES('Paul',2,36);
INSERT INTO mytable VALUES('Laura',2,39);
SELECT a.* FROM mytable AS a
LEFT JOIN mytable AS a2
ON a.groupname = a2.groupname AND a.age <= a2.age
GROUP BY a.person
HAVING COUNT(*) <= 2
ORDER BY a.groupname, a.age DESC;
gives me:
a.person a.groupname a.age
---------- ----------- ----------
Shawn 1 42
Jill 1 34
Laura 2 39
Paul 2 36
I was strongly inspired by the answer from Bill Karwin to Select top 10 records for each category
Also, I'm using SQLite, but this should work on MySQL.
Another thing: in the above, I replaced the group column with a groupname column for convenience.
Edit:
Following-up on the OP's comment regarding missing tie results, I incremented on snuffin's answer to show all the ties. This means that if the last ones are ties, more than 2 rows can be returned, as shown below:
.headers on
.mode column
CREATE TABLE foo (person, groupname, age);
INSERT INTO foo VALUES('Paul',2,36);
INSERT INTO foo VALUES('Laura',2,39);
INSERT INTO foo VALUES('Joe',2,36);
INSERT INTO foo VALUES('Bob',1,32);
INSERT INTO foo VALUES('Jill',1,34);
INSERT INTO foo VALUES('Shawn',1,42);
INSERT INTO foo VALUES('Jake',2,29);
INSERT INTO foo VALUES('James',2,15);
INSERT INTO foo VALUES('Fred',1,12);
INSERT INTO foo VALUES('Chuck',3,112);
SELECT a.person, a.groupname, a.age
FROM foo AS a
WHERE a.age >= (SELECT MIN(b.age)
FROM foo AS b
WHERE (SELECT COUNT(*)
FROM foo AS c
WHERE c.groupname = b.groupname AND c.age >= b.age) <= 2
GROUP BY b.groupname)
ORDER BY a.groupname ASC, a.age DESC;
gives me:
person groupname age
---------- ---------- ----------
Shawn 1 42
Jill 1 34
Laura 2 39
Paul 2 36
Joe 2 36
Chuck 3 112
Snuffin solution seems quite slow to execute when you've got plenty of rows and Mark Byers/Rick James and Bluefeet solutions doesn't work on my environnement (MySQL 5.6) because order by is applied after execution of select, so here is a variant of Marc Byers/Rick James solutions to fix this issue (with an extra imbricated select):
select person, groupname, age
from
(
select person, groupname, age,
(#rn:=if(#prev = groupname, #rn +1, 1)) as rownumb,
#prev:= groupname
from
(
select person, groupname, age
from persons
order by groupname , age desc, person
) as sortedlist
JOIN (select #prev:=NULL, #rn :=0) as vars
) as groupedlist
where rownumb<=2
order by groupname , age desc, person;
I tried similar query on a table having 5 millions rows and it returns result in less than 3 seconds
If the other answers are not fast enough Give this code a try:
SELECT
province, n, city, population
FROM
( SELECT #prev := '', #n := 0 ) init
JOIN
( SELECT #n := if(province != #prev, 1, #n + 1) AS n,
#prev := province,
province, city, population
FROM Canada
ORDER BY
province ASC,
population DESC
) x
WHERE n <= 3
ORDER BY province, n;
Output:
+---------------------------+------+------------------+------------+
| province | n | city | population |
+---------------------------+------+------------------+------------+
| Alberta | 1 | Calgary | 968475 |
| Alberta | 2 | Edmonton | 822319 |
| Alberta | 3 | Red Deer | 73595 |
| British Columbia | 1 | Vancouver | 1837970 |
| British Columbia | 2 | Victoria | 289625 |
| British Columbia | 3 | Abbotsford | 151685 |
| Manitoba | 1 | ...
Check this out:
SELECT
p.Person,
p.`Group`,
p.Age
FROM
people p
INNER JOIN
(
SELECT MAX(Age) AS Age, `Group` FROM people GROUP BY `Group`
UNION
SELECT MAX(p3.Age) AS Age, p3.`Group` FROM people p3 INNER JOIN (SELECT MAX(Age) AS Age, `Group` FROM people GROUP BY `Group`) p4 ON p3.Age < p4.Age AND p3.`Group` = p4.`Group` GROUP BY `Group`
) p2 ON p.Age = p2.Age AND p.`Group` = p2.`Group`
ORDER BY
`Group`,
Age DESC,
Person;
SQL Fiddle: http://sqlfiddle.com/#!2/cdbb6/15
WITH cte_window AS (
SELECT movie_name,director_id,release_date,
ROW_NUMBER() OVER( PARTITION BY director_id ORDER BY release_date DESC) r
FROM movies
)
SELECT * FROM cte_window WHERE r <= <n>;
Above query will returns latest n movies for each directors.
I wanted to share this because I spent a long time searching for an easy way to implement this in a java program I'm working on. This doesn't quite give the output you're looking for but its close. The function in mysql called GROUP_CONCAT() worked really well for specifying how many results to return in each group. Using LIMIT or any of the other fancy ways of trying to do this with COUNT didn't work for me. So if you're willing to accept a modified output, its a great solution. Lets say I have a table called 'student' with student ids, their gender, and gpa. Lets say I want to top 5 gpas for each gender. Then I can write the query like this
SELECT sex, SUBSTRING_INDEX(GROUP_CONCAT(cast(gpa AS char ) ORDER BY gpa desc), ',',5)
AS subcategories FROM student GROUP BY sex;
Note that the parameter '5' tells it how many entries to concatenate into each row
And the output would look something like
+--------+----------------+
| Male | 4,4,4,4,3.9 |
| Female | 4,4,3.9,3.9,3.8|
+--------+----------------+
You can also change the ORDER BY variable and order them a different way. So if I had the student's age I could replace the 'gpa desc' with 'age desc' and it will work! You can also add variables to the group by statement to get more columns in the output. So this is just a way I found that is pretty flexible and works good if you are ok with just listing results.
In SQL Server row_numer() is a powerful function that can get result easily as below
select Person,[group],age
from
(
select * ,row_number() over(partition by [group] order by age desc) rn
from mytable
) t
where rn <= 2
There is a really nice answer to this problem at MySQL - How To Get Top N Rows per Each Group
Based on the solution in the referenced link, your query would be like:
SELECT Person, Group, Age
FROM
(SELECT Person, Group, Age,
#group_rank := IF(#group = Group, #group_rank + 1, 1) AS group_rank,
#current_group := Group
FROM `your_table`
ORDER BY Group, Age DESC
) ranked
WHERE group_rank <= `n`
ORDER BY Group, Age DESC;
where n is the top n and your_table is the name of your table.
I think the explanation in the reference is really clear. For quick reference I will copy and paste it here:
Currently MySQL does not support ROW_NUMBER() function that can assign
a sequence number within a group, but as a workaround we can use MySQL
session variables.
These variables do not require declaration, and can be used in a query
to do calculations and to store intermediate results.
#current_country := country This code is executed for each row and
stores the value of country column to #current_country variable.
#country_rank := IF(#current_country = country, #country_rank + 1, 1)
In this code, if #current_country is the same we increment rank,
otherwise set it to 1. For the first row #current_country is NULL, so
rank is also set to 1.
For correct ranking, we need to have ORDER BY country, population DESC
SELECT
p1.Person,
p1.`GROUP`,
p1.Age
FROM
person AS p1
WHERE
(
SELECT
COUNT( DISTINCT ( p2.age ) )
FROM
person AS p2
WHERE
p2.`GROUP` = p1.`GROUP`
AND p2.Age >= p1.Age
) < 2
ORDER BY
p1.`GROUP` ASC,
p1.age DESC
reference leetcode

Select N rows grouped by foreign key [duplicate]

The following is the simplest possible example, though any solution should be able to scale to however many n top results are needed:
Given a table like that below, with person, group, and age columns, how would you get the 2 oldest people in each group? (Ties within groups should not yield more results, but give the first 2 in alphabetical order)
+--------+-------+-----+
| Person | Group | Age |
+--------+-------+-----+
| Bob | 1 | 32 |
| Jill | 1 | 34 |
| Shawn | 1 | 42 |
| Jake | 2 | 29 |
| Paul | 2 | 36 |
| Laura | 2 | 39 |
+--------+-------+-----+
Desired result set:
+--------+-------+-----+
| Shawn | 1 | 42 |
| Jill | 1 | 34 |
| Laura | 2 | 39 |
| Paul | 2 | 36 |
+--------+-------+-----+
NOTE: This question builds on a previous one- Get records with max value for each group of grouped SQL results - for getting a single top row from each group, and which received a great MySQL-specific answer from #Bohemian:
select *
from (select * from mytable order by `Group`, Age desc, Person) x
group by `Group`
Would love to be able to build off this, though I don't see how.
Here is one way to do this, using UNION ALL (See SQL Fiddle with Demo). This works with two groups, if you have more than two groups, then you would need to specify the group number and add queries for each group:
(
select *
from mytable
where `group` = 1
order by age desc
LIMIT 2
)
UNION ALL
(
select *
from mytable
where `group` = 2
order by age desc
LIMIT 2
)
There are a variety of ways to do this, see this article to determine the best route for your situation:
http://www.xaprb.com/blog/2006/12/07/how-to-select-the-firstleastmax-row-per-group-in-sql/
Edit:
This might work for you too, it generates a row number for each record. Using an example from the link above this will return only those records with a row number of less than or equal to 2:
select person, `group`, age
from
(
select person, `group`, age,
(#num:=if(#group = `group`, #num +1, if(#group := `group`, 1, 1))) row_number
from test t
CROSS JOIN (select #num:=0, #group:=null) c
order by `Group`, Age desc, person
) as x
where x.row_number <= 2;
See Demo
In other databases you can do this using ROW_NUMBER. MySQL doesn't support ROW_NUMBER but you can use variables to emulate it:
SELECT
person,
groupname,
age
FROM
(
SELECT
person,
groupname,
age,
#rn := IF(#prev = groupname, #rn + 1, 1) AS rn,
#prev := groupname
FROM mytable
JOIN (SELECT #prev := NULL, #rn := 0) AS vars
ORDER BY groupname, age DESC, person
) AS T1
WHERE rn <= 2
See it working online: sqlfiddle
Edit I just noticed that bluefeet posted a very similar answer: +1 to him. However this answer has two small advantages:
It it is a single query. The variables are initialized inside the SELECT statement.
It handles ties as described in the question (alphabetical order by name).
So I'll leave it here in case it can help someone.
Try this:
SELECT a.person, a.group, a.age FROM person AS a WHERE
(SELECT COUNT(*) FROM person AS b
WHERE b.group = a.group AND b.age >= a.age) <= 2
ORDER BY a.group ASC, a.age DESC
DEMO
How about using self-joining:
CREATE TABLE mytable (person, groupname, age);
INSERT INTO mytable VALUES('Bob',1,32);
INSERT INTO mytable VALUES('Jill',1,34);
INSERT INTO mytable VALUES('Shawn',1,42);
INSERT INTO mytable VALUES('Jake',2,29);
INSERT INTO mytable VALUES('Paul',2,36);
INSERT INTO mytable VALUES('Laura',2,39);
SELECT a.* FROM mytable AS a
LEFT JOIN mytable AS a2
ON a.groupname = a2.groupname AND a.age <= a2.age
GROUP BY a.person
HAVING COUNT(*) <= 2
ORDER BY a.groupname, a.age DESC;
gives me:
a.person a.groupname a.age
---------- ----------- ----------
Shawn 1 42
Jill 1 34
Laura 2 39
Paul 2 36
I was strongly inspired by the answer from Bill Karwin to Select top 10 records for each category
Also, I'm using SQLite, but this should work on MySQL.
Another thing: in the above, I replaced the group column with a groupname column for convenience.
Edit:
Following-up on the OP's comment regarding missing tie results, I incremented on snuffin's answer to show all the ties. This means that if the last ones are ties, more than 2 rows can be returned, as shown below:
.headers on
.mode column
CREATE TABLE foo (person, groupname, age);
INSERT INTO foo VALUES('Paul',2,36);
INSERT INTO foo VALUES('Laura',2,39);
INSERT INTO foo VALUES('Joe',2,36);
INSERT INTO foo VALUES('Bob',1,32);
INSERT INTO foo VALUES('Jill',1,34);
INSERT INTO foo VALUES('Shawn',1,42);
INSERT INTO foo VALUES('Jake',2,29);
INSERT INTO foo VALUES('James',2,15);
INSERT INTO foo VALUES('Fred',1,12);
INSERT INTO foo VALUES('Chuck',3,112);
SELECT a.person, a.groupname, a.age
FROM foo AS a
WHERE a.age >= (SELECT MIN(b.age)
FROM foo AS b
WHERE (SELECT COUNT(*)
FROM foo AS c
WHERE c.groupname = b.groupname AND c.age >= b.age) <= 2
GROUP BY b.groupname)
ORDER BY a.groupname ASC, a.age DESC;
gives me:
person groupname age
---------- ---------- ----------
Shawn 1 42
Jill 1 34
Laura 2 39
Paul 2 36
Joe 2 36
Chuck 3 112
Snuffin solution seems quite slow to execute when you've got plenty of rows and Mark Byers/Rick James and Bluefeet solutions doesn't work on my environnement (MySQL 5.6) because order by is applied after execution of select, so here is a variant of Marc Byers/Rick James solutions to fix this issue (with an extra imbricated select):
select person, groupname, age
from
(
select person, groupname, age,
(#rn:=if(#prev = groupname, #rn +1, 1)) as rownumb,
#prev:= groupname
from
(
select person, groupname, age
from persons
order by groupname , age desc, person
) as sortedlist
JOIN (select #prev:=NULL, #rn :=0) as vars
) as groupedlist
where rownumb<=2
order by groupname , age desc, person;
I tried similar query on a table having 5 millions rows and it returns result in less than 3 seconds
If the other answers are not fast enough Give this code a try:
SELECT
province, n, city, population
FROM
( SELECT #prev := '', #n := 0 ) init
JOIN
( SELECT #n := if(province != #prev, 1, #n + 1) AS n,
#prev := province,
province, city, population
FROM Canada
ORDER BY
province ASC,
population DESC
) x
WHERE n <= 3
ORDER BY province, n;
Output:
+---------------------------+------+------------------+------------+
| province | n | city | population |
+---------------------------+------+------------------+------------+
| Alberta | 1 | Calgary | 968475 |
| Alberta | 2 | Edmonton | 822319 |
| Alberta | 3 | Red Deer | 73595 |
| British Columbia | 1 | Vancouver | 1837970 |
| British Columbia | 2 | Victoria | 289625 |
| British Columbia | 3 | Abbotsford | 151685 |
| Manitoba | 1 | ...
Check this out:
SELECT
p.Person,
p.`Group`,
p.Age
FROM
people p
INNER JOIN
(
SELECT MAX(Age) AS Age, `Group` FROM people GROUP BY `Group`
UNION
SELECT MAX(p3.Age) AS Age, p3.`Group` FROM people p3 INNER JOIN (SELECT MAX(Age) AS Age, `Group` FROM people GROUP BY `Group`) p4 ON p3.Age < p4.Age AND p3.`Group` = p4.`Group` GROUP BY `Group`
) p2 ON p.Age = p2.Age AND p.`Group` = p2.`Group`
ORDER BY
`Group`,
Age DESC,
Person;
SQL Fiddle: http://sqlfiddle.com/#!2/cdbb6/15
WITH cte_window AS (
SELECT movie_name,director_id,release_date,
ROW_NUMBER() OVER( PARTITION BY director_id ORDER BY release_date DESC) r
FROM movies
)
SELECT * FROM cte_window WHERE r <= <n>;
Above query will returns latest n movies for each directors.
I wanted to share this because I spent a long time searching for an easy way to implement this in a java program I'm working on. This doesn't quite give the output you're looking for but its close. The function in mysql called GROUP_CONCAT() worked really well for specifying how many results to return in each group. Using LIMIT or any of the other fancy ways of trying to do this with COUNT didn't work for me. So if you're willing to accept a modified output, its a great solution. Lets say I have a table called 'student' with student ids, their gender, and gpa. Lets say I want to top 5 gpas for each gender. Then I can write the query like this
SELECT sex, SUBSTRING_INDEX(GROUP_CONCAT(cast(gpa AS char ) ORDER BY gpa desc), ',',5)
AS subcategories FROM student GROUP BY sex;
Note that the parameter '5' tells it how many entries to concatenate into each row
And the output would look something like
+--------+----------------+
| Male | 4,4,4,4,3.9 |
| Female | 4,4,3.9,3.9,3.8|
+--------+----------------+
You can also change the ORDER BY variable and order them a different way. So if I had the student's age I could replace the 'gpa desc' with 'age desc' and it will work! You can also add variables to the group by statement to get more columns in the output. So this is just a way I found that is pretty flexible and works good if you are ok with just listing results.
In SQL Server row_numer() is a powerful function that can get result easily as below
select Person,[group],age
from
(
select * ,row_number() over(partition by [group] order by age desc) rn
from mytable
) t
where rn <= 2
There is a really nice answer to this problem at MySQL - How To Get Top N Rows per Each Group
Based on the solution in the referenced link, your query would be like:
SELECT Person, Group, Age
FROM
(SELECT Person, Group, Age,
#group_rank := IF(#group = Group, #group_rank + 1, 1) AS group_rank,
#current_group := Group
FROM `your_table`
ORDER BY Group, Age DESC
) ranked
WHERE group_rank <= `n`
ORDER BY Group, Age DESC;
where n is the top n and your_table is the name of your table.
I think the explanation in the reference is really clear. For quick reference I will copy and paste it here:
Currently MySQL does not support ROW_NUMBER() function that can assign
a sequence number within a group, but as a workaround we can use MySQL
session variables.
These variables do not require declaration, and can be used in a query
to do calculations and to store intermediate results.
#current_country := country This code is executed for each row and
stores the value of country column to #current_country variable.
#country_rank := IF(#current_country = country, #country_rank + 1, 1)
In this code, if #current_country is the same we increment rank,
otherwise set it to 1. For the first row #current_country is NULL, so
rank is also set to 1.
For correct ranking, we need to have ORDER BY country, population DESC
SELECT
p1.Person,
p1.`GROUP`,
p1.Age
FROM
person AS p1
WHERE
(
SELECT
COUNT( DISTINCT ( p2.age ) )
FROM
person AS p2
WHERE
p2.`GROUP` = p1.`GROUP`
AND p2.Age >= p1.Age
) < 2
ORDER BY
p1.`GROUP` ASC,
p1.age DESC
reference leetcode

sql statement to find max value for each primary key record [duplicate]

The following is the simplest possible example, though any solution should be able to scale to however many n top results are needed:
Given a table like that below, with person, group, and age columns, how would you get the 2 oldest people in each group? (Ties within groups should not yield more results, but give the first 2 in alphabetical order)
+--------+-------+-----+
| Person | Group | Age |
+--------+-------+-----+
| Bob | 1 | 32 |
| Jill | 1 | 34 |
| Shawn | 1 | 42 |
| Jake | 2 | 29 |
| Paul | 2 | 36 |
| Laura | 2 | 39 |
+--------+-------+-----+
Desired result set:
+--------+-------+-----+
| Shawn | 1 | 42 |
| Jill | 1 | 34 |
| Laura | 2 | 39 |
| Paul | 2 | 36 |
+--------+-------+-----+
NOTE: This question builds on a previous one- Get records with max value for each group of grouped SQL results - for getting a single top row from each group, and which received a great MySQL-specific answer from #Bohemian:
select *
from (select * from mytable order by `Group`, Age desc, Person) x
group by `Group`
Would love to be able to build off this, though I don't see how.
Here is one way to do this, using UNION ALL (See SQL Fiddle with Demo). This works with two groups, if you have more than two groups, then you would need to specify the group number and add queries for each group:
(
select *
from mytable
where `group` = 1
order by age desc
LIMIT 2
)
UNION ALL
(
select *
from mytable
where `group` = 2
order by age desc
LIMIT 2
)
There are a variety of ways to do this, see this article to determine the best route for your situation:
http://www.xaprb.com/blog/2006/12/07/how-to-select-the-firstleastmax-row-per-group-in-sql/
Edit:
This might work for you too, it generates a row number for each record. Using an example from the link above this will return only those records with a row number of less than or equal to 2:
select person, `group`, age
from
(
select person, `group`, age,
(#num:=if(#group = `group`, #num +1, if(#group := `group`, 1, 1))) row_number
from test t
CROSS JOIN (select #num:=0, #group:=null) c
order by `Group`, Age desc, person
) as x
where x.row_number <= 2;
See Demo
In other databases you can do this using ROW_NUMBER. MySQL doesn't support ROW_NUMBER but you can use variables to emulate it:
SELECT
person,
groupname,
age
FROM
(
SELECT
person,
groupname,
age,
#rn := IF(#prev = groupname, #rn + 1, 1) AS rn,
#prev := groupname
FROM mytable
JOIN (SELECT #prev := NULL, #rn := 0) AS vars
ORDER BY groupname, age DESC, person
) AS T1
WHERE rn <= 2
See it working online: sqlfiddle
Edit I just noticed that bluefeet posted a very similar answer: +1 to him. However this answer has two small advantages:
It it is a single query. The variables are initialized inside the SELECT statement.
It handles ties as described in the question (alphabetical order by name).
So I'll leave it here in case it can help someone.
Try this:
SELECT a.person, a.group, a.age FROM person AS a WHERE
(SELECT COUNT(*) FROM person AS b
WHERE b.group = a.group AND b.age >= a.age) <= 2
ORDER BY a.group ASC, a.age DESC
DEMO
How about using self-joining:
CREATE TABLE mytable (person, groupname, age);
INSERT INTO mytable VALUES('Bob',1,32);
INSERT INTO mytable VALUES('Jill',1,34);
INSERT INTO mytable VALUES('Shawn',1,42);
INSERT INTO mytable VALUES('Jake',2,29);
INSERT INTO mytable VALUES('Paul',2,36);
INSERT INTO mytable VALUES('Laura',2,39);
SELECT a.* FROM mytable AS a
LEFT JOIN mytable AS a2
ON a.groupname = a2.groupname AND a.age <= a2.age
GROUP BY a.person
HAVING COUNT(*) <= 2
ORDER BY a.groupname, a.age DESC;
gives me:
a.person a.groupname a.age
---------- ----------- ----------
Shawn 1 42
Jill 1 34
Laura 2 39
Paul 2 36
I was strongly inspired by the answer from Bill Karwin to Select top 10 records for each category
Also, I'm using SQLite, but this should work on MySQL.
Another thing: in the above, I replaced the group column with a groupname column for convenience.
Edit:
Following-up on the OP's comment regarding missing tie results, I incremented on snuffin's answer to show all the ties. This means that if the last ones are ties, more than 2 rows can be returned, as shown below:
.headers on
.mode column
CREATE TABLE foo (person, groupname, age);
INSERT INTO foo VALUES('Paul',2,36);
INSERT INTO foo VALUES('Laura',2,39);
INSERT INTO foo VALUES('Joe',2,36);
INSERT INTO foo VALUES('Bob',1,32);
INSERT INTO foo VALUES('Jill',1,34);
INSERT INTO foo VALUES('Shawn',1,42);
INSERT INTO foo VALUES('Jake',2,29);
INSERT INTO foo VALUES('James',2,15);
INSERT INTO foo VALUES('Fred',1,12);
INSERT INTO foo VALUES('Chuck',3,112);
SELECT a.person, a.groupname, a.age
FROM foo AS a
WHERE a.age >= (SELECT MIN(b.age)
FROM foo AS b
WHERE (SELECT COUNT(*)
FROM foo AS c
WHERE c.groupname = b.groupname AND c.age >= b.age) <= 2
GROUP BY b.groupname)
ORDER BY a.groupname ASC, a.age DESC;
gives me:
person groupname age
---------- ---------- ----------
Shawn 1 42
Jill 1 34
Laura 2 39
Paul 2 36
Joe 2 36
Chuck 3 112
Snuffin solution seems quite slow to execute when you've got plenty of rows and Mark Byers/Rick James and Bluefeet solutions doesn't work on my environnement (MySQL 5.6) because order by is applied after execution of select, so here is a variant of Marc Byers/Rick James solutions to fix this issue (with an extra imbricated select):
select person, groupname, age
from
(
select person, groupname, age,
(#rn:=if(#prev = groupname, #rn +1, 1)) as rownumb,
#prev:= groupname
from
(
select person, groupname, age
from persons
order by groupname , age desc, person
) as sortedlist
JOIN (select #prev:=NULL, #rn :=0) as vars
) as groupedlist
where rownumb<=2
order by groupname , age desc, person;
I tried similar query on a table having 5 millions rows and it returns result in less than 3 seconds
If the other answers are not fast enough Give this code a try:
SELECT
province, n, city, population
FROM
( SELECT #prev := '', #n := 0 ) init
JOIN
( SELECT #n := if(province != #prev, 1, #n + 1) AS n,
#prev := province,
province, city, population
FROM Canada
ORDER BY
province ASC,
population DESC
) x
WHERE n <= 3
ORDER BY province, n;
Output:
+---------------------------+------+------------------+------------+
| province | n | city | population |
+---------------------------+------+------------------+------------+
| Alberta | 1 | Calgary | 968475 |
| Alberta | 2 | Edmonton | 822319 |
| Alberta | 3 | Red Deer | 73595 |
| British Columbia | 1 | Vancouver | 1837970 |
| British Columbia | 2 | Victoria | 289625 |
| British Columbia | 3 | Abbotsford | 151685 |
| Manitoba | 1 | ...
Check this out:
SELECT
p.Person,
p.`Group`,
p.Age
FROM
people p
INNER JOIN
(
SELECT MAX(Age) AS Age, `Group` FROM people GROUP BY `Group`
UNION
SELECT MAX(p3.Age) AS Age, p3.`Group` FROM people p3 INNER JOIN (SELECT MAX(Age) AS Age, `Group` FROM people GROUP BY `Group`) p4 ON p3.Age < p4.Age AND p3.`Group` = p4.`Group` GROUP BY `Group`
) p2 ON p.Age = p2.Age AND p.`Group` = p2.`Group`
ORDER BY
`Group`,
Age DESC,
Person;
SQL Fiddle: http://sqlfiddle.com/#!2/cdbb6/15
WITH cte_window AS (
SELECT movie_name,director_id,release_date,
ROW_NUMBER() OVER( PARTITION BY director_id ORDER BY release_date DESC) r
FROM movies
)
SELECT * FROM cte_window WHERE r <= <n>;
Above query will returns latest n movies for each directors.
I wanted to share this because I spent a long time searching for an easy way to implement this in a java program I'm working on. This doesn't quite give the output you're looking for but its close. The function in mysql called GROUP_CONCAT() worked really well for specifying how many results to return in each group. Using LIMIT or any of the other fancy ways of trying to do this with COUNT didn't work for me. So if you're willing to accept a modified output, its a great solution. Lets say I have a table called 'student' with student ids, their gender, and gpa. Lets say I want to top 5 gpas for each gender. Then I can write the query like this
SELECT sex, SUBSTRING_INDEX(GROUP_CONCAT(cast(gpa AS char ) ORDER BY gpa desc), ',',5)
AS subcategories FROM student GROUP BY sex;
Note that the parameter '5' tells it how many entries to concatenate into each row
And the output would look something like
+--------+----------------+
| Male | 4,4,4,4,3.9 |
| Female | 4,4,3.9,3.9,3.8|
+--------+----------------+
You can also change the ORDER BY variable and order them a different way. So if I had the student's age I could replace the 'gpa desc' with 'age desc' and it will work! You can also add variables to the group by statement to get more columns in the output. So this is just a way I found that is pretty flexible and works good if you are ok with just listing results.
In SQL Server row_numer() is a powerful function that can get result easily as below
select Person,[group],age
from
(
select * ,row_number() over(partition by [group] order by age desc) rn
from mytable
) t
where rn <= 2
There is a really nice answer to this problem at MySQL - How To Get Top N Rows per Each Group
Based on the solution in the referenced link, your query would be like:
SELECT Person, Group, Age
FROM
(SELECT Person, Group, Age,
#group_rank := IF(#group = Group, #group_rank + 1, 1) AS group_rank,
#current_group := Group
FROM `your_table`
ORDER BY Group, Age DESC
) ranked
WHERE group_rank <= `n`
ORDER BY Group, Age DESC;
where n is the top n and your_table is the name of your table.
I think the explanation in the reference is really clear. For quick reference I will copy and paste it here:
Currently MySQL does not support ROW_NUMBER() function that can assign
a sequence number within a group, but as a workaround we can use MySQL
session variables.
These variables do not require declaration, and can be used in a query
to do calculations and to store intermediate results.
#current_country := country This code is executed for each row and
stores the value of country column to #current_country variable.
#country_rank := IF(#current_country = country, #country_rank + 1, 1)
In this code, if #current_country is the same we increment rank,
otherwise set it to 1. For the first row #current_country is NULL, so
rank is also set to 1.
For correct ranking, we need to have ORDER BY country, population DESC
SELECT
p1.Person,
p1.`GROUP`,
p1.Age
FROM
person AS p1
WHERE
(
SELECT
COUNT( DISTINCT ( p2.age ) )
FROM
person AS p2
WHERE
p2.`GROUP` = p1.`GROUP`
AND p2.Age >= p1.Age
) < 2
ORDER BY
p1.`GROUP` ASC,
p1.age DESC
reference leetcode

finding number of repeated users who have interacted with a category

I am new to sql and am trying to figure out the following..
Imagine the following table:
user_id, category_id
1, 12344
1, 12344
1, 12345
2, 12345
2, 12345
3, 12344
3, 12344
and so on..
I want to find number of repeated users each category got..
so, in example above..
12344, 2 (because user_id 1 and 3 are repeated users)
12345, 1 (user_id 2 is repeated user.. 1 is not as that user visited just once)
How do i figure this out in sql/hive?
E.g.:
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
,user_id INT NOT NULL
,category_id INT NOT NULL
);
INSERT INTO my_table (user_id,category_id) VALUES
(1, 12344),
(1, 12344),
(1, 12345),
(2, 12345),
(2, 12345),
(3, 12344),
(3, 12344);
SELECT category_id
, COUNT(*) total
FROM
( SELECT x.*
FROM my_table x
JOIN my_table y
ON y.user_id = x.user_id
AND y.category_id = x.category_id
AND y.id < x.id
) a
GROUP
BY category_id;
+-------------+-------+
| category_id | total |
+-------------+-------+
| 12344 | 2 |
| 12345 | 1 |
+-------------+-------+
It's a little hard to follow what you're looking for, but test this:
select category_id, count(user_id) from (Select category_id, user_id, count(table_primary_id) as 'total' from tablename group by category_id, user_id) a where total > 1 group by category_id
The subquery counts the number of times a user visited a category, and the outside query should count the number of users who visited a category more than once.

SQL: How to make a conditional COUNT and SUM

Let's say I have the following table:
id | letter | date
--------------------------------
1 | A | 2011-01-01
2 | A | 2011-04-01
3 | A | 2011-04-01
4 | B | 2011-01-01
5 | B | 2011-01-01
6 | B | 2011-01-01
I would like to make a count of the rows broken down by letter and date, and sum the count of all the previous dates. every letter should have a row to every date of the table (ie. letter B doesn't have a 2011-04-01 date, but still appears in the result)
The resulting table would look like this
letter| date | total
--------------------------------
A | 2011-01-01 | 1
A | 2011-04-01 | 3
B | 2011-01-01 | 3
B | 2011-04-01 | 3
How to achieve this in a SQL query?
Thank you for your help!
NOTE
I didn't notice it was mysql, which doesn't support CTE. You may be able to define temporary tables to use this.
This is an interesting problem. You kind of need to join all letters with all dates and then count the preceding rows. If you weren't concerned with having rows for letters that have a count of 0 for the dates, you could probably just do something like this:
SELECT letter, date,
(SELECT COUNT(*)
FROM tbl tbl2
WHERE tbl2.letter = tbl1.letter
AND tbl2.date <= tbl1.date) AS total
FROM tbl
ORDER BY date, letter
/deleted CTE solution/
Solution without CTE
SELECT tblDates.[date], tblLetters.letter,
(SELECT COUNT(*)
FROM tblData tbl2
WHERE tbl2.letter = tblLetters.letter
AND tbl2.[date] <= tblDates.[date]) AS total
FROM (SELECT DISTINCT [date] FROM tblData) tblDates
CROSS JOIN (SELECT DISTINCT letter FROM tblData) tblLetters
ORDER BY tblDates.[date], tblLetters.letter
The requirement
every letter should have a row to every date of the table
requires a cross join of the distinct dates and letters. Once you do that its pretty straight forward
SELECT letterdate.letter,
letterdate.DATE,
COUNT(yt.id) total
FROM (SELECT letter,
date
FROM (SELECT DISTINCT DATE
FROM yourtable) dates,
(SELECT DISTINCT letter
FROM yourtable) letter) letterdate
LEFT JOIN yourtable yt
ON letterdate.letter = yt.letter
AND yt.DATE < yt.letter
GROUP BY letterdate.letter,
letterdate.DATE
A slight variation on the previous:
declare #table1 table (id int, letter char, date smalldatetime)
insert into #table1 values (1, 'A', '1/1/2011')
insert into #table1 values (2, 'A', '4/1/2011')
insert into #table1 values (3, 'A', '4/1/2011')
insert into #table1 values (4, 'B', '1/1/2011')
insert into #table1 values (5, 'B', '1/1/2011')
insert into #table1 values (6, 'B', '1/1/2011')
select b.letter, b.date, count(0) AS count_
from (
select distinct letter, a.date from #table1
cross join (select distinct date from #table1 ) a
) b
join #table1 t1
on t1.letter = b.letter
and t1.date <= b.date
group by b.letter, b.date
order by b.letter