Get the max value in a specific group of rows - mysql

I have these two tables:
popular_song
song_name | rate | country_id
------------------------------
Tic Tac | 10 | 1
Titanic | 2 | 1
Love Boat | 8 | 2
Battery | 9 | 2
country
conutry_id | country
--------------------------
1 | United States
2 | Germany
What I'd like to achieve is to get the most poular song in each country, e.g.:
song_name | rate | country
--------------------------
Tic Tac | 10 | United States
Battery | 9 | Germany
I've tried this query:
SELECT MAX(rate), song_name, country
FROM popular_song ps JOIN country cnt
ON ps.country_id = cnt.country_id
GROUP BY country
But this doesn't work. I've tried looking at questions like "Order by before group by" but didn't find an answer.
Which mysql query could achieve this result?

You can use another self join to popular songs table with the max rating
SELECT ps.*,cnt.country
FROM popular_song ps
JOIN (SELECT MAX(rate) rate, country_id FROM popular_song GROUP BY country_id) t1
ON(ps.country_id = t1.country_id and ps.rate= t1.rate)
JOIN country cnt
ON ps.country_id = cnt.conutry_id
See Demo

There is a trick that you can use with substring_index() and group_concat():
SELECT MAX(rate),
substring_index(group_concat(song_name order by rate desc separator '|'), '|', 1) as song,
country
FROM popular_song ps JOIN
country cnt
ON ps.country_id = cnt.country_id
GROUP BY country;
EDIT:
If you have big tables and lots of songs per country, I would suggest the not exists approach:
select rate, song country
from popular_song ps join
country cnt
on ps.country_id = cnt.country_id
where not exists (select 1
from popular_song ps2
where ps2.country_id = ps.country_id and ps2.rate > ps.rate
);
Along with an index on popular_song(country_id, rate). I recommended the group_concat() approach because the OP already had a query with a group by, so the trick is the easiest to plug into such a query.

Here is another way I'v learned from #Gordon Linoff. Here is that question you could learn too.
SELECT ps.*,cnt.country
FROM
(SELECT popular_song.*,
#rownum:= if (#c = country_id ,#rownum+1,if(#c := country_id, 1, 1) )as row_number
FROM popular_song ,
(SELECT #c := '', #rownum:=0) r
order by country_id, rate desc) as ps
LEFT JOIN country cnt
ON ps.country_id = cnt.conutry_id
WHERE ps.row_number = 1
This is the way of implementing row_number()(Partition by ...) window function in MySql.

You can do this with EXISTS like this:
SELECT rate, song_name, cnt.country_id
FROM popular_song ps JOIN country cnt
ON ps.country_id = cnt.country_id
WHERE NOT EXISTS
(SELECT * FROM popular_song
WHERE ps.country_id = country_id AND rate > ps.rate)
It is not specified in the question whether two songs can be returned per country if their rating is the same. Above query will return several records per country if ratings are not unique at country level.

Related

How do we select the student who takes all classes taught by Davidson and how to select all the teacher's name who teaches atleast 3 classes?

all! I'm taking database class and have couple of questions that is so confusing to me. down below is my table
Student(S_ID, S_FIRST_NAME, S_LAST_NAME, S_MAJOR)
Course(C_ID, C_NAME, C_INST_NAME, C_ROOM)
takes(S_ID,C_ID)
Q-1.i want to select all the student_id who takes all courses taught by Davidson. I tried with this code
select s.S_ID from student s inner join (select t.S_ID from takes t inner join
course c on t.C_ID = c.C_ID group by t.S_ID having sum(case when c.C_INST_NAME
= 'Davidson' then 1 else 0 end) = 3) t on s.S_ID = t.S_ID;
it works because i know how many classes Davidson teaches(in my case 3). how do we write the query if we don't know how many classes he teaches?
Q-2. i want to select all the instructors who teach atleast 3 classes. For this question i did following
select distinct C_INST_NAME from course where C_ID >= 3;
+-------------+
| C_INST_NAME |
+-------------+
| Peterson |
| Davidson |
| Jackson |
| Hanney |
+-------------+
But i got all the instructor, any help would be appreciated thank you!
Q1.
SELECT
S_ID, GROUP_CONCAT(C_ID ORDER BY C_ID ASC)
FROM
takes
WHERE
GROUP_CONCAT(C_ID ORDER BY C_ID ASC) = (
SELECT
GROUP_CONCAT(Course.C_ID ORDER BY C_ID ASC)
FROM
Course
WHERE
Course.C_INST_NAME = "Davidson"
)
GROUP BY
S_ID
Q2.
SELECT
COUNT(C_ID) AS "numClassesTaught", Course.C_INST_NAME
FROM
Course
GROUP BY
Course.C_INST_NAME
HAVING
numClassesTaught >= 3
Those should both work.

Select MAX value with restriction to rows

I have 3 tables:
matchdays:
matchday_id | season_id | userid | points | matchday
----------------------------------------------------
1 | 1 | 1 | 33 | 1
2 | 1 | 2 | 45 | 1
etc
players
userid | username
-----------------
1 | user1
2 | user2
etc.
seasons
seasons_id | title | userid
----------------------------
1 | 2011 | 3
2 | 2012 | 10
3 | 2013 | 5
My query:
SELECT s.title, p.username, SUM(points) FROM matchdays m
INNER JOIN players p ON p.userid = m.userid
INNER JOIN seasons s ON m.userid = s.userid
group by s.season_id
This results in (example!):
title | username | SUM(points)
------------------------------
2011 | user3 | 3744
2012 | user10 | 3457
2013 | user5 | 3888
What it should look like is a table with the winner (max points) of every season. Right now, the title and username is correct, but the sum of the points is way too high. I couldn't figure out what sum is calculated. Ideally, the sum is the addition of every matchday of a season for every user.
Your main issue is that you group by seasons only. Thus your SUM is running on all points over a season, regardless of the player.
The whole approach is wrong anyway. The "flaw" with userid in the season table is your biggest issue, and you seem to know it.
I will explain you how to calculate your rankings in the database one time for all, and to have them at your disposal at all times, which will save you a lot of headaches, and obviously save some CPU and loading times as well.
Start by creating a new table "Rankings":
CREATE table rankings (season_id INT, userid INT, points INT, rank INT)
If you have a lot of players, index all columns but points
Then, populate the table for each season:
This is a oneshot operation to run each time a season has ended.
So for the time being, you will have to run it several times for each season.
The key here is to compute the rank of each player for the season, which is a must-have that will be super-handy for later. Because MySQL doesnt have a window function for that, we have to use an old trick : incrementing a counter.
I decompose.
This will compute the points of a season, and provide the ranking for that season:
SELECT season_id, userid, SUM(points) as points
FROM matchdays
WHERE season_id = 1
GROUP BY season_id, userid
ORDER BY points DESC
Now we adapt this query to add a rank column :
SELECT
season_id, userid, points,
#curRank := #curRank + 1 AS rank
FROM
(
SELECT season_id, userid, SUM(points) as points
FROM matchdays
WHERE season_id = 1
GROUP BY season_id, userid
) T,
(
SELECT #curRank := 0
) R
ORDER BY T.points DESC
That's it.
Now we can INSERT the results of this computation into our ranking table, to store it once for good :
INSERT INTO rankings
SELECT
season_id, userid, points,
#curRank := #curRank + 1 AS rank
FROM
(
SELECT season_id, userid, SUM(points) as points
FROM matchdays
WHERE season_id = 1
GROUP BY season_id, userid
) T,
(
SELECT #curRank := 0
) R
ORDER BY T.points DESC
Change the season_id = 1 and repeat for each season.
Save this query somewhere, and in the future, run it once each time a season has ended.
Now you have a proper database-computed ranking and a nice ranking table that you can query whenever you want.
You want the winner for each season ? As simple as that:
SELECT S.title, P.username, R.points
FROM Ranking R
INNER JOIN seasons S ON R.season_id=S.season_id
INNER JOIN players P ON R.userid=P.userid
WHERE R.rank = 1
You will discover over the time that you can do a lot of different things very simply with your ranking table.
You're join is wrong, try something like:
SELECT s.title, p.username, SUM(m.points) as points FROM matchdays m
JOIN players p ON p.userid = m.userid
JOIN seasons s ON m.season_id = s.season_id
group by s.season_id, p.userid
ORDER by points DESC;
As pointed out, userid does'nt belong/is not needed in 'seasons' table.

SQL: get A with max B for every distinct C

In my example, I have a table containing info about different venues, with columns for city, venue_name, and capacity. I need to select the city and venue_name for the venue with the highest capacity within each city. So if I have data:
city | venue | capacity
LA | venue1 | 10000
LA | venue2 | 20000
NY | venue3 | 1000
NY | venue4 | 500
... the query should return:
LA | venue2
NY | venue3
Can anybody give me advice on how to accomplish this query in SQL? I've gotten tangled up in joins and nested queries :P. Thanks!
select t.city, t.venue
from tbl t
join (select city, max(capacity) as max_capacity from tbl group by city) v
on t.city = v.city
and t.capacity = v.max_capacity
One way to do this is with not exists:
select i.*
from info i
where not exists (select 1
from into i2
where i2.city = i.city and i2.capacity > i.capacity);
The common approach is to join the table back to itself using a subquery with max:
select y.city, y.venue_name
from yourtable y
join (select city, max(capacity) maxcapacity
from yourtable
group by city
) t on y.city = t.city and y.capacity = t.maxcapacity
You can use an outer apply to order those values and bring the results back to your main query.
http://www.codeproject.com/Articles/607246/Making-OUTER-and-CROSS-APPLY-work-for-you
Another alternative would be to use the RowNum() function. http://msdn.microsoft.com/en-us/library/ms186734.aspx
SELECT
v.city,
Ranked.Venue,
Ranked.Capacity
FROM Venues v WITH (NOLOCK)
Outer Apply
(
SELECT TOP 1
Venue, Capacity
FROM Venues Ranked WITH (NOLOCK)
WHERE v.City = Ranked.City
ORDER BY Capacity DESC
) as Ranked
GROUP BY
v.city,
Ranked.Venue,
Ranked.Capacity

Identifying groups in Group By

I am running a complicated group by statement and I get all my results in their respective groups. But I want to create a custom column with their "group id". Essentially all the items that are grouped together would share an ID.
This is what I get:
partID | Description
-------+---------+--
11000 | "Oven"
12000 | "Oven"
13000 | "Stove"
13020 | "Stove"
12012 | "Grill"
This is what I want:
partID | Description | GroupID
-------+-------------+----------
11000 | "Oven" | 1
12000 | "Oven" | 1
13000 | "Stove" | 2
13020 | "Stove" | 2
12012 | "Grill" | 3
"GroupID" does not exist as data in any of the tables, it would be a custom generated column (alias) that would be associated to that group's key,id,index, whatever it would be called.
How would I go about doing this?
I think this is the query that returns the five rows:
select partId, Description
from part p;
Here is one way (using standard SQL) to get the groups:
select partId, Description,
(select count(distinct Description)
from part p2
where p2.Description <= p.Description
) as GroupId
from part p;
This is using a correlated subquery. The subquery is finding all the description values less than the current one -- and counting the distinct values. Note that this gives a different set of values from the ones in the OP. These will be alphabetically assigned rather than assigned by first encounter in the data. If that is important, the OP should add that into the question. Based on the question, the particular ordering did not seem important.
Here's one way to get it:
SELECT p.partID,p.Description,b.groupID
FROM (
SELECT Description,#rn := #rn + 1 AS groupID
FROM (
SELECT distinct description
FROM part,(SELECT #rn:= 0) c
) a
) b
INNER JOIN part p ON p.description = b.description;
sqlfiddle demo
This gets assigns a diferent groupID to each description, and then joins the original table by that description.
Based on your comments in response to Gordon's answer, I think what you need is a derived table to generate your groupids, like so:
select
t1.description,
#cntr := #cntr + 1 as GroupID
FROM
(select distinct table1.description from table1) t1
cross join
(select #cntr:=0) t2
which will give you:
DESCRIPTION GROUPID
Oven 1
Stove 2
Grill 3
Then you can use that in your original query, joining on description:
select
t1.partid,
t1.description,
t2.GroupID
from
table1 t1
inner join
(
select
t1.description,
#cntr := #cntr + 1 as GroupID
FROM
(select distinct table1.description from table1) t1
cross join
(select #cntr:=0) t2
) t2
on t1.description = t2.description
SQL Fiddle
SELECT partID , Description, #s:=#s+1 GroupID
FROM part, (SELECT #s:= 0) AS s
GROUP BY Description

SQL: Returning the most common value for each person

EDIT: I'm using MySQL, I found another post with the same question, but it's in Postgres; I require MySQL.
Get most common value for each value of another column in SQL
I ask this question after extensive searching of this site and others but have not found a result that works as I intend it to.
I have a table of people (recordid, personid, transactionid) and a transaction table (transactionid, rating). I require a single SQL statement that can return the most common rating each person has.
I currently have this SQL statement that returns the most common rating for a specified person id. It works and perhaps it may help others.
SELECT transactionTable.rating as MostCommonRating
FROM personTable, transactionTable
WHERE personTable.transactionid = transactionTable.transactionid
AND personTable.personid = 1
GROUP BY transactionTable.rating
ORDER BY COUNT(transactionTable.rating) desc
LIMIT 1
However I require a statement that does what the above statement does for each personid in personTable.
My attempt is below; however, it times out my MySQL server.
SELECT personid AS pid,
(SELECT transactionTable.rating as MostCommonRating
FROM personTable, transactionTable
WHERE personTable.transactionid = transactionTable.transactionid
AND personTable.personid = pid
GROUP BY transactionTable.rating
ORDER BY COUNT(transactionTable.rating) desc
LIMIT 1)
FROM persontable
GROUP BY personid
Any help you can give me would be much obliged. Thanks.
PERSONTABLE:
RecordID, PersonID, TransactionID
1, Adam, 1
2, Adam, 2
3, Adam, 3
4, Ben, 1
5, Ben, 3
6, Ben, 4
7, Caitlin, 4
8, Caitlin, 5
9, Caitlin, 1
TRANSACTIONTABLE:
TransactionID, Rating
1 Good
2 Bad
3 Good
4 Average
5 Average
The output of the SQL statement I am searching for would be:
OUTPUT:
PersonID, MostCommonRating
Adam Good
Ben Good
Caitlin Average
Preliminary comment
Please learn to use the explicit JOIN notation, not the old (pre-1992) implicit join notation.
Old style:
SELECT transactionTable.rating as MostCommonRating
FROM personTable, transactionTable
WHERE personTable.transactionid = transactionTable.transactionid
AND personTable.personid = 1
GROUP BY transactionTable.rating
ORDER BY COUNT(transactionTable.rating) desc
LIMIT 1
Preferred style:
SELECT transactionTable.rating AS MostCommonRating
FROM personTable
JOIN transactionTable
ON personTable.transactionid = transactionTable.transactionid
WHERE personTable.personid = 1
GROUP BY transactionTable.rating
ORDER BY COUNT(transactionTable.rating) desc
LIMIT 1
You need an ON condition for each JOIN.
Also, the personID values in the data are strings, not numbers, so you'd need to write
WHERE personTable.personid = "Ben"
for example, to get the query to work on the tables shown.
Main answer
You're seeking to find an aggregate of an aggregate: in this case, the maximum of a count. So, any general solution is going to involve both MAX and COUNT. You can't apply MAX directly to COUNT, but you can apply MAX to a column from a sub-query where the column happens to be a COUNT.
Build the query up using Test-Driven Query Design — TDQD.
Select person and transaction rating
SELECT p.PersonID, t.Rating, t.TransactionID
FROM PersonTable AS p
JOIN TransactionTable AS t
ON p.TransactionID = t.TransactionID
Select person, rating, and number of occurrences of rating
SELECT p.PersonID, t.Rating, COUNT(*) AS RatingCount
FROM PersonTable AS p
JOIN TransactionTable AS t
ON p.TransactionID = t.TransactionID
GROUP BY p.PersonID, t.Rating
This result will become a sub-query.
Find the maximum number of times the person gets any rating
SELECT s.PersonID, MAX(s.RatingCount)
FROM (SELECT p.PersonID, t.Rating, COUNT(*) AS RatingCount
FROM PersonTable AS p
JOIN TransactionTable AS t
ON p.TransactionID = t.TransactionID
GROUP BY p.PersonID, t.Rating
) AS s
GROUP BY s.PersonID
Now we know which is the maximum count for each person.
Required result
To get the result, we need to select the rows from the sub-query which have the maximum count. Note that if someone has 2 Good and 2 Bad ratings (and 2 is the maximum number of ratings of the same type for that person), then two records will be shown for that person.
SELECT s.PersonID, s.Rating
FROM (SELECT p.PersonID, t.Rating, COUNT(*) AS RatingCount
FROM PersonTable AS p
JOIN TransactionTable AS t
ON p.TransactionID = t.TransactionID
GROUP BY p.PersonID, t.Rating
) AS s
JOIN (SELECT s.PersonID, MAX(s.RatingCount) AS MaxRatingCount
FROM (SELECT p.PersonID, t.Rating, COUNT(*) AS RatingCount
FROM PersonTable AS p
JOIN TransactionTable AS t
ON p.TransactionID = t.TransactionID
GROUP BY p.PersonID, t.Rating
) AS s
GROUP BY s.PersonID
) AS m
ON s.PersonID = m.PersonID AND s.RatingCount = m.MaxRatingCount
If you want the actual rating count too, that's easily selected.
That's a fairly complex piece of SQL. I would hate to try writing that from scratch. Indeed, I probably wouldn't bother; I'd develop it step-by-step, more or less as shown. But because we've debugged the sub-queries before we use them in bigger expressions, we can be confident of the answer.
WITH clause
Note that Standard SQL provides a WITH clause that prefixes a SELECT statement, naming a sub-query. (It can also be used for recursive queries, but we aren't needing that here.)
WITH RatingList AS
(SELECT p.PersonID, t.Rating, COUNT(*) AS RatingCount
FROM PersonTable AS p
JOIN TransactionTable AS t
ON p.TransactionID = t.TransactionID
GROUP BY p.PersonID, t.Rating
)
SELECT s.PersonID, s.Rating
FROM RatingList AS s
JOIN (SELECT s.PersonID, MAX(s.RatingCount) AS MaxRatingCount
FROM RatingList AS s
GROUP BY s.PersonID
) AS m
ON s.PersonID = m.PersonID AND s.RatingCount = m.MaxRatingCount
This is simpler to write. Unfortunately, MySQL does not yet support the WITH clause.
The SQL above has now been tested against IBM Informix Dynamic Server 11.70.FC2 running on Mac OS X 10.7.4. That test exposed the problem diagnosed in the preliminary comment. The SQL for the main answer worked correctly without needing to be changed.
Here's a somewhat hacky abuse of the fact that the max aggregate function in MySQL does lexical sorting on varchars (as well as the expected numerical sorting on integers/floats):
SELECT
PersonID,
substring(max(concat(lpad(c, 20, '0'), Rating)), 21) AS MostFrequentRating
FROM (
SELECT PersonID, Rating, count(*) c
FROM PERSONTABLE INNER JOIN TRANSACTIONTABLE USING(TransactionID)
GROUP BY PersonID, Rating
) AS grouped_ratings
GROUP BY PersonID;
Which gives the desired:
+----------+--------------------+
| PersonID | MostFrequentRating |
+----------+--------------------+
| Adam | Good |
| Ben | Good |
| Caitlin | Average |
+----------+--------------------+
(note, if there are multiple modes per person, it will pick the one with the highest alphabetic entry, so — pretty much randomly — Good over Bad and Bad over Average)
You should be able to see what the max is operating over by examining the following:
SELECT PersonID, Rating, count(*) c, concat(lpad(count(*), 20, '0'), Rating) as LexicalMaxMe
FROM PERSONTABLE INNER JOIN TRANSACTIONTABLE USING(TransactionID)
GROUP BY PersonID, Rating
ORDER BY PersonID, c DESC;
Which outputs:
+----------+---------+---+-----------------------------+
| PersonID | Rating | c | LexicalMaxMe |
+----------+---------+---+-----------------------------+
| Adam | Good | 2 | 00000000000000000002Good |
| Adam | Bad | 1 | 00000000000000000001Bad |
| Ben | Good | 2 | 00000000000000000002Good |
| Ben | Average | 1 | 00000000000000000001Average |
| Caitlin | Average | 2 | 00000000000000000002Average |
| Caitlin | Good | 1 | 00000000000000000001Good |
+----------+---------+---+-----------------------------+
For anyone using Microsoft SQL Server: You have the possibility to create a custom aggregate function to get the most common value. Example 2 of this blog post by Ahmed Tarek Hasan describes how to do it:
http://developmentsimplyput.blogspot.nl/2013/03/creating-sql-custom-user-defined.html