Getting Distinct Max Values from Multiple Columns - mysql

I'm working on a sports database, and I want to write a query that will return the name and the statistical value for certain categories. For example, goal leader, assist leader, points leader, +/- leader, penalty minutes leader, etc. I am using a table called NJDSkaters which contains player names and stats from a specific team. Here is the query code:
SELECT CONCAT(PlayerName,' - ',Goals) AS GoalLeader, CONCAT(PlayerName,' - ',Assists)
CONCAT(PlayerName,' - ',Points) AS PointsLeader
FROM NJDSkaters
WHERE Goals = (SELECT DISTINCT MAX(Goals) FROM NJDSkaters)
OR Assists = (SELECT DISTINCT MAX(Assists) FROM NJDSkaters)
OR Points = (SELECT DISTINCT MAX(Points) FROM NJDSkaters);
Here is a snippet from my skater register table which will show the players who should be returned by this query:
As you can see, my desired return query should have 'Ilya Kovalchuk - 37' returned as GoalLeader, 'Patrik Elias - 52' as AssistLeader, and 'Ilya Kovalchuk - 83' as PointsLeader. Running the query does provide these results, but there is extra information included that I do not want, as you can see here:
My question is, how do i get rid of the excess information? I only want the leaders in each category, and I don't want to see the #2 player listed, even if that player is #1 in some other category. Essentially, what I'm saying, is I want only 1 row in this table. Before, I had code that would return all players with the leaders at the top, so this code is a step closer to my desired result, but now I'm stuck. Searching for an answer to this problem has been challenging, as finding a way to ask it generally is difficult.

You need to PIVOT your data, I would use something like this:
SELECT
MAX(CASE WHEN NJDSkaters.Goals=mx.goals
THEN CONCAT(PlayerName,' - ', NJDSkaters.Goals) END) GoalLeader,
MAX(CASE WHEN NJDSkaters.Assists=mx.assists
THEN CONCAT(PlayerName,' - ', NJDSkaters.Assists) END) AssistsLeader,
MAX(CASE WHEN NJDSkaters.Points=mx.points
THEN CONCAT(PlayerName,' - ', NJDSkaters.Points) END) PointsLeader
FROM
NJDSkaters INNER JOIN (
SELECT MAX(Goals) goals, MAX(Assists) assists, MAX(Points) points
FROM NJDSkaters) mx
ON NJDSkaters.Goals=mx.goals
OR NJDSkaters.Assists=mx.assists
OR NJDSkaters.Points=mx.points
Please see fiddle here.
You might also want to use GROUP_CONCAT instead of MAX in case that more than one player shares the same maximum value:
SELECT
CONCAT(GROUP_CONCAT(CASE WHEN NJDSkaters.Goals=mx.goals
THEN PlayerName END), ' - ', mx.goals) GoalLeader,
CONCAT(GROUP_CONCAT(CASE WHEN NJDSkaters.Assists=mx.assists
THEN PlayerName END), ' - ', mx.assists) AssistsLeader,
CONCAT(GROUP_CONCAT(CASE WHEN NJDSkaters.Points=mx.points
THEN PlayerName END), ' - ', mx.points) PointsLeader
FROM
NJDSkaters INNER JOIN (
SELECT MAX(Goals) goals, MAX(Assists) assists, MAX(Points) points
FROM NJDSkaters) mx
ON NJDSkaters.Goals=mx.goals
OR NJDSkaters.Assists=mx.assists
OR NJDSkaters.Points=mx.points
A little explanation:
The subquery mx will return the maximum number of goals, the maximum number of assists, and the maximum points
I'm joining the table NJDSkaters with this subquery to return all of the rows that have the maximum number of goals OR the maximum number of assists OR the maximum points
CASE WHEN NJDSkaters.Goals=mx.goals THEN PlayerName END will return the PlayerName if that player has the maximum number of goals, otherwise it will return NULL. The same goes for assists and points.
using GROUP_CONCAT I'm concatenating all of the players names returned by the CASE WHEN. GROUP_CONCAT will skip NULL values and will only concatenate players that have the maximum value for their category
using CONCAT I'm concatenating the string returned by the GROUP_CONCAT above with the maximum value for each category.

why not to limit the result by using LIMIT 1?

Related

MySQL alternative to subquery/join

I am looking for an efficient alternative to subqueries/joins for this query. Let's say I a table that stores information about companies with the following columns:
name: the name of the company
state: the state the company is located
in
revenue: the annual revenue of the company
employees: how many
employees this company has
active_business: wether or not the company
is in business (1 = yes, 0 = no)
Let's say that from this table, I want to find out how many companies in each state meet the requirement for some minimum amount of revenue, and also how many companies meet the requirement for some minimum number of employees. This can be expressed as the following subquery (can also be written as a a join):
SELECT state,
(
SELECT count(*)
FROM records AS a
WHERE a.state = records.state
AND a.revenue > 1000000
) AS companies_with_min_revenue,
(
SELECT count(*)
FROM records AS a
WHERE a.state = records.state
AND a.employees > 10
) AS companies_with_min_employees
FROM records
WHERE active_business = 1
GROUP BY state
My question is this. Can I do this without the subqueries or joins? Since the query is already iterating over each row (there's no indexes), is there some way I can add a condition that if the row meets the minimum revenue requirements and is in the same state, it will increment some sort of counter for the query (similar to map/reduce)?
I think CASE and SUM will solve it:
SELECT state
, SUM(CASE WHEN R.revenue > 1000000 THEN 1 ELSE 0 END) AS companies_with_min_revenue
, SUM(CASE WHEN R.employees > 10 THEN 1 ELSE 0 END) AS companies_with_min_employees
FROM records R
WHERE R.active_business = 1
GROUP BY R.state
As you can see, we will have a value of 1 per record with a revenue of greater than 1000000 (else 0), then we'll take the sum. The same goes with the other column.
Thanks to this StackOverflow question. You'll find this when you search "sql conditional count" in google.

Mysql SUM CASE with unique IDs only

Easiest explained through an example.
A father has children who win races.
How many of a fathers offspring have won a race and how many races in total have a fathers offspring won. (winners and wins)
I can easily figure out the total amount of wins but sometimes a child wins more than one race so to figure out winners I need only sum if the child has won, not all the times it has won.
In the below extract from a query I cannot use Distinct, so this doesn't work
SUM(CASE WHEN r.finish = '1' AND DISTINCT h.runnersid THEN 1 ELSE 0 END ) AS winners,
This also won't work
SUM(SELECT DISTINCT r.runnersid FROM runs r WHERE r.finish='1') AS winners
This works when I need to find the total amount of wins.
SUM(CASE WHEN r.finish = '1' THEN 1 ELSE 0 END ) AS wins,
Here is a sqlfiddle http://sqlfiddle.com/#!2/e9a81/1
Let's take this step by step.
You have two pieces of information you are looking for: Who has won a race, and how many races have they one.
Taking the first one, you can select a distinct runnersid where they have a first place finish:
SELECT DISTINCT runnersid
FROM runs
WHERE finish = 1;
For the second one, you can select every runnersid where they have a first place finish, count the number of rows returned, and group by runnersid to get the total wins for each:
SELECT runnersid, COUNT(*) AS numWins
FROM runs
WHERE finish = 1
GROUP BY runnersid;
The second one actually has everything you want. You don't need to do anything with that first query, but I used it to help demonstrate the thought process I take when trying to accomplish a task like this.
Here is the SQL Fiddle example.
EDIT
As you've seen, you don't really need the SUM here. Because finish represents a place in the race, you don't want to SUM that value, but you want to COUNT the number of wins.
EDIT2
An additional edit based on OPs requirements. The above does not match what OP needs, but I left this in as a reference to any future readers. What OP really needs, as I understand it now, is the number of children each father has that has run a race. I will again explain my thought process step by step.
First I wrote a simple query that pulls all of the winning father-son pairs. I was able to use GROUP BY to get the distinct winning pairs:
SELECT father, name
FROM runs
WHERE finish = 1
GROUP BY father, name;
Once I had done that, I used it is a subquery and the COUNT(*) function to get the number of winners for each father (this means I have to group by father):
SELECT father, COUNT(*) AS numWinningChildren
FROM(SELECT father, name
FROM runs
WHERE finish = 1
GROUP BY father, name) t
GROUP BY father;
If you just need the fathers with winning children, you are done. If you want to see all fathers, I would write one query to select all fathers, join it with our result set above, and replace any values where numWinningChildren is null, with 0.
I'll leave that part to you to challenge yourself a bit. Also because SQL Fiddle is down at the moment and I can't test what I was thinking, but I was able to test those above with success.
I think you want the father name along with the count of the wins by his sons.
select father, count(distinct(id)) wins
from runs where father = 'jack' and finish = 1
group by father
sqlfiddle
I am not sure if this is what you are looking for
Select user_id, sum(case when finish='1' then 1 else 0 end) as total
From table
Group by user_id

MySQL ORDER BY Column = value AND distinct?

I'm getting grey hair by now...
I have a table like this.
ID - Place - Person
1 - London - Anna
2 - Stockholm - Johan
3 - Gothenburg - Anna
4 - London - Nils
And I want to get the result where all the different persons are included, but I want to choose which Place to order by.
For example. I want to get a list where they are ordered by LONDON and the rest will follow, but distinct on PERSON.
Output like this:
ID - Place - Person
1 - London - Anna
4 - London - Nils
2 - Stockholm - Johan
Tried this:
SELECT ID, Person
FROM users
ORDER BY FIELD(Place,'London'), Person ASC "
But it gives me:
ID - Place - Person
1 - London - Anna
4 - London - Nils
3 - Gothenburg - Anna
2 - Stockholm - Johan
And I really dont want Anna, or any person, to be in the result more then once.
This is one way to get the specified output, but this uses MySQL specific behavior which is not guaranteed:
SELECT q.ID
, q.Place
, q.Person
FROM ( SELECT IF(p.Person<=>#prev_person,0,1) AS r
, #prev_person := p.Person AS person
, p.Place
, p.ID
FROM users p
CROSS
JOIN (SELECT #prev_person := NULL) i
ORDER BY p.Person, !(p.Place<=>'London'), p.ID
) q
WHERE q.r = 1
ORDER BY !(q.Place<=>'London'), q.Person
This query uses an inline view to return all the rows in a particular order, by Person, so that all of the 'Anna' rows are together, followed by all the 'Johan' rows, etc. The set of rows for each person is ordered by, Place='London' first, then by ID.
The "trick" is to use a MySQL user variable to compare the values from the current row with values from the previous row. In this example, we're checking if the 'Person' on the current row is the same as the 'Person' on the previous row. Based on that check, we return a 1 if this is the "first" row we're processing for a a person, otherwise we return a 0.
The outermost query processes the rows from the inline view, and excludes all but the "first" row for each Person (the 0 or 1 we returned from the inline view.)
(This isn't the only way to get the resultset. But this is one way of emulating analytic functions which are available in other RDBMS.)
For comparison, in databases other than MySQL, we could use SQL something like this:
SELECT ROW_NUMBER() OVER (PARTITION BY t.Person ORDER BY
CASE WHEN t.Place='London' THEN 0 ELSE 1 END, t.ID) AS rn
, t.ID
, t.Place
, t.Person
FROM users t
WHERE rn=1
ORDER BY CASE WHEN t.Place='London' THEN 0 ELSE 1 END, t.Person
Followup
At the beginning of the answer, I referred to MySQL behavior that was not guaranteed. I was referring to the usage of MySQL User-Defined variables within a SQL statement.
Excerpts from MySQL 5.5 Reference Manual http://dev.mysql.com/doc/refman/5.5/en/user-variables.html
"As a general rule, other than in SET statements, you should never assign a value to a user variable and read the value within the same statement."
"For other statements, such as SELECT, you might get the results you expect, but this is not guaranteed."
"the order of evaluation for expressions involving user variables is undefined."
Try this:
SELECT ID, Place, Person
FROM users
GROUP BY Person
ORDER BY FIELD(Place,'London') DESC, Person ASC;
You want to use group by instead of distinct:
SELECT ID, Person
FROM users
GROUP BY ID, Person
ORDER BY MAX(FIELD(Place, 'London')), Person ASC;
The GROUP BY does the same thing as SELECT DISTINCT. But, you are allowed to mention other fields in clauses such as HAVING and ORDER BY.

Multiple LEFT JOINs to self with criteria to produce distribution

Although several . questions . come . close . to what I want (and as I write this stackoverflow has suggested several more, none of which quite capture my problem), I just don't seem to be able to find my way out of the SQL thicket.
I have a single table (let's call it the user_classification_fct) that has three fields: user, week, and class (e.g. user #1 in week #1 had a class of 'Regular User', while user #2 in week #1 has a class of 'Infrequent User'). (As an aside, I have implemented classes as INTs, but wanted to work with something legible in the form of VARCHAR while I sorted out the SQL.)
What I want to do is produce a summary report of how user behaviour is changing in aggregate along the lines of:
There were 50 users who were regular users in both week 1 and week 2 and ...
There were 10 users who were regular users in week 1, but fell to infrequent users in week 2
There were 5 users who went from infrequent in week 1 to regular in week 2
... and so on ...
What makes this slightly more tricky is that user #5000 might only have started using the service in week 2 and so have no record in the table for week 1. In that case, I'd want to see a NULL FOR week 1 and a 'Regular User' (or whatever is appropriate) for week 2. The size of the table is not strictly relevant, but with 5 weeks' worth of data I'm looking at 42 million rows, so I do not want to insert 4 'fake' rows of 'Non-User' for someone who only starts using the service in week 5 or something.
To me this seems rather obviously like a case for using a LEFT or RIGHT JOIN in MySQL because the NULL should come through on the 'missing' record.
I have tried using both WHERE and AND conditions on the LEFT JOINs and am just not getting the 'right' answers (i.e. I either get no NULL values at all in the case of trailing WHERE conditions, or my counts are far, far too high for the number of distinct users (which is ca. 10 million) in the case of the AND constraints used below). Here's was my last attempt to get this working:
SELECT
ucf1.class_nm AS 'Class in 2012/15',
ucf2.class_nm AS 'Class in 2012/16',
ucf3.class_nm AS 'Class in 2012/17',
ucf4.class_nm AS 'Class in 2012/18',
ucf5.class_nm AS 'Class in 2012/19',
count(*) AS 'Count'
FROM
user_classification_fct ucf5
LEFT JOIN user_classification_fct ucf4
ON ucf5.user_id=ucf4.user_id
AND ucf5.week_key=201219 AND ucf4.week_key=201218
LEFT JOIN user_classification_fct ucf3
ON ucf4.user_id=ucf3.user_id
AND ucf4.week_key=201218 AND ucf3.week_key=201217
LEFT JOIN user_classification_fct ucf2
ON ucf3.user_id=ucf2.user_id
AND ucf3.week_key=201217 AND ucf2.week_key=201216
LEFT JOIN user_classification_fct ucf1
ON ucf2.user_id=ucf1.user_id
AND ucf2.week_key=201216 AND ucf1.week_key=201215
GROUP BY 1,2,3,4,5;
In looking at the various other questions on stackoverflow.com, it may well be that I need to perform the queries one-at-a-time and UNION the result sets together or use parentheses to chain them one-to-another, but those approaches are not ones that I'm familiar with (yet) and I can't even get a single LEFT JOIN (i.e. week 5 to week 1, dropping all the other weeks of data) to return something useful.
Any tips would be much, much appreciated and I would really appreciate suggestions that work in MySQL as switching database products is not an option.
You can do this with a group by. I would start by summarizing all the possible combinations for the five weeks as:
select c_201215, c_201216, c_201217, c_201218, c_201219,
count(*) as cnt
from (select user_id,
max(case when week_key=201215 then class_nm end) as c_201215,
max(case when week_key=201216 then class_nm end) as c_201216,
max(case when week_key=201217 then class_nm end) as c_201217,
max(case when week_key=201218 then class_nm end) as c_201218,
max(case when week_key=201219 then class_nm end) as c_201219
from user_classification_fct ucf
group by user_id
) t
group by c_201215, c_201216, c_201217, c_201218, c_201219
This may solve your problem. If you have 5 classes (including NULL), then this will return at most 5^5 or 3,125 rows.
This fits into Excel, so you can do the final processing there. Alternatively, you can still use the database.
If you want to extract pairs of weeks, then I would suggest putting the above into a temporary table, say "t". And doing a series of extracts with unions:
select *
from ((select '201215' as weekstart, c_201215, c_201216, sum(cnt) as cnt
from t
group by c_201215, c_201216
) union all
(select '201216', c_201216, c_201217, sum(cnt) as cnt
from t
group by c_201216, c_201217
) union all
(select '201217', c_201217, c_201218, sum(cnt) as cnt
from t
group by c_201217, c_201218
) union all
(select '201218', c_201218, c_201219, sum(cnt) as cnt
from t
group by c_201218, c_201219
)
) tg
order by 1, cnt desc
I suggest putting it in a subquery because you don't want to message around with common-subquery optimizations on such a large table. You'll get to your final answer by summarizing first, and then bringing the data together.

mySQL count occurances of a string

I have this query...
SELECT SUM(brownlow_votes) AS votes,
player_id,
player_name,
player_team,
COUNT(*) AS vote_count
FROM afl_brownlow_phantom, afl_playerstats
WHERE player_id=brownlow_player
AND brownlow_match=player_match
GROUP BY player_id
ORDER BY votes DESC LIMIT 50
So "votes" becomes the number of votes a player has, "vote_count" becomes the number of times (matches in which) a player has been voted for. This works fine.
However, I have another column called "brownlow_lock" which is either blank, or 'Y'. How do I get the number of occurances of 'Y'? I know I could solve this changing it to 0 or 1 and just doing a SUM() but I don't want to have to go and edit the tons of pages that are inserting data.
If I have understood you correctly you just need to add
COUNT(CASE WHEN brownlow_lock='Y' THEN 1 END) AS Cnt
to your query
Try using the IF control flow function
SELECT SUM(IF(brownlow_lock='Y',1,0)) lock_count ...