Better way to select max values in multiple-column group?

Better way to select max values in multiple-column group? - mysql

Simplified the following question I got from a coding challenge...
I have a table grades like:
year sex person mark
2000 M Mark 70
2010 F Alyssa 23
2020 M Robert 54
I want to select the people per year for both sexes that have the highest marks.
My Attempt:
SELECT
year,
MAX(CASE
WHEN sex = ‘F’ THEN person
ELSE ‘’ END) AS person_f,
MAX(CASE
WHEN sex = ‘M’ THEN person
ELSE ‘’ END) AS person_m
FROM (
SELECT
year,
sex,
person,
**mark
FROM grades
WHERE mark IN (
SELECT MAX(mark) AS mark
FROM grades
GROUP BY year, sex)
**) AS t
WHERE x = 1
GROUP BY 1
ORDER BY 1
I modified everything within the ** ** but the rest of the code was pre-populated. The code seemed right to me, but somehow only passed 2/4 test cases, and there were no tiebreaker records.
Also, I omitted the WHERE x = 1 line, but the correct solution apparently needs that. (yes, x isn't a column in any table)
Is there a more elegant/efficient way to solve this?
Can't seem to figure it out, and it's really bugging me.

First you need to use single quotes for Strings
The Problem of your query, is the subquery for your marks, you select a bunch of highest marks without associating them to the year, and gender
MySql allows you to IN clause with multiple columns.
SELECT
year,
MAX(CASE
WHEN sex = 'F' THEN person
ELSE '' END) AS person_f,
MAX(CASE
WHEN sex = 'M' THEN person
ELSE '' END) AS person_m
FROM (
SELECT
year,
sex,
person,
mark
FROM grades
WHERE (year,sex,mark) IN (
SELECT year, sex,MAX(mark) AS mark
FROM grades
GROUP BY year, sex)
) AS t
GROUP BY 1
ORDER BY 1
| year | person\_f | person\_m |
|-----:|:----------|:----------|
| 2000 | | Mark |
| 2010 | Alyssa | |
| 2020 | | Robert |
fiddle

I believe this approach incorporates the WHERE x = 1 clause as well.
SELECT
year,
MAX(CASE
WHEN sex = 'F' THEN person
ELSE '' END) AS person_f,
MAX(CASE
WHEN sex = 'M' THEN person
ELSE '' END) AS person_m
FROM (
SELECT
year,
sex,
person,
RANK() OVER (PARTITION BY year, sex ORDER BY mark DESC) AS x
FROM grades)
WHERE x = 1
GROUP BY 1
ORDER BY 1

You can use rank. I added info to the table so you could see what happens in different scenarios including a tie.
select year
,sex
,person
,mark
from (
select *
,rank() over(partition by year, sex order by mark desc) as rnk
from t
) t
where rnk = 1
order by year, sex
year
sex
person
mark
2000
F
Alyssa
23
2000
M
Mark
70
2000
M
Danny
70
2010
F
Alma
100
2010
M
Dudu
47
2020
F
Noga
98
2020
M
Moshe
56
Fiddle

Related

Selecting multiple columns from two tables in which one column of a table has multiple where conditions and group them by two columns and order by one

I have two tables namely "appointment" and "skills_data".
Structure of appointment table is:
id_ap || ap_meet_date || id_skill || ap_status.
And the value of ap_status are complete, confirm, cancel and missed.
And the skills_data table contains two columns namely:
id_skill || skill
I want to get the count of total number of appointments for each of these conditions
ap_status = ('complete' and 'confirm'),
ap_status = 'cancel' and
ap_status = 'missed'
GROUP BY id_skill and year and
order by year DESC
I tried this query which only gives me count of one condition but I want to get other two based on group by and order by clauses as mentioned.
If there is no record(for example: zero appointments missed in 2018 for a skill) matching for certain conditions, then it should display the output value 0 for zero count.
Could someone please suggest me with a query whether I should implement multiple select query or CASE clause to achieve my expected results. I have lot of records in appointment table and want a efficient way to query my records. Thank you!
SELECT a.id_skill, YEAR(a.ap_meet_date) As year, s.skill,COUNT(*) as count_comp_conf
FROM appointment a,skills_data s WHERE a.id_skill=s.id_skill and a.ap_status IN ('complete', 'confirm')
GROUP BY `id_skill`, `year`
ORDER BY `YEAR` DESC
Output from my query:
id_skill | year | skill | count_comp_conf
-----------------------------------------
1 2018 A 20
2 2018 B 15
1 2019 A 10
2 2019 B 12
3 2019 C 10
My expected output should be like this:
id_skill | year | skill | count_comp_conf | count_cancel | count_missed
------------------------------------------------------------------------
1 2018 A 20 5 1
2 2018 B 15 8 0
1 2019 A 10 4 1
2 2019 B 12 0 5
3 2019 C 10 2 2

You can use conditional aggregation using case when expression
SELECT a.id_skill, YEAR(a.ap_meet_date) As year, s.skill,
COUNT(case when a.ap_status IN ('complete', 'confirm') then 1 end) as count_comp_conf,
COUNT(case when a.ap_status = 'cancel' then 1 end) as count_cancel,
COUNT(case when a.ap_status = 'missed' then 1 end) as count_missed
FROM appointment a inner join skills_data s on a.id_skill=s.id_skill
GROUP BY `id_skill`, `year`
ORDER BY `YEAR` DESC

SELECT a.id_skill,
YEAR(a.ap_meet_date) As year,
s.skill,
SUM(IF(a.ap_status IN ('complete', 'confirm'),1,0)) AS count_comp_conf,
SUM(IF(a.ap_status='cancel',1,0)) AS count_cancel,
SUM(IF(a.ap_status='missed',1,0)) AS count_missed
FROM appointment a,skills_data s WHERE a.id_skill=s.id_skill
GROUP BY `id_skill`, `year`
ORDER BY `YEAR` DESC;
Please try to use if condition along with sum.

With below query you will get output.
select id_skill ,
year ,
skill ,
count_comp_conf ,
count_cancel ,
count_missed ( select id_skill, year, skill, if ap_status ='Completed' then count_comp_conf+1, elseif ap_status ='cancelled' then count_cancel +1 else count_missed+1
from appointment a join skills_data s on (a.id_skill = s.id_skill) group by id_skill, year) group by id_skill,year
order by year desc;

Grouping by to find average differences for specific indexes in SQL

I have the following table:
person_index score year
3 76 2003
3 86 2004
3 86 2005
3 87 2006
4 55 2005
4 91 2006
I want to group by person_index, getting the average score difference between consecutive years, such that I end up with one row per person, indicating the average increase/decrease:
person_index avg(score_diff)
3 3.67
4 36
So for person with index 3 - there were changes over 3 years, one was 10pt, one was 0, and one was 1pt. Therefore, their average score_diff is 3.67.
EDIT: to clarify, scores can also decrease. And years aren't necessarily consecutive (one person might not get a score at a certain year, so could be 2013 followed by 2015).

Simplest way is to use LAG(MySQL 8.0+):
WITH cte AS (
SELECT *, score - LAG(score) OVER(PARTITION BY person_index ORDER BY year) AS diff
FROM tab
)
SELECT person_index, AVG(diff) AS avg_diff
FROM cte
GROUP BY person_index;
db<>fiddle demo
Output:
+---------------+----------+
| person_index | avg_diff |
+---------------+----------+
| 3 | 3.6667 |
| 4 | 36.0000 |
+---------------+----------+

If the scores only increase -- as in your example -- you can simply do:
select person_id,
( max(score) - min(score) ) / nullif(max(year) - min(year) - 1, 0)
from t
group by person_id;
If they do not only increase, it is a bit trickier because you have to calculate the first and last scores:
select t.person_id,
(tmax.score - tmin.score) / nullif(tmax.year - tmin.year - 1, 0)
from (select t.person_id, min(year) as miny, max(year) as maxy
from t
group by person_id
) p join
t tmin
on tmin.person_id = p.person_id and tmin.year = p.miny join
t tmax
on tmax.person_id = p.person_id and tmax.year = p.maxy join

Display distinct users reaching specific count thresholds

Here's my table, showing user names and the timestamp they scored a point:
id user date
1 Aaron 23/02/2012 22:44
2 Betty 23/02/2012 22:47
3 Carlos 24/02/2012 16:01
4 David 28/02/2012 11:40
5 David 28/02/2012 12:32
6 David 28/02/2012 16:59
7 Aaron 2/03/2012 13:46
8 Aaron 30/03/2012 18:37
9 Betty 30/03/2012 19:58
10 Emma 9/04/2012 6:49
11 Emma 9/04/2012 13:19
12 Emma 9/04/2012 18:20
13 Emma 9/04/2012 20:46
14 Aaron 10/04/2012 15:47
15 Betty 10/04/2012 19:15
16 Betty 10/04/2012 20:40
17 Carlos 11/04/2012 9:44
18 Carlos 11/04/2012 20:01
19 David 11/04/2012 23:17
20 David 12/04/2012 17:09
And here is the results table I am trying to achieve, i.e. an x axis showing month-year, and a y axis displaying the number of users who reached a certain points threshold within that month:
date 1 point First time? 2 points First time? 3 points First time? 4 points First time? Total
Feb-12 A,B,C A,B,C D D 4
Mar-12 B A A 3
Apr-12 A,B,C B,C,D B,C,D E E 4
I've only got as far as calculating the total number of points and the total number of distinct scorers within a given month:
SELECT DISTINCT CONCAT (MONTHNAME(date), ' ', YEAR(date)) as 'date', COUNT(id) as total_points, COUNT(distinct referrer_id) as number_of_scorers
from points
group by CONCAT (MONTH(date), ' ', YEAR(date))
order by YEAR(date), MONTH(date)
which is only giving me:
date total_points number_of_scorers
Feb-12 6 4
Mar-12 3 3
etc.
So my questions are:
How can I amend the query to show me which users reached each point threshold within each month?
How can I amend the query to show me which users reached each point threshold for the first time within that month?
Thanks

The basic query you need is this:
select date_format(date, '%Y-%m') as yyyymm, user, count(*) as points
from t
group by date_format(date, '%Y-%m') as yyyymm, user;
This gets the number of points for each user in a month.
The rest is just aggregations, joins, and conditions:
select ymu.yyyymm,
group_concat(case when ymu.points = 1 then user end) as Points1_Users,
group_concat(case when ymu.points = 1 and ymu.yyyymm = u.min_yyyymm then user end) as Points1_Users_First,
group_concat(case when ymu.points = 2 then user end) as Points2_Users,
group_concat(case when ymu.points = 2 and ymu.yyyymm = u.min_yyyymm then user end) as Points2_Users_First
from (select date_format(date, '%Y-%m') as yyyymm, user, count(*) as points
from t
group by date_format(date, '%Y-%m') as yyyymm, user
) ymu join
(select user, min(yyyymm) as min_yyyymm
from (select date_format(date, '%Y-%m') as yyyymm, user, count(*) as points
from t
group by date_format(date, '%Y-%m') as yyyymm, user
) t
group by user
) u
on ymu.user = u.user
group by yyyymm
order by yyyymm;

SQL query with different averages for different columns for the same data

The SQL challenge I'm working on is to build a query to display the name of an individual and their average performance over 10 iterations of data in one column and their average over 50 iterations of data in the next column. Grouped by name of course. The iterations progress in order therefore the average of the past 10 for an individual would be an average score of the 10 highest ID numbers for that individual. The raw dataset looks like this:
ID, Name, Score
1, Joe, 10
2, Bob, 13
3, Joe, 9
4, Kim, 6
5, Rob, 8
6, Han, 9
7, Kim, 12
There is about 1000 rows like this with about 50 names. The end goal is to run a query that returns something like this:
Name, AvgPast10, AvgPast50
Bob, 8, 10
Joe, 7, 9
Kim, 6, 10
Han, 9, 6
Rob, 7, 5
When I tried to do this I realized that there might be a different ways of doing this. Maybe a self join back onto itself, perhaps nested select statements. I tried and realized that I was getting in over my head. Also, my boss is a real stickler for query optimization. For some reason he despises nested select statements. If I need them then I better have a compelling reason or at least have some idea about how optimization was built into the query.

Admittedly this one uses a nested select (or a subquery):
SELECT Name, AVG(CASE WHEN Rank <= 10 THEN Score END) AvgPast10,
AVG(CASE WHEN Rank <= 50 THEN Score END) AvgPast50
FROM (
SELECT Name,
#rank := IF(#Name = Name, #rank+1, 1) as Rank,
#Name := Name, Score
FROM tbl
ORDER BY Name, ID DESC
) A
GROUP BY Name
See my Demo that uses Past 3 and Past 5 for simplicity.

Your sample is very small so I have used 2 and 50 below, but hopefully the process is clear regardless of numbers or how many averages
| NAME | AVG_2 | AVG_50 |
|------|-------|--------|
| Bob | 13 | 13 |
| Han | 9 | 9 |
| Joe | 9 | 9.5 |
| Kim | 12 | 9 |
| Rob | 8 | 8 |
SELECT
name
, sum_2 / (count_2 * 1.0) AS avg_2
, sum_50 / (count_50 * 1.0) AS avg_50
FROM (
SELECT
name
, COUNT(CASE
WHEN rn <= 2 THEN score END) count_2
, SUM(CASE
WHEN rn <= 2 THEN score END) sum_2
, COUNT(CASE
WHEN rn <= 50 THEN score END) count_50
, SUM(CASE
WHEN rn <= 50 THEN score END) sum_50
FROM (
SELECT
*
, ROW_NUMBER() OVER (PARTITION BY name ORDER BY ID DESC) AS rn
FROM Scores
) x
GROUP BY
name
) y
ORDER BY
name
I wasn't sure what you wanted to do if the number of observations is less than the quantity required (e.g. a count of 20 but average is for 50), I have used the actual count in this example.
see: http://sqlfiddle.com/#!3/84cf6/2

Mysql average based on sum if in another column

For example purposes lets say Im trying to figure out the average score for males and females from each parent.
Example data looks like this:
parentID childID sex score
------------------------------------
1 21 m 17
1 23 f 12
2 33 f 55
2 55 m 22
3 67 m 26
3 78 f 29
3 93 m 31
This is the result I want:
parentID offspring m f avg-m avg-f avg-both
----------------------------------------------------
1 2 1 1 17 12 14.5
2 2 1 1 22 55 38.5
3 3 2 1 28.5 29 28.67
With the below query I can find the average for both males and females but I'm not sure how to get the average for either male or female
SELECT parentID, COUNT( childID ) AS offspring, SUM( IF( sex = 'm', 1, 0 ) ) AS m, SUM( IF( sex = 'f', 1, 0 ) ) AS f, max(score) as avg-both
FROM sexb_1
WHERE avg-both > 11
GROUP BY parentID
I tried something like this in the query but it returns an error
AVG(IF(sex = 'm', max(score),0)) as avg-m

I tried something like this in the query but it returns an error
AVG(IF(sex = 'm', max(score),0)) as avg-m
You can't use one aggregate function within another (in this case, MAX() within AVG())—what would that even mean? Once one has discovered the MAX() of the group, over what is there to take an average?
Instead, you want to take the AVG() of score values where the sex matches your requirement; since AVG() ignores NULL values and the default for unmatched CASE expressions is NULL, one can simply do:
SELECT parentID,
COUNT(*) offspring,
SUM(sex='m') m,
SUM(sex='f') f,
AVG(CASE sex WHEN 'm' THEN score END) `avg-m`,
AVG(CASE sex WHEN 'f' THEN score END) `avg-f`,
AVG(score) `avg-both`
FROM sexb_1
GROUP BY parentID
HAVING `avg-both` > 11
See it on sqlfiddle.

Using if
SELECT parentID, COUNT( childID ) AS offspring,
SUM(iF( sex='m', 1 ,0 )) AS m,
SUM(iF( sex='f', 1 ,0 )) AS f,
AVG(if(sex='m', score, null)) as avg_m,
AVG(if(sex='f', score, null)) as avg_f,
AVG(score) as avgboth
FROM sexb_1
GROUP BY parentID
HAVING avgboth > 11
fiddle
In your query the error is due to the usage of avg-both You need to use back ticks or underscore for the alias name. Here it considers it as difference of avg and both
And also you cannot use alias names inside where clause as after the table name is picked up from the query, it is the where clause that comes next. So the database doesn't know the alias names yet.

You can try below query-
SELECT
parentID, COUNT(childID) AS `offspring`,
COUNT(IF(sex = 'm',sex ,NULL )) AS `m`, COUNT(IF(sex = 'f', sex,NULL)) AS `f`,
AVG(IF(sex = 'm',score,NULL )) AS `avg-m`, COUNT(IF(sex = 'f', score,NULL)) AS `avg-f`,
AVG(score) AS `avg-both`
FROM sexb_1
GROUP BY parentID
HAVING `avg-both` > 11;

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Better way to select max values in multiple-column group? - mysql

Related

Selecting multiple columns from two tables in which one column of a table has multiple where conditions and group them by two columns and order by one

Grouping by to find average differences for specific indexes in SQL

Display distinct users reaching specific count thresholds

SQL query with different averages for different columns for the same data

Mysql average based on sum if in another column

Categories

Resources