MySQL: Calculating Median of Values grouped by a Column

MySQL: Calculating Median of Values grouped by a Column - mysql

I have the following table:
+------------+-------+
| SchoolName | Marks |
+------------+-------+
| A | 71 |
| A | 71 |
| A | 71 |
| B | 254 |
| B | 135 |
| B | 453 |
| B | 153 |
| C | 453 |
| C | 344 |
| C | 223 |
| B | 453 |
| D | 300 |
| D | 167 |
+------------+-------+
And here is the average of marks grouped by school names:
+------------+------------+
| SchoolName | avg(Marks) |
+------------+------------+
| A | 71.0000 |
| B | 289.6000 |
| C | 340.0000 |
| D | 233.5000 |
+------------+------------+
https://www.db-fiddle.com/f/5t7N3Vx8FSQmwUJgKLqjfK/9
However rather than average, I want to calculate median of the marks grouped by school names.
I am using,
SELECT AVG(dd.Marks) as median_val
FROM (
SELECT d.Marks, #rownum:=#rownum+1 as `row_number`, #total_rows:=#rownum
FROM tablename d, (SELECT #rownum:=0) r
WHERE d.Marks is NOT NULL
ORDER BY d.Marks
) as dd
WHERE dd.row_number IN ( FLOOR((#total_rows+1)/2), FLOOR((#total_rows+2)/2) );
to calculate the average of entire Marks column, but I don't know how to do it for each school separately.

Your query computes row numbers using user variables, which makes it more complicated to handle partitions. Since you are using MySQL 8.0, I would suggest using window functions instead.
This should get you close to what you expect:
select
SchoolName,
avg(Marks) as median_val
from (
select
SchoolName,
Marks,
row_number() over(partition by SchoolName order by Marks) rn,
count(*) over(partition by SchoolName) cnt
from tablename
) as dd
where rn in ( FLOOR((cnt + 1) / 2), FLOOR( (cnt + 2) / 2) )
group by SchoolName
The arithmetic stays the same, but we are using window functions in groups of records having the same SchoolName (instead of a global partition in your initial query). Then, the outer query filters and aggregates by SchoolName.
In your DB Fiddlde, this returns:
| SchoolName | median_val |
| ---------- | ---------- |
| A | 71 |
| B | 254 |
| C | 344 |
| D | 233.5 |

Related

Mysql Query numbering groups rows

I looked for days for a way to show a compact continuous numbering for group rows.
The products can be single type in the carton or mix together. Some of the carton markings are already printed so I cannot rearrange carton markings.
I have this table:
+-----+------------+--------+-----------+
| qty | product_id | Type | carton_no |
+-----+------------+--------+-----------+
| 18 | 111 | single | 1 |
| 18 | 111 | single | 2 |
| 18 | 111 | single | 3 |
| 48 | 115 | single | 4 |
| 48 | 115 | single | 5 |
| 48 | 115 | single | 6 |
| 36 | 119 | single | 7 |
| 36 | 119 | single | 8 |
| 18 | 111 | single | 9 |
| 36 | 119 | single | 10 |
| 16 | 199 | single | 11 |
| 16 | 199 | single | 12 |
| 4 | 111 | mix | 13 |
| 4 | 115 | mix | 13 |
| 4 | 119 | mix | 13 |
| 4 | 199 | mix | 13 |
+-----+------------+--------+-----------+
The documents processor needs a view like this:
+-----------+-----+------------+--------+
| Numbering | QTY | product_id | Type |
+-----------+-----+------------+--------+
| 1-4 | 72 | 111 | single |
| 5-7 | 144 | 115 | single |
| 8-10 | 108 | 119 | single |
| 11-12 | 32 | 199 | single |
| 13 | 4 | 111 | mix |
| 13 | 4 | 115 | mix |
| 13 | 4 | 119 | mix |
| 13 | 4 | 199 | mix |
+-----------+-----+------------+--------+
The numbering are actually counting of total cartons for each product_id order by type, product_id ASC.
Any ideas?

WITH
cte1 AS (
SELECT qty,
product_id,
Type,
carton_no,
CASE WHEN product_id = LAG(product_id) OVER (ORDER BY carton_no)
THEN 0
ELSE 1
END new_group
FROM src ),
cte2 AS (
SELECT qty,
product_id,
Type,
carton_no,
SUM(new_group) OVER (ORDER BY carton_no) group_num
FROM cte1
)
SELECT CASE WHEN MAX(carton_no) > MIN(carton_no)
THEN CONCAT(MIN(carton_no), '-', MAX(carton_no))
ELSE MIN(carton_no)
END Numbering ,
SUM(qty) QTY,
product_id,
ANY_VALUE(Type) Type
FROM cte2
GROUP BY group_num, product_id;
fiddle

WITH
cte1 AS (
SELECT qty,
product_id,
Type,
carton_no,
CASE WHEN product_id = LAG(product_id) OVER (ORDER BY type desc, product_id)
THEN 0
ELSE 1
END new_group
FROM src order by type desc, product_id ),
cte2 AS (
SELECT qty,
product_id,
Type,
carton_no,
SUM(new_group) OVER (ORDER BY type desc, product_id) group_num
FROM cte1 ),
cte3 AS (
SELECT SUM(qty) QTY,
product_id,
Type,
group_num,
carton_no,
count(group_num) sum,
LAG(count(group_num)) OVER () prevsum
FROM cte2 group by group_num order by type desc, carton_no
)
SELECT CASE WHEN group_num = 1 THEN CONCAT(group_num,'-', sum)
WHEN group_num <> 1 and Type = "mix" and LAG(carton_no) OVER (ORDER BY carton_no) <> carton_no THEN CONCAT(SUM(prevsum) OVER (ORDER BY type desc, product_id) + 1)
WHEN group_num <> 1 and Type = "mix" and LAG(carton_no) OVER (ORDER BY carton_no) = carton_no THEN CONCAT(LAG(carton_no) OVER (ORDER BY carton_no))
WHEN group_num <> 1 and Type = "single" THEN CONCAT(SUM(prevsum) OVER (ORDER BY type desc, product_id) + 1,'-', SUM(prevsum) OVER (ORDER BY type desc, product_id) + sum)
END numbering,
qty,
product_id,
type
FROM cte3
I think I solved the problem, but the code is working in Workbench, but not in fiddle. Any idea how to compress it more and not working in fiddle?

Retrieving a variable number of rows using a table join

This is an addition layer of complexity on another question I asked here: Using GROUP BY and ORDER BY in same MySQL query
Same table structure and problem, except this time imagine that the past_election table is now set up as...
| election_ID | Date | jurisdiction | Race | Seats |
|-------------|------------|----------------|---------------|-------|
| 1 | 2016-11-08 | federal | president | 1 |
| 2 | 2016-11-08 | state_district | state senator | 2 |
(last record has seats set as 2 instead of 1.)
I want to use the Seats number to grab different numbers of records, ordered by the number of votes, for each group. So in this case with the following additional tables...
candidates
| Candidate_ID | FirstName | LastName | MiddleName |
|--------------|-----------|----------|------------|
| 1 | Aladdin | Arabia | A. |
| 2 | Long | Silver | John |
| 3 | Thor | Odinson | NULL |
| 4 | Baba | Yaga | NULL |
| 5 | Robin | Hood | Locksley |
| 6 | Sherlock | Holmes | J. |
| 7 | King | Kong | Null |
past_elections-candidates
| ID | PastElection | Candidate | Votes |
|----|--------------|-----------|-------|
| 1 | 1 | 1 | 200 |
| 2 | 1 | 2 | 100 |
| 3 | 1 | 6 | 50 |
| 4 | 2 | 3 | 75 |
| 5 | 2 | 4 | 25 |
| 6 | 2 | 5 | 150 |
| 7 | 2 | 7 | 100 |
I would expect the following output:
| election_ID | FirstName | LastName | votes | percent |
|-------------|-----------|----------|-------|---------|
| 1 | Aladdin | Arabia | 200 | 0.5714 |
| 2 | Robin | Hood | 150 | 0.4286 |
| 2 | King | Kong | 100 | 0.2857 |
I've tried setting a variable and using that with a LIMIT statement but variables don't work in limits. I've also tried using ROW_NUMBER() (I'm not using MySQL 8.0 so this won't work but I'd be willing to upgrade if it did) or a related workaround like #row_number := IF ... and then filtering based on the row number but nothing has worked.
Last tried query:
SELECT pe.election_ID as elec,
pe.Seats as s,
pecs.row_num,
c.FirstName,
c.LastName,
pecs.max_votes AS votes,
pecs.max_votes / pecs.total_votes AS percent
FROM past_elections pe
JOIN `past_elections-candidates` pec ON pec.PastElection = pe.election_ID
JOIN (SELECT PastElection,
Candidate,
#row_num := IF(PastElection = #current_election, #current_election + 1, 1) as row_num,
MAX(Votes) AS max_votes,
SUM(Votes) AS total_votes,
#current_election := PastElection
FROM `past_elections-candidates`
GROUP BY PastElection) pecs ON pecs.PastElection = pec.PastElection AND pecs.row_num <= pe.Seats
JOIN candidates c ON c.Candidate_ID = pec.Candidate

Use MySQL 8 regardless ;)
Use ROW_NUMBER to order the past elections:
SELECT *, ROW_NUMBER() OVER(PARTITION BY pastelection ORDER BY votes DESC) as rown
FROM `past_elections-candidates`
Join this to past_elections as a subquery (this is just the bit you're stuck on with the "using pe.seats to vary the number of rows returned per election" and doesn't include the percent bits:
SELECT *
FROM
past_elections pe
INNER JOIN
(
SELECT *, ROW_NUMBER() OVER(PARTITION BY pastelection ORDER BY votes DESC) as rown
FROM `past_elections-candidates`
) pecr
ON pecr.pastelection = pe.electionid AND
pecr.rown <= pe.seats
If you want to test things out on 8 before you upgrade, loads of the db fiddle sites support v8
ps; percent-y stuff can be done at the same time as the ROW_NUMBER with eg:
votes/SUM(votes) OVER(PARTITION BY past_election)
eg for election ID 1 that sum will be 200+100+50, giving 200/350 = ~57%
SELECT *, votes/SUM(votes) OVER(PARTITION BY past_election) as pcnt, ROW_NUMBER() OVER(PARTITION BY pastelection ORDER BY votes DESC) as rown
FROM `past_elections-candidates`
You need to calc it before filtering

I don't have the right fields listed but this is as close as I'll probably get tonight... I've gotten the rows I need but need to join the candidate table to get the name out...
Using Dense_Rank seems to work for this...
SELECT * FROM (
SELECT pec.PastElection,
c.FirstName,
c.LastName,
pec.Votes,
pecs.totalVotes,
pe.Seats as s,
DENSE_RANK() OVER(PARTITION BY PastElection ORDER BY Votes DESC) as rank_votes
FROM `past_elections-candidates` pec
JOIN (SELECT PastElection,
Max(Votes) as maxVotes,
Sum(Votes) as totalVotes
FROM `past_elections-candidates`
GROUP BY PastElection) pecs ON pecs.PastElection = pec.PastElection
JOIN `past_elections` pe ON pec.PastElection = pe.election_ID
JOIN candidates c ON c.Candidate_ID = pec.Candidate
) t WHERE rank_votes <= s;
This results in
| PastElection | FirstName | LastName | Votes | totalVotes | s | rank_votes |
|--------------|-----------|----------|-------|------------|---|------------|
| 1 | Aladdin | Arabia | 200 | 350 | 1 | 1 |
| 2 | Robin | Hood | 150 | 350 | 2 | 1 |
| 2 | King | Kong | 100 | 350 | 2 | 2 |
I guess it's just kind of messy having the rank_votes and s columns in the data, but that's honestly fine with me if it gets the results I need.

SELECT query where LIMIT is a distinct count of repeating key

I have a problem with selecting specific amount of data. The problem is that one of the keys have the same repeated value.
--------------------
| id | name | key |
--------------------
| 1 | alfa | a |
| 2 | alfa | b |
| 3 | alfa | c |
| 4 | beal | a |
| 5 | beal | b |
| 6 | gala | c |
| 7 | gala | d |
| 8 | delt | a |
| 9 | ceta | a |
--------------------
In this situation I want to select three individual names. For example I want to limit distinct name to 3 positions to get this result:
SAMPLE DUMP CODE:
SELECT * in Table
WHERE `name` LIKE '%al%'
LIMIT BY DISTINCT
`name`, 3
------ RESULT ------
| 1 | alfa | a |
| 2 | alfa | b |
| 3 | alfa | c |
| 4 | beal | a |
| 5 | beal | b |
| 6 | gala | c |
| 7 | gala | d |
--------------------
I will be glad for help.

Without window functions:
select *
from (
select distinct name
from mytable
where `name` like '%al%'
order by name
limit 3
) n
natural join mytable
db-fiddle
If you don't like NATURAL JOINs you can also use
select t.*
from (
select distinct name
from mytable
where `name` like '%al%'
order by name
limit 3
) n
join mytable t on t.name = n.name
If window functions are supported, you can use DENSE_RANK():
with cte as (
select *,
dense_rank() over (order by name) as dr
from mytable
where `name` like '%al%'
)
select id, name, `key`
from cte
where dr <= 3
db-fiddle
I prefer the LIMIT 3 subquery, since it can stop the index scan (depending on optimizer) after three distinct names are found.

MySQL 8.0 solution utilizing Window functions is as follows:
SELECT
dt.id, dt.name, dt.`key`
FROM
(
SELECT
ROW_NUMBER() OVER (PARTITION BY name ORDER BY id) AS rn,
id,
name,
`key`
FROM your_table_name
WHERE name LIKE '%al%'
) AS dt
WHERE dt.rn <= 3
ORDER BY dt.id
Explanation:
In a Derived table (subquery), determine Row_Number() within a partition (group) of specific name, ordered by id in ascending order. We will consider only names matching %al% condition.
Now, use the subquery result to SELECT only the rows having row number upto 3 (basically limiting to 3 rows per name).
By the way, key is a Reserved Keyword in MySQL. You should consider renaming column to something else; otherwise you will need to use backticks around it.
Result
| id | name | key |
| --- | ---- | --- |
| 1 | alfa | a |
| 2 | alfa | b |
| 3 | alfa | c |
| 4 | beal | a |
| 5 | beal | b |
| 6 | gala | c |
| 7 | gala | d |
View on DB Fiddle

sort data by specific order sequence (mysql)

So, let say I have this data
id | value | group
1 | 100 | A
2 | 120 | A
3 | 150 | B
4 | 170 | B
I want to sort it so it become like this
id | value | group
1 | 100 | A
3 | 150 | B
2 | 120 | A
4 | 170 | B
there will be more group than that, so if I the data ordered the group like (A,C,B,D,B,C,A), it will become (A,B,C,D,A,B,C)

You can add a counter column to the table, which will be used to sort the table:
select t.id, t.value, t.`group`
from (
select t.id, t.value, t.`group`,
(select count(*) from tablename
where `group` = t.`group` and id < t.id) counter
from tablename t
) t
order by t.counter, t.`group`
See the demo.
Results:
| id | value | group |
| --- | ----- | ----- |
| 1 | 100 | A |
| 3 | 150 | B |
| 2 | 120 | A |
| 4 | 170 | B |

You can approach this as
SELECT *
FROM `tablename`
ORDER BY
row_number() OVER (PARTITION BY `group` ORDER BY `group`), `group`

MySQL - generate numbers for groups of a result

I need a query to return this result:
+---------+-----+-------+
| ref_nid | nid | delta |
+---------+-----+-------+
| AA | 97 | 1 |
| BB | 97 | 2 |
| CC | 97 | 3 |
| DD | 98 | 1 |
| EE | 98 | 2 |
| FF | 98 | 3 |
+---------+-----+-------+
However, I do not have the delta column. I need to generate it for each nid group.
In other words, I need an auto incremented number for each group of the result.

Check out this guy's blog
select #rownum:=#rownum+1 ‘rank’, p.* from player p, (SELECT #rownum:=0) r order by score desc limit 10;
Basically,
set #i = 0;
select id, #i:=#i+1 as myrow from mytable

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

MySQL: Calculating Median of Values grouped by a Column - mysql

Related

Mysql Query numbering groups rows

Retrieving a variable number of rows using a table join

SELECT query where LIMIT is a distinct count of repeating key

sort data by specific order sequence (mysql)

MySQL - generate numbers for groups of a result

Categories

Resources