How to calculate the median category wise in mysql - mysql

I need to find median of total scores region wise. I have got the solution after trial and error on data but the query is not in a optimized way. i need a efficient mysql query for this problem.
#Thanks for the solutions#
Edit: first exam has to be filter from assessment table and second total_score needs to be summed for all subject of each student using studentassessment table. Then finally region wise median needs to be calculated.
SELECT region,
Avg(total_score) AS median
FROM (SELECT row_num,
region,
total_score,
region_cnt
FROM (SELECT Row_number()
OVER (
partition BY region
ORDER BY total_score) AS row_num,
region,
total_score,
Count(region)
OVER (
partition BY region) AS region_cnt
FROM (SELECT i.region AS region,
Sum(S.score) AS total_score
FROM tredence.assesment A
INNER JOIN tredence.studentassessment S
ON A.id_assessment = S.id_assessment
INNER JOIN tredence.studentinfo i
ON i.id_student = S.id_student
WHERE A.assessment = 'Exam'
GROUP BY S.id_student,
i.region
ORDER BY region,
total_score) t) r
GROUP BY 1,
2,
3,
4
HAVING row_num IN ( Floor(region_cnt / 2), Ceil(region_cnt / 2) )) z
GROUP BY region
ORDER BY median DESC ```
tables and columns:
|Assessments |student_info|student_assessment|
|---------------|------------|------------------|
|course_code |course_code |id_assessment |
|batch_code |batch_code |id_student |
|id_assessments |id_student |date_submitted |
|assessment_type|gender |is_banked |
|date |region |score |
Output:
|Region |Median|
|-------------|------|
|North Region | 82 |
|London Region| 80 |
|Scotland | 80 |
|Ireland | 76 |

Assuming you reduce the set to the following. Note: id_student isn't required at this point in the calculation.
CREATE TABLE tscores (
id int primary key auto_increment
, region int
, id_student int
, total_score int
, index (region, total_score)
);
INSERT INTO tscores (region, id_student, total_score) VALUES
(1, 1000, 40)
, (1, 1001, 50)
, (1, 1002, 30)
, (1, 1003, 90)
, (2, 1101, 50)
, (2, 1102, 51)
, (2, 1103, 55)
;
SQL and Result:
WITH cte1 AS (
SELECT region, total_score
, ((COUNT(*) OVER (PARTITION BY region) + 1) / 2) AS n
, ROW_NUMBER() OVER (PARTITION BY region ORDER BY total_score) AS rn
FROM tscores AS t
)
SELECT region
, truncate(AVG(total_score), 2) AS med_score
FROM cte1 AS t
WHERE rn IN (ceil(n), floor(n))
GROUP BY region
;
+--------+-----------+
| region | med_score |
+--------+-----------+
| 1 | 45.00 |
| 2 | 51.00 |
+--------+-----------+
2 rows in set (0.004 sec)
Still not quite enough detail. But here's SQL that runs against your schema, minus the typos I think you had in your SQL:
WITH tscores AS (
SELECT i.region AS region
, Sum(S.score) AS total_score
FROM tredence.assessments A
JOIN tredence.studentassessment S
ON A.id_assessment = S.id_assessment
JOIN tredence.studentinfo i
ON i.id_student = S.id_student
WHERE A.assessment = 'Exam'
GROUP BY S.id_student
, i.region
)
, cte1 AS (
SELECT region, total_score
, ((COUNT(*) OVER (PARTITION BY region) + 1) / 2) AS n
, ROW_NUMBER() OVER (PARTITION BY region ORDER BY total_score) AS rn
FROM tscores AS t
)
SELECT region
, truncate(AVG(total_score), 2) AS med_score
FROM cte1 AS t
WHERE rn IN (ceil(n), floor(n))
GROUP BY region
;

Related

count wthout invalid use group of function mysql

I have a table like this,
CREATE TABLE order_match
(`order_buyer_id` int, `createdby` int, `createdAt` datetime, `quantity` decimal(10,2))
;
INSERT INTO order_match
(`order_buyer_id`, `createdby`, `createdAt`, `quantity`)
VALUES
(19123, 19, '2017-02-02', 5),
(193241, 19, '2017-02-03', 5),
(123123, 20, '2017-02-03', 1),
(32242, 20, '2017-02-04', 4),
(32434, 20, '2017-02-04', 5),
(2132131, 12, '2017-02-02', 6)
;
here's the fiddle
on this table, order_buyer_id is id of the transaction, createdby are the buyer, createdAt are the time of each transaction, quantity are the quantity of transaction
I want to find out the maximum, minimum, median and average for each repeat order (the buyer with transaction > 1)
so on this table, expected results are just like this
+-----+-----+---------+--------+
| MAX | MIN | Average | Median |
+-----+-----+---------+--------+
| 3 | 2 | 2.5 | 3 |
+-----+-----+---------+--------+
note: im using mysql 5.7
I am using this syntax
select -- om.createdby, om.quantity, x1.count_
MAX(count(om.createdby)) AS max,
MIN(count(om.createdby)) AS min,
AVG(count(om.createdby)) AS average
from (select count(xx.count_) as count_
from (select count(createdby) as count_ from order_match
group by createdby
having count(createdby) > 1) xx
) x1,
(select createdby
from order_match
group by createdby
having count(createdby) > 1) yy,
order_match om
where yy.createdby = om.createdby
and om.createdAt <= '2017-02-04'
and EXISTS (select 1 from order_match om2
where om.createdby = om2.createdby
and om2.createdAt >= '2017-02-02'
and om2.createdAt <= '2017-02-04')
but it's said
Invalid use of group function
We can try aggregating by createdby, and then taking the aggregates you want:
SELECT
MAX(cnt) AS MAX,
MIN(cnt) AS MIN,
AVG(cnt) AS Average
FROM
(
SELECT createdby, COUNT(*) AS cnt
FROM order_match
GROUP BY createdby
HAVING COUNT(*) > 0
) t
To simulate the median in MySQL 5.7 is a lot of work, and ugly. If you have a long term need for median, consider upgrading to MySQL 8+.

Select the last price according to the date using GROUP BY

I'm trying to do a request with a group BY.
Here is an exemple of my table ticket :
id DtSell Price Qt
1 01-01-2017 3.00 1
1 02-01-2017 2.00 3
2 01-01-2017 5.00 5
2 02-01-2017 8.00 2
And my request :
SELECT id, Price, sum(Qt) FROM ticket
GROUP BY id;
but unfortunately, the price returned is not necessarily the right one; I would like to have the last price according to DtSell like that :
id Price sum(Qt)
1 2.00 4
2 8.00 7
But i didn't find how to do it.
Can you help me ?
Thank you in advance!!
You might need a sub query,try below:
SELECT
t1.id,
(SELECT t2.price FROM ticket t2 WHERE t2.id=t1.id
ORDER BY t2.DtSell DESC LIMIT 1 ) AS price,
SUM(t1.Qt)
FROM ticket t1 GROUP BY t1.id;
You can do this with a group_concat()/substring_index() trick:
SELECT id, Price, SUM(Qt)
SUBSTRING_INDEX(GROUP_CONCAT(price ORDER BY dtsell DESC), ',' 1) as last_price
FROM ticket
GROUP BY id;
Two notes:
This is subject to internal limits on the length of the intermediate string used for GROUP_CONAT() (a limit that can easily be changed).
It changes the type of price to a string.
Try this query.
SELECT id, Price, sum(Qt) FROM ticket
GROUP BY id,Price
Your Output;
id Price sum(Qt)
1 3.00 4
2 8.00 7
You can select all rows from ticket grouped by id ( to sum quantity), then join to the rows which have the max dtsell for each id group( to select the price).
http://sqlfiddle.com/#!9/574cb9/8
SELECT t.id
, t3.price
, SUM(t.Qt)
FROM ticket t
JOIN ( SELECT t1.id
, t1.price
FROM ticket t1
JOIN ( SELECT id
, MAX(dtsell) dtsell
FROM ticket
GROUP BY id ) t2
ON t1.id = t2.id
AND t1.dtsell = t2.dtsell ) t3
ON t3.id = t.id
GROUP BY t.id;
You can do it like this:
declare #t table (id int, dtsell date, price numeric(18,2),qt int)
insert into #t
values
(1 ,'01-01-2017', 3.00 , 1),
(1 ,'02-01-2017', 2.00 , 3),
(2 ,'01-01-2017', 5.00 , 5),
(2 ,'02-01-2017', 8.00 , 2)
select x.id,price,z.Qt from (
select id,price,dtsell,row_number() over(partition by id order by dtsell desc ) as rn from #t
)x
inner join (select SUM(qt) as Qt,ID from #t group by id ) z on x.id = z.id
where rn = 1

How to get most occurences of rows for every user in mysql

user_id category suburb dated walk_time
1 experience US 2016-04-09 5
1 discovery US 2016-04-09 5
1 experience UK 2016-04-09 5
1 experience AUS 2016-04-23 10
2 actions IND 2016-04-15 2
2 actions IND 2016-04-15 1
2 discovery US 2016-04-21 2
3 discovery FR 2016-04-12 3
3 Emotions IND 2016-04-23 3
3 discovery UK 2016-04-12 4
3 experience IND 2016-04-12 3
I am trying to get every users most used category,suburb,dated,walk_time
so resulting table would be
user_id category suburb dated walk_time
1 experience US 2016-04-09 5
2 actions IND 2016-04-15 2
3 discovery IND 2016-04-12 3
The query I am trying here is
select user_id,
substring_index(group_concat(suburb order by cnt desc), ',', 1) as suburb_visited,
substring_index(group_concat(category order by cct desc), ',', 1) as category_used,
substring_index(group_concat(walk_time order by wct desc), ',', 1) as walked,
substring_index(group_concat(dated order by nct desc), ',', 1) as dated_at
from (select user_id, suburb, count(*) as cnt,category, count(*) cct, walk_time, count(*) wct, dated,count(*) nct
from temp_user_notes
group by user_id, suburb,category,walk_time,dated
) upv
group by user_id;
SELECT user_id,
(SELECT category FROM temp_user_notes t1
WHERE t1.user_id = T.user_id
GROUP BY category ORDER BY count(*) DESC LIMIT 1) as category,
(SELECT suburb FROM temp_user_notes t2
WHERE t2.user_id = T.user_id
GROUP BY suburb ORDER BY count(*) DESC LIMIT 1) as suburb,
(SELECT dated FROM temp_user_notes t3
WHERE t3.user_id = T.user_id
GROUP BY dated ORDER BY count(*) DESC LIMIT 1) as dated,
(SELECT walk_time FROM temp_user_notes t4
WHERE t4.user_id = T.user_id
GROUP BY walk_time ORDER BY count(*) DESC LIMIT 1) as walk_time
FROM (SELECT DISTINCT user_id FROM temp_user_notes) T
http://sqlfiddle.com/#!9/8aac6a/19
Try this, seems to be a little complicated, but hope help for you;)
Mysql Schema:
CREATE TABLE table1
(`user_id` int, `category` varchar(10), `suburb` varchar(3), `dated` datetime, `walk_time` int)
;
INSERT INTO table1
(`user_id`, `category`, `suburb`, `dated`, `walk_time`)
VALUES
(1, 'experience', 'US', '2016-04-09 00:00:00', 5),
(1, 'discovery', 'US', '2016-04-09 00:00:00', 5),
(1, 'experience', 'UK', '2016-04-09 00:00:00', 5),
(1, 'experience', 'AUS', '2016-04-23 00:00:00', 10),
(2, 'actions', 'IND', '2016-04-15 00:00:00', 2),
(2, 'actions', 'IND', '2016-04-15 00:00:00', 1),
(2, 'discovery', 'US', '2016-04-21 00:00:00', 2),
(3, 'discovery', 'FR', '2016-04-12 00:00:00', 3),
(3, 'Emotions', 'IND', '2016-04-23 00:00:00', 3),
(3, 'discovery', 'UK', '2016-04-12 00:00:00', 4),
(3, 'experience', 'IND', '2016-04-12 00:00:00', 3)
;
Query SQL:
select c.user_id, c.category, s.suburb, d.dated, w.walk_time
from (
select user_id, left(group_concat(category order by cnt desc), locate(',', group_concat(category order by cnt desc)) - 1) as category
from (
select
user_id, category, count(1) as cnt
from table1
group by user_id, category
) t
group by user_id
) c
inner join (
select user_id, left(group_concat(suburb order by cnt desc), locate(',', group_concat(suburb order by cnt desc)) - 1) as suburb
from (
select
user_id, suburb, count(1) as cnt
from table1
group by user_id, suburb
) t
group by user_id
) s on c.user_id = s.user_id
inner join (
select user_id, left(group_concat(dated order by cnt desc), locate(',', group_concat(dated order by cnt desc)) - 1) as dated
from (
select
user_id, dated, count(1) as cnt
from table1
group by user_id, dated
) t
group by user_id
) d on c.user_id = d.user_id
inner join (
select user_id, left(group_concat(walk_time order by cnt desc), locate(',', group_concat(walk_time order by cnt desc)) - 1) as walk_time
from (
select
user_id, walk_time, count(1) as cnt
from table1
group by user_id, walk_time
) t
group by user_id
) w on c.user_id = w.user_id
Result:
| user_id | category | suburb | dated | walk_time |
+---------+------------+--------+---------------------+-----------+
| 1 | experience | US | 2016-04-09 00:00:00 | 5 |
| 2 | actions | IND | 2016-04-15 00:00:00 | 2 |
| 3 | discovery | IND | 2016-04-12 00:00:00 | 3 |

How would I return the result of SQL math operations?

So I was taking a test recently with some higher level SQL problems. I only have what I would consider "intermediate" experience in SQL and I've been working on this for a day or so now. I just can't figure it out.
Here's the problem:
You have a table with 4 columns as such:
EmployeeID int unique
EmployeeType int
EmployeeSalary int
Created date
Goal: I need to retrieve the difference between the latest two EmployeeSalary for any EmployeeType with more than 1 entry. It has to be done in one statement (nested queries are fine).
Example Data Set: http://sqlfiddle.com/#!9/0dfc7
EmployeeID | EmployeeType | EmployeeSalary | Created
-----------|--------------|----------------|--------------------
1 | 53 | 50 | 2015-11-15 00:00:00
2 | 66 | 20 | 2014-11-11 04:20:23
3 | 66 | 30 | 2015-11-03 08:26:21
4 | 66 | 10 | 2013-11-02 11:32:47
5 | 78 | 70 | 2009-11-08 04:47:47
6 | 78 | 45 | 2006-11-01 04:42:55
So for this data set, the proper return would be:
EmployeeType | EmployeeSalary
-------------|---------------
66 | 10
78 | 25
The 10 comes from subtracting the latest two EmployeeSalary values (30 - 20) for the EmployeeType of 66. The 25 comes from subtracting the latest two EmployeeSalary values (70-45) for EmployeeType of 78. We skip EmployeeID 53 completely because it only has one value.
This one has been destroying my brain. Any clues?
Thanks!
How to make really simple query complex?
One funny way(not best performance) to do it is:
SELECT final.EmployeeType, SUM(salary) AS difference
FROM (
SELECT b.EmployeeType, b.EmployeeSalary AS salary
FROM tab b
JOIN (SELECT EmployeeType, GROUP_CONCAT(EmployeeSalary ORDER BY Created DESC) AS c
FROM tab
GROUP BY EmployeeType
HAVING COUNT(*) > 1) AS sub
ON b.EmployeeType = sub.EmployeeType
AND FIND_IN_SET(b.EmployeeSalary, sub.c) = 1
UNION ALL
SELECT b.EmployeeType, -b.EmployeeSalary AS salary
FROM tab b
JOIN (SELECT EmployeeType, GROUP_CONCAT(EmployeeSalary ORDER BY Created DESC) AS c
FROM tab
GROUP BY EmployeeType
HAVING COUNT(*) > 1) AS sub
ON b.EmployeeType = sub.EmployeeType
AND FIND_IN_SET(b.EmployeeSalary, sub.c) = 2
) AS final
GROUP BY final.EmployeeType;
SqlFiddleDemo
EDIT:
The keypoint is MySQL doesn't support windowed function so you need to use equivalent code:
For example solution in SQL Server:
SELECT EmployeeType, SUM(CASE rn WHEN 1 THEN EmployeeSalary
ELSE -EmployeeSalary END) AS difference
FROM (SELECT *,
ROW_NUMBER() OVER(PARTITION BY EmployeeType ORDER BY Created DESC) AS rn
FROM #tab
) AS sub
WHERE rn IN (1,2)
GROUP BY EmployeeType
HAVING COUNT(EmployeeType) > 1
LiveDemo
And MySQL equivalent:
SELECT EmployeeType, SUM(CASE rn WHEN 1 THEN EmployeeSalary
ELSE -EmployeeSalary END) AS difference
FROM (
SELECT t1.EmployeeType, t1.EmployeeSalary,
count(t2.Created) + 1 as rn
FROM #tab t1
LEFT JOIN #tab t2
ON t1.EmployeeType = t2.EmployeeType
AND t1.Created < t2.Created
GROUP BY t1.EmployeeType, t1.EmployeeSalary
) AS sub
WHERE rn IN (1,2)
GROUP BY EmployeeType
HAVING COUNT(EmployeeType) > 1;
LiveDemo2
The dataset of the fiddle is different from the example above, which is confusing (not to mention a little perverse). Anyway, there's lots of ways to skin this particular cat. Here's one (not the fastest, however):
SELECT a.employeetype, ABS(a.employeesalary-b.employeesalary) diff
FROM
( SELECT x.*
, COUNT(*) rank
FROM employees x
JOIN employees y
ON y.employeetype = x.employeetype
AND y.created >= x.created
GROUP
BY x.employeetype
, x.created
) a
JOIN
( SELECT x.*
, COUNT(*) rank
FROM employees x
JOIN employees y
ON y.employeetype = x.employeetype
AND y.created >= x.created
GROUP
BY x.employeetype
, x.created
) b
ON b.employeetype = a.employeetype
AND b.rank = a.rank+1
WHERE a.rank = 1;
a very similar but faster solution looks like this (although you sometimes need to assign different variables between tables a and b - for reasons I still don't fully understand)...
SELECT a.employeetype
, ABS(a.employeesalary-b.employeesalary) diff
FROM
( SELECT x.*
, CASE WHEN #prev = x.employeetype THEN #i:=#i+1 ELSE #i:=1 END i
, #prev := x.employeetype prev
FROM employees x
, (SELECT #prev := 0, #i:=1) vars
ORDER
BY x.employeetype
, x.created DESC
) a
JOIN
( SELECT x.*
, CASE WHEN #prev = x.employeetype THEN #i:=#i+1 ELSE #i:=1 END i
, #prev := x.employeetype prev
FROM employees x
, (SELECT #prev := 0, #i:=1) vars
ORDER
BY x.employeetype
, x.created DESC
) b
ON b.employeetype = a.employeetype
AND b.i = a.i + 1
WHERE a.i = 1;

Difficult MySQL Query - Getting Max difference between dates

I have a MySQL table of the following form
account_id | call_date
1 2013-06-07
1 2013-06-09
1 2013-06-21
2 2012-05-01
2 2012-05-02
2 2012-05-06
I want to write a MySQL query that will get the maximum difference (in days) between successive dates in call_date for each account_id. So for the above example, the result of this query would be
account_id | max_diff
1 12
2 4
I'm not sure how to do this. Is this even possible to do in a MySQL query?
I can do datediff(max(call_date),min(call_date)) but this would ignore dates in between the first and last call dates. I need some way of getting the datediff() between each successive call_date for each account_id, then finding the maximum of those.
I'm sure fp's answer will be faster, but just for fun...
SELECT account_id
, MAX(diff) max_diff
FROM
( SELECT x.account_id
, DATEDIFF(MIN(y.call_date),x.call_date) diff
FROM my_table x
JOIN my_table y
ON y.account_id = x.account_id
AND y.call_date > x.call_date
GROUP
BY x.account_id
, x.call_date
) z
GROUP
BY account_id;
CREATE TABLE t
(`account_id` int, `call_date` date)
;
INSERT INTO t
(`account_id`, `call_date`)
VALUES
(1, '2013-06-07'),
(1, '2013-06-09'),
(1, '2013-06-21'),
(2, '2012-05-01'),
(2, '2012-05-02'),
(2, '2012-05-06')
;
select account_id, max(diff) from (
select
account_id,
timestampdiff(day, coalesce(#prev, call_date), call_date) diff,
#prev := call_date
from
t
, (select #prev:=null) v
order by account_id, call_date
) sq
group by account_id
| ACCOUNT_ID | MAX(DIFF) |
|------------|-----------|
| 1 | 12 |
| 2 | 4 |
see it working live in an sqlfiddle
If you have an index on account_id, call_date, then you can do this rather efficiently without variables:
select account_id, max(call_date - prev_call_date) as diff
from (select t.*,
(select t2.call_date
from table t2
where t2.account_id = t.account_id and t2.call_date < t.call_date
order by t2.call_date desc
limit 1
) as prev_call_date
from table t
) t
group by account_id;
Just for educational purposes, doing it with JOIN:
SELECT t1.account_id,
MAX(DATEDIFF(t2.call_date, t1.call_date)) AS max_diff
FROM t t1
LEFT JOIN t t2
ON t2.account_id = t1.account_id
AND t2.call_date > t1.call_date
LEFT JOIN t t3
ON t3.account_id = t1.account_id
AND t3.call_date > t1.call_date
AND t3.call_date < t2.call_date
WHERE t3.account_id IS NULL
GROUP BY t1.account_id
Since you didn't specify, this shows max_diff of NULL for accounts with only 1 call.
SELECT a1.account_id , max(a1.call_date - a2.call_date)
FROM account a2, account a1
WHERE a1.account_id = a2.account_id
AND a1.call_date > a2.call_date
AND NOT EXISTS
(SELECT 1 FROM account a3 WHERE a1.call_date > a3.call_date AND a2.call_date < a3.call_date)
GROUP BY a1.account_id
Which gives :
ACCOUNT_ID MAX(A1.CALL_DATE - A2.CALL_DATE)
1 12
2 4