how to count median properly in mysql 5.7 - mysql

this is my fiddle https://dbfiddle.uk/?rdbms=mysql_5.7&fiddle=7946871d9c25cd8914353c70fde1fe8d
so this is my queries
select count(user_id) as itung, user_Id from
(SELECT t1.user_id,
t1.createdAt cretecompare1,
t2.createdAt cretecompare2,
DATEDIFF(t2.createdAt, t1.createdAt) diff
-- table for a transaction
FROM test t1
-- table for prev. transaction
JOIN test t2 ON t1.user_id = t2.user_id
AND t1.createdAt < t2.createdAt
AND 7 NOT IN (t1.status_id, t2.status_id)
JOIN (SELECT t3.user_id
FROM test t3
WHERE t3.status_id != 7
GROUP BY t3.user_id
HAVING SUM(t3.createdAt < '2020-04-01') > 1
AND SUM(t3.createdAt BETWEEN '2020-02-01' AND '2020-04-01')) t4 ON t1.user_id = t4.user_id
WHERE NOT EXISTS (SELECT NULL
FROM test t5
WHERE t1.user_id = t5.user_id
AND t5.status_id != 7
AND t1.createdAt < t5.createdAt
AND t5.createdAt < t2.createdAt)
HAViNG cretecompare2 BETWEEN '2020-02-01' AND '2020-04-01') aa
group by user_Id
output table:
+--------+---------+
| itung | user_Id |
+--------+---------+
| 1 | 13 |
| 2 | 14 |
+--------+---------+
based on that table i want to find out the max(itung), min(itung), and the median(itung), with this query
select max(itung), min(itung), format(avg(itung), 2), IF(count(*)%2 = 1, CAST(SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(itung ORDER BY itung SEPARATOR ',')
, ',', 50/100 * COUNT(*)), ',', -1) AS DECIMAL), ROUND((CAST(SUBSTRING_INDEX(SUBSTRING_INDEX
( GROUP_CONCAT(itung ORDER BY itung SEPARATOR ','), ',', 50/100
* COUNT(*) + 1), ',', -1) AS DECIMAL) + CAST(SUBSTRING_INDEX(SUBSTRING_INDEX
( GROUP_CONCAT(itung ORDER BY itung SEPARATOR ','), ',', 50/100
* COUNT(*)), ',', -1) AS DECIMAL)) / 2)) as median from
(select count(user_id) as itung, user_Id from
(SELECT t1.user_id,
t1.createdAt cretecompare1,
t2.createdAt cretecompare2,
DATEDIFF(t2.createdAt, t1.createdAt) diff
-- table for a transaction
FROM test t1
-- table for prev. transaction
JOIN test t2 ON t1.user_id = t2.user_id
AND t1.createdAt < t2.createdAt
AND 7 NOT IN (t1.status_id, t2.status_id)
JOIN (SELECT t3.user_id
FROM test t3
WHERE t3.status_id != 7
GROUP BY t3.user_id
HAVING SUM(t3.createdAt < '2020-04-01') > 1
AND SUM(t3.createdAt BETWEEN '2020-02-01' AND '2020-04-01')) t4 ON t1.user_id = t4.user_id
WHERE NOT EXISTS (SELECT NULL
FROM test t5
WHERE t1.user_id = t5.user_id
AND t5.status_id != 7
AND t1.createdAt < t5.createdAt
AND t5.createdAt < t2.createdAt)
HAViNG cretecompare2 BETWEEN '2020-02-01' AND '2020-04-01') aa
group by user_Id) ab
output table:
+------------+------------+-----------------------+--------+
| max(itung) | min(itung) | format(avg(itung), 2) | median |
+------------+------------+-----------------------+--------+
| 2 | 1 | 1.50 | 2 |
+------------+------------+-----------------------+--------+
you know that's wrong query for the median because median should be 1,5 not 2. where my wrong at in my median query?

You have ROUND() there to round the reported median to an integer. If you don't want that, remove it:
select max(itung), min(itung), format(avg(itung), 2), IF(count(*)%2 = 1, CAST(SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(itung ORDER BY itung SEPARATOR ',') , ',', 50/100 * COUNT(*)), ',', -1) AS DECIMAL), (CAST(SUBSTRING_INDEX(SUBSTRING_INDEX ( GROUP_CONCAT(itung ORDER BY itung SEPARATOR ','), ',', 50/100 * COUNT(*) + 1), ',', -1) AS DECIMAL) + CAST(SUBSTRING_INDEX(SUBSTRING_INDEX ( GROUP_CONCAT(itung ORDER BY itung SEPARATOR ','), ',', 50/100 * COUNT(*)), ',', -1) AS DECIMAL)) / 2) as median
or leave in the round and add a number of decimal places to round to, here 3:
select max(itung), min(itung), format(avg(itung), 2), IF(count(*)%2 = 1, CAST(SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(itung ORDER BY itung SEPARATOR ',') , ',', 50/100 * COUNT(*)), ',', -1) AS DECIMAL), ROUND((CAST(SUBSTRING_INDEX(SUBSTRING_INDEX ( GROUP_CONCAT(itung ORDER BY itung SEPARATOR ','), ',', 50/100 * COUNT(*) + 1), ',', -1) AS DECIMAL) + CAST(SUBSTRING_INDEX(SUBSTRING_INDEX ( GROUP_CONCAT(itung ORDER BY itung SEPARATOR ','), ',', 50/100 * COUNT(*)), ',', -1) AS DECIMAL)) / 2, 3)) as median
Be aware that looking up the median values from a GROUP_CONCAT comma-separated list of all the values only works if there are not too many rows, since GROUP_CONCAT will be truncated at ##group_concat_max_len, which defaults to 1024 characters on MySQL or on MariaDB before 10.2.

Related

How to calculate percentiles in MySQL with decimals?

I am want to archive daily price statistics in MySQL 5.6 where I also use 25/50/75 percentiles. The formulas I found seem to work but are rounded to integers.
How can I get the 25/50/75 percentiles of price with 2 decimals precision?
INSERT INTO prices_daily
(DATE, ID, PRICE_MIN, PRICE_MAX, PRICE_AVG, `PRICE_25PZTL`, `PRICE 50PZTL`, `PRICE_75PZTL`, STRENGTH)
SELECT
DATE,
ID,
ROUND(MIN(price),2) AS PRICE_MIN,
ROUND(MAX(price),2) AS PRICE_MAX,
ROUND(AVG(price),2) AS PRICE_AVG,
CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(GROUP_CONCAT(price ORDER BY price SEPARATOR ','), ',', 25/100 * COUNT(*) + 1), ',', -1) AS DECIMAL) AS 'PRICE_25PZTL',
CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(GROUP_CONCAT(price ORDER BY price SEPARATOR ','), ',', 50/100 * COUNT(*) + 1), ',', -1) AS DECIMAL) AS 'PRICE_50PZTL',
CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(GROUP_CONCAT(price ORDER BY price SEPARATOR ','), ',', 75/100 * COUNT(*) + 1), ',', -1) AS DECIMAL) AS 'PRICE_75PZTL',
count(ID) AS STRENGTH
FROM
`articles`
WHERE
DATE = CURDATE()
GROUP BY
DATE,
ID
ORDER BY
STRENGTH DESC

Mysql query with loop

I have this query, works fine for view and csv export from phpmyadmin.
Is possible create a loop without repeat? thanks!
SELECT
id, date, name,
SUBSTRING_INDEX(SUBSTRING_INDEX(message, '-', 1), '(', -1) AS op,
SUBSTRING_INDEX(SUBSTRING_INDEX(message, '-', 4), '-', -3) AS dt,
SUBSTRING_INDEX(SUBSTRING_INDEX(message, ')', 1), '-', -1) AS hour,
SUBSTRING_INDEX(SUBSTRING_INDEX(message, '(', 2), '-', -1) AS note
FROM center
WHERE center.date BETWEEN '2019-08-01 00:00:00' AND '2019-12-31 00:00:00'
and message!= ''
HAVING op = 'op1' OR op = 'op2'
UNION SELECT
id, date, name,
SUBSTRING_INDEX(SUBSTRING_INDEX(message, '-', 6), '(', -1) AS op,
SUBSTRING_INDEX(SUBSTRING_INDEX(message, '-', 9), '-', -3) AS dt,
SUBSTRING_INDEX(SUBSTRING_INDEX(message, ')', 2), '-', -1) AS hour,
SUBSTRING_INDEX(SUBSTRING_INDEX(message, '(', 3), '-', -1) AS note
FROM center
WHERE center.date BETWEEN '2019-08-01 00:00:00' AND '2019-12-31 00:00:00'
and message!= ''
HAVING op = 'op1' OR op = 'op2'
UNION SELECT.... more
You can test this Query. It split a max. of 10 pieces from a row.
SELECT `id`,`date`,`name`,CONCAT('op',`op`) as op,`dt`,`hour`,`note`
,subid,cols -- only for test. you can remove this line
FROM (
SELECT c.id,c.date,c.name,
cnt.*,
-- count the pieces in one row
(LENGTH(message)-LENGTH(replace(message,'(op','')))/3 as cols,
-- Split String in piece and store in #content
#content := SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT(' (op ',c.message,' (op'), ' (op', subid+3), ' (op', -1)
, SUBSTRING_INDEX(#content, ' - ',1) as op
, SUBSTRING_INDEX( SUBSTRING_INDEX(#content, ' - ',2), ' - ',-1) as dt
, TRIM( TRAILING ')' FROM SUBSTRING_INDEX( SUBSTRING_INDEX(#content, ' - ',3), ' - ',-1)) as hour
, SUBSTRING_INDEX( SUBSTRING_INDEX(#content, ' - ',4), ' - ',-1) as note
FROM center c
CROSS JOIN (
SELECT 0 as subid UNION ALL SELECT 1 UNION ALL SELECT 2
UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5
UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8
UNION ALL SELECT 9
) as cnt
) as result
WHERE
subid < cols
AND `date` BETWEEN '2019-01-01 00:00:00' AND '2019-12-31 00:00:00'
AND op in (1,2)
ORDER BY id,subid,cols;
Here is a Sample : http://www.sqlfiddle.com/#!9/8bc3b4/60

Delete all records having value 5x lower than the second highest value

I have a table (prices) with two fields, code (char) and price (decimal). I need to find all records with same code, having price 5x lower or less then two highest prices.
E.G. In this case I wish to delete id=1:
id code price
1 1001 10
2 1001 101
3 1001 40
4 1001 201
5 1002 122
6 1002 50
DELETE
FROM myTable
WHERE ID IN (
SELECT *
FROM (
SELECT t2.id
FROM myTable t2
WHERE EXISTS (
SELECT 1
FROM myTable t3
WHERE t3.code = t2.code
AND t3.price > t2.price * 5
HAVING COUNT(*) > 1
)
) t
)
;
My approach:
DELETE t
FROM t
JOIN (SELECT code,
REVERSE(SUBSTRING_INDEX(REVERSE(SUBSTRING_INDEX(
GROUP_CONCAT(price ORDER BY price DESC SEPARATOR ';'), ';', 2)), ';',1))
AS second_price
FROM t
GROUP BY code) s
ON t.code = s.code
AND t.price * 5 < s.second_price;
Rextester Demo
It is based on selecting second price:
SELECT code,
REVERSE(SUBSTRING_INDEX(REVERSE(SUBSTRING_INDEX(
GROUP_CONCAT(price ORDER BY price DESC SEPARATOR ';'), ';', 2)), ';',1))
AS second_price
FROM t
GROUP BY code;
Rextester Demo2
EDIT:
Actually it could be much easier:
DELETE t
FROM t
JOIN (SELECT code,
SUBSTRING_INDEX(SUBSTRING_INDEX(
GROUP_CONCAT(price ORDER BY price DESC SEPARATOR ';'),
';', 2), ';',-1) AS second_price
FROM t
GROUP BY code) s
ON t.code = s.code
AND t.price * 5 < s.second_price;
Rextester Demo
Good and fast solution, but honestly, I can not clearly understand it.
It is very easy to follow:
SELECT code,
GROUP_CONCAT(price ORDER BY price DESC SEPARATOR ';'),
SUBSTRING_INDEX(GROUP_CONCAT(price ORDER BY price DESC SEPARATOR ';'), ';', 2),
SUBSTRING_INDEX(SUBSTRING_INDEX(
GROUP_CONCAT(price ORDER BY price DESC SEPARATOR ';'),
';', 2), ';',-1)
FROM t
GROUP BY code;

mysql query to display row data in one column

I wrote query like
SELECT
SUBSTRING_INDEX(inter.CHR_SKILLLEVELS, ',', 1) AS level1,
IF(#num_lines > 1, SUBSTRING_INDEX(SUBSTRING_INDEX(inter.CHR_SKILLLEVELS, ',', 2), ',', -1), '') AS level2,
IF(#num_lines > 2, SUBSTRING_INDEX(SUBSTRING_INDEX(inter.CHR_SKILLLEVELS, ',', 3), ',', -1), '') AS level3,
IF(#num_lines > 3, SUBSTRING_INDEX(SUBSTRING_INDEX(inter.CHR_SKILLLEVELS, ',', 4), ',', -1), '') AS level4,
IF(#num_lines > 4, SUBSTRING_INDEX(SUBSTRING_INDEX(inter.CHR_SKILLLEVELS, ',', 5), ',', -1), '') AS level5
FROM hrm_t_interview inter,
(SELECT #num_lines := 1 + LENGTH(CHR_SKILLLEVELS) - LENGTH(REPLACE(CHR_SKILLLEVELS, ',', '')) FROM hrm_t_interview WHERE INT_APPLICANTID=15) temp
WHERE inter.INT_APPLICANTID=15
I displayed values like
level1 | level2 | level3
======================================
4 | 3 | 5
I want to display values like
column1 | column2
========================
level1 | 4
level2 | 3
level3 | 5
Please help me using mysql.
Crude way would be to use multiple unioned queries, something like this:-
SELECT 'level1' AS column1, SUBSTRING_INDEX(inter.CHR_SKILLLEVELS, ',', 1) AS column2
FROM hrm_t_interview inter
INNER JOIN
(
SELECT INT_APPLICANTID, 1 + LENGTH(CHR_SKILLLEVELS) - LENGTH(REPLACE(CHR_SKILLLEVELS, ',', '')) AS num_lines
FROM hrm_t_interview
GROUP BY INT_APPLICANTID
HAVING num_lines > 0
) temp
ON inter.INT_APPLICANTID = temp.INT_APPLICANTID
WHERE inter.INT_APPLICANTID=15
UNION
SELECT 'level2' AS column1, SUBSTRING_INDEX(SUBSTRING_INDEX(inter.CHR_SKILLLEVELS, ',', 2), ',', -1) AS column2
FROM hrm_t_interview inter
INNER JOIN
(
SELECT INT_APPLICANTID, 1 + LENGTH(CHR_SKILLLEVELS) - LENGTH(REPLACE(CHR_SKILLLEVELS, ',', '')) AS num_lines
FROM hrm_t_interview
GROUP BY INT_APPLICANTID
HAVING num_lines > 1
) temp
ON inter.INT_APPLICANTID = temp.INT_APPLICANTID
WHERE inter.INT_APPLICANTID=15
UNION
SELECT 'level3' AS column1, SUBSTRING_INDEX(SUBSTRING_INDEX(inter.CHR_SKILLLEVELS, ',', 3), ',', -1) AS column2
FROM hrm_t_interview inter
INNER JOIN
(
SELECT INT_APPLICANTID, 1 + LENGTH(CHR_SKILLLEVELS) - LENGTH(REPLACE(CHR_SKILLLEVELS, ',', '')) AS num_lines
FROM hrm_t_interview
GROUP BY INT_APPLICANTID
HAVING num_lines > 2
) temp
ON inter.INT_APPLICANTID = temp.INT_APPLICANTID
WHERE inter.INT_APPLICANTID=15
UNION
SELECT 'level4' AS column1, SUBSTRING_INDEX(SUBSTRING_INDEX(inter.CHR_SKILLLEVELS, ',', 4), ',', -1) AS column2
FROM hrm_t_interview inter
INNER JOIN
(
SELECT INT_APPLICANTID, 1 + LENGTH(CHR_SKILLLEVELS) - LENGTH(REPLACE(CHR_SKILLLEVELS, ',', '')) AS num_lines
FROM hrm_t_interview
GROUP BY INT_APPLICANTID
HAVING num_lines > 3
) temp
ON inter.INT_APPLICANTID = temp.INT_APPLICANTID
WHERE inter.INT_APPLICANTID=15
UNION
SELECT 'level5' AS column1, SUBSTRING_INDEX(SUBSTRING_INDEX(inter.CHR_SKILLLEVELS, ',', 5), ',', -1) AS column2
FROM hrm_t_interview inter
INNER JOIN
(
SELECT INT_APPLICANTID, 1 + LENGTH(CHR_SKILLLEVELS) - LENGTH(REPLACE(CHR_SKILLLEVELS, ',', '')) AS num_lines
FROM hrm_t_interview
GROUP BY INT_APPLICANTID
HAVING num_lines > 4
) temp
ON inter.INT_APPLICANTID = temp.INT_APPLICANTID
WHERE inter.INT_APPLICANTID=15
A bit more elegant might be to generate a range of rows, one for each possible delimited value.
Not tested but something like this:-
SELECT CONCAT('level', temp2.iCnt) AS column1, SUBSTRING_INDEX(SUBSTRING_INDEX(inter.CHR_SKILLLEVELS, ',', temp2.iCnt), ',', -1) AS column2
FROM hrm_t_interview inter
INNER JOIN
(
SELECT 1 + LENGTH(CHR_SKILLLEVELS) - LENGTH(REPLACE(CHR_SKILLLEVELS, ',', '')) AS num_lines
FROM hrm_t_interview
GROUP BY INT_APPLICANTID
) temp
ON inter.INT_APPLICANTID = temp.INT_APPLICANTID
INNER JOIN
(
SELECT 0 AS iCnt UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4
) temp2
ON temp.num_lines >= temp2.iCnt
WHERE inter.INT_APPLICANTID=15
If you can paste up some test data I will check the sql.

Separate group_concat() result

I am trying to get 5 latest order timings of customers using:
SET SESSION group_concat_max_len = 99;
select o.customer_id, substring_index(m.orders,',', 1) as order1,
(case when numc >=2 then substring_index(substring_index(m.orders, ',', 2), ',', -1)end) as order2,
(case when numc >=3 then substring_index(substring_index(m.orders, ',', 3), ',', -1)end) as order3,
(case when numc >=4 then substring_index(substring_index(m.orders, ',', 4), ',', -1)end) as order4,
(case when numc >=5 then substring_index(substring_index(m.orders, ',', 5), ',', -1)end) as order5
from orders o,
(select group_concat(date order by date desc) as orders, count(*) as numc
FROM orders) m
where country_id='27'
group by customer_id
But it is returning me sysdate for all customers.
Where am i doing it wrong?
I'm not sure what you mean by "sysdate for all customers". But, the join is not doing what you expect.
Just do the aggregation once:
select o.customer_id, substring_index(group_concat(date order by date desc),',', 1) as order1,
(case when count(*) >=2 then substring_index(substring_index(group_concat(date order by date desc), ',', 2), ',', -1)end) as order2,
(case when count(*) >=3 then substring_index(substring_index(group_concat(date order by date desc), ',', -1)end) as order3,
(case when count(*) >=4 then substring_index(substring_index(group_concat(date order by date desc), ',', 4), ',', -1)end) as order4,
(case when count(*) >=5 then substring_index(substring_index(group_concat(date order by date desc), ',', 5), ',', -1)end) as order5
from orders o
where country_id='27'
group by customer_id