SQL - Percentiles

SQL - Percentiles - mysql

I have one table:
country(ID, city, freg, counts, date)
I want to calculate the 90th percentile of counts in a specific interval of dates ($min and $max).
I've already did the same but with the average (code below):
SELECT
AVG(counts)
FROM country
WHERE date>= #min AND date < #max
;
How can I calculate the 90th percentile instead of the average?

Finally, something GROUP_CONCAT is good for...
SELECT SUBSTRING_INDEX(
SUBSTRING_INDEX(
GROUP_CONCAT(ct.ctdivol ORDER BY ct.ctdivol SEPARATOR ','),',',90/100 * COUNT(*) + 1
),',',-1
) `90th Percentile`
FROM ct
JOIN exam e
ON e.examid = ct.examid
AND e.date BETWEEN #min AND #max
WHERE e.modality = 'ct';

It appears doing it with a single query is not possible. At least not in MySQL.
You can do it in multiple queries:
1) Select how many rows satisfy your condition.
SELECT
COUNT(*)
FROM exam
INNER JOIN ct on exam.examID = ct.examID AND ct.ctdivol_mGy > 0
WHERE exam.modality = 'CT'
AND exam.date >= #min AND exam.date < #max
2) Check the percentile threshold by multiplying the number of rows by percentile/100. For example:
Number of rows in previous count: 200
Percentile: 90%
Number of rows to threshold: 200 * (90/100) = 180
3) Repeat the query, order by the value you want the percentile from and LIMIT the result to the only row number you found in the 2nd point. Like so:
SELECT
ct.ctdivol_mGy
FROM exam
INNER JOIN ct on exam.examID = ct.examID AND ct.ctdivol_mGy > 0
WHERE exam.modality = 'CT'
AND exam.date >= #min AND exam.date < #max
ORDER BY ct.ctdivol_mGy
LIMIT 1 OFFSET 179 --> Take 1 row after 179 rows, so our 180th we need
You'll get the 180th value of the selected rows, so the 90th percentile you need.
Hope this helps!

Related

current average for each row of data

I have a table output with as-Date Output
1-Jan 20
2-Jan 40
3-Jan 30
4-Jan 100
5-Jan 120
6-Jan 10
7-Jan 90
8-Jan 80
9-Jan 60till
31-Dec 120
I need to query the average of each date where the average is the culmilative average of values from 1st date to current date as below-
Date Output Average
1-Jan 20 20
2-Jan 40 30
3-Jan 30 30
4-Jan 100 47.5
5-Jan 120 62
6-Jan 10 53.5
Any one can help please?

SELECT `date`, `output`,
(SELECT avg(`output`) from Table1 where Table1.`date` <= b.`date`)
as `average` FROM Table1 b
sqlfiddle here

Axel's answer works, alternatively, you can do it in a single query, with variables:
set #count := 0;
set #total := 0;
select case when ((#count := #count + 1) and ((#total := #total + output) or 1))
then #total / #count
end rolling_average,
`date`,
`output`
from data
order by `date` asc
http://sqlfiddle.com/#!9/2e006/14
This avoids the dependent subquery, which depending on the size of your data may result in better performance.

Pala's idea is a good idea. In addition to lacking the order by, it also fails if the cumulative sum were ever zero or if output where ever NULL. This can easily be fixed:
select `date`, `output`,
if((#count := #count + 1) is not null,
if((#total := #total + coalesce(output, 0)) is not null,
#total/#count, 0
), 0
) as running_average
from data cross join
(select #count := 0, #total := 0) init
order by date;

Here's another way, although Pala's method scales better...
SELECT x.*
, AVG(y.output) avg
FROM output x
JOIN output y
ON y.date <= x.date
GROUP
BY x.date
ORDER
BY x.date;
The order by clause is apparently necessary post version 5.5/5.6

Grouping top N rows, but also needing to sum column values

I am using the following query to group the top N rows in my data set:
SELECT mgap_ska_id,mgap_ska_id_name, account_manager_id,
mgap_growth AS growth,mgap_recovery,
(mgap_growth+mgap_recovery) total
FROM
(SELECT mgap_ska_id,mgap_ska_id_name, account_manager_id, mgap_growth,
mgap_recovery,(mgap_growth+mgap_recovery) total,
#acid_rank := IF(#current_acid = account_manager_id, #acid_rank + 1, 1)
AS acid_rank,
#current_acid := account_manager_id
FROM mgap_orders
ORDER BY account_manager_id, mgap_growth DESC
) ranked
WHERE acid_rank <= 5
and the result is VERY close to what I need, but I am having an aggregate issue that I need help with. I have attcached a screenshot of my query results (I had to block out the customer names and ids for privacy; the mgap_ska_id and account_manager_id are INT columns and the mgap_ska_id_name is a VARCHAR.
In theory I need to SUM (I know its an aggregate; that is the issue) multiple mgap_growth values while keeping the ranking in tact.
If I GROUP BY, then I lose the top 5 ranking. Currently, the mgap_growth value is only one value per mgap_ska_id within the mgap_growth column; I need it to be the SUM of all mgap_growth values per mgap_ska_id and keep the top five ranking as shown.
Thanks!

You can add the following line in your select fields:
(SELECT SUM(t.mgap_growth) FROM mgap_orders t WHERE t.mgap_ska_id = ranked.mgap_ska_id ) AS total_mgap_growth
So your code will be:
SELECT mgap_ska_id,mgap_ska_id_name, account_manager_id,
mgap_growth AS growth,mgap_recovery,
(mgap_growth+mgap_recovery) total,
(SELECT SUM(t.mgap_growth) FROM mgap_orders t WHERE t.mgap_ska_id = ranked.mgap_ska_id ) AS total_mgap_growth
FROM
(SELECT mgap_ska_id,mgap_ska_id_name, account_manager_id, mgap_growth,
mgap_recovery,(mgap_growth+mgap_recovery) total,
#acid_rank := IF(#current_acid = account_manager_id, #acid_rank + 1, 1)
AS acid_rank,
#current_acid := account_manager_id
FROM mgap_orders
ORDER BY account_manager_id, mgap_growth DESC
) ranked
WHERE acid_rank <= 5

GROUP_CONCAT or SUM with Count(*) inside and GROUP BY in MySQL

I currently have this query:
SELECT ((count(*) DIV 20) * 10) AS money_earned
FROM ticket
WHERE
closed_by = 269 AND
status = 1 AND
closed_index >= TO_DAYS("2012/01/01") AND closed_index <= TO_DAYS("2013/01/01")
GROUP BY closed_index;
It yields this:
money_earned
60
50
30
20
20
Is there any way to either sum these rows, or concatenate the resulting rows into a string. I have tried to use GROUP_CONCAT, but get an "invalid use of Group function" error.
Instead, I would like to yield the following in a single query if possible:
money_earned
180
or
money_earned
60,50,30,20,20

You can use your query as a subquery:
select sum(money_earned), group_concat(money_earned)
from (SELECT ((count(*) DIV 20) * 10) AS money_earned
FROM ticket
WHERE closed_by = 269 AND
status = 1 AND
closed_index >= TO_DAYS("2012/01/01") AND closed_index <= TO_DAYS("2013/01/01")
GROUP BY closed_index
) tci;
Because of the rounding, I would be wary of doing this in a single query.

Include an additional counter in the MySQL result set

Can I include an additional counter in a MySQL result set? I have the following query which gives me two columns back. I need an additional column (only in the result) indicating the row of each line in the result set.
select orderid, round(sum(unitprice * quantity),2) as value
from order_details
group by orderid
order by 2 desc
limit 10
I need something like the following:
10865 1 17250.00
11030 2 16321.90
10981 3 15810.00
10372 4 12281.20
10424 5 11493.20

Try this:
SET #counter = 0;
Select sub.*
FROM
(
select orderid, (#counter := #counter +1) as counter,
round(sum(unitprice * quantity),2) as value
from order_details
group by orderid
) sub
order by 2 desc

Try following
SET #counter = 0;
select orderid, (#counter:= #counter + 1) as counter, round(sum(unitprice * quantity),2) as value
from order_details
group by orderid
order by 3 desc
limit 10
Hope it helps...

Based on the two answers I managed to get the following:
SET #counter = 0;
Select sub.orderid,sub.value,(#counter := #counter +1) as counter
FROM
(
select orderid,
round(sum(unitprice * quantity),2) as value
from order_details
group by orderid
) sub
order by 2 desc
limit 10
The original answers showed the IDs from the inner query resulting in larger ints with huge gaps. Using the modification I get just the '1 to x' range that I need for my pgfplots LaTeX plot.

Checking for maximum length of consecutive days which satisfy specific condition

I have a MySQL table with the structure:
beverages_log(id, users_id, beverages_id, timestamp)
I'm trying to compute the maximum streak of consecutive days during which a user (with id 1) logs a beverage (with id 1) at least 5 times each day. I'm pretty sure that this can be done using views as follows:
CREATE or REPLACE VIEW daycounts AS
SELECT count(*) AS n, DATE(timestamp) AS d FROM beverages_log
WHERE users_id = '1' AND beverages_id = 1 GROUP BY d;
CREATE or REPLACE VIEW t AS SELECT * FROM daycounts WHERE n >= 5;
SELECT MAX(streak) AS current FROM ( SELECT DATEDIFF(MIN(c.d), a.d)+1 AS streak
FROM t AS a LEFT JOIN t AS b ON a.d = ADDDATE(b.d,1)
LEFT JOIN t AS c ON a.d <= c.d
LEFT JOIN t AS d ON c.d = ADDDATE(d.d,-1)
WHERE b.d IS NULL AND c.d IS NOT NULL AND d.d IS NULL GROUP BY a.d) allstreaks;
However, repeatedly creating views for different users every time I run this check seems pretty inefficient. Is there a way in MySQL to perform this computation in a single query, without creating views or repeatedly calling the same subqueries a bunch of times?

This solution seems to perform quite well as long as there is a composite index on users_id and beverages_id -
SELECT *
FROM (
SELECT t.*, IF(#prev + INTERVAL 1 DAY = t.d, #c := #c + 1, #c := 1) AS streak, #prev := t.d
FROM (
SELECT DATE(timestamp) AS d, COUNT(*) AS n
FROM beverages_log
WHERE users_id = 1
AND beverages_id = 1
GROUP BY DATE(timestamp)
HAVING COUNT(*) >= 5
) AS t
INNER JOIN (SELECT #prev := NULL, #c := 1) AS vars
) AS t
ORDER BY streak DESC LIMIT 1;

Why not include user_id in they daycounts view and group by user_id and date.
Also include user_id in view t.
Then when you are queering against t add the user_id to the where clause.
Then you don't have to recreate your views for every single user you just need to remember to include in your where clause.

That's a little tricky. I'd start with a view to summarize events by day:
CREATE VIEW BView AS
SELECT UserID, BevID, CAST(EventDateTime AS DATE) AS EventDate, COUNT(*) AS NumEvents
FROM beverages_log
GROUP BY UserID, BevID, CAST(EventDateTime AS DATE)
I'd then use a Dates table (just a table with one row per day; very handy to have) to examine all possible date ranges and throw out any with a gap. This will probably be slow as hell, but it's a start:
SELECT
UserID, BevID, MAX(StreakLength) AS StreakLength
FROM
(
SELECT
B1.UserID, B1.BevID, B1.EventDate AS StreakStart, DATEDIFF(DD, StartDate.Date, EndDate.Date) AS StreakLength
FROM
BView AS B1
INNER JOIN Dates AS StartDate ON B1.EventDate = StartDate.Date
INNER JOIN Dates AS EndDate ON EndDate.Date > StartDate.Date
WHERE
B1.NumEvents >= 5
-- Exclude this potential streak if there's a day with no activity
AND NOT EXISTS (SELECT * FROM Dates AS MissedDay WHERE MissedDay.Date > StartDate.Date AND MissedDay.Date <= EndDate.Date AND NOT EXISTS (SELECT * FROM BView AS B2 WHERE B1.UserID = B2.UserID AND B1.BevID = B2.BevID AND MissedDay.Date = B2.EventDate))
-- Exclude this potential streak if there's a day with less than five events
AND NOT EXISTS (SELECT * FROM BView AS B2 WHERE B1.UserID = B2.UserID AND B1.BevID = B2.BevID AND B2.EventDate > StartDate.Date AND B2.EventDate <= EndDate.Date AND B2.NumEvents < 5)
) AS X
GROUP BY
UserID, BevID

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

SQL - Percentiles - mysql

Related

current average for each row of data

Grouping top N rows, but also needing to sum column values

GROUP_CONCAT or SUM with Count(*) inside and GROUP BY in MySQL

Include an additional counter in the MySQL result set

Checking for maximum length of consecutive days which satisfy specific condition

Categories

Resources