I need to create a view to get the sum of the count column and display the average as a new column. I used the below code.
select
count(`t`.`surg_priority`) AS `Surgery_Count`,
`t`.`surg_priority` AS `Surgery_Type`
from
`DataBase`.`booking` t
group by
`t`.`surg_priority
This was the result. I need a new column called average to get the total of surgery Avg.
Surgery Average = (Surgery Count / Sum of Surgery Count) * 100
I also tried
select
count(`t`.`surg_priority`) AS `Surgery_Count`,
`t`.`surg_priority` AS `Surgery_Type`,
(
(count(`t`.`surg_priority`)/(sum(Surgery_Count))
)* 100 AS `Surgery_AVG`
from
`DataBase`.`orbkn_booking` t
group by
`t`.`surg_priority`
This too didn't work. Make sure this is a view. Cant use variable or cumulative functions
You can compute the total count in a subquery and divide the Surgery_Count by that:
select
count(`t`.`surg_priority`) AS `Surgery_Count`,
`t`.`surg_priority` AS `Surgery_Type`,
100.0 * count(`t`.`surg_priority`) /
(select count(`surg_priority`) from `DataBase`.`booking`) AS `Surgery_Avg`
from
`DataBase`.`booking` t
group by
`t`.`surg_priority
Use window functions:
select b.surg_priority as Surgery_Type, count(*) as surgery_count,
count(*) * 100.0 / sum(count(*)) over () as ratio
from DataBase.booking b
group by b.surg_priority;
These have been available since MySQL 8+ was released.
Also, don't clutter your queries with backticks. They just make queries harder to write and read.
Related
I am having table name as "Table1" in mysql.I have to find Sum of Mean and Std dev on column "Open".I did it easily using python but I am unable to do it using sql.
Select * from BANKNIFTY_cal_spread;
Date Current Next difference
2021-09-03 00:00:00 36914.8 37043.95 129.14999999999418
2021-09-06 00:00:00 36734 36869.15 135.15000000000146
2021-09-07 00:00:00 36572.9 36710.65 137.75
2021-09-08 00:00:00 36945 37065 120
2021-09-09 00:00:00 36770 36895.1 125.09999999999854
Python Code-
nf_fut_mean = round(df['difference'].mean())
print(f"NF Future Mean: {nf_fut_mean}")
nf_fut_std = round(df['difference'].std())
print(f"NF Future Standard Deviation: {nf_fut_std}")
upper_range = round((nf_fut_mean + nf_fut_std))
lower_range = round((nf_fut_mean - nf_fut_std))
I search for Sql solution but I didn't get it. I tried building query but it's not showing correct results in query builder in grafana alerting.
Now I added Mean column ,std dev column , upper_range and lower_range column using python dataframe and pushed to mysql table.
#Booboo,
After removing Date from SQL Query, it's showing correct results in two columns- average + std_deviation and average - std_deviation.
select average + std_deviation, average - std_deviation from (
select avg(difference) as average, stddev_pop(difference) as std_deviation from BANKNIFTY_cal_spread
) sq
It looks as though the sample you're using for the aggregations for MEAN, STDDEV, etc is the entire table - in which case you have to drop the DATE field from the query's result set.
You could also establish the baseline query using a CTE (Common Table Expression) using a WITH statement instead of a subquery, and then apply the subsequent processing:
WITH BN_CTE AS
(
select avg(difference) as average, stddev_pop(difference) as std_deviation from BANKNIFTY_cal_spread
)
select average + std_deviation, average - std_deviation from BN_CTE;
With the data you posted having only a single Open column value for any given Date column value, you standard deviation should be 0 (and the average just that single value).
I am having difficulty in understanding your SQL since I cannot see how it relates to finding the sum (and presumably the difference, which you also seem to want) of the average and standard deviation of column Open in table Table1. If I just go by your English-language description of what you are trying to do and your definition of table Table1, then the following should work. Note that since we want both the sum and difference of two values, which are not trivial to calculate, we should calculate those two values only once:
select Date, average + std_deviation, average - std_deviation from (
select Date, avg(Open) as average, stddev_pop(Open) as std_deviation from Table1
group by Date
) sq
order by Date
Note that I am using column aliases in the subquery that do not conflict with built-in MySQL function names.
SQL does not allow both calculating something in the SELECT clause and using it. (Yes, #variables allow in limited cases; but that won't work for aggregates in the way hinted in the Question.)
Either repeat the expressions:
SELECT average(difference) AS mean,
average(difference) + stddev_pop(difference) AS "mean-sigma",
average(difference) - stddev_pop(difference) AS "mean+sigma"
FROM BANKNIFTY_cal_spread;
Or use a subquery to call the functions only once:
SELECT mean, mean-sigma, mean+sigma
FROM ( SELECT
average(difference) AS mean,
stddev_pop(difference) AS sigma
FROM BANKNIFTY_cal_spread
) AS x;
I expect the timings to be similar.
And, as already mentioned, avoid using aliases that are identical to function names, etc.
I have two query to calculate some attributes from table- 'agg_table'.The second one is basically to find out median value grouped by msgdate.My expected output should have these 5 fields:
msgdate,avg-Total,avg-duration,stddev and median. Currently I am doing by using UNION which works fine. I will execute this query in AWS Athena. To calculate median since the second query is accessing agg_data again, the data scan is being doubled, lets say input data size is 4 mb and in the Athena history page I can see data scanned is 8 mb.
I want to avoid the data scan in second time to save cost. Can you please help me to acheive this by calling agg_data table one time only?
Query 1: To calculate avg-Total,avg-duration,stddev
SELECT b.msgdate1 as msgdate,ROUND(b.avrg,3) AS avg-Total,
ROUND(AVG(b.duration),3) AS avg-duration,ROUND(b.stdv,3) AS stddev
FROM
(
SELECT AVG(a2.duration) OVER(PARTITION BY a2.msgdate) AS avrg, a2.duration as duration,a2.msgdate msgdate1,
CASE
WHEN stddev(a2.duration) OVER(PARTITION BY a2.msgdate) IS NULL THEN 0
ELSE stddev(a2.duration) OVER(PARTITION BY a2.msgdate)
END AS stdv
FROM (
agg_data
) a2
) AS b
Query 2: To calculate median
WITH RankedTable AS
(
SELECT msgdate, duration,
ROW_NUMBER() OVER (PARTITION BY msgdate ORDER BY duration) AS Rnk,
COUNT(*) OVER (PARTITION BY msgdate) AS Cnt
FROM agg_data
)
SELECT msgdate,duration as median
FROM RankedTable
WHERE Rnk = Cnt / 2 + 1 or Cnt=1
I'm sure there's some trick that could do what you ask, but it will not be the easiest thing to do with all the window functions – combining these is always complicated.
If you can live with an approximation you could use the approx_percentile function – approx_percentile(column, 0.5) will be an approximation of the median. This can be used in your first query, avoiding the need for the second.
MySQL version 8.0
Hi say I have a table that looks like this:
Which I got after groupby count operation:
select number
, count(number) as `count`
FROM (select *
, CASE
WHEN column = 0 Then 0 Else 1
END AS number
FROM table) t1;
result in:
number count
0 100
1 900
Now for each number I want to add a column that gives corresponding percentage.
Desired:
number count percentage
0 100 10
1 900 90
Thanks in advance!
If you are running MySQL 8.0, you can use window functions:
select
t.*,
count / sum(count) over() ratio
from mytable t
In earlier versions, an option uses a subquery:
select
t.*,
count / (select sum(count) from mytable) ratio
from mytable t
This gives you a ratio between 0 and 1; you can multiply it by 100 if you want a percentage.
Note that, if you are getting your original resultset from a query, it is very likely that this can be furthermore optimized. You might want to ask a new question, disclosing your original table(s) and query.
In terms of your original query, it would be something like:
select number, count(*), count(*) / sum(count(*)) over () as ratio
from t
group by number;
If you want the percentage rather than ratio than multiply by 100.
I'm building a query that searches through a Medicare database listing how much doctors charge for various procedures.
Ideally, this query would:
Return every record, meaning every procedure for every doctor. (I'll add filtering WHERE clauses later)
Return the average amount doctors charge for each procedure
Return the percentage difference between the average cost and what each individual doctor charges
Return the average of all those percentage differences for each doctor, generating a meta cost-differential score.
With the query below, I've been able to achieve everything but the last goal.
SELECT medicare.*,
peerAverage.average AS charge_average,
( medicare.average_submitted_chrg_amt - peerAverage.average ) /
peerAverage.average * 100 AS difference_from_average,
Avg( ( medicare.average_submitted_chrg_amt - peerAverage.average ) /
peerAverage.average * 100 ) as total_difference_from_average
FROM medicare
JOIN (SELECT Avg(average_submitted_chrg_amt) AS average,
procedure_code
FROM medicare
GROUP BY procedure_code) AS peerAverage
ON medicare.procedure_code = peerAverage.procedure_code
ORDER BY procedure_code ASC,
difference_from_average DESC
When I add the final SELECT condition (Avg( ( medicare.average_submitted_chrg_amt - peerAverage.average ) / peerAverage.average * 100 ) as total_difference_from_average), the query only returns one record.
Delete that condition and the query returns the correct number of records. What am I doing wrong?
Aggregation functions move the aggregation level up. Until you have specified the grouping conditions for the average function, it will always return one row aggregated over all values, returned by the expression
I'm trying to calculate a percentage of customers that live within a specified range (<=10miles) from the total number of customers, the data comes from the same table.
Customer_Loc (table name)
Cust_ID | Rd_dist (column names)
I've tried the query below which returns syntax errors.
select count(Rd_dist) / count (Cust_ID)
from customer_loc
where Rd_Dist <=10 *100 as percentage
I realise the solution to this may be fairly easy but I'm new to SQL and it's had me stuck for ages.
The problem with your query is that you are filtering out all the customers who are more than 10 miles away. You need conditional aggregation, and this is very easy in MySQL:
select (sum(Rd_Dist <= 10) / count(*)) * 100 as percentage
from customer_loc;