How to compute the standard deviation with a "number" column? - mysql

I have this table in MySQL :
value number_ads
1 3
2 1
3 1
3 1
4 1
I would like to compute the standard deviation of the column value, but taking into account that the value 1 for example should be counted 3 times.
The result should be :
AVG = 2.1429 STD = 1.124858267715973
I tried with this following request, but I don't have the good result:
SELECT
SUM(value * number_ads) / SUM(number_ads) AS avg,
SQRT((SUM(POW(value, 2)) - POW(2.1429, 2))/SUM(number_ads))
FROM `test`

Calculate the square root of variance. Variance is the difference between mean of (squares of values) and square of mean i.e, Sum(xx)/Count(n) - MeanMean.
SELECT
SUM(value * number_ads) / SUM(number_ads) AS avg,
SQRT((SUM(POW(value ,2) * number_ads)/SUM(number_ads)) - avg * avg)
FROM `test`
Source

Related

SQL - Comparing difference between values in same column

I need some help with how to compare values in the same column LogNum to find 'unusual' entries. For example, in my table below LogTbl we can see that on ID number 4 the LogNum entry jumps massively compared to the previous pattern of entries.
How can I compare these LogNum entries and identify/output any that have increased by say more than 5% from the previous entry, using LogDate to age the entries?
ID
LogDate
LogNum
1
2006-05-26 00:00:00.000
112
2
2006-07-19 00:00:00.000
145
3
2006-09-08 00:00:00.000
162
4
2006-11-01 00:00:00.000
1787
Thanks.
There is no formal criteria for your words 'masssive' and 'unusual'. However I suppose that you may try to select the records where LogNum will be within the borders (LogNum >= MEAN(LogNum) - 2 * STDDEV(LogNum)) AND - LogNum <= MEAN(LogNum) + 2 * STDDEV(LogNum))
There's a wide interperetation of what your requirements are, one possible idea would be to identify a threshold using an average or standard deviation and filter rows that exceed the threshold.
with a as (
select *, avg(lognum) over() threshold
from t
)
select *
from a
where lognum > threshold
If you are interested in only differences between adjacent rows you could use lead, ie, find rows where the value is increases by >25% of the previous value
select Id, LogDate, Lognum
from (
select *, Lead(lognum) over(order by logdate) nxt
from t
)t
where nxt > lognum * 1.25

Creating buckets to categorize the values using MySQL

Hi I am trying to create a buckets for a very large number of rows. I have a maximum value of 9759721 and a minimum value of 1006909. I would like to show the results as following:
distance
bucket
range
1006909
0
1000000 - 1009999
1013525
1
1010000 - 1019999
1021948
2
1020000 - 1029999
The table might not be so clear but in general, I would like to break down them by a change of 10000. Creating a new bucket once every 10000 starting from 1000000.
I tried the following code but it doesn't show the correct output.
select distance,floor(distance/10000) as _floor from data;
I got something like:
distance
bucket
1006909
100
1013525
101
1021948
102
1035472
103
1042069
104
9759721
975
This seems to be correct but I need the bucket to start from 0 and then change based on 10000. And then have a range column as well. The minimum value that I have for distance is 1006909 and so the data doesn't start with 0 but is it possible to still have a bucket column starting from 0 [i.e assigned to minimum distance].
SELECT
d.distance,
DENSE_RANK() OVER (ORDER BY d._floor) - 1 AS bucket,
d._floor * 10000 AS bucket_lower_limit,
d._floor * 10000 + 10000 AS bucket_upper_limit
FROM
(
SELECT
distance,
FLOOR(distance / 10000) AS _floor
FROM
data
)
AS d
NOTE: the will give buckets numbered from 0 upwards, but will also remove all gaps (such that you sample data will have bucket 5 for the last row, not bucket 975)
Alternatively, if you need to preserve the gaps...
SELECT
d.distance,
d._floor - MIN(d._floor) OVER () AS bucket,
d._floor * 10000 AS bucket_lower_limit,
d._floor * 10000 + 10000 AS bucket_upper_limit
FROM
(
SELECT
distance,
FLOOR(distance / 10000) AS _floor
FROM
data
)
AS d
Just calculate 1006909 div 10000 * 10000 = 1000000 and subtract it from distance. That'll make the buckets start from 0:
SELECT distance
, (distance - a) div 10000 AS bucket
, distance div 10000 * 10000 AS range_from
, distance div 10000 * 10000 + (10000 - 1) AS range_to
FROM t
CROSS JOIN (
SELECT MIN(distance) div 10000 * 10000 AS a
FROM t
) AS x
SQL Fiddle

How to get previous row value that is a mathematical argument with alias in mysql query?

I got troubled in getting the getting the previous row value in the query. In the query there is a field percent that is a mathematical equation of ROUND(COUNT(DISTINCT userSteps.userid) / (SELECT COUNT(*) FROM userSteps) * 100.0, 2) AS percent
The idea is, in the percent field, I will get the current row and subtract to the next one and it will be the value of the delta field. on 1st row it should be 0 since there is no row before it. any idea?
I was planning to have (percent - <previous of percent value>) as delta
I tried to use LAG() but it got me some error of unknown column
SET #startDate='';
SET #endDate='';
SET #version='';
WITH userSteps AS
(SELECT ... )
SELECT steps.version as version,
steps.step as step,
steps.stepDesc as stepDesc,
COUNT(DISTINCT userSteps.userid) AS numUsers,
ROUND(COUNT(DISTINCT userSteps.userid) / (SELECT COUNT(*) FROM userSteps) * 100.0, 2) AS percent,
LAG(percent, 1) OVER (
PARTITION BY steps.version
ORDER BY steps.step
) delta
FROM
(SELECT ...) steps,
userSteps
WHERE userSteps.version = steps.version AND userSteps.maxStep >= steps.step
GROUP BY steps.version, steps.step;
this line that I insert got an issue:
LAG(percent, 1) OVER (
PARTITION BY steps.version
ORDER BY steps.step
) delta
I expect that it will look like this one.
version step stepdesc numUsers percent delta
1 1 .. 10 100 0
1 2 .. 5 98 2
1 3 .. 3 92 6
1 4 .. 4 90 2
1 5 .. 8 80 10
``

how to fix query mysql with multiple sum

I have a query data from sum function:
ROUND(((nominal)*12) * ROUND((SUM((a.NCI)/3*(60/100))+SUM((b.NSI)/3*(40/100)))/3,2)/100,2) AS nominal_persentasi,
ROUND((((nominal)*12) * ROUND((SUM((a.NCI)/3*(60/100))+SUM((b.NSI)/3*(40/100)))/3,2))*(1.1/100)/100,2) AS tambah_persentasi,
ROUND((((nominal)*12) * ROUND((SUM((a.NCI)/3*(60/100))+SUM((b.NSI)/3*(40/100)))/3,2))+((((nominal)*12) * ROUND((SUM((a.NCI)/3*(60/100))+SUM((b.NSI)/3*(40/100)))/3,2))*(1.1/100))/100,2) AS total_penyesuaian
And the results are:
nominal_persentasi | tambah_persentasi | total_penyesuaian
12.000 3.000 1.203.000
The results produced should be 15,000 , why did it happen ?
I tried to sum the variable nominal_persentasi + tambah_persentasi but the result is 0.
You are missing a division by 100 in your total. Hence, instead of adding 12,000 and 3,000 to get 15,000 you were actually adding 12,000,000 and 3,000 to get 12,003,000.
SELECT ROUND(( (nominal)*12) * ROUND((SUM((a.NCI)/3*(60/100))+SUM((b.NSI)/3*(40/100)))/3,2)/100,2) AS nominal_persentasi,
ROUND((((nominal)*12) * ROUND((SUM((a.NCI)/3*(60/100))+SUM((b.NSI)/3*(40/100)))/3,2))*(1.1/100)/100,2) AS tambah_persentasi,
ROUND((((nominal)*12) * ROUND((SUM((a.NCI)/3*(60/100))+SUM((b.NSI)/3*(40/100)))/3,2)/100) + ((((nominal)*12) * ROUND((SUM((a.NCI)/3*(60/100))+SUM((b.NSI)/3*(40/100)))/3,2))*(1.1/100))/100, 2) AS total_penyesuaian
FROM yourTable -- your query was missing this division by 100 ^^^

finding the percentage using case statement

The following SQL Query :
SELECT
SUM(aol_int) AS AOL,
SUM(android_phone_int) AS Android_Phone,
SUM(androidTablet_int) AS Android_Tablet
FROM mytable;
Is giving me the total values under AOL, Android_Phone and Android_Tablet columns
However, I am trying to get the percentage of AOL, Android_Phone and Android_Tablet and hence I decided to write the above query as follows:
SELECT
SUM(CASE WHEN 'aol_int' THEN 100 END) / Count(aol_int) AS AOL,
SUM(CASE WHEN ‘android_phone_int’ THEN 100 END) /Count(android_phone_int) AS Android_Phone,
SUM(CASE WHEN ‘androidTablet_int’ THEN 100 END)/Count(androidTablet_int) AS Android_Tablet
FROM mytable;
And I am getting NULL in each column. What am I doing wrong here?
If those 3 add up to 100 percent, then you can sumply divide each sum by the total sum to get the percentages. I get the idea that you want to see 20% as "20.00" as opposed to "0.2", so in my example I'll multiple the ration by 100 and round it off at a decimal precision of 2, but it's easy for you to change those as you see fit.
select
round((sum(aol_int) /
(sum(aol_int + android_phone_int + androidtablet_int))) *
100,2) as aol_percentage,
round((sum(android_phone_int) /
(sum(aol_int + android_phone_int + androidtablet_int))) *
100,2) as android_phone_percentage,
round((sum(androidtablet_int) /
(sum(aol_int + android_phone_int + androidtablet_int))) *
100,2) as android_tablet_percentage
from mytable