How to get relative counts/frequency in mySQL with a single query - mysql

I want to get relative counts/frequency of values (can be many) in the column.
From this toy table numbers:
num
1
2
3
1
1
2
1
0
This one:
num | count
0 | 0.125
1 | 0.5
2 | 0.25
3 | 0.125
I can do this with a variable and two queries:
SET #total = (SELECT COUNT(*) FROM numbers);
SELECT num, ROUND(COUNT(*) / #total, 3) AS count
FROM numbers
GROUP BY num
ORDER BY num ASC
But how I can get the results in one query (without listing all the possible values of num)?
If I am querying joins of several tables, then even getting a total number of rows becomes quite long and ugly.

EDIT: This is tested in msSql, misread question!
You can try this:
--DROP TABLE numbers
CREATE TABLE numbers(num decimal(16,3))
INSERT INTO numbers VALUES(1)
INSERT INTO numbers VALUES(2)
INSERT INTO numbers VALUES(3)
INSERT INTO numbers VALUES(1)
INSERT INTO numbers VALUES(1)
INSERT INTO numbers VALUES(2)
INSERT INTO numbers VALUES(1)
INSERT INTO numbers VALUES(0)
SELECT
num,
CAST(numCount as DECIMAL(16,2)) / CAST(sum(numCount) over() AS decimal(16,2)) frequency
FROM (
SELECT
num,
count(num) numCount
FROM
numbers
GROUP BY
num
) numbers
num frequency
0.000 0.1250000000000000000
1.000 0.5000000000000000000
2.000 0.2500000000000000000
3.000 0.1250000000000000000

You can use windowing functions -
SELECT DISTINCT num,
ROUND(CAST(COUNT(1) OVER (Partition by num) AS DECIMAL) / CAST(COUNT(1)OVER() AS DECIMAL),3) AS [count]
FROM numbers
ORDER BY num ASC
COUNT(num) would give the same results, it's personal preference for me to count a supplied value per row rather than counting the value in the rows, the partitioning handles which rows are included in the count.
Note the counts need to be cast as decimal, otherwise your division will be integer division, giving you wrong numbers.
Using DISTINCT instead of GROUP lets your windowing function apply to the whole table, not just each group within that table, and still only returns one result per num.
SQLFiddle

This is about the same number of keystrokes, and about the same performance, but it is only one statement:
SELECT n.num, ROUND(COUNT(*) / t.total, 3) AS count
FROM ( SELECT COUNT(*) AS total FROM numbers ) AS t
JOIN numbers AS n
GROUP BY n.num
ORDER BY n.num ASC

Related

Select minimal count of rows with total sum greater than or equal to a given threshold

I have an sql table trade with following data
id| value | price
1| 0.1 |1
2| 0.5 |2
3| 0.9 |2
4| 0.3 |2
How do I make an SQL query so that I get the count of entries limited to total value of 0.9 for price 2 ascending by id . For example:-
Select Count of id FROM trade WHERE sum(value) <= 0.9 and price = '2'
Result should come as
3
As there are total 3 entries with id 2,3,4 with values 0.5,0.9,0.3 with price 2 . The sum of them is 1.7 which is more than 0.9 but if we take 0.5 and 0.3 it combines to 0.8 which is lesser than 0.9 . So result should be 3 as it consists value of atleast 0.9 with price 2.
Also is there way to get the id of the results with last and first highest value in order.
So it looks like :-
4
2
3
Help appreciated :)
select id from
(select id, if(not(#sum > 0.9), 1, 0) mark, (#sum:=#sum+value) as sum
from trade cross join (select #sum:=0) s
where price=2 order by value asc) t
where mark =1
The inner query counts cumulative sum and addional field mark, which is equal one while sum is less and turn into zero when it is over 0.9. Since it's working one step later, it gathers the first row where sum is above the limit.
The result of the inner select
id mark sum
4 1 0.30000001192092896
2 1 0.800000011920929
3 1 1.699999988079071
Now in the outer query you just need to select rows with mark equal 1. And it results in 4,2,3
demo on sqlfiddle
You can achieve this by using a temporary SQL variable which stores partial sums and used rows count. Note that there are two SQL statements:
SET #tempsum := 0, #rowscount := 0;
SELECT MIN(tempcount) FROM
(SELECT
(#rowscount := #rowscount + 1) AS tempcount,
(#tempsum := #tempsum + value) AS tempsum
FROM trade WHERE
price='2'
ORDER BY value
) AS partialsums
WHERE tempsum>=0.9;
This way the partial sums are built only once. Without variables, you would need to build another subquery which builds multiple partial sums.
Here the SQL Fiddle: http://sqlfiddle.com/#!9/38fc2/11/1
See also: Create a Cumulative Sum Column in MySQL
You may also use variables to store the IDs involved, i.e.:
SET #tempsum := 0, #rowscount := 0, #ids := '';
SELECT MIN(tempcount), tempids FROM
(SELECT
(#tempsum := #tempsum + value) AS tempsum,
(#rowscount := #rowscount + 1) AS tempcount,
(#ids := concat(#ids, ' ', id)) AS tempids
FROM trade WHERE
price='2'
ORDER BY value
) AS partialsums
WHERE tempsum>=0.9;
See Fiddle: http://sqlfiddle.com/#!9/38fc2/33/0
If you need the count of the distinct ID values you could use count(distinct id)
and do the fact you checking for an aggregated result (sum() ..) you should use having and not where
Select Count(distinct id )
FROM trade
where price = 2
HAVING sum(value) <= 0.9
if you want the count for the rows with ID not null theb you could use count(id)
Select Count(id )
FROM trade
where price = 2
HAVING sum(value) <= 0.9
NB you are using price as a string

SQL Sort By Index sum from Multiple Results By Adding Index corresponding to the same key

I am having difficulty sorting results of multiple queries. The queries I'm running return a sorted list with zipcode as the key. So, for example, 1 query will return sorted zipcodes where crime rate is low, with lowest zipcode having a 1 as its index, then a query that returns zipcodes where average salary is over 100k or less with closest to 100k being index of 1.
Say I have 6 or more similar queries. How can I then sort zipcodes by sum of indices from all queries?
Example queries im running :
SELECT DISTINCT s1.Zip_Code, s1.Median_Value
FROM NJ_Housing_Expenses s1, NJ_Housing_Expenses s2
WHERE s1.Median_Value < 100000 AND s1.Zip_Code NOT IN (
SELECT Zip_Code
FROM NJ_Housing_Expenses
WHERE Median_Value = 0
)
ORDER BY Median_Value DESC
and
SELECT City, (((Violent_Crime*4) + Property_Crime)/Population) as CrimeSum
From NJ_Crime_Statistics
where Date = 2016
Group By City
Order by CrimeSum ASC
OUTPUT
1 08754
2 08234
3 07332
4 09563
then
1 08754
2 07332
3 09563
4 08234
Then is sorted by adding index
1 08754 (2)
2 07332 (5)
3 08234 (6)
4 09563 (7)
Sounds like you want to "number" the rows in each query. We could use a MySQL user-defined variable to do that.
We can wrap a suitable query in parens, and reference it as an inline view (in place of a table). As a demonstration.
SELECT q1.Zip_code
, #q1_rn := #q1_rn + 1 AS rn
FROM ( SELECT #q1_rn := 0 ) i
CROSS
JOIN (
-- source query here as inline view
SELECT s1.Zip_Code
, ...
FROM ...
ORDER BY Median_Value DESC
) q1
ORDER BY q1.Median_Value DESC
We can do the same thing for another query, but use a different user-defined variable
SELECT q2.Zip_code
, #q2_rn := #q2_rn + 1 AS rn
FROM ( SELECT #q2_rn := 0 ) i
CROSS
JOIN (
-- inline view query here
) q2
ORDER BY q2.CrimeSum ASC
We can combine the results of those queries with a UNION ALL set operator, and reference that whole thing as an inline view,
SELECT t.Zip_code
, SUM(t.rn) AS tot_rn
FROM (
(
-- first query from above goes here
)
UNION ALL
(
-- second query from above goes here
)
UNION ALL
(
-- third query
)
UNION ALL
(
-- fourth query
)
) t
GROUP BY t.Zip_code
ORDER BY tot_rn ASC
Add a GROUP BY to collapse all of the rows with the same Zip_Code (the first column returned by each of the source queries... each query should return exactly two columns... Zip_code and rn.
We use a SUM() aggregate to total up the values of rn, giving a total for each Zip_Code.

Mysql Ranking Query on 2 columns

Table
id user_id rank_solo lp
1 1 15 45
2 2 7 79
3 3 17 15
How can I sort out a ranking query that sorts on rank_solo ( This ranges from 0 to 28) and if rank_solo = rank_solo , uses lp ( 0-100) to further determine ranking?
(If lp = lp, add a ranking for no tie rankings)
The query should give me the ranking from a certain random user_id. How is this performance wise on 5m+ rows?
So
User_id 1 would have ranking 2
User_id 2 would have ranking 3
User_id 3 would have ranking 1
You can get the ranking using variablesL
select t.*, (#rn := #rn + 1) as ranking
from t cross join
(select #rn := 0) params
order by rank_solo desc, lp;
You can use ORDER BY to sort your query:
SELECT *
FROM `Table`
ORDER BY rank_solo, lp
I'm not sure I quite understand what you're saying. With that many rows, create a query on the fields you're using to do your selects. For example, in MySQL client use:
create index RANKINGS on mytablename(rank_solo,lp,user_id);
Depending on what you use in your query to select the data, you may change the index or add another index with a different field combination. This has improved performance on my tables by a factor of 10 or more.
As for the query, if you're selecting a specific user then could you not just use:
select rank_solo from table where user_id={user id}
If you want the highest ranking individual, you could:
select * from yourtable order by rank_solo,lp limit 1
Remove the limit 1 to list them all.
If I've misunderstood, please comment.
An alternative would be to use a 2nd table.
table2 would have the following fields:
rank (auto_increment)
user_id
rank_solo
lp
With the rank field as auto increment, as it's populated, it will automatically populate with values beginning with "1".
Once the 2nd table is ready, just do this when you want to update the rankings:
delete from table2;
insert into table2 select user_id,rank_solo,lp from table1 order by rank_solo,lp;
It may not be "elegant" but it gets the job done. Plus, if you create an index on both tables, this query would be very quick since the fields are numeric.

One MySQL query to get AVG by different Groupings?

Wondering is there is a way to write the following in ONE MySQL query.
I have a table:
cust_ID | rpt_name | req_secs
In the query I'd like to get:
the AVG req_secs when grouped by cust_ID
the AVG req_secs when grouped by rpt_name
the total req_secs AVG
I know I can do separate grouping queries on the same table then UNION the results into one. But I was hoping there was some way to do it in one query.
Thanks.
Well, the following would does two out of three:
select n,
(case when n = 1 then cast(cust_id as varchar(255)) else rpt_name end) as grouping,
avg(req_secs)
from t cross join
(select 1 as n union all select 2
) n
group by n, (case when n = 1 then cust_id else rpt_name end);
This essentially "doubles" the data and then does the aggregation for each group. This assumes that cust_id and rpt_name are of compatible types. (The query could be tweaked if this is not the case.)
Actually, you can get the overall average by using rollup:
select n,
(case when n = 1 then cust_id else rpt_name end) as grouping,
avg(req_secs)
from t cross join
(select 1 as n union all select 2
) n
group by n, (case when n = 1 then cast(cust_id as varchar(255)) else rpt_name end) with rollup
This works for average because the average is the same on the "doubled" data as for the original data. It wouldn't work for sum() or count().
No there is not. You can group by a combination of cust_ID and rpt_name at the same time (i.e. two levels of grouping) but you are not going to be able to do separate top-level groupings and then a non-grouped aggregation at the same time.
Because of the way GROUP BY works, the SQL to do this is a little tricky. One way to get the result is to get three copies of the rows, and group each set of rows separately.
SELECT g.gkey
, IF(g.grp='cust_id',t.cust_ID,IF(g.grp='rpt_name',t.rpt_name,'')) AS gval
, AVG(t.req_secs) AS avg_req_secs
FROM (SELECT 'cust_id' AS gkey UNION ALL SELECT 'rpt_name' UNION ALL SELECT 'total') g
CROSS
JOIN mytable t
GROUP
BY g.gkey
, IF(g.grp='cust_id',t.cust_ID,IF(g.grp='rpt_name',t.rpt_name,''))
The inline view aliased as "g" doesn't have to use UNION ALL operators, you just need a rowset that returns exactly 3 rows with distinct values. I just used the UNION ALL as a convenient way to return three literal values as a rowset, so I could join that to the original table.

mysql query to generate a commision report based on referred members

A person gets a 10% commision for purchases made by his referred friends.
There are two tables :
Reference table
Transaction table
Reference Table
Person_id Referrer_id
3 1
4 1
5 1
6 2
Transaction Table
Person_id Amount Action Date
3 100 Purchase 10-20-2011
4 200 Purchase 10-21-2011
6 400 Purchase 12-15-2011
3 200 Purchase 12-30-2011
1 50 Commision 01-01-2012
1 10 Cm_Bonus 01-01-2012
2 20 Commision 01-01-2012
How to get the following Resultset for Referrer_Person_id=1
Month Ref_Pur Earn_Comm Todate_Earn_Comm BonusRecvd Paid Due
10-2011 300 30 30 0 0 30
11-2011 0 0 30 0 0 30
12-2011 200 20 50 0 0 50
01-2012 0 0 50 10 50 0
Labels used above are:
Ref_Pur = Total Referred Friend's Purchase for that month
Earn_Comm = 10% Commision earned for that month
Todate_Earn_Comm = Total Running Commision earned upto that month
MYSQL CODE that i wrote
SELECT dx1.month,
dx1.ref_pur,
dx1.earn_comm,
( #cum_earn := #cum_earn + dx1.earn_comm ) as todate_earn_comm
FROM
(
select date_format(`date`,'%Y-%m') as month,
sum(amount) as ref_pur ,
(sum(amount)*0.1) as earn_comm
from transaction tr, reference rf
where tr.person_id=rf.person_id and
tr.action='Purchase' and
rf.referrer_id=1
group by date_format(`date`,'%Y-%m')
order by date_format(`date`,'%Y-%m')
)as dx1
JOIN (select #cum_earn:=0)e;
How to join the query to also include BonusRecvd,Paid and Due trnsactions, which is not dependent on reference table?
and also generate row for the month '11-2011', even though no trnx occured on that month
If you want to include commission payments and bonuses into the results, you'll probably need to include corresponding rows (Action IN ('Commision', 'Cm_Bonus')) into the initial dataset you are using to calculate the results on. Or, at least, that's what I would do, and it might be like this:
SELECT t.Amount, t.Action, t.Date
FROM Transaction t LEFT JOIN Reference r ON t.Person_id = r.Person_id
WHERE r.Referrer_id = 1 AND t.Action = 'Purchase'
OR t.Person_id = 1 AND t.Action IN ('Commision', 'Cm_Bonus')
And when calculating monthly SUMs, you can use CASE expressions to distinguish among Amounts related to differnt types of Action. This is how the corresponding part of the query might look like:
…
IFNULL(SUM(CASE Action WHEN 'Purchase' THEN Amount END) , 0) AS Ref_Pur,
IFNULL(SUM(CASE Action WHEN 'Purchase' THEN Amount END) * 0.1, 0) AS Earn_Comm,
IFNULL(SUM(CASE Action WHEN 'Cm_Bonus' THEN Amount END) , 0) AS BonusRecvd,
IFNULL(SUM(CASE Action WHEN 'Commision' THEN Amount END) , 0) AS Paid
…
When calculating the Due values, you can initialise another variable and use it quite similarly to #cum_earn, except you'll also need to subtract Paid, something like this:
(#cum_due := #cum_due + Earn_Comm - Paid) AS Due
One last problem seems to be missing months. To address it, I would do the following:
Get the first and the last date from the subset to be processed (as obtained by the query at the beginning of this post).
Get the corresponding month for each of the dates (i.e. another date which is merely the first of the same month).
Using a numbers table, generate a list of months covering the two calculated in the previous step.
Filter out the months that are present in the subset to be processed and use the remaining ones to add dummy transactions to the subset.
As you can see, the "subset to be processed" needs to be touched twice when performing these steps. So, for effeciency, I would insert that subset into a temporary table and use that table, instead of executing the same (sub)query several times.
A numbers table mentioned in Step #3 is a tool that I would recommend keep always handy. You would only need to initialise it once, and its uses for you may turn out numerous, if you pardon the pun. Here's but one way to populate a numbers table:
CREATE TABLE numbers (n int);
INSERT INTO numbers (n) SELECT 0;
INSERT INTO numbers (n) SELECT cnt + n FROM numbers, (SELECT COUNT(*) AS cnt FROM numbers) s;
INSERT INTO numbers (n) SELECT cnt + n FROM numbers, (SELECT COUNT(*) AS cnt FROM numbers) s;
INSERT INTO numbers (n) SELECT cnt + n FROM numbers, (SELECT COUNT(*) AS cnt FROM numbers) s;
INSERT INTO numbers (n) SELECT cnt + n FROM numbers, (SELECT COUNT(*) AS cnt FROM numbers) s;
INSERT INTO numbers (n) SELECT cnt + n FROM numbers, (SELECT COUNT(*) AS cnt FROM numbers) s;
INSERT INTO numbers (n) SELECT cnt + n FROM numbers, (SELECT COUNT(*) AS cnt FROM numbers) s;
INSERT INTO numbers (n) SELECT cnt + n FROM numbers, (SELECT COUNT(*) AS cnt FROM numbers) s;
INSERT INTO numbers (n) SELECT cnt + n FROM numbers, (SELECT COUNT(*) AS cnt FROM numbers) s;
/* repeat as necessary; every repeated line doubles the number of rows */
And that seems to be it. I will not post a complete solution here to spare you the chance to try to use the above suggestions in your own way, in case you are keen to. But if you are struggling or just want to verify that they can be applied to the required effect, you can try this SQL Fiddle page for a complete solution "in action".