MySQL double averaging with double grouping - mysql

I have the following data in a MySQL table called test
I run the following SQL query
SELECT user_id,
group_id,
sum(value) as total_present,
avg(value)*100 as attendance_percentage
FROM test t
WHERE t.session_date BETWEEN '2017-10-01' AND '2017-10-15'
GROUP BY user_id
This gives me percentages for each user_id like this:
If you look at the output example above, user_id 1 and 2 are in the same group_id. So is there a way for me to further group my query to then take an average of the same group_id's. So for the example above, the group_id 3 percentage should be 70.83335

Khalid answer is OK. But I think you should consider the problem you are averaging different things with different magnitude. user_id = 2 has more values than user_id = 1 so his percentage should weight more.
For example if user_id = 3 only went once with 100% attendance that will distort the avg.
You should do:
SELECT group_id, avg(value)
FROM yourTable
GROUP BY group_id
In this case the AVG() is 71.42 instead of 70.83

You can by applying further aggregation on your query
SELECT t.group_id, avg(t.attendance_percentage) as t.group_attendance_percentage
FROM(
SELECT user_id, group_id, sum(value) as total_present, avg(value)*100 as attendance_percentage
FROM test t
WHERE t.session_date BETWEEN '2017-10-01' AND '2017-10-15'
GROUP BY user_id
) t
GROUP BY t.group_id

Related

why UNION ALL command in mysql doesn't give back any results?

I am trying to merge two queries into one, but UNION is not working for me.
Here is the code:
SELECT
Customer_A,
Activity,
Customer_P,
Purchase
FROM (
SELECT
buyer_id as Customer_A,
COUNT(buyer_id) As Activity
FROM
customer_info_mxs
GROUP BY buyer_id
UNION ALL
SELECT
buyer_id as Customer_P,
SUM(purchase_amount) As Purchase
FROM
customer_info_mxs
GROUP BY buyer_id
)sub
I expect to have 4 columns as a result, but I get 2 instead (Customer_A) and(Activity).
If the query is supposed to return a list of customers, their number of purchases, and the total amount they’ve spent, then you can use a single query like this:
SELECT mxs.buyer_id as Customer,
COUNT(mxs.purchase_id) As Activity,
SUM(mxs.purchase_amount) As Purchases
FROM customer_info_mxs mxs
GROUP BY mxs.buyer_id;
Otherwise, your first subquery will always be a buyer_id and a value of 1.
Be sure to change purchase_id to whatever the unique id is for each purchase if you wish to see that number.
I think there is some confusion about the union statement. The union statement returns a row set that is the sum of all of the 'unioned' queries; since these queries have only 2 columns, the combined output only has two columns. The fact that the columns have different names is irrelevant. The column names in the output are being applied from the first query of the union.
One option is to just do
select buyer_id, count(buyer_id), sum(purchase_amount) from customer_info_mxs group by buyer_id
From your question, it looks like you are trying to do a pivot, turning some of the rows into additional columns. That could be done with ... some difficulty.
i read your comment,
'main goal is to creat a dataset in which returns 5 columns as: Customer_A, Activity (top 100), customer_P, Purchase(top 100), inner join of activity and purchase'
please try this query
SET #row_number = 0, #row_number2 = 0;
SELECT t1.Customer_A,t1.Activity, t2.Customer_P, t2.Purchase
from (
SELECT (#row_number:=#row_number + 1) AS n, t.Customer_a, t.Activity
from (
select buyer_id as Customer_A,COUNT(buyer_id) As Activity
FROM customer_info_mxs
GROUP BY buyer_id
order by Activity desc
Limit 100
)t
) t1
left join (
SELECT (#row_number2:=#row_number2 + 1) AS n,
FROM (
select buyer_id as Customer_P, SUM(purchase_amount) Purchase
FROM customer_info_mxs
GROUP BY buyer_id
order by Purchase desc
Limit 100
)t
) t2 on t2.n=t1.n
basic idea is, i just create some temporary number 0-99 to table 1 (t1) and join to temporary number on table 2 (t2)

sql return most prevalent column value

I'm a beginner at SQL, how do I get a query which returns the most prevalent column value? Probably there is an answer somewhere but I don't know how to google it.
For example in the user_id column the query should return the value 1 because this is the most prevalent number.
One approach is to do a GROUP BY aggregation and then apply a LIMIT trick:
SELECT user_id, COUNT(*) AS cnt
FROM yourTable
GROUP BY user_id
ORDER BY COUNT(*) DESC
LIMIT 1;
If you want something more complex, then you would be getting into the realm of rank functionality. MySQL (at least as of the current release) does not support built-in rank support, so it can be tricky to perform such queries.
SELECT top 1 user_id, COUNT(*) AS cnt
FROM yourTable
GROUP BY user_id
ORDER BY COUNT(*) DESC
Have a common table expression that counts each user_id. Select user_id where the count is the max count. Will return both user_id's in case of a tie.
with cte as
(
SELECT user_id, COUNT(*) AS cnt
FROM yourTable
GROUP BY user_id
)
select user_id
from cte
where cnt = (select max(cnt) from cte)

MySQL and average calculation giving different results

I've created a test table called test with some sample data that looks like the following:
if I add-up each user's value and divide it by the total number of entries there is for that user_id and multiply it by 100, I get a percentage. So for example
for user_id 1 there are a total of 6 records and the values add up to 4. So 4/6*100 = 66.67%
for user_id 2 there are a total of 8 records and the values add up to 6. So 6/8*100 = 75%
I am able to get these values using the following SQL:
SELECT a.user_id, a.total_present / (SELECT count(*) as total_sessions FROM test WHERE session_date BETWEEN '2017-10-01' AND '2017-10-15' AND user_id = a.user_id) * 100 AS attendance_percentage, a.total_present
FROM (
SELECT user_id, count(*) as total_present
FROM test t
WHERE t.session_date BETWEEN '2017-10-01' AND '2017-10-15' AND t.value=1
GROUP BY user_id
) a ORDER BY a.user_id ASC
If I add the percentages up and divide by how many users then I get an average of 70.83%
I was thinking to simplify this simply just by summing up all the records and divide by total number of records and multiplying by 100 like this:
SELECT ((SELECT count(*) as total_present FROM test t WHERE t.session_date BETWEEN '2017-10-01' AND '2017-10-15' AND t.value=1) / (SELECT count(*) as total_sessions FROM test WHERE session_date BETWEEN '2017-10-01' AND '2017-10-15')) * 100 AS average_percentage
This however does not give me the same average calculation because each user has a different number of total sessions. If both users had the same amount of sessions then it matches.
So the question is, is there a way to calculate this using my simplified method but get the result more accurate to the one where I am getting each user percentage and then adding it up and dividing by total users?
You are complicating your query. To get the attendance_percentage by user, you can simply take the avg of value(since its already in binary) and multiply it by 100. Here is the query for that.
SELECT user_id, sum(value) as total_present, avg(value)*100 as attendance_percentage
FROM test t
WHERE t.session_date BETWEEN '2017-10-01' AND '2017-10-15'
GROUP BY user_id
And finally to calculate the average attendance_percantage percentage, just take the avg of result from above query. Like this.
select avg(attendance_percentage) from
(SELECT user_id, sum(value) as total_present, avg(value)*100 as attendance_percentage
FROM test t
WHERE t.session_date BETWEEN '2017-10-01' AND '2017-10-15'
GROUP BY user_id)q;
I have also created this sqlfiddle for you so that you can test the queries.
Edit: In case your value is not in binary, you can use your original query to find user wise attendance_percentage and simply take the avg of it using the Avg() function. So your query changes to this.
select avg(attendance_percentage) from
(SELECT a.user_id, a.total_present / (SELECT count(*) as total_sessions FROM test WHERE session_date BETWEEN '2017-10-01' AND '2017-10-15' AND user_id = a.user_id) * 100 AS attendance_percentage, a.total_present
FROM (
SELECT user_id, count(*) as total_present
FROM test t
WHERE t.session_date BETWEEN '2017-10-01' AND '2017-10-15' AND t.value=1
GROUP BY user_id
) a ORDER BY a.user_id ASC)q.
Here is the updated fiddle.
I would suggest you to prevent using sub-queries that easily when is not a good reason for that. To be honest, most of the times you can do that without them as well as sub-queries are usually bad on performance and not scalable at all.
However, I think your query would look something like this:
SELECT user_id, sum(IF(t.value=1,1,0)) as total_present, (total_present/count(*))*100 AS attendance_percentage
FROM test t
WHERE t.session_date BETWEEN '2017-10-01' AND '2017-10-15'
GROUP BY user_id
I hope that helps

Query with distinct and group by

How do I get 9,300 only out of the table above? I just need to add 6500 + 1800 + 1000
Here is my current query
SELECT
SUM(e.amount) / (SELECT count(e2.receipt_no)
FROM entries e2
WHERE e2.receipt_no = e.receipt_no) as total,
e.user_id
FROM
entries e
GROUP BY e.receipt_no
The result is
Now i need to get the total per user_id
Expected output should be
From my understanding this should give you want you want
SELECT sum(DISTINCT amount) as total, reciept_no FROM entries GROUP BY receipt_no
Try some thing like this
SELECT SUM(DISTINCT(amount)) as total, user_id FROM `entries` GROUP BY user_id
Try this
SELECT SUM(amount) as total,user_id FROM entries GROUP BY user_id
First calculate DISTINCT amount group by userid,receipt_no and then sum of there entries group by user_id:
SELECT sum(total),userid from (SELECT sum(DISTINCT amount) as total,
userid,receipt_no FROM entries GROUP BY userid,receipt_no) as rgrouped
GROUP BY userid
You can also try this
SELECT user_id, SUM(DISTINCT `amount`) FROM `test` group by `user_id`
Step 1: Select distinct amount for each user id
101 - 6500,1800,1000
189 - 1019.00
Step - 2
101 = 6500+1800+1000 = 9300.00
189 = 1019.00
This will select distinct amount for each user id and then add selected amount and give you same result.

Greatest 'n' per group by month

I have a mysql table with date, name and rating of a person. I need to build a query to show the best person of each month. The query above gives me maximum rating of the month but wrong name/id of person.
SELECT DATE_FORMAT(date,'%m.%Y') as date2, MAX(rating), name FROM test GROUP BY date2
Here's sqlfiddle with sample table: http://sqlfiddle.com/#!2/4dd54b/9
I read several greatest-n-per-group topics, but those queries didn't work, I suppose it's because of grouping by DATE_FORMAT. So here I ask.
The easiest way is to use the substring_index()/group_concat() trick:
SELECT DATE_FORMAT(date, '%m.%Y') as date2, MAX(rating),
substring_index(group_concat(name order by rating desc), ',', 1) as name
FROM test
GROUP BY date2;
A faster solution might look like this - although removal of the DATE_FORMAT function altogether will speed things up even further...
SELECT x.*
FROM test x
JOIN
( SELECT DATE_FORMAT(date,'%Y-%m') dt
, MAX(rating) max_rating
FROM test
GROUP
BY DATE_FORMAT(date,'%Y-%m')
) y
ON y.dt = DATE_FORMAT(x.date,'%Y-%m')
AND y.max_rating = x.rating;