MySQL and average calculation giving different results - mysql

I've created a test table called test with some sample data that looks like the following:
if I add-up each user's value and divide it by the total number of entries there is for that user_id and multiply it by 100, I get a percentage. So for example
for user_id 1 there are a total of 6 records and the values add up to 4. So 4/6*100 = 66.67%
for user_id 2 there are a total of 8 records and the values add up to 6. So 6/8*100 = 75%
I am able to get these values using the following SQL:
SELECT a.user_id, a.total_present / (SELECT count(*) as total_sessions FROM test WHERE session_date BETWEEN '2017-10-01' AND '2017-10-15' AND user_id = a.user_id) * 100 AS attendance_percentage, a.total_present
FROM (
SELECT user_id, count(*) as total_present
FROM test t
WHERE t.session_date BETWEEN '2017-10-01' AND '2017-10-15' AND t.value=1
GROUP BY user_id
) a ORDER BY a.user_id ASC
If I add the percentages up and divide by how many users then I get an average of 70.83%
I was thinking to simplify this simply just by summing up all the records and divide by total number of records and multiplying by 100 like this:
SELECT ((SELECT count(*) as total_present FROM test t WHERE t.session_date BETWEEN '2017-10-01' AND '2017-10-15' AND t.value=1) / (SELECT count(*) as total_sessions FROM test WHERE session_date BETWEEN '2017-10-01' AND '2017-10-15')) * 100 AS average_percentage
This however does not give me the same average calculation because each user has a different number of total sessions. If both users had the same amount of sessions then it matches.
So the question is, is there a way to calculate this using my simplified method but get the result more accurate to the one where I am getting each user percentage and then adding it up and dividing by total users?

You are complicating your query. To get the attendance_percentage by user, you can simply take the avg of value(since its already in binary) and multiply it by 100. Here is the query for that.
SELECT user_id, sum(value) as total_present, avg(value)*100 as attendance_percentage
FROM test t
WHERE t.session_date BETWEEN '2017-10-01' AND '2017-10-15'
GROUP BY user_id
And finally to calculate the average attendance_percantage percentage, just take the avg of result from above query. Like this.
select avg(attendance_percentage) from
(SELECT user_id, sum(value) as total_present, avg(value)*100 as attendance_percentage
FROM test t
WHERE t.session_date BETWEEN '2017-10-01' AND '2017-10-15'
GROUP BY user_id)q;
I have also created this sqlfiddle for you so that you can test the queries.
Edit: In case your value is not in binary, you can use your original query to find user wise attendance_percentage and simply take the avg of it using the Avg() function. So your query changes to this.
select avg(attendance_percentage) from
(SELECT a.user_id, a.total_present / (SELECT count(*) as total_sessions FROM test WHERE session_date BETWEEN '2017-10-01' AND '2017-10-15' AND user_id = a.user_id) * 100 AS attendance_percentage, a.total_present
FROM (
SELECT user_id, count(*) as total_present
FROM test t
WHERE t.session_date BETWEEN '2017-10-01' AND '2017-10-15' AND t.value=1
GROUP BY user_id
) a ORDER BY a.user_id ASC)q.
Here is the updated fiddle.

I would suggest you to prevent using sub-queries that easily when is not a good reason for that. To be honest, most of the times you can do that without them as well as sub-queries are usually bad on performance and not scalable at all.
However, I think your query would look something like this:
SELECT user_id, sum(IF(t.value=1,1,0)) as total_present, (total_present/count(*))*100 AS attendance_percentage
FROM test t
WHERE t.session_date BETWEEN '2017-10-01' AND '2017-10-15'
GROUP BY user_id
I hope that helps

Related

Display count of column excluding min and max values

I want to count how many unique occurrences of an activity occurs in the table (FRIENDS) below. Then, I want to print the activities whom which their occurrences are not the maximum or minimum value of all occurrences.
***ID/Name/Activity***
1/James/Horse Riding
2/Eric/Eating
3/Sean/Eating
4/John/Horse Riding
5/Chris/Eating
6/Jessica/Paying
Ex:
Horse Riding occur 140 times
Playing occurs 170 times
Eating occurs 120 times
Walking occurs 150 times
Running occurs 200 times
The max occurrence here is Running, occurring 200 times, and the minimum occurrence here is Eating, occurring 120 times.
Therefore, I want to display
Horse Riding
Playing
Walking
In no particular order.
This is a code I have so far, but I keep getting a syntax error. When I don't get a syntax error, I get a "Every derived table must have its own alias error." I am new to SQL so I appreciate any advice I can get.
SELECT ACTIVITY, count(ACTIVITY) as Occurences FROM FRIENDS,
(SELECT MAX(Occur) AS Ma,MIN(Occur) AS Mi FROM (SELECT ACTIVITY, count(ACTIVITY) as Occur
FROM FRIENDS GROUP by City)) as T
GROUP BY City HAVING Occurences!=T.Ma AND Occurences!=T.Mi ORDER BY Occurences DESC
In MySQL 8.0, you can do this with aggregation and window functions:
select *
from (
select activity, count(*) cnt,
rank() over(order by count(*)) rn_asc,
rank() over(order by count(*) desc) rn_desc
from mytable
group by activity
) t
where rn_asc > 1 and rn_desc > 1
The subquery counts the occurences of each activity, and ranks them in both ascending and descending oders. All that is left to do is exclude the top and bottom records. If there are top ties (or bottoms), the query evicts them.
In earlier versions, an option is a having clause:
select activity, count(*) cnt
from mytable t
group by activty
having count(*) > (select count(*) from mytable group by activity order by count(*) limit 1)
and count(*) < (select count(*) from mytable group by activity order by count(*) desc limit 1)

Who to the number of users who have had one transaction per day?

Here is my query:
select count(1) from
(select count(1) num, user_id from pos_transactions pt
where date(created_at) <= '2020-6-21'
group by user_id
having num = 1) x
It gives me the number of users who have had 1 transaction until 2020-6-21. Now I want to group it also per date(created_at). I mean, I want to get a list of dates (such as 2020-6-21, 2020-6-22 etc ..) plus the number of users who have had 1 transaction in that date (day).
Any idea how can I do that?
EDIT: The result of query above is correct, the issue is, it's manually now. I mean, I have to increase 2020-6-21 by hand. I want to make it automatically. In other words, I want a list of all dates (from 2020-6-21 til now) contains the number of users who have had 1 transaction until that date.
If you want the number of users who had one transaction on each day, then you need to aggregate by the date as well:
select dte, count(*)
from (select date(created_at) as dte, user_id
from pos_transactions pt
where date(created_at) <= '2020-6-21'
group by dte, user_id
having count(*) = 1
) du
group by dte;

Finding percentage from count by category on a specific day

This is a hypothetical problem I created from the following sql problem http://sqlfiddle.com/#!9/ef1f32f/15 that I stumbled upon,
which, by itself is a modified version of http://sqlfiddle.com/#!2/1b72f3/1
I am trying to find the percentage of people who were active during a specific day and group them by location.
I am noob learning sql subqueries.
Here is some code I modified from What percentage of users participated on each day (SQL Query).
Here is sample data:
|id |activity_typeid| competitionid| userid| time| activity_weight| location
22 2151 52736be97b706 421 2013-11-04T08:30:38Z 2 SF
1951 2151 52736be97b706 231 2013-11-01T09:05:22Z 2 LA
1961 2151 52736be97b706 241 2013-11-01T09:07:30Z 2 LA
Please check the rest in the sqlfiddle link.
SELECT date(time) as typical_day, location,
count(distinct userid) as counting,
count(distinct userid) / sum(cnt) * 100 percentage
FROM activity_entries ae join
(select count(distinct userid) as cnt
from activity_entries) cd
where date(time) = '2013-11-01'
GROUP BY location;
I was able to get the percentage as 33.3, 20 and 50 for LA, OK and SF. However,the expected percentage should be 37.5, 37.5 and 25.
I am not sure what is missing from my code.
I have spent a few hours trying to figure out, but to no avail. Please help me with this, don't close the question.
I don't know how you get the values that you expect. But if you want the proportion of all users, then the sum(cnt) is throwing you off. The query should look more like this:
SELECT date(time) as typical_day, activityname, cnt,
count(distinct userid) as counting,
count(distinct userid) / cnt * 100 percentage
FROM activity_entries ae CROSS JOIN
(select count(distinct userid) as cnt
from activity_entries) cd
WHERE date(time) = '2013-11-01'
GROUP BY date(time), activityname, cnt;
If you want the proportion of users on a given day, then:
SELECT date(ae.time) as typical_day, ae.activityname, cd.cnt,
count(distinct ae.userid) as counting,
count(distinct ae.userid) / cd.cnt * 100 percentage
FROM activity_entries ae JOIN
(SELECT date(time) as date,
COUNT(distinct userid) as cnt
FROM activity_entries
GROUP BY date(time)
) cd
ON date(ae.time) = cd.date
WHERE date(ae.time) = '2013-11-01'
GROUP BY date(ae.time), ae.activityname, cd.cnt;

MySQL double averaging with double grouping

I have the following data in a MySQL table called test
I run the following SQL query
SELECT user_id,
group_id,
sum(value) as total_present,
avg(value)*100 as attendance_percentage
FROM test t
WHERE t.session_date BETWEEN '2017-10-01' AND '2017-10-15'
GROUP BY user_id
This gives me percentages for each user_id like this:
If you look at the output example above, user_id 1 and 2 are in the same group_id. So is there a way for me to further group my query to then take an average of the same group_id's. So for the example above, the group_id 3 percentage should be 70.83335
Khalid answer is OK. But I think you should consider the problem you are averaging different things with different magnitude. user_id = 2 has more values than user_id = 1 so his percentage should weight more.
For example if user_id = 3 only went once with 100% attendance that will distort the avg.
You should do:
SELECT group_id, avg(value)
FROM yourTable
GROUP BY group_id
In this case the AVG() is 71.42 instead of 70.83
You can by applying further aggregation on your query
SELECT t.group_id, avg(t.attendance_percentage) as t.group_attendance_percentage
FROM(
SELECT user_id, group_id, sum(value) as total_present, avg(value)*100 as attendance_percentage
FROM test t
WHERE t.session_date BETWEEN '2017-10-01' AND '2017-10-15'
GROUP BY user_id
) t
GROUP BY t.group_id

Greatest 'n' per group by month

I have a mysql table with date, name and rating of a person. I need to build a query to show the best person of each month. The query above gives me maximum rating of the month but wrong name/id of person.
SELECT DATE_FORMAT(date,'%m.%Y') as date2, MAX(rating), name FROM test GROUP BY date2
Here's sqlfiddle with sample table: http://sqlfiddle.com/#!2/4dd54b/9
I read several greatest-n-per-group topics, but those queries didn't work, I suppose it's because of grouping by DATE_FORMAT. So here I ask.
The easiest way is to use the substring_index()/group_concat() trick:
SELECT DATE_FORMAT(date, '%m.%Y') as date2, MAX(rating),
substring_index(group_concat(name order by rating desc), ',', 1) as name
FROM test
GROUP BY date2;
A faster solution might look like this - although removal of the DATE_FORMAT function altogether will speed things up even further...
SELECT x.*
FROM test x
JOIN
( SELECT DATE_FORMAT(date,'%Y-%m') dt
, MAX(rating) max_rating
FROM test
GROUP
BY DATE_FORMAT(date,'%Y-%m')
) y
ON y.dt = DATE_FORMAT(x.date,'%Y-%m')
AND y.max_rating = x.rating;