Cumulative Sum of group count in mysql query

Cumulative Sum of group count in mysql query - mysql

I have a table look like below....
ID HID Date UID
1 1 2012-01-01 1002
2 1 2012-01-24 2005
3 1 2012-02-15 5152
4 2 2012-01-01 6252
5 2 2012-01-19 10356
6 3 2013-01-06 10989
7 3 2013-03-25 25001
8 3 2014-01-14 35798
How can i group by HID, Year, Month and count(UID) and add a cumulative_sum (which is count of UID). So the final result look like this...
HID Year Month Count cumulative_sum
1 2012 01 2 2
1 2012 02 1 3
2 2012 01 2 2
3 2013 01 1 1
3 2013 03 1 2
3 2014 01 1 3
What's the best way to accomplish this using query?

I made assumptions about the original data set. You should be able to adapt this to the revised dataset - although note that the solution using variables (instead of my self-join) is faster...
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(ID INT NOT NULL
,Date DATE NOT NULL
,UID INT NOT NULL PRIMARY KEY
);
INSERT INTO my_table VALUES
(1 ,'2012-01-01', 1002),
(1 ,'2012-01-24', 2005),
(1 ,'2012-02-15', 5152),
(2 ,'2012-01-01', 6252),
(2 ,'2012-01-19', 10356),
(3 ,'2013-01-06', 10989),
(3 ,'2013-03-25', 25001),
(3 ,'2014-01-14', 35798);
SELECT a.*
, SUM(b.count) cumulative
FROM
(
SELECT x.id,YEAR(date) year,MONTH(date) month, COUNT(0) count FROM my_table x GROUP BY id,year,month
) a
JOIN
(
SELECT x.id,YEAR(date) year,MONTH(date) month, COUNT(0) count FROM my_table x GROUP BY id,year,month
) b
ON b.id = a.id AND (b.year < a.year OR (b.year = a.year AND b.month <= a.month)
)
GROUP
BY a.id, a.year,a.month;
+----+------+-------+-------+------------+
| id | year | month | count | cumulative |
+----+------+-------+-------+------------+
| 1 | 2012 | 1 | 2 | 2 |
| 1 | 2012 | 2 | 1 | 3 |
| 2 | 2012 | 1 | 2 | 2 |
| 3 | 2013 | 1 | 1 | 1 |
| 3 | 2013 | 3 | 1 | 2 |
| 3 | 2014 | 1 | 1 | 3 |
+----+------+-------+-------+------------+
If you don't mind an extra column in the result, you can simplify (and accelerate) the above, as follows:
SELECT x.*
, #running:= IF(#previous=x.id,#running,0)+x.count cumulative
, #previous:=x.id
FROM
( SELECT x.id,YEAR(date) year,MONTH(date) month, COUNT(0) count FROM my_table x GROUP BY id,year,month ) x
,( SELECT #cumulative := 0,#running:=0) vals;

The code turns out kind of messy, and it reads as follows:
SELECT
HID,
strftime('%Y', `Date`) AS Year,
strftime('%m', `Date`) AS Month,
COUNT(UID) AS Count,
(SELECT
COUNT(UID)
FROM your_db A
WHERE
A.HID=B.HID
AND
(strftime('%Y', A.`Date`) < strftime('%Y', B.`Date`)
OR
(strftime('%Y', A.`Date`) = strftime('%Y', B.`Date`)
AND
strftime('%m', A.`Date`) <= strftime('%m', B.`Date`)))) AS cumulative_count
FROM your_db B
GROUP BY HID, YEAR, MONTH
Though by using views, it should become much clearer:
CREATE VIEW temp_data AS SELECT
HID,
strftime('%Y', `Date`) as Year,
strftime('%m', `Date`) as Month,
COUNT(UID) as Count
FROM your_db GROUP BY HID, YEAR, MONTH;
Then your statement will read as follows:
SELECT
HID,
Year,
Month,
`Count`,
(SELECT SUM(`Count`)
FROM temp_data A
WHERE
A.HID = B.HID
AND
(A.Year < B.Year
OR
(A.Year = B.Year
AND
A.Month <= B.Month))) AS cumulative_sum
FROM temp_data B;

Related

MySQL: How to get the Active members by month in year

I have got the previous year working members and subtracted previous year relieving employees, then got the previous month relieving list and subtracted it from the result set. Then added the newly added members in a current month.
SQL Fiddle Link
I am sensing that there lot of improvements we can do to the current query. But right now I am out of ideas, Can someone kindly help on this?

IF I have interpreted your existing query correctly, I suggest the following:
select
mnth.num, count(*)
from (
select 1 AS num union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all
select 7 union all select 8 union all select 9 union all select 10 union all select 11 union all select 12
) mnth
left join (
select
e.emp_id
, case
when e.hired_date < date_format(current_date(), '%Y-01-01') then 1
else month(e.hired_date)
end AS start_month
, case
when es.relieving_date < date_format(current_date(), '%Y-01-01') then 0
when es.relieving_date >= date_format(current_date(), '%Y-01-01') then month(es.relieving_date)
else month(current_date())
end AS end_month
from employee e
left join employee_separation es on e.emp_id = es.emp_id
) emp on mnth.num between emp.start_month and emp.end_month
where mnth.num <= month(current_date())
group by
mnth.num
;
This produced the following result (current_date() on Nov 21 2017
| num | count(*) |
|-----|----------|
| 1 | 6 |
| 2 | 7 |
| 3 | 8 |
| 4 | 9 |
| 5 | 10 |
| 6 | 9 |
| 7 | 10 |
| 8 | 11 |
| 9 | 12 |
| 10 | 13 |
| 11 | 14 |
DEMO
Depending on data volumes adding a where clause in the emp subquery may help, this also affect a case expression:
, case
when es.relieving_date >= date_format(current_date(), '%Y-01-01') then month(es.relieving_date)
else month(current_date())
end AS end_month
from employee e
left join employee_separation es on e.emp_id = es.emp_id
where es.relieving_date >= date_format(current_date(), '%Y-01-01')

I think what you need to do is to get all the employees who are already working from the employee table with:
SELECT * FROM employee WHERE hired_date<= CURRENT_DATE;
Then get the list of employees whose relieving date is still in the future using:
SELECT * FROM employee_separation WHERE relieving_date > CURRENT_DATE;
Then join the two results and group by the month and year of the reliving date as shown below:
SELECT DATE_FORMAT(B.relieving_date, "%Y-%M") RELIEVING_DATE, COUNT(*)
NUMBER_OF_ACTIVE_MEMBERS FROM
(SELECT * FROM employee WHERE hired_date <= CURRENT_DATE) A INNER JOIN
(SELECT * FROM employee_separation WHERE relieving_date > CURRENT_DATE) B
ON A.emp_id=B.emp_id
GROUP BY DATE_FORMAT(B.relieving_date , "%Y-%M");
Here is a Demo on sql fiddle.

Get the count() where created_date is cumulative and date based

I'm aware that there are several answers on SO about cumulative totals. I have experimented and have not found a solution to my problem.
Here is a sqlfiddle.
We have a contacts table with two fields, eid and create_time:
eid create_time
991772 April, 21 2016 11:34:21
989628 April, 17 2016 02:19:57
985557 April, 04 2016 09:56:39
981920 March, 30 2016 11:03:12
981111 March, 30 2016 09:36:48
I would like to select the number of new contacts in each month along with the size of our contacts database at the end of each month. New contacts by year and month is simple enough. For the size of the contacts table at the end of each month I did some research and found what looked to be a straight forwards method:
set #csum = 0;
select
year(c.create_time) as yr,
month(c.create_time) as mth,
count(c.eid) as new_contacts,
(#csum + count(c.eid)) as cumulative_contacts
from
contacts c
group by
yr,
mth
That runs but gives me unexpected results.
If I run:
select count(*) from contacts where date(create_time) < current_date
I get the total number of records in the table 146.
I therefore expected the final row in my query using #csum to have 146 for April 2016. It has only 3?
What my goal is for field cumulative_contacts:
For the record with e.g. January 2016.
select count(*) from contacts where date(create_time) < '2016-02-01';
And the record for February would have:
select count(*) from contacts where date(create_time) < '2016-03-01';
And so on

Try this, a bit of modification from your sql;)
CREATE TABLE IF NOT EXISTS `contacts` (
`eid` char(50) DEFAULT NULL,
`create_time` timestamp NULL DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT;
INSERT INTO `contacts` (`eid`, `create_time`) VALUES
('991772', '2016-04-21 11:34:21'),
('989628', '2016-04-17 02:19:57'),
('985557', '2016-04-04 09:56:39'),
('981920', '2016-03-30 11:03:12'),
('981111', '2016-03-30 09:36:48');
SET #csum = 0;
SELECT t.*, #csum:=(#csum + new_contacts) AS cumulative_contacts
FROM (
SELECT YEAR(c.create_time) AS yr, MONTH(c.create_time) AS mth, COUNT(c.eid) AS new_contacts
FROM contacts c
GROUP BY yr, mth) t
Output results is
| yr | mth | new_contacts | cumulative_contacts |
------ ----- -------------- ---------------------
| 2016 | 3 | 2 | 2 |
| 2016 | 4 | 3 | 5 |

This sql will get the cumulative sum and is pretty efficient. It numbers each row first and then uses that as the cumulative sum.
SELECT s1.yr, s1.mth, s1.new_contacts, s2.cummulative_contacts
FROM
(SELECT
YEAR(create_time) AS yr,
MONTH(create_time) AS mth,
COUNT(eid) AS new_contacts,
MAX(eid) AS max_eid
FROM
contacts
GROUP BY
yr,
mth
ORDER BY create_time) s1 INNER JOIN
(SELECT eid, (#sum:=#sum+1) AS cummulative_contacts
FROM
contacts INNER JOIN
(SELECT #sum := 0) r
ORDER BY create_time) s2 ON max_eid=s2.eid;
--Result sample--
| yr | mth | new_contacts | cumulative_contacts |
|------|-----|--------------|---------------------|
| 2016 | 1 | 4 | 132 |
| 2016 | 2 | 4 | 136 |
| 2016 | 3 | 7 | 143 |
| 2016 | 4 | 3 | 146 |

Try this: fiddele
Here you have a "greater than or equal" join, so each group "contains" all previous values. Times 12 part, converts the hole comparation to months. I did offer this solution as it is not MySql dependant. (can be implemented on many other DBs with minimun or no changes)
select dates.yr, dates.mth, dates.new_contacts, sum(NC.new_contacts) as cumulative_new_contacts
from (
select
year(c.create_time) as yr,
month(c.create_time) as mth,
count(c.eid) as new_contacts
from
contacts c
group by
year(c.create_time),
month(c.create_time)
) as dates
left join
(
select
year(c.create_time) as yr,
month(c.create_time) as mth,
count(c.eid) as new_contacts
from
contacts c
group by
year(c.create_time),
month(c.create_time)
) as NC
on dates.yr*12+dates.mth >= NC.yr*12+NC.mth
group by
dates.yr,
dates.mth,
dates.new_contacts -- not needed by MySql, present here for other DBs compatibility
order by 1,2

Finding count for a Period in sql

I have a table with :
user_id | order_date
---------+------------
12 | 2014-03-23
12 | 2014-01-24
14 | 2014-01-26
16 | 2014-01-23
15 | 2014-03-21
20 | 2013-10-23
13 | 2014-01-25
16 | 2014-03-23
13 | 2014-01-25
14 | 2014-03-22
A Active user is someone who has logged in last 12 months.
Need output as
Period | count of Active user
----------------------------
Oct-2013 - 1
Jan-2014 - 5
Mar-2014 - 10
The Jan 2014 value - includes Oct -2013 1 record and 4 non duplicate record for Jan 2014)

You can use a variable to calculate the running total of active users:
SELECT Period,
#total:=#total+cnt AS `Count of Active Users`
FROM (
SELECT CONCAT(MONTHNAME(order_date), '-', YEAR(order_date)) AS Period,
COUNT(DISTINCT user_id) AS cnt
FROM mytable
GROUP BY Period
ORDER BY YEAR(order_date), MONTH(order_date) ) t,
(SELECT #total:=0) AS var
The subquery returns the number of distinct active users per Month/Year. The outer query uses #total variable in order to calculate the running total of active users' count.
Fiddle Demo here

I've got two queries that do the thing. I am not sure which one's the fastest. Check them aginst your database:
SQL Fiddle
Query 1:
select per.yyyymm,
(select count(DISTINCT o.user_id) from orders o where o.order_date >=
(per.yyyymm - INTERVAL 1 YEAR) and o.order_date < per.yyyymm + INTERVAL 1 MONTH) as `count`
from
(select DISTINCT LAST_DAY(order_date) + INTERVAL 1 DAY - INTERVAL 1 MONTH as yyyymm
from orders) per
order by per.yyyymm
Results:
| yyyymm | count |
|---------------------------|-------|
| October, 01 2013 00:00:00 | 1 |
| January, 01 2014 00:00:00 | 5 |
| March, 01 2014 00:00:00 | 6 |
Query 2:
select DATE_FORMAT(order_date, '%Y-%m'),
(select count(DISTINCT o.user_id) from orders o where o.order_date >=
(LAST_DAY(o1.order_date) + INTERVAL 1 DAY - INTERVAL 13 MONTH) and
o.order_date <= LAST_DAY(o1.order_date)) as `count`
from orders o1
group by DATE_FORMAT(order_date, '%Y-%m')
Results:
| DATE_FORMAT(order_date, '%Y-%m') | count |
|----------------------------------|-------|
| 2013-10 | 1 |
| 2014-01 | 5 |
| 2014-03 | 6 |

The best thing I could do is this:
SELECT Date, COUNT(*) as ActiveUsers
FROM
(
SELECT DISTINCT userId, CONCAT(YEAR(order_date), "-", MONTH(order_date)) as Date
FROM `a`
ORDER BY Date
)
AS `b`
GROUP BY Date
The output is the following:
| Date | ActiveUsers |
|---------|-------------|
| 2013-10 | 1 |
| 2014-1 | 4 |
| 2014-3 | 4 |
Now, for every row you need to sum up the number of active users in previous rows.
For example, here is the code in C#.
int total = 0;
while (reader.Read())
{
total += (int)reader['ActiveUsers'];
Console.WriteLine("{0} - {1} active users", reader['Date'].ToString(), reader['ActiveUsers'].ToString());
}
By the way, for the March of 2014 the answer is 9 because one row is duplicated.

Try this, but thise doesn't handle the last part: The Jan 2014 value - includes Oct -2013
select TO_CHAR(order_dt,'MON-YYYY'), count(distinct User_ID ) cnt from [orders]
where User_ID in
(select User_ID from
(select a.User_ID from [orders] a,
(select a.User_ID,count (a.order_dt) from [orders] a
where a.order_dt > (select max(b.order_dt)-365 from [orders] b where a.User_ID=b.User_ID)
group by a.User_ID
having count(order_dt)>1) b
where a.User_ID=b.User_ID) a
)
group by TO_CHAR(order_dt,'MON-YYYY');

This is what I think you are looking for
SET #cnt = 0;
SELECT Period, #cnt := #cnt + total_active_users AS total_active_users
FROM (
SELECT DATE_FORMAT(order_date, '%b-%Y') AS Period , COUNT( id) AS total_active_users
FROM t
GROUP BY DATE_FORMAT(order_date, '%b-%Y')
ORDER BY order_date
) AS t
This is the output that I get
Period total_active_users
Oct-2013 1
Jan-2014 6
Mar-2014 10
You can also do COUNT(DISTINCT id) to get the unique Ids only
Here is a SQL Fiddle

count the number of rows between intervals

My table is like:
+---------+---------+------------+-----------------------+---------------------+
| visitId | userId | locationId | comments | time |
+---------+---------+------------+-----------------------+---------------------+
| 1 | 3 | 12 | It's a good day here! | 2012-12-12 11:50:12 |
+---------+---------+------------+-----------------------+---------------------+
| 2 | 3 | 23 | very beautiful | 2012-12-12 12:50:12 |
+---------+---------+------------+-----------------------+---------------------+
| 3 | 3 | 52 | nice | 2012-12-12 13:50:12 |
+---------+---------+------------+-----------------------+---------------------+
which records visitors' trajectory and some comments on the places visited.
I want to count the numbers of visitors that visit a specific place (say id=3227) from 0:00 to 23:59, over some interval (ie. 30mins)
I was trying to do this by :
SELECT COUNT(*) FROM visits
GROUP BY HOUR(time), SIGN( MINUTE(time) - 30 )// if they are in the same interval this will yield the same result
WHERE locationId=3227
The problem is that if there is no record that falls in some interval, this will NOT return that interval with count 0. For example, there are no visitors visiting the location from 02:00 to 03:00, this will not give me the intervals of 02:00-02:29 and 02:30-2:59.
I want a result with an exact size of 48 (one for every half hour), how can I do this?

You have to create a table with the 48 rows that you want and use left outer join:
select n.hr, n.hr, coalesce(v.cnt, 0) as cnt
from (select 0 as hr, -1 as sign union all
select 0, 1 union all
select 1, -1 union all
select 1, 1 union all
. . .
select 23, -1 union all
select 23, 1 union all
) left outer join
(SELECT HOUR(time) as hr, SIGN( MINUTE(time) - 30 ) as sign, COUNT(*) as cnt
FROM visits
WHERE locationId=3227
GROUP BY HOUR(time), SIGN( MINUTE(time) - 30 )
) v
on n.hr = v.hr and n.sign = v.sign
order by n.hr, n.hr

Sql query to extract average grouped by a column

I'm trying to generate a SQL query to extract an average montly powerusage (of a year) for an ID.
+----+------------+------------+
| id | powerusage | date |
+----+------------+------------+
| 1 | 750 | 2011-12-2 |
| 1 | 1000 | 2011-12-1 |
| 1 | 1500 | 2011-11-15 |
| 1 | 100 | 2011-11-13 |
| 1 | 50 | 2011-11-10 |
| 2 | 500 | 2011-11-15 |
| 2 | 200 | 2011-11-13 |
+----+------------+------------+
So if ID = 1 I want (avg november + avg december) / 2 = (1750/2 + 1650/3) / 2 = 712.5
select AVG(powerusage) as avgMontlyPowerUsage
from usagetable
where id = 1 and YEAR(date) = 2011
But this will give me 680.
How do I do a average on a group?
Many thanks for all the answers! But I see my question is incorrect. See updated question

Something like
select AVG(montlyPowerUsage) from (
SELECT MONTH(date) as mnth,sum(powerusage) as montlyPowerUsage
from usagetable
where id = 1 and YEAR(date) = 2011 group by MONTH(date)
) t1
For Edited question
select AVG(montlyPowerUsage) from (
SELECT MONTH(date) as mnth,AVG(powerusage) as montlyPowerUsage
from usagetable
where id = 1 and YEAR(date) = 2011 group by MONTH(date)
) t1

mysql> select avg(powerusage)
from
(select monthname(date), sum(powerusage) as powerusage
from usagetable
where id=1 and year(date)=2011
group by monthname(date)) as avg_usage;
+-----------------+
| avg(powerusage) |
+-----------------+
| 1700.0000 |
+-----------------+
select avg(total_powerusage)
from
(select monthname(date), sum(powerusage) as total_powerusage
from usagetable
where id=1 and year(date)=2011
group by monthname(date)
) as avg_usage;
/* the use of subquery
is to return total of unique occurrences,
and sum powerusage of each occurrence,
which mean, you just need to apply AVG into the subquery */

This should give you monthly averages for every year and user. Some of the syntax may be MS SQL specific, but the logic should be good.
SELECT id, AVG(usage), year FROM
(SELECT id, SUM(powerusage) as usage, YEAR(date) as Year, MONTH(date) as Month
FROM usagetable
GROUP BY id, YEAR(date), MONTH(date)) as InnerTable
GROUP BY id, year

Try adding a group by on the id
GROUP BY id
Or the date, whichever suits.

SELECT SUM(powerusage) / (MONTH(MAX(`date`)) - MONTH(MIN(`date`)) + 1)
AS avgMontlyPowerUsage
FROM usagetable
WHERE id = 1
AND YEAR(`date`) = 2011
or (depending on what you need when data is sparse):
SELECT SUM(powerusage) / COUNT( DISTINCT MONTH(`date`) )
AS avgMontlyPowerUsage
FROM usagetable
WHERE id = 1
AND YEAR(`date`) = 2011
Warning: Neither of the above is optimized for performance.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Cumulative Sum of group count in mysql query - mysql

Related

MySQL: How to get the Active members by month in year

Get the count() where created_date is cumulative and date based

Finding count for a Period in sql

count the number of rows between intervals

Sql query to extract average grouped by a column

Categories

Resources