I'm aware that there are several answers on SO about cumulative totals. I have experimented and have not found a solution to my problem.
Here is a sqlfiddle.
We have a contacts table with two fields, eid and create_time:
eid create_time
991772 April, 21 2016 11:34:21
989628 April, 17 2016 02:19:57
985557 April, 04 2016 09:56:39
981920 March, 30 2016 11:03:12
981111 March, 30 2016 09:36:48
I would like to select the number of new contacts in each month along with the size of our contacts database at the end of each month. New contacts by year and month is simple enough. For the size of the contacts table at the end of each month I did some research and found what looked to be a straight forwards method:
set #csum = 0;
select
year(c.create_time) as yr,
month(c.create_time) as mth,
count(c.eid) as new_contacts,
(#csum + count(c.eid)) as cumulative_contacts
from
contacts c
group by
yr,
mth
That runs but gives me unexpected results.
If I run:
select count(*) from contacts where date(create_time) < current_date
I get the total number of records in the table 146.
I therefore expected the final row in my query using #csum to have 146 for April 2016. It has only 3?
What my goal is for field cumulative_contacts:
For the record with e.g. January 2016.
select count(*) from contacts where date(create_time) < '2016-02-01';
And the record for February would have:
select count(*) from contacts where date(create_time) < '2016-03-01';
And so on
Try this, a bit of modification from your sql;)
CREATE TABLE IF NOT EXISTS `contacts` (
`eid` char(50) DEFAULT NULL,
`create_time` timestamp NULL DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT;
INSERT INTO `contacts` (`eid`, `create_time`) VALUES
('991772', '2016-04-21 11:34:21'),
('989628', '2016-04-17 02:19:57'),
('985557', '2016-04-04 09:56:39'),
('981920', '2016-03-30 11:03:12'),
('981111', '2016-03-30 09:36:48');
SET #csum = 0;
SELECT t.*, #csum:=(#csum + new_contacts) AS cumulative_contacts
FROM (
SELECT YEAR(c.create_time) AS yr, MONTH(c.create_time) AS mth, COUNT(c.eid) AS new_contacts
FROM contacts c
GROUP BY yr, mth) t
Output results is
| yr | mth | new_contacts | cumulative_contacts |
------ ----- -------------- ---------------------
| 2016 | 3 | 2 | 2 |
| 2016 | 4 | 3 | 5 |
This sql will get the cumulative sum and is pretty efficient. It numbers each row first and then uses that as the cumulative sum.
SELECT s1.yr, s1.mth, s1.new_contacts, s2.cummulative_contacts
FROM
(SELECT
YEAR(create_time) AS yr,
MONTH(create_time) AS mth,
COUNT(eid) AS new_contacts,
MAX(eid) AS max_eid
FROM
contacts
GROUP BY
yr,
mth
ORDER BY create_time) s1 INNER JOIN
(SELECT eid, (#sum:=#sum+1) AS cummulative_contacts
FROM
contacts INNER JOIN
(SELECT #sum := 0) r
ORDER BY create_time) s2 ON max_eid=s2.eid;
--Result sample--
| yr | mth | new_contacts | cumulative_contacts |
|------|-----|--------------|---------------------|
| 2016 | 1 | 4 | 132 |
| 2016 | 2 | 4 | 136 |
| 2016 | 3 | 7 | 143 |
| 2016 | 4 | 3 | 146 |
Try this: fiddele
Here you have a "greater than or equal" join, so each group "contains" all previous values. Times 12 part, converts the hole comparation to months. I did offer this solution as it is not MySql dependant. (can be implemented on many other DBs with minimun or no changes)
select dates.yr, dates.mth, dates.new_contacts, sum(NC.new_contacts) as cumulative_new_contacts
from (
select
year(c.create_time) as yr,
month(c.create_time) as mth,
count(c.eid) as new_contacts
from
contacts c
group by
year(c.create_time),
month(c.create_time)
) as dates
left join
(
select
year(c.create_time) as yr,
month(c.create_time) as mth,
count(c.eid) as new_contacts
from
contacts c
group by
year(c.create_time),
month(c.create_time)
) as NC
on dates.yr*12+dates.mth >= NC.yr*12+NC.mth
group by
dates.yr,
dates.mth,
dates.new_contacts -- not needed by MySql, present here for other DBs compatibility
order by 1,2
Related
I have a MySQL table that looks like this
id | client_id | date
--------------------------------------
1 | 12 | 02/02/2008
2 | 15 | 12/06/2008
3 | 23 | 11/12/2008
4 | 12 | 18/01/2009
5 | 12 | 03/03/2009
6 | 18 | 02/07/2009
7 | 23 | 08/09/2010
8 | 18 | 02/10/2010
9 | 21 | 30/11/2010
What I am trying to do is get the number of new clients for each year. 2008 has 3 new clients(12,15,23), 2009 has 1 new client(18) and 2010 has 1 new client(21).
So far I have this query that gives me the distinct clients for each year, that is 3 for 2008, 2 for 2009 and 3 for 2010.
SELECT COUNT(DISTINCT client_id) FROM table GROUP BY YEAR(date)
Any help would be appreciated..
You could use a subquery to get the first year of every client_id grouped by client_id, and then count the occurrence of client_id grouped by year, so:
SELECT COUNT(client_id), YEAR_MIN FROM (
SELECT client_id, MIN(YEAR(date)) AS YEAR_MIN
FROM table
GROUP BY client_id) AS T
GROUP BY YEAR_MIN
SQL Fiddle here
So you want to count the first date a client appears in the table. In other words, the row for which no other row exists with an earlier date and the same client. You can do this with an exclusion join.
Then you can count them per year as you're doing now.
SELECT YEAR(t.date) AS yr, COUNT(t.client_id) AS client_count
FROM (
SELECT t1.client_id, t1.date
FROM mytable AS t1
LEFT JOIN mytable AS t2 ON (t1.client_id=t2.client_id AND t1.date > t2.date)
WHERE t2.client_id IS NULL) AS t
GROUP BY yr
You should store dates using the DATE data type, which uses YYYY-MM-DD format. You won't be able to do > comparisons if your dates are stored as strings in DD-MM-YYYY format.
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(id SERIAL PRIMARY KEY
,client_id INT NOT NULL
,date INT NOT NULL
);
INSERT INTO my_table VALUES
(1,12,2008),
(2,15,2008),
(3,23,2008),
(4,12,2009),
(5,12,2009),
(6,18,2009),
(7,23,2010),
(8,18,2010),
(9,21,2010);
SELECT year
, COUNT(*) total
FROM
( SELECT client_id, MIN(date) year FROM my_table GROUP BY client_id ) x
GROUP
BY year;
+------+-------+
| year | total |
+------+-------+
| 2008 | 3 |
| 2009 | 1 |
| 2010 | 1 |
+------+-------+
I have a table with :
user_id | order_date
---------+------------
12 | 2014-03-23
12 | 2014-01-24
14 | 2014-01-26
16 | 2014-01-23
15 | 2014-03-21
20 | 2013-10-23
13 | 2014-01-25
16 | 2014-03-23
13 | 2014-01-25
14 | 2014-03-22
A Active user is someone who has logged in last 12 months.
Need output as
Period | count of Active user
----------------------------
Oct-2013 - 1
Jan-2014 - 5
Mar-2014 - 10
The Jan 2014 value - includes Oct -2013 1 record and 4 non duplicate record for Jan 2014)
You can use a variable to calculate the running total of active users:
SELECT Period,
#total:=#total+cnt AS `Count of Active Users`
FROM (
SELECT CONCAT(MONTHNAME(order_date), '-', YEAR(order_date)) AS Period,
COUNT(DISTINCT user_id) AS cnt
FROM mytable
GROUP BY Period
ORDER BY YEAR(order_date), MONTH(order_date) ) t,
(SELECT #total:=0) AS var
The subquery returns the number of distinct active users per Month/Year. The outer query uses #total variable in order to calculate the running total of active users' count.
Fiddle Demo here
I've got two queries that do the thing. I am not sure which one's the fastest. Check them aginst your database:
SQL Fiddle
Query 1:
select per.yyyymm,
(select count(DISTINCT o.user_id) from orders o where o.order_date >=
(per.yyyymm - INTERVAL 1 YEAR) and o.order_date < per.yyyymm + INTERVAL 1 MONTH) as `count`
from
(select DISTINCT LAST_DAY(order_date) + INTERVAL 1 DAY - INTERVAL 1 MONTH as yyyymm
from orders) per
order by per.yyyymm
Results:
| yyyymm | count |
|---------------------------|-------|
| October, 01 2013 00:00:00 | 1 |
| January, 01 2014 00:00:00 | 5 |
| March, 01 2014 00:00:00 | 6 |
Query 2:
select DATE_FORMAT(order_date, '%Y-%m'),
(select count(DISTINCT o.user_id) from orders o where o.order_date >=
(LAST_DAY(o1.order_date) + INTERVAL 1 DAY - INTERVAL 13 MONTH) and
o.order_date <= LAST_DAY(o1.order_date)) as `count`
from orders o1
group by DATE_FORMAT(order_date, '%Y-%m')
Results:
| DATE_FORMAT(order_date, '%Y-%m') | count |
|----------------------------------|-------|
| 2013-10 | 1 |
| 2014-01 | 5 |
| 2014-03 | 6 |
The best thing I could do is this:
SELECT Date, COUNT(*) as ActiveUsers
FROM
(
SELECT DISTINCT userId, CONCAT(YEAR(order_date), "-", MONTH(order_date)) as Date
FROM `a`
ORDER BY Date
)
AS `b`
GROUP BY Date
The output is the following:
| Date | ActiveUsers |
|---------|-------------|
| 2013-10 | 1 |
| 2014-1 | 4 |
| 2014-3 | 4 |
Now, for every row you need to sum up the number of active users in previous rows.
For example, here is the code in C#.
int total = 0;
while (reader.Read())
{
total += (int)reader['ActiveUsers'];
Console.WriteLine("{0} - {1} active users", reader['Date'].ToString(), reader['ActiveUsers'].ToString());
}
By the way, for the March of 2014 the answer is 9 because one row is duplicated.
Try this, but thise doesn't handle the last part: The Jan 2014 value - includes Oct -2013
select TO_CHAR(order_dt,'MON-YYYY'), count(distinct User_ID ) cnt from [orders]
where User_ID in
(select User_ID from
(select a.User_ID from [orders] a,
(select a.User_ID,count (a.order_dt) from [orders] a
where a.order_dt > (select max(b.order_dt)-365 from [orders] b where a.User_ID=b.User_ID)
group by a.User_ID
having count(order_dt)>1) b
where a.User_ID=b.User_ID) a
)
group by TO_CHAR(order_dt,'MON-YYYY');
This is what I think you are looking for
SET #cnt = 0;
SELECT Period, #cnt := #cnt + total_active_users AS total_active_users
FROM (
SELECT DATE_FORMAT(order_date, '%b-%Y') AS Period , COUNT( id) AS total_active_users
FROM t
GROUP BY DATE_FORMAT(order_date, '%b-%Y')
ORDER BY order_date
) AS t
This is the output that I get
Period total_active_users
Oct-2013 1
Jan-2014 6
Mar-2014 10
You can also do COUNT(DISTINCT id) to get the unique Ids only
Here is a SQL Fiddle
I have a table look like below....
ID HID Date UID
1 1 2012-01-01 1002
2 1 2012-01-24 2005
3 1 2012-02-15 5152
4 2 2012-01-01 6252
5 2 2012-01-19 10356
6 3 2013-01-06 10989
7 3 2013-03-25 25001
8 3 2014-01-14 35798
How can i group by HID, Year, Month and count(UID) and add a cumulative_sum (which is count of UID). So the final result look like this...
HID Year Month Count cumulative_sum
1 2012 01 2 2
1 2012 02 1 3
2 2012 01 2 2
3 2013 01 1 1
3 2013 03 1 2
3 2014 01 1 3
What's the best way to accomplish this using query?
I made assumptions about the original data set. You should be able to adapt this to the revised dataset - although note that the solution using variables (instead of my self-join) is faster...
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(ID INT NOT NULL
,Date DATE NOT NULL
,UID INT NOT NULL PRIMARY KEY
);
INSERT INTO my_table VALUES
(1 ,'2012-01-01', 1002),
(1 ,'2012-01-24', 2005),
(1 ,'2012-02-15', 5152),
(2 ,'2012-01-01', 6252),
(2 ,'2012-01-19', 10356),
(3 ,'2013-01-06', 10989),
(3 ,'2013-03-25', 25001),
(3 ,'2014-01-14', 35798);
SELECT a.*
, SUM(b.count) cumulative
FROM
(
SELECT x.id,YEAR(date) year,MONTH(date) month, COUNT(0) count FROM my_table x GROUP BY id,year,month
) a
JOIN
(
SELECT x.id,YEAR(date) year,MONTH(date) month, COUNT(0) count FROM my_table x GROUP BY id,year,month
) b
ON b.id = a.id AND (b.year < a.year OR (b.year = a.year AND b.month <= a.month)
)
GROUP
BY a.id, a.year,a.month;
+----+------+-------+-------+------------+
| id | year | month | count | cumulative |
+----+------+-------+-------+------------+
| 1 | 2012 | 1 | 2 | 2 |
| 1 | 2012 | 2 | 1 | 3 |
| 2 | 2012 | 1 | 2 | 2 |
| 3 | 2013 | 1 | 1 | 1 |
| 3 | 2013 | 3 | 1 | 2 |
| 3 | 2014 | 1 | 1 | 3 |
+----+------+-------+-------+------------+
If you don't mind an extra column in the result, you can simplify (and accelerate) the above, as follows:
SELECT x.*
, #running:= IF(#previous=x.id,#running,0)+x.count cumulative
, #previous:=x.id
FROM
( SELECT x.id,YEAR(date) year,MONTH(date) month, COUNT(0) count FROM my_table x GROUP BY id,year,month ) x
,( SELECT #cumulative := 0,#running:=0) vals;
The code turns out kind of messy, and it reads as follows:
SELECT
HID,
strftime('%Y', `Date`) AS Year,
strftime('%m', `Date`) AS Month,
COUNT(UID) AS Count,
(SELECT
COUNT(UID)
FROM your_db A
WHERE
A.HID=B.HID
AND
(strftime('%Y', A.`Date`) < strftime('%Y', B.`Date`)
OR
(strftime('%Y', A.`Date`) = strftime('%Y', B.`Date`)
AND
strftime('%m', A.`Date`) <= strftime('%m', B.`Date`)))) AS cumulative_count
FROM your_db B
GROUP BY HID, YEAR, MONTH
Though by using views, it should become much clearer:
CREATE VIEW temp_data AS SELECT
HID,
strftime('%Y', `Date`) as Year,
strftime('%m', `Date`) as Month,
COUNT(UID) as Count
FROM your_db GROUP BY HID, YEAR, MONTH;
Then your statement will read as follows:
SELECT
HID,
Year,
Month,
`Count`,
(SELECT SUM(`Count`)
FROM temp_data A
WHERE
A.HID = B.HID
AND
(A.Year < B.Year
OR
(A.Year = B.Year
AND
A.Month <= B.Month))) AS cumulative_sum
FROM temp_data B;
I have a history table ('property_histories') that logs events in our property management system. These events can be used to determine whether a given property was available to rent and I am trying to build a (weekly) summary of 'live' properties.
The 4 events in question are 'published', 'unpublished', 'hidden_from_search' and 'unhidden_from_search.
For a property to be live it must have been:
Published.
If it has ever been unpublished a subsequent published event mush be the most recent.
If it has ever been hidden_from_search a subsequent 'unhidden_from_search' event must have taken place more recently.
Most properties will have a simple history that most likely consists of a single 'Published' event but some are more complicated an example is here:
property_histories
----------------------------
id | property_id | City | status | date
1 | 325407 | Paris | published | 2014-01-01
2 | 325407 | Paris | hidden_from_search | 2014-01-24
3 | 325407 | Paris | unhidden_from_search | 2014-02-05
4 | 325407 | Paris | unpublished | 2014-02-15
5 | 410008 | London | published | 2014-01-01
6 | 410008 | London | unpublished | 2014-01-10
7 | 410008 | London | published | 2014-01-18
My aim is to be able to count 'live' properties by week:
weekly_count
----------------------------
Year | Week | City | Live_Count
2014 | 1 | Paris | 0
2014 | 1 | London | 0
2014 | 2 | Paris | 1
2014 | 2 | London | 1
2014 | 3 | Paris | 1
2014 | 3 | London | 0
2014 | 4 | Paris | 1
2014 | 4 | London | 1
2014 | 5 | Paris | 0
2014 | 5 | London | 1
2014 | 6 | Paris | 0
2014 | 6 | London | 1
2014 | 7 | Paris | 1
2014 | 7 | London | 0
2014 | 8 | Paris | 0
2014 | 8 | London | 1
2014 | 9 | Paris | 0
2014 | 9 | London | 1
----------------------------
Help appreciated!!
Your own test results don't match what you're asking for. You state the live count is by week, which means London should be live in week #1 as it was published in week #1 and then unpublished in week #2.
Assuming week starts on a Sunday (sql default) then this will work. Just put in your own date range, and replace my numbers table with yours.
If you need Monday to be your start date, use this at the top of your query
SET DATEFIRST 1
Emulating your test:
-- Create dummy data
CREATE TABLE #property_histories
(
id int, property_id int, City varchar(50), status varchar(50), date date
)
INSERT INTO #property_histories
SELECT 1 , 325407 , 'Paris' , 'published' , '2014-01-01' UNION ALL
SELECT 2 , 325407 , 'Paris' , 'hidden_from_search' , '2014-01-24' UNION ALL
SELECT 3 , 325407 , 'Paris' , 'unhidden_from_search' , '2014-02-05' UNION ALL
SELECT 4 , 325407 , 'Paris' , 'unpublished' , '2014-02-15' UNION ALL
SELECT 5 , 410008 , 'London' , 'published' , '2014-01-01' UNION ALL
SELECT 6 , 410008 , 'London' , 'unpublished' , '2014-01-10' UNION ALL
SELECT 7 , 410008 , 'London' , 'published' , '2014-01-18'
Now the code:
-- TODO: Set your date range
DECLARE #SD Datetime = '2014-01-01'
DECLARE #ED Datetime = '2014-12-31'
DECLARE #Wks INT = Datediff(week,#SD,#ED) -- Don't change this
-- Generate dates table
SELECT NumberID as 'Week',
DATEADD(DAY, 1-DATEPART(WEEKDAY, DateAdd(week,NumberID-1,#SD)), DateAdd(week,NumberID-1,#SD)) as 'WeekStart',
DATEADD(DAY, 7-DATEPART(WEEKDAY, DateAdd(week,NumberID-1,#SD)), DateAdd(week,NumberID-1,#SD)) as 'WeekEnd'
INTO #Dates
FROM Generic.tblNumbers -- TODO: use your own Numbers table here
WHERE NumberID BETWEEN 1 AND #Wks
-- Now generate report
SELECT T.Year, T.Week, T.City,
SUM(CASE WHEN PH1.status = 'published' THEN 1
WHEN PH1.status = 'unhidden_from_search' THEN 1
ELSE 0 END) as 'Live_Count'
FROM #Dates D1
LEFT JOIN
-- Get latest date per week
(SELECT YEAR(D.WeekStart) as 'Year',
D.Week,
PH.City,
PH.property_ID,
MAX(PH.date) as MaxDate
FROM #Dates D
LEFT JOIN #property_histories PH
ON PH.date BETWEEN #SD AND D.WeekEnd
GROUP BY D.WeekStart, D.Week, D.WeekStart, D.WeekEnd, PH.City, PH.property_id
) T
ON T.Week = D1.Week
LEFT JOIN #property_histories PH1
ON PH1.City = T.City AND PH1.property_id = T.property_id AND PH1.date = T.MaxDate
GROUP BY T.Year, T.Week, T.City
To break down the logic: Firstly I'm creating a helper table with week number, week start and week end dates. Week start is largely redundant but might come in handy for reporting.
I then subquery to get the latest date relevant for each week / city / property. For this "max" date, city and property I get the status, and if it's live, I sum it. So in layman terms ; get the latest status per city per property per week and SUM(if live).
Unlike the other answers posted, this solution caters for gaps in data. If the latest status recorded for a city and property was actually all the way back to week 1, it still works in any subsequent week.
I have a feeling I have missed a simpler way to do this.
However the following query uses 2 sub queries. The first gets all the published / unpublished ranges for a property (ie, the smallest unpublished date following a published date), while the 2nd does the same for properties being hidden from search.
These are then joined to properties on the property id, where the current date is within the range returned by the sub queries. The WHERE clause then checks that a record is matched for published and not found for the hidden sub queries
Had to use DISTINCT as otherwise the multiple published dates for a single unpublish would trigger duplicate property rows being returned.
SELECT DISTINCT properties.*
FROM properties
INNER JOIN
(
SELECT a.property_id, a.created_at AS start_date, IFNULL(MIN(b.created_at), NOW()) AS end_date
FROM property_histories a
LEFT OUTER JOIN property_histories b
ON a.property_id = b.propert_id
AND a.created_at < b.created_at
WHERE a.status = 'published'
AND b.status = 'unpublished'
GROUP BY a.property_id, a.created_at
) published
ON properties.property_id = published.property_id
AND NOW() BETWEEN published.start_date AND published.end_date
LEFT OUTER JOIN
(
SELECT a.property_id, a.created_at AS start_date, MIN(b.created_at) AS end_date
FROM property_histories a
LEFT OUTER JOIN property_histories b
ON a.property_id = b.propert_id
AND a.created_at < b.created_at
WHERE a.status = 'hidden_from_search'
AND b.status = 'unhidden_from_search'
GROUP BY a.property_id, a.created_at
) hidden
ON properties.property_id = hidden.property_id
AND NOW() BETWEEN hidden.start_date AND hidden.end_date
WHERE published.property_id IS NOT NULL
AND hidden.property_id IS NULL
I used a numbers table as a handy shortcut. Essentially, your question revolved around wanting to know a running sum of published or unhidden versus unpublished or hidden. At this point, the paper IDs become a moot point in the view (provided their uniqueness is properly constrained elsewhere), and all we need is a custom sum. I have the example on SQLFiddle. Here's the query:
select years.n + 2013 as year, weeks.n as week
, c.City
,
(select
sum(case
when status in ('published','unhidden_from_research') then 1
when status in ('unpublished','hidden_from_research') then -1
else 0
end)
from property_histories p2
where weekofyear(p2.date) <= weeks.n
and p2.city=c.city
) AS Live_Count
from numbers weeks
inner join numbers years on weeks.n <= 52
cross join (select City from property_histories group by city) c
where years.n + 2013 <= (select max(year(date)) from property_histories)
group by years.n + 2013, weeks.n
, c.City
;
I'm doing an inner join where i select between a date range (say, BETWEEN '2011-01-01' AND '2011-02-01'), and grouping by an enumerated value. is there a way to do this for each month as a column for a range of months? I'm currently doing this by hand for each month.
Example:
vehicle_type | January | February | March
----------------------------------------------
sedan | 12 | 10 | 4
coupe | 5 | 7 | 23
truck | 0 | 0 | 9
electric | 22 | 10 | 13
hybrid | 0 | 12 | 0
You could create a calendar table...
CREATE TABLE calendar
(
description VARCHAR2(100 BYTE),
when_start DATE,
when_end DATE
)
then use a pivot query
e.g.
SELECT
vehicle_type,
SUM(jan),SUM(feb),
--add the other months here
SUM(nov),SUM(dece)
FROM
(
SELECT v.vehicle_type,
CASE WHEN c.description='Jan' THEN
count(*)
END AS jan,
case when c.description='Feb' THEN
count(*)
END AS feb,
-- Add the rest of the months here too
CASE WHEN c.description='Nov' THEN
COUNT(*)
END AS nov,
CASE WHEN c.description='Dec' THEN
COUNT(*)
END AS dece
FROM calendar c
INNER JOIN vehicles v ON v.when >= c.when_start AND v.when <= c.when_end
GROUP BY v.vehicle_type
)
GROUP BY vehicle_type
ORDER BY vehicle_type