SQL sum multiple values by date into new values by date - mysql

Hi I have reports that have been issued with some regularity but they have an initial value of accounts for when the report value is null.
I would like to create a new variable Accts_N which occurs for School, Year, and date and is the sum of that date's Accts value and the Accts value when the date is null.
So, using the sample table below, School A Year 2017 would have a Accts_N value of 8 for 2016-01-10 and a value of 12 for 2016-02-10.
School | Year | Accts | ReportDate
-------|------|-------|-----------
A | 2017 | 2 | null
A | 2017 | 6 | 2016-01-10
A | 2017 | 10 | 2016-02-10
A | 2018 | 0 | 2016-01-10
A | 2018 | 4 | 2016-02-10
B | 2017 | 9 | null
B | 2018 | 3 | 2016-2-10
I've tried a few different instances of SUM CASE WHEN but I don't think that's the right approach. Can someone suggest a direction for me?
Thank you

If you want to add a new column, then a correlated subquery comes to mind:
select r.*,
(select sum(r2.accts)
from reports r2
where r2.school = r.school and
r2.year = r.year and
(r2.reportdate = r.reportdate or r2.reportdate is null)
) as accts_n
from reports r;

How about this?
declare #t table(School nvarchar(50), Year datetime, Accts int, ReportDate datetime)
insert into #t
values
('A','2017',2,null),
('A','2017',6,'2016-01-10'),
('A','2017',10,'2016-02-10'),
('A','2018',0,'2016-01-10'),
('A','2018',4,'2016-02-10'),
('B','2017',9,null),
('B','2018',3,'2016-01-10')
select t.School, t.Year, t.ReportDate, t.Accts + ISNULL(tNulls.SumAcctsWhenNull,0)
from #t t
outer apply (select t2.School, t2.Year, SUM(Accts) AS SumAcctsWhenNull
from #t t2
where
t2.ReportDate IS NULL AND
t2.School = t.School AND
t2.Year = t.Year
group by t2.School, t2.Year) tNulls
where
t.ReportDate IS NOT NULL

Related

MySQL select count only new id's for each year

I have a MySQL table that looks like this
id | client_id | date
--------------------------------------
1 | 12 | 02/02/2008
2 | 15 | 12/06/2008
3 | 23 | 11/12/2008
4 | 12 | 18/01/2009
5 | 12 | 03/03/2009
6 | 18 | 02/07/2009
7 | 23 | 08/09/2010
8 | 18 | 02/10/2010
9 | 21 | 30/11/2010
What I am trying to do is get the number of new clients for each year. 2008 has 3 new clients(12,15,23), 2009 has 1 new client(18) and 2010 has 1 new client(21).
So far I have this query that gives me the distinct clients for each year, that is 3 for 2008, 2 for 2009 and 3 for 2010.
SELECT COUNT(DISTINCT client_id) FROM table GROUP BY YEAR(date)
Any help would be appreciated..
You could use a subquery to get the first year of every client_id grouped by client_id, and then count the occurrence of client_id grouped by year, so:
SELECT COUNT(client_id), YEAR_MIN FROM (
SELECT client_id, MIN(YEAR(date)) AS YEAR_MIN
FROM table
GROUP BY client_id) AS T
GROUP BY YEAR_MIN
SQL Fiddle here
So you want to count the first date a client appears in the table. In other words, the row for which no other row exists with an earlier date and the same client. You can do this with an exclusion join.
Then you can count them per year as you're doing now.
SELECT YEAR(t.date) AS yr, COUNT(t.client_id) AS client_count
FROM (
SELECT t1.client_id, t1.date
FROM mytable AS t1
LEFT JOIN mytable AS t2 ON (t1.client_id=t2.client_id AND t1.date > t2.date)
WHERE t2.client_id IS NULL) AS t
GROUP BY yr
You should store dates using the DATE data type, which uses YYYY-MM-DD format. You won't be able to do > comparisons if your dates are stored as strings in DD-MM-YYYY format.
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(id SERIAL PRIMARY KEY
,client_id INT NOT NULL
,date INT NOT NULL
);
INSERT INTO my_table VALUES
(1,12,2008),
(2,15,2008),
(3,23,2008),
(4,12,2009),
(5,12,2009),
(6,18,2009),
(7,23,2010),
(8,18,2010),
(9,21,2010);
SELECT year
, COUNT(*) total
FROM
( SELECT client_id, MIN(date) year FROM my_table GROUP BY client_id ) x
GROUP
BY year;
+------+-------+
| year | total |
+------+-------+
| 2008 | 3 |
| 2009 | 1 |
| 2010 | 1 |
+------+-------+

Select all years from 2 date colums

it's very simple what I'm trying to do, but no sure if it's posible and if it is how can I do it?
I have a mysql data base with a table what has trackId, StartDate, EndDate and im trying to get all the distinc years from both colums in 1 resultset.
So far I have this:
SELECT DISTINCT YEAR(StartDate) as year, YEAR(EndDate) as year
from TRACK
and my result is :
| year | year |
|------|------|
| 2016 | 2017 |
| 2017 | 2018 |
And what I'm trying to get is:
| year |
|------|
| 2016 |
| 2017 |
| 2018 |
Is it posible?
Use UNION
SELECT * from (
SELECT YEAR(StartDate) as year from TRACK
UNION
SELECT YEAR(EndDate) as year from TRACK
)
ORDER BY year
The outer select is only necessary if you need the result in an order way
you can use this:
SELECT DISTINCT [Year] From
(SELECT StartYear AS [Year] FROM #Test
UNION
SELECT EndYear AS [Year] FROM #Test) A

Get the count() where created_date is cumulative and date based

I'm aware that there are several answers on SO about cumulative totals. I have experimented and have not found a solution to my problem.
Here is a sqlfiddle.
We have a contacts table with two fields, eid and create_time:
eid create_time
991772 April, 21 2016 11:34:21
989628 April, 17 2016 02:19:57
985557 April, 04 2016 09:56:39
981920 March, 30 2016 11:03:12
981111 March, 30 2016 09:36:48
I would like to select the number of new contacts in each month along with the size of our contacts database at the end of each month. New contacts by year and month is simple enough. For the size of the contacts table at the end of each month I did some research and found what looked to be a straight forwards method:
set #csum = 0;
select
year(c.create_time) as yr,
month(c.create_time) as mth,
count(c.eid) as new_contacts,
(#csum + count(c.eid)) as cumulative_contacts
from
contacts c
group by
yr,
mth
That runs but gives me unexpected results.
If I run:
select count(*) from contacts where date(create_time) < current_date
I get the total number of records in the table 146.
I therefore expected the final row in my query using #csum to have 146 for April 2016. It has only 3?
What my goal is for field cumulative_contacts:
For the record with e.g. January 2016.
select count(*) from contacts where date(create_time) < '2016-02-01';
And the record for February would have:
select count(*) from contacts where date(create_time) < '2016-03-01';
And so on
Try this, a bit of modification from your sql;)
CREATE TABLE IF NOT EXISTS `contacts` (
`eid` char(50) DEFAULT NULL,
`create_time` timestamp NULL DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT;
INSERT INTO `contacts` (`eid`, `create_time`) VALUES
('991772', '2016-04-21 11:34:21'),
('989628', '2016-04-17 02:19:57'),
('985557', '2016-04-04 09:56:39'),
('981920', '2016-03-30 11:03:12'),
('981111', '2016-03-30 09:36:48');
SET #csum = 0;
SELECT t.*, #csum:=(#csum + new_contacts) AS cumulative_contacts
FROM (
SELECT YEAR(c.create_time) AS yr, MONTH(c.create_time) AS mth, COUNT(c.eid) AS new_contacts
FROM contacts c
GROUP BY yr, mth) t
Output results is
| yr | mth | new_contacts | cumulative_contacts |
------ ----- -------------- ---------------------
| 2016 | 3 | 2 | 2 |
| 2016 | 4 | 3 | 5 |
This sql will get the cumulative sum and is pretty efficient. It numbers each row first and then uses that as the cumulative sum.
SELECT s1.yr, s1.mth, s1.new_contacts, s2.cummulative_contacts
FROM
(SELECT
YEAR(create_time) AS yr,
MONTH(create_time) AS mth,
COUNT(eid) AS new_contacts,
MAX(eid) AS max_eid
FROM
contacts
GROUP BY
yr,
mth
ORDER BY create_time) s1 INNER JOIN
(SELECT eid, (#sum:=#sum+1) AS cummulative_contacts
FROM
contacts INNER JOIN
(SELECT #sum := 0) r
ORDER BY create_time) s2 ON max_eid=s2.eid;
--Result sample--
| yr | mth | new_contacts | cumulative_contacts |
|------|-----|--------------|---------------------|
| 2016 | 1 | 4 | 132 |
| 2016 | 2 | 4 | 136 |
| 2016 | 3 | 7 | 143 |
| 2016 | 4 | 3 | 146 |
Try this: fiddele
Here you have a "greater than or equal" join, so each group "contains" all previous values. Times 12 part, converts the hole comparation to months. I did offer this solution as it is not MySql dependant. (can be implemented on many other DBs with minimun or no changes)
select dates.yr, dates.mth, dates.new_contacts, sum(NC.new_contacts) as cumulative_new_contacts
from (
select
year(c.create_time) as yr,
month(c.create_time) as mth,
count(c.eid) as new_contacts
from
contacts c
group by
year(c.create_time),
month(c.create_time)
) as dates
left join
(
select
year(c.create_time) as yr,
month(c.create_time) as mth,
count(c.eid) as new_contacts
from
contacts c
group by
year(c.create_time),
month(c.create_time)
) as NC
on dates.yr*12+dates.mth >= NC.yr*12+NC.mth
group by
dates.yr,
dates.mth,
dates.new_contacts -- not needed by MySql, present here for other DBs compatibility
order by 1,2

MySQL SELECT Query to Turn History into Weekly Summary Over Time

I have a history table ('property_histories') that logs events in our property management system. These events can be used to determine whether a given property was available to rent and I am trying to build a (weekly) summary of 'live' properties.
The 4 events in question are 'published', 'unpublished', 'hidden_from_search' and 'unhidden_from_search.
For a property to be live it must have been:
Published.
If it has ever been unpublished a subsequent published event mush be the most recent.
If it has ever been hidden_from_search a subsequent 'unhidden_from_search' event must have taken place more recently.
Most properties will have a simple history that most likely consists of a single 'Published' event but some are more complicated an example is here:
property_histories
----------------------------
id | property_id | City | status | date
1 | 325407 | Paris | published | 2014-01-01
2 | 325407 | Paris | hidden_from_search | 2014-01-24
3 | 325407 | Paris | unhidden_from_search | 2014-02-05
4 | 325407 | Paris | unpublished | 2014-02-15
5 | 410008 | London | published | 2014-01-01
6 | 410008 | London | unpublished | 2014-01-10
7 | 410008 | London | published | 2014-01-18
My aim is to be able to count 'live' properties by week:
weekly_count
----------------------------
Year | Week | City | Live_Count
2014 | 1 | Paris | 0
2014 | 1 | London | 0
2014 | 2 | Paris | 1
2014 | 2 | London | 1
2014 | 3 | Paris | 1
2014 | 3 | London | 0
2014 | 4 | Paris | 1
2014 | 4 | London | 1
2014 | 5 | Paris | 0
2014 | 5 | London | 1
2014 | 6 | Paris | 0
2014 | 6 | London | 1
2014 | 7 | Paris | 1
2014 | 7 | London | 0
2014 | 8 | Paris | 0
2014 | 8 | London | 1
2014 | 9 | Paris | 0
2014 | 9 | London | 1
----------------------------
Help appreciated!!
Your own test results don't match what you're asking for. You state the live count is by week, which means London should be live in week #1 as it was published in week #1 and then unpublished in week #2.
Assuming week starts on a Sunday (sql default) then this will work. Just put in your own date range, and replace my numbers table with yours.
If you need Monday to be your start date, use this at the top of your query
SET DATEFIRST 1
Emulating your test:
-- Create dummy data
CREATE TABLE #property_histories
(
id int, property_id int, City varchar(50), status varchar(50), date date
)
INSERT INTO #property_histories
SELECT 1 , 325407 , 'Paris' , 'published' , '2014-01-01' UNION ALL
SELECT 2 , 325407 , 'Paris' , 'hidden_from_search' , '2014-01-24' UNION ALL
SELECT 3 , 325407 , 'Paris' , 'unhidden_from_search' , '2014-02-05' UNION ALL
SELECT 4 , 325407 , 'Paris' , 'unpublished' , '2014-02-15' UNION ALL
SELECT 5 , 410008 , 'London' , 'published' , '2014-01-01' UNION ALL
SELECT 6 , 410008 , 'London' , 'unpublished' , '2014-01-10' UNION ALL
SELECT 7 , 410008 , 'London' , 'published' , '2014-01-18'
Now the code:
-- TODO: Set your date range
DECLARE #SD Datetime = '2014-01-01'
DECLARE #ED Datetime = '2014-12-31'
DECLARE #Wks INT = Datediff(week,#SD,#ED) -- Don't change this
-- Generate dates table
SELECT NumberID as 'Week',
DATEADD(DAY, 1-DATEPART(WEEKDAY, DateAdd(week,NumberID-1,#SD)), DateAdd(week,NumberID-1,#SD)) as 'WeekStart',
DATEADD(DAY, 7-DATEPART(WEEKDAY, DateAdd(week,NumberID-1,#SD)), DateAdd(week,NumberID-1,#SD)) as 'WeekEnd'
INTO #Dates
FROM Generic.tblNumbers -- TODO: use your own Numbers table here
WHERE NumberID BETWEEN 1 AND #Wks
-- Now generate report
SELECT T.Year, T.Week, T.City,
SUM(CASE WHEN PH1.status = 'published' THEN 1
WHEN PH1.status = 'unhidden_from_search' THEN 1
ELSE 0 END) as 'Live_Count'
FROM #Dates D1
LEFT JOIN
-- Get latest date per week
(SELECT YEAR(D.WeekStart) as 'Year',
D.Week,
PH.City,
PH.property_ID,
MAX(PH.date) as MaxDate
FROM #Dates D
LEFT JOIN #property_histories PH
ON PH.date BETWEEN #SD AND D.WeekEnd
GROUP BY D.WeekStart, D.Week, D.WeekStart, D.WeekEnd, PH.City, PH.property_id
) T
ON T.Week = D1.Week
LEFT JOIN #property_histories PH1
ON PH1.City = T.City AND PH1.property_id = T.property_id AND PH1.date = T.MaxDate
GROUP BY T.Year, T.Week, T.City
To break down the logic: Firstly I'm creating a helper table with week number, week start and week end dates. Week start is largely redundant but might come in handy for reporting.
I then subquery to get the latest date relevant for each week / city / property. For this "max" date, city and property I get the status, and if it's live, I sum it. So in layman terms ; get the latest status per city per property per week and SUM(if live).
Unlike the other answers posted, this solution caters for gaps in data. If the latest status recorded for a city and property was actually all the way back to week 1, it still works in any subsequent week.
I have a feeling I have missed a simpler way to do this.
However the following query uses 2 sub queries. The first gets all the published / unpublished ranges for a property (ie, the smallest unpublished date following a published date), while the 2nd does the same for properties being hidden from search.
These are then joined to properties on the property id, where the current date is within the range returned by the sub queries. The WHERE clause then checks that a record is matched for published and not found for the hidden sub queries
Had to use DISTINCT as otherwise the multiple published dates for a single unpublish would trigger duplicate property rows being returned.
SELECT DISTINCT properties.*
FROM properties
INNER JOIN
(
SELECT a.property_id, a.created_at AS start_date, IFNULL(MIN(b.created_at), NOW()) AS end_date
FROM property_histories a
LEFT OUTER JOIN property_histories b
ON a.property_id = b.propert_id
AND a.created_at < b.created_at
WHERE a.status = 'published'
AND b.status = 'unpublished'
GROUP BY a.property_id, a.created_at
) published
ON properties.property_id = published.property_id
AND NOW() BETWEEN published.start_date AND published.end_date
LEFT OUTER JOIN
(
SELECT a.property_id, a.created_at AS start_date, MIN(b.created_at) AS end_date
FROM property_histories a
LEFT OUTER JOIN property_histories b
ON a.property_id = b.propert_id
AND a.created_at < b.created_at
WHERE a.status = 'hidden_from_search'
AND b.status = 'unhidden_from_search'
GROUP BY a.property_id, a.created_at
) hidden
ON properties.property_id = hidden.property_id
AND NOW() BETWEEN hidden.start_date AND hidden.end_date
WHERE published.property_id IS NOT NULL
AND hidden.property_id IS NULL
I used a numbers table as a handy shortcut. Essentially, your question revolved around wanting to know a running sum of published or unhidden versus unpublished or hidden. At this point, the paper IDs become a moot point in the view (provided their uniqueness is properly constrained elsewhere), and all we need is a custom sum. I have the example on SQLFiddle. Here's the query:
select years.n + 2013 as year, weeks.n as week
, c.City
,
(select
sum(case
when status in ('published','unhidden_from_research') then 1
when status in ('unpublished','hidden_from_research') then -1
else 0
end)
from property_histories p2
where weekofyear(p2.date) <= weeks.n
and p2.city=c.city
) AS Live_Count
from numbers weeks
inner join numbers years on weeks.n <= 52
cross join (select City from property_histories group by city) c
where years.n + 2013 <= (select max(year(date)) from property_histories)
group by years.n + 2013, weeks.n
, c.City
;

Group by Enum with columns as Date, MySQL

I'm doing an inner join where i select between a date range (say, BETWEEN '2011-01-01' AND '2011-02-01'), and grouping by an enumerated value. is there a way to do this for each month as a column for a range of months? I'm currently doing this by hand for each month.
Example:
vehicle_type | January | February | March
----------------------------------------------
sedan | 12 | 10 | 4
coupe | 5 | 7 | 23
truck | 0 | 0 | 9
electric | 22 | 10 | 13
hybrid | 0 | 12 | 0
You could create a calendar table...
CREATE TABLE calendar
(
description VARCHAR2(100 BYTE),
when_start DATE,
when_end DATE
)
then use a pivot query
e.g.
SELECT
vehicle_type,
SUM(jan),SUM(feb),
--add the other months here
SUM(nov),SUM(dece)
FROM
(
SELECT v.vehicle_type,
CASE WHEN c.description='Jan' THEN
count(*)
END AS jan,
case when c.description='Feb' THEN
count(*)
END AS feb,
-- Add the rest of the months here too
CASE WHEN c.description='Nov' THEN
COUNT(*)
END AS nov,
CASE WHEN c.description='Dec' THEN
COUNT(*)
END AS dece
FROM calendar c
INNER JOIN vehicles v ON v.when >= c.when_start AND v.when <= c.when_end
GROUP BY v.vehicle_type
)
GROUP BY vehicle_type
ORDER BY vehicle_type