group by year on multiple date columns mysql - mysql

I have table as following:
hours | ... | task_assigned | task_deadline | task_completion
----------------------------------------------------------------
123 | ... | 2019-08-01 | - | -
234 | ... | - | 2018-08-01 | 2019-08-01
145 | ... | 2017-08-01 | 2017-08-01 | 2018-01-01
I want to calculate total hours for each year, i.e. grouping by year.
Currently I'm only taking into account task_completion field.
If there's no value in task_completion field, the record is not included in SUM calculation.
To elaborate further, say for year 2019, row 1 and 1 both should be considered. Hence the total hours should be 123 + 234 = 357.
And for year 2018, row 2 and 3.
Similarly, for year 2017, row 3.
SELECT YEAR(task_completion) as year, ROUND(SUM(total_hours), 2) as hours
FROM task
GROUP BY year
HAVING year BETWEEN '$year_from' AND '$year_to'
The resultset:
year | hours
--------------------
2017 | <somevalue>
2018 | <somevalue>
2019 | <somevalue>
How can I include other two date fields too?

You want to consider each row once for each of its years. Use UNION to get these years:
select year, round(sum(total_hours), 2) as hours
from
(
select year(task_assigned) as year, total_hours from task
union
select year(task_deadline) as year, total_hours from task
union
select year(task_completion) as year, total_hours from task
) years_and_hours
group by year
having year between $year_from and $year_to
order by year;
If you want to consider a row with one year twice or thrice also as often in the sum, then change UNION to UNION ALL.

Basically, you want to unpivot the data. I will assume that the - represents a NULL value and your dates are real dates.
select year(dte) as year, sum(total_hours) as hours
from ((select task_assigned as dte, total_hours
from task
) union all
(select task_deadline, total_hours
from task
) union all
(select task_completion, total_hours
from task
)
) d
where dte is not null
group by year(dte)
order by year(dte);
Based on your sample data, the round() is not necessary so I removed it.
If you want to filter for particular years, the filtering should be in a where clause -- so it filters the data before aggregation.
Change the where to:
where year(dte) >= ? and year(dte) <= ?
or:
where dte >= ? and dte <= ?
to pass in the dates.
The ? are for parameter placeholders. Learn how to use parameters rather than munging query strings.

This answer is no langer valid with the updated request.
If I understand correctly, you want to use task_assigned if the task_completion is still null. Use COALEASCE for this.
SELECT
YEAR(COALESCE(task_completion, task_assigned)) as year,
ROUND(SUM(total_hours), 2) as hours
FROM task
GROUP BY year
HAVING year BETWEEN $year_from AND $year_to
ORDER BY year;
(I don't think you actually want to use task_deadline, too, for how could a task get completed before getting assigned first? If such can occur, then include it in the COALESCE expression. Probably: COALESCE(task_completion, task_assigned, task_deadline)` then.)

Related

MySQL query to get sum of difference between Start date and End date

I have a table which has the following data:
+-----------+-----------+
| Date_from | Date_to |
+-----------+-----------+
| 20-NOV-19 | 22-NOV-19 |
+-----------+-----------+
| 10-NOV-19 | 21-NOV-19 |
+-----------+-----------+
| 14-NOV-19 | 26-NOV-19 |
+-----------+-----------+
I need a query to find the sum of the difference between date_from and date_to.
Example:
The difference between 20-Nov-19 and 22-Nov-19 is 2 days
The difference between 10-Nov-19 and 21-Nov-19 is 11 days but the query has to consider it as 9 days because days 20-Nov, 21-Nov are already considered in the first row.
The difference between 14-Nov-19 and 26-Nov-19 is 12 days but the query has to consider it as 4 days because days 14-Nov to 22-Nov are already considered in the above rows.
The query result should be
15 days (2+9+4)
Any help would be much appreciated.
You can use below query to get total.
Please change Table_name with your actual table name
SELECT SUM(TIMESTAMPDIFF(DAY,Date_from,Date_to)) as total FROM Table_name
I used window functions in a sub-query to calculate the difference between date_from and date_to and then subtracting any overlapping days
SELECT SUM(days) FROM
(SELECT CASE WHEN LEAD(date_from) OVER w < date_to THEN DATEDIFF(date_to, date_from) + DATEDIFF(LEAD(date_from) OVER w, date_to)
ELSE DATEDIFF(date_to, date_from)
END AS days
FROM test
WINDOW w AS (ORDER BY date_to)) as d
Note though that this produces the result 16 days, not 15 as in the question, but then again so does
SELECT DATEDIFF('2019-11-26', '2019-11-10')
You can use CTE to generate all the dates between the range and get the distinct count().It will give you the total number of days as 17. Since you want difference you have to subtract by 2(start date and end date)
WITH recursive Date_Ranges AS (
select datefrom as Dt,dateto from mydates
union all
select dt + interval 1 day , dateto
from Date_Ranges
where dt < dateto)
select count(distinct(dt))-2 from Date_Ranges
DEMO HERE

Grouping values with IF, month-by-month, in a select statement

I am trying to run a report that will group by month the totals of expense accounts from an expense table which has the followings columns on each row:
expense_acc
debit
credit
post_date
The desired output of the report is in the following column format:
EXP ACC - JAN - FEB - MAR
This is my SQL select query:
SELECT expense_acc,
if(MONTH(post_date)=1,SUM(expenses.debit-expenses.credit),0) AS 'JAN',
if(MONTH(post_date)=2,SUM(debit-credit),0) AS 'FEB',
if(MONTH(post_date)=3,SUM(debit-credit),0) AS 'MAR'
FROM expenses
WHERE YEAR(expenses.entered)='2016'
GROUP BY expenses.expense_acc
The results are not grouping the expense values by month as expected. I am seeing grouping in the first row, regardless of the transaction date.
You have two parts to your requirement.
A month-by-month aggregate of your table's contents
A pivot table rendering, in which you pivot by-month rows to lie in columns.
Also, SUM(expenses.debit-expenses.credit) isn't resilient if the debit or credit columns ever contain NULL values.
Also, YEAR(date) defeats any index on date.
If you're wise you'll handle these requirements in two steps. For one thing, it will be easier to troubleshoot your results. For another, the next person who comes along will better understand your project.
The month-by-month aggregate:
SELECT expense_acc, LAST_DAY(post_date) month_ending,
SUM(expenses.debit) - SUM(expenses.credit) as net_expenses
FROM expenses
WHERE post_date >= '2016-01-01'
AND post_date < '20016-12-31' + INTERVAL 1 DAY
GROUP BY expense_acc, LAST_DAY(post_date)
ORDER BY expense_acc, LAST_DAY(post_date)
This will give you a row for each account and date. The row will show the account, the month-ending date, and the net expenses. (I don't understand expenses.entered in your example. It's best to filter on the same date you use to make your aggregate.) Your auditors will appreciate this separation of logic.
Next, you can use this as a subquery, to make your pivot display.
That's pretty straightforward:
SELECT expense_acc,
SUM(IF(MONTH(month_ending)=1,net_expenses,0)) as jan,
SUM(IF(MONTH(month_ending)=2,net_expenses,0)) as feb,
SUM(IF(MONTH(month_ending)=3,net_expenses,0)) as mar,
...
FROM (
SELECT expense_acc, LAST_DAY(post_date) month_ending,
SUM(expenses.debit) - SUM(expenses.credit) as net_expenses
FROM expenses
WHERE post_date >= '2016-01-01'
AND post_date < '20016-12-31' + INTERVAL 1 DAY
) z
GROUP BY expense_acc
ORDER BY expense_acc
But, you may want to do the pivoting in a client program. It's notoriously hard to write and maintain MySQL pivot code.
SELECT
SUM(JAN) AS JAN,
SUM(FEB) AS FEB
SUM(MAR) AS MAR
FROM
(
SELECT expense_acc,
if(MONTH(post_date)=1,SUM(expenses.debit-expenses.credit),0) AS 'JAN',
if(MONTH(post_date)=2,SUM(debit-credit),0) AS 'FEB',
if(MONTH(post_date)=3,SUM(debit-credit),0) AS 'MAR'
FROM expenses
WHERE YEAR(expenses.entered)='2016'
GROUP BY expenses.expense_acc
)TMP
GROUP BY expense_acc
In my opinion, the optimal way is that:
SELECT expense_acc, MONTH(post_date) month_num, SUM(debit-credit) as total
FROM expenses
WHERE YEAR(expenses.entered) = '2016'
GROUP BY expenses.expense_account, MONTH(post_date)
That will give you something like
|expense account | mont_num | total |
-----------------------------------------
| 2345 | 1 | 50 |
-----------------------------------------
| 2345 | 2 | 30 |
-----------------------------------------
| 2346 | 1 | 45 |
...
Because if you are doing one if statement per month, you will get 12 ifs.
It's better to manage the format of rows in you controller.
Hope it will help you.

Check if instances have occurred minimum once, every year in a specific range

In MySQL I'm tasked with a big dataset, with data from 1970 to 2010.
I want to check for consistency: check if each instance occurs minimum one time per year. I took a snippet from 1970-1972 as example to demonstrate my problem.
input:
id year counts
-- ---- ---------
1 1970 1
1 1971 1
2 1970 3
2 1971 8
2 1972 1
3 1970 4
expected:
id 1970-1972
-- ----------
1 no
2 yes
3 no
I though about counting within the date range and then taking those out who had 3 counts: 1970, 1971, 1972. The following query doesn't force the check on each point in the range though.
select id, count(*)
from table1
WHERE (year BETWEEN '1970' AND '1972') AND `no_counts` >= 1
group by id
What to do?
You can use GROUP BY with CASE / inline if.
Using CASE. SQL Fiddle
select id,CASE WHEN COUNT(distinct year) = 3 THEN 'yes'ELSE 'No' END "1970-72"
from abc
WHERE year between 1970 and 1972
GROUP BY id
Using inline IF. SQL Fiddle
select id,IF( COUNT(distinct year) = 3,'yes','No') "1970-72"
from abc
WHERE year between 1970 and 1972
GROUP BY id
You can use a having clause with distinct count:
select `id`
from `table1`
where `year` between '1970' and '1972'
group by id
having count(distinct `year`) = 3
Do you expect this?
select id, count(*)
from table1
WHERE (year BETWEEN '1970' AND '1972')
group by id
having count(distinct year) = 3

Given a table with time periods, query for a list of sum per day

Let's say I have a table that says how many items of something are valid between two dates.
Additionally, there may be multiple such periods.
For example, given a table:
itemtype | count | start | end
A | 10 | 2014-01-01 | 2014-01-10
A | 10 | 2014-01-05 | 2014-01-08
This means that there are 10 items of type A valid 2014-01-01 - 2014-01-10 and additionally, there are 10 valid 2014-01-05 - 2014-01-08.
So for example, the sum of valid items at 2014-01-06 are 20.
How can I query the table to get the sum per day? I would like a result such as
2014-01-01 10
2014-01-02 10
2014-01-03 10
2014-01-04 10
2014-01-05 20
2014-01-06 20
2014-01-07 20
2014-01-08 20
2014-01-09 10
2014-01-10 10
Can this be done with SQL? Either Oracle or MySQL would be fine
The basic syntax you are looking for is as follows:
For my example below I've defined a new table called DateTimePeriods which has a column for StartDate and EndDate both of which are DATE columns.
SELECT
SUM(NumericColumnName)
, DateTimePeriods.StartDate
, DateTimePeriods.EndDate
FROM
TableName
INNER JOIN DateTimePeriods ON TableName.dateColumnName BETWEEN DateTimePeriods.StartDate and DateTimePeriods.EndDate
GROUP BY
DateTimePeriods.StartDate
, DateTimePeriods.EndDate
Obviously the above code won't work on your database but should give you a reasonable place to start. You should look into GROUP BY and Aggregate Functions. I'm also not certain of how universal BETWEEN is for each database type, but you could do it using other comparisons such as <= and >=.
There are several ways to go about this. First, you need a list of dense dates to query. Using a row generator statement can provide that:
select date '2014-01-01' + level -1 d
from dual
connect by level <= 15;
Then for each date, select the sum of inventory:
with
sample_data as
(select 'A' itemtype, 10 item_count, date '2014-01-01' start_date, date '2014-01-10' end_date from dual union all
select 'A', 10, date '2014-01-05', date '2014-01-08' from dual),
periods as (select date '2014-01-01' + level -1 d from dual connect by level <= 15)
select
periods.d,
(select sum(item_count) from sample_data where periods.d between start_date and end_date) available
from periods
where periods.d = date '2014-01-06';
You would need to dynamically set the number of date rows to generate.
If you only needed a single row, then a query like this would work:
with
sample_data as
(select 'A' itemtype, 10 item_count, date '2014-01-01' start_date, date '2014-01-10' end_date from dual union all
select 'A', 10, date '2014-01-05', date '2014-01-08' from dual)
select sum(item_count)
from sample_data
where date '2014-01-06' between start_date and end_date;

MySQL - Count Yearly Totals when some Years have nulls

I have 1 table with similar data:
CustomerID | ProjectID | DateListed | DateCompleted
123456 | 045 | 07-29-2010 | 04-03-2011
123456 | 123 | 10-12-2011 | 11-30-2011
123456 | 157 | 12-12-2011 | 02-10-2012
123456 | 258 | 06-07-2011 | NULL
Basically, a customer contacts us, we get a project on our list, and we mark it completed when we're done with it.
What I'm after is a simple (you'd think, at least) count of all projects, with expected output like below:
YEAR | TotalListed | TotalCompleted
2010 | 1 | 0
2011 | 3 | 2
2012 | 0 | 1
However, my query below - because of the join - isn't showing 2012's count, because there's been no listed project for 2012. However, I can't really reverse the query, as then 2010's count wouldn't show up (since nothing was completed in 2010).
I'm open to any suggestions, or tips like how to do this. I've pondered a temp table, is that the best way to go? I'm open to anything that gets me what I need!
(If the code looks familiar, ya'll helped me get the subquery made! MySQL Subquery with main query data variable)
SELECT YEAR(p1.DateListed) AS YearListed, COUNT(p1.ProjectID) As Listed, PreQuery.Completed
FROM(
SELECT YEAR(DateCompleted) AS YearCompleted, COUNT(ProjectID) AS Completed
FROM projects
WHERE CustomerID = 123456 AND DateListed >= DATE_SUB(Now(), INTERVAL 5 YEAR)
GROUP BY YEAR(DateCompleted)
) PreQuery
RIGHT OUTER JOIN projects p1 ON PreQuery.YearCompleted = YEAR(p1.DateListed)
WHERE CustomerID = 123456 AND DateListed >= DATE_SUB(Now(), INTERVAL 5 YEAR)
GROUP BY YearListed
ORDER BY p1.DateListed
After reviewing your table, query, and expected results - I believe I have found a more-revised query to suit your needs. It is a fairly-full rewrite of your existing query though, but I've tested it with your given data and received the same results you want/expect:
SELECT
years.`year`,
SUM(IF(YEAR(DateListed) = years.`year`, 1, 0)) AS TotalListed,
SUM(IF(YEAR(DateCompleted) = years.`year`, 1, 0)) AS TotalCompleted
FROM
projects
LEFT JOIN (
SELECT DISTINCT `year` FROM (
SELECT YEAR(DateListed) AS `year` FROM projects
UNION SELECT YEAR(DateCompleted) AS `year` FROM projects WHERE DateCompleted IS NOT NULL
) as year_inner
) AS years
ON YEAR(DateListed) = `year`
OR YEAR(DateCompleted) = `year`
WHERE
CustomerID = 123456 AND DateListed >= DATE_SUB(Now(), INTERVAL 5 YEAR)
GROUP BY
years.`year`
ORDER BY
years.`year`
To explain, we should start with the inner query (aliased as year_inner). It selects a full list of years in the DateListed and DateCompleted columns and then selects a DISTINCT list of those to create the years alias sub-query. This sub-query is used to get a full list of "years" that we want data for. Doing it this way, opposed to a sub-query with counts and groupings will allow you to only have to define the WHERE clause on the outermost query (though, if efficiency becomes an issue with thousands and thousands of records, you could always add a WHERE clause to the inner query too; or an index to the date columns).
After we've built our inner queries, we join the projects table on the results with a LEFT JOIN for the DateListed or DateCompleted's YEAR() value - which will allow us to bring back null columns too!
For the field selections, we use the year column from our inner query to assure that we get a full list of years to display. Then, we compare the current row's DateListed & DateCompleted YEAR() value to the current year; if they're equal, add 1 - else add 0. When we GROUP BY year, our SUM() will count all of the 1's for that year for each column and give you the output you want (hopefully, of course =P).