Grouping values with IF, month-by-month, in a select statement - mysql

I am trying to run a report that will group by month the totals of expense accounts from an expense table which has the followings columns on each row:
expense_acc
debit
credit
post_date
The desired output of the report is in the following column format:
EXP ACC - JAN - FEB - MAR
This is my SQL select query:
SELECT expense_acc,
if(MONTH(post_date)=1,SUM(expenses.debit-expenses.credit),0) AS 'JAN',
if(MONTH(post_date)=2,SUM(debit-credit),0) AS 'FEB',
if(MONTH(post_date)=3,SUM(debit-credit),0) AS 'MAR'
FROM expenses
WHERE YEAR(expenses.entered)='2016'
GROUP BY expenses.expense_acc
The results are not grouping the expense values by month as expected. I am seeing grouping in the first row, regardless of the transaction date.

You have two parts to your requirement.
A month-by-month aggregate of your table's contents
A pivot table rendering, in which you pivot by-month rows to lie in columns.
Also, SUM(expenses.debit-expenses.credit) isn't resilient if the debit or credit columns ever contain NULL values.
Also, YEAR(date) defeats any index on date.
If you're wise you'll handle these requirements in two steps. For one thing, it will be easier to troubleshoot your results. For another, the next person who comes along will better understand your project.
The month-by-month aggregate:
SELECT expense_acc, LAST_DAY(post_date) month_ending,
SUM(expenses.debit) - SUM(expenses.credit) as net_expenses
FROM expenses
WHERE post_date >= '2016-01-01'
AND post_date < '20016-12-31' + INTERVAL 1 DAY
GROUP BY expense_acc, LAST_DAY(post_date)
ORDER BY expense_acc, LAST_DAY(post_date)
This will give you a row for each account and date. The row will show the account, the month-ending date, and the net expenses. (I don't understand expenses.entered in your example. It's best to filter on the same date you use to make your aggregate.) Your auditors will appreciate this separation of logic.
Next, you can use this as a subquery, to make your pivot display.
That's pretty straightforward:
SELECT expense_acc,
SUM(IF(MONTH(month_ending)=1,net_expenses,0)) as jan,
SUM(IF(MONTH(month_ending)=2,net_expenses,0)) as feb,
SUM(IF(MONTH(month_ending)=3,net_expenses,0)) as mar,
...
FROM (
SELECT expense_acc, LAST_DAY(post_date) month_ending,
SUM(expenses.debit) - SUM(expenses.credit) as net_expenses
FROM expenses
WHERE post_date >= '2016-01-01'
AND post_date < '20016-12-31' + INTERVAL 1 DAY
) z
GROUP BY expense_acc
ORDER BY expense_acc
But, you may want to do the pivoting in a client program. It's notoriously hard to write and maintain MySQL pivot code.

SELECT
SUM(JAN) AS JAN,
SUM(FEB) AS FEB
SUM(MAR) AS MAR
FROM
(
SELECT expense_acc,
if(MONTH(post_date)=1,SUM(expenses.debit-expenses.credit),0) AS 'JAN',
if(MONTH(post_date)=2,SUM(debit-credit),0) AS 'FEB',
if(MONTH(post_date)=3,SUM(debit-credit),0) AS 'MAR'
FROM expenses
WHERE YEAR(expenses.entered)='2016'
GROUP BY expenses.expense_acc
)TMP
GROUP BY expense_acc

In my opinion, the optimal way is that:
SELECT expense_acc, MONTH(post_date) month_num, SUM(debit-credit) as total
FROM expenses
WHERE YEAR(expenses.entered) = '2016'
GROUP BY expenses.expense_account, MONTH(post_date)
That will give you something like
|expense account | mont_num | total |
-----------------------------------------
| 2345 | 1 | 50 |
-----------------------------------------
| 2345 | 2 | 30 |
-----------------------------------------
| 2346 | 1 | 45 |
...
Because if you are doing one if statement per month, you will get 12 ifs.
It's better to manage the format of rows in you controller.
Hope it will help you.

Related

group by year on multiple date columns mysql

I have table as following:
hours | ... | task_assigned | task_deadline | task_completion
----------------------------------------------------------------
123 | ... | 2019-08-01 | - | -
234 | ... | - | 2018-08-01 | 2019-08-01
145 | ... | 2017-08-01 | 2017-08-01 | 2018-01-01
I want to calculate total hours for each year, i.e. grouping by year.
Currently I'm only taking into account task_completion field.
If there's no value in task_completion field, the record is not included in SUM calculation.
To elaborate further, say for year 2019, row 1 and 1 both should be considered. Hence the total hours should be 123 + 234 = 357.
And for year 2018, row 2 and 3.
Similarly, for year 2017, row 3.
SELECT YEAR(task_completion) as year, ROUND(SUM(total_hours), 2) as hours
FROM task
GROUP BY year
HAVING year BETWEEN '$year_from' AND '$year_to'
The resultset:
year | hours
--------------------
2017 | <somevalue>
2018 | <somevalue>
2019 | <somevalue>
How can I include other two date fields too?
You want to consider each row once for each of its years. Use UNION to get these years:
select year, round(sum(total_hours), 2) as hours
from
(
select year(task_assigned) as year, total_hours from task
union
select year(task_deadline) as year, total_hours from task
union
select year(task_completion) as year, total_hours from task
) years_and_hours
group by year
having year between $year_from and $year_to
order by year;
If you want to consider a row with one year twice or thrice also as often in the sum, then change UNION to UNION ALL.
Basically, you want to unpivot the data. I will assume that the - represents a NULL value and your dates are real dates.
select year(dte) as year, sum(total_hours) as hours
from ((select task_assigned as dte, total_hours
from task
) union all
(select task_deadline, total_hours
from task
) union all
(select task_completion, total_hours
from task
)
) d
where dte is not null
group by year(dte)
order by year(dte);
Based on your sample data, the round() is not necessary so I removed it.
If you want to filter for particular years, the filtering should be in a where clause -- so it filters the data before aggregation.
Change the where to:
where year(dte) >= ? and year(dte) <= ?
or:
where dte >= ? and dte <= ?
to pass in the dates.
The ? are for parameter placeholders. Learn how to use parameters rather than munging query strings.
This answer is no langer valid with the updated request.
If I understand correctly, you want to use task_assigned if the task_completion is still null. Use COALEASCE for this.
SELECT
YEAR(COALESCE(task_completion, task_assigned)) as year,
ROUND(SUM(total_hours), 2) as hours
FROM task
GROUP BY year
HAVING year BETWEEN $year_from AND $year_to
ORDER BY year;
(I don't think you actually want to use task_deadline, too, for how could a task get completed before getting assigned first? If such can occur, then include it in the COALESCE expression. Probably: COALESCE(task_completion, task_assigned, task_deadline)` then.)

Check if instances have occurred minimum once, every year in a specific range

In MySQL I'm tasked with a big dataset, with data from 1970 to 2010.
I want to check for consistency: check if each instance occurs minimum one time per year. I took a snippet from 1970-1972 as example to demonstrate my problem.
input:
id year counts
-- ---- ---------
1 1970 1
1 1971 1
2 1970 3
2 1971 8
2 1972 1
3 1970 4
expected:
id 1970-1972
-- ----------
1 no
2 yes
3 no
I though about counting within the date range and then taking those out who had 3 counts: 1970, 1971, 1972. The following query doesn't force the check on each point in the range though.
select id, count(*)
from table1
WHERE (year BETWEEN '1970' AND '1972') AND `no_counts` >= 1
group by id
What to do?
You can use GROUP BY with CASE / inline if.
Using CASE. SQL Fiddle
select id,CASE WHEN COUNT(distinct year) = 3 THEN 'yes'ELSE 'No' END "1970-72"
from abc
WHERE year between 1970 and 1972
GROUP BY id
Using inline IF. SQL Fiddle
select id,IF( COUNT(distinct year) = 3,'yes','No') "1970-72"
from abc
WHERE year between 1970 and 1972
GROUP BY id
You can use a having clause with distinct count:
select `id`
from `table1`
where `year` between '1970' and '1972'
group by id
having count(distinct `year`) = 3
Do you expect this?
select id, count(*)
from table1
WHERE (year BETWEEN '1970' AND '1972')
group by id
having count(distinct year) = 3

Need to sum transaction totals from one table using customer information in another

I have spent the last hour looking for something I can use to implement here, but haven't found exactly what I need.
I have 2 tables: TRANSACTIONS & CUSTOMERS
CUSTOMER
internal_id | name | email
TRANSACTIONS
internal_id | customer_id | transaction_date | total_amount
I would like to cycle through all CUSTOMERS, then sum up the total TRANSACTIONS for each by month and year. I thought it would be as easy as just adding select statements as columns to the initial query, but that isn't working obviously:
NOT WORKING:
select customer.internal_id,
(sum(total_amount) as 'total' from TRANSACTIONS where transactions.customer_id = customer.internal_id and transaction_date >= DATE_SUB(NOW(),INTERVAL 1 month)),
(sum(total_amount) as 'total' from TRANSACTIONS where transactions.customer_id = customer.internal_id and transaction_date >= DATE_SUB(NOW(),INTERVAL 1 year))
from CUSTOMER join TRANSACTIONS on CUSTOMER.internal_id = TRANSACTIONS.customer_id
Basically I would like the output to look like this:
CUSTOMER.name | TRANSACTIONS.total_amount_month | TRANSACTIONS.total_amount_year
ABC Company | $335.00 | $8900.34
Is this possible with a single query? I have it implemented with multiple queries using PHP and would just prefer a single query if possible for performance sake.
Thanks!
SELECT c.name,
SUM(IF(transaction_date >= DATE__SUB(NOW(), INTERVAL 1 MONTH), total_amount, 0) AS total_amount_month,
SUM(total_amount) AS total_amount_year
FROM transactions AS t
JOIN customer AS c ON c.internal_id = t.customer_id
WHERE transaction_date >= DATE__SUB(NOW(), INTERVAL 1 YEAR
GROUP BY t.customer_id

MySQL calculate gain, loss and net gain over a period of time

I have a table something like this:
id | Customer | date
-----------------------------------------
1 | Customer2 | 2013-08-01 00:00:00
-----------------------------------------
2 | Customer1 | 2013-07-15 00:00:00
-----------------------------------------
3 | Customer1 | 2013-07-01 00:00:00
-----------------------------------------
. | ... | ...
-----------------------------------------
n | CustomerN | 2012-03-01 00:00:00
I want to calculate the "gained" customers for each month, the "lost" customers for each month and the Net Gain for each month, even if done in separate tables / views.
How can I do that?
EDIT
Ok, let me demonstrate what I've done so far.
To select Gained customers for any month, I've tried to select customers from Bookings table where the following not exist:
select Customer
from Bookings
where not exists
(select Customer
from Bookings
where
(Bookings.date BETWEEN
DATE_FORMAT(DATE_SUB(Bookings.date, INTERVAL 1 MONTH), '%Y-%m-01 00:00:00')
AND DATE_FORMAT(Bookings.date, '%Y-%m-01 00:00:00'
)
) AND Bookings.date >= STR_TO_DATE('2010-11-01 00:00:00', '%Y-%m-%d 00:00:00'))
This supposedly gets the customers that existed in the "selected" month but not in the previous one. "2010-11-01" is the date of the start of bookings + 1 month.
To select Lost customers for any month, I've tried to select customers from Bookings table where the following not exist:
select Customer
from Booking
where not exists
(select Customer
from Bookings
where
(Bookings.date BETWEEN
DATE_FORMAT(Bookings.date, '%Y-%m-01 00:00:00')
AND Bookings.date
)
AND Bookings.date >= STR_TO_DATE('2010-11-01 00:00:00', '%Y-%m-%d 00:00:00'
)
)
This supposedly gets the customers that existed in a previous month but not in the "selected" one.
For the "Loss" SQL query I got empty result! For the "Gain" I got thousands of rows but not sure if that's accurate.
You can use COUNT DISTINCT to count your customers, and WHERE YEAR(Date) = [year] AND MONTH(Date) = [month] to get the month.
The total number of customers in Sept 2013:
SELECT COUNT(DISTINCT Customer) AS MonthTotalCustomers FROM table
WHERE YEAR(date) = 2013 AND MONTH(date) = 9
The customers gained in Sept 2013:
SELECT COUNT(DISTINCT Customer) AS MonthGainedCustomers FROM table
WHERE YEAR(date) = 2013 AND MONTH(date) = 9
AND Customer NOT IN
(SELECT Customer FROM table
WHERE date < '2013-09-01')
Figuring out the lost customers is more difficult. I would need to know by what criteria you consider them to be 'lost.' If you just mean that they were around in August 2013 but they were not around in September 2013:
SELECT COUNT(DISTINCT Customer) AS MonthLostCustomers FROM table
WHERE YEAR(date) = 2013 AND MONTH(date) = 8
AND Customer NOT IN
(SELECT Customer FROM table
WHERE YEAR(date) = 2013 AND MONTH(date) = 9)
I hope from these examples you can extrapolate what you're looking for.

MySQL query help with grouping and adding

I have a table called user_logins which tracks user logins into the system. It has three columns, login_id, user_id, and login_time
login_id(INT) | user_id(INT) | login_time(TIMESTAMP)
------------------------------------------------------
1 | 4 | 2010-8-14 08:54:36
1 | 9 | 2010-8-16 08:56:36
1 | 9 | 2010-8-16 08:59:19
1 | 3 | 2010-8-16 09:00:24
1 | 1 | 2010-8-16 09:01:24
I am looking to write a query that will determine the number of unique logins for each day if that day has a login and only for the past 30 days from the current date. So for the output should look like this
logins(INT) | login_date(DATE)
---------------------------
1 | 2010-8-14
3 | 2010-8-16
in the result table 2010-8-16 only has 3 because the user_id 9 logged in twice that day and him logging into the system only counts as 1 login for that day. I am only looking for unique logins for a particular day. Remember I only want the past 30 days so its like a snapshot of the last month of user logins for a system.
I have attempted to create the query with little success what I have so far is this,
SELECT
DATE(login_time) as login_date,
COUNT(login_time) as logins
FROM
user_logins
WHERE
login_time > (SELECT DATE(SUBDATE(NOW())-1)) FROM DUAL)
AND
login_time < LAST_DAY(NOW())
GROUP BY FLOOR(login_time/86400)
I know this is wrong and this returns all logins only starting from the beginning of the current month and doesn't group them correctly. Some direction on how to do this would be greatly appreciated. Thank you
You need to use COUNT(DISTINCT ...):
SELECT
DATE(login_time) AS login_date,
COUNT(DISTINCT login_id) AS logins
FROM user_logins
WHERE login_time > NOW() - interval 30 day
GROUP BY DATE(login_time)
I was a little unsure what you wanted for your WHERE clause because your question seems to contradict itself. You may need to modify the WHERE clause depending on what you want.
As Mark suggests you can use COUNT(DISTINCT...
Alternatively:
SELECT login_day, COUNT(*)
FROM (
SELECT DATE_FORMAT(login_time, '%D %M %Y') AS login_day,
user_id
FROM user_logins
WHERE login_time>DATE_SUB(NOW(), INTERVAL 1 MONTH)
GROUP BY DATE_FORMAT(login_time, '%D %M %Y'),
user_id
)
GROUP BY login_day