mysql - query database for expenses and debts - mysql

With respect to the sample table below, and keeping in mind the following definitions,
start, end and timestamp are all unix timestamps
Definition: duration = ((end - start)/3600), that is, in hours
I would like to get the following mysql query:
Group by student and calculate all money spent by each student - that is, (duration x cost)
This is what I got and it works, but is incomplete!
SELECT student, SUM(ceil(cost*(end-start)/3600)) AS expenses
FROM schedules GROUP BY student;
AND (this one does not work, but the idea is actually what I want to attain)
SELECT student, SUM(SELECT ceil(cost*(end-start)/3600) FROM schedules WHERE paid = 1) AS expenses, SUM(SELECT ceil(cost*(end-start)/3600) FROM schedules WHERE paid = 0) AS debts FROM schedules GROUP BY student;
My BIGGEST problem is with calculating expenses from today into the past as well as debts if the date of today is greater than start and paid is still set to 0
Thank you all for your ideas!
Sample Table
id meta_id start end admin student tutor course cost paid paydate timestamp
18 4 1359867600 1359867690 jnc banjune cameron 2 90 1 1361521193 1359881165
19 4 1360472400 1360472490 jnc banjune cameron 2 90 1 1361521195 1359881165
20 4 1359867600 1359867690 jnc saadcore cameron 2 90 1 1361547064 1359881165
25 6 1359914400 1359919800 jnc johndoe cameron 3 35 1 1361547080 1359893058
26 6 1360000800 1360006200 jnc johndoe cameron 3 35 0 0 1359893058
27 6 1360087200 1360092600 jnc johndoe cameron 3 35 0 0 1359893058

I got the desired solution
SELECT
student,
SUM(CASE WHEN paid = 1 AND FROM_UNIXTIME(start) <= now() THEN ceil(cost*(end-start)/3600)
ELSE 0 END) as expenses,
SUM(CASE WHEN paid = 0 AND FROM_UNIXTIME(start) <= now() THEN ceil(cost*(end-start)/3600)
ELSE 0 END) as debts
FROM schedules
GROUP BY student;

Related

SQL subquery in SELECT clause

I'm trying to find admin activity within the last 30 days.
The accounts table stores the user data (username, password, etc.)
At the end of each day, if a user had logged in, it will create a new entry in the player_history table with their updated data. This is so we can track progress over time.
accounts table:
id
username
admin
1
Michael
4
2
Steve
3
3
Louise
3
4
Joe
0
5
Amy
1
player_history table:
id
user_id
created_at
playtime
0
1
2021-04-03
10
1
2
2021-04-04
10
2
3
2021-04-05
15
3
4
2021-04-10
20
4
5
2021-04-11
20
5
1
2021-05-12
40
6
2
2021-05-13
55
7
3
2021-05-17
65
8
4
2021-05-19
75
9
5
2021-05-23
30
10
1
2021-06-01
60
11
2
2021-06-02
65
12
3
2021-06-02
67
13
4
2021-06-03
90
The following query
SELECT a.`username`, SEC_TO_TIME((MAX(h.`playtime`) - MIN(h.`playtime`))*60) as 'time' FROM `player_history` h, `accounts` a WHERE h.`created_at` > '2021-05-06' AND h.`user_id` = a.`id` AND a.`admin` > 0 GROUP BY h.`user_id`
Outputs this table:
Note that this is just admin activity, so Joe is not included in this data.
from 2021-05-06 to present (yy-mm-dd):
username
time
Michael
00:20:00
Steve
00:10:00
Louise
00:02:00
Amy
00:00:00
As you can see this from data, Amy's time is shown as 0 although she has played for 10 minutes in the last month. This is because she only has 1 entry starting from 2021-05-06 so there is no data to compare to. It is 0 because 10-10 = 0.
Another flaw is that it doesn't include all activity in the last month, basically only subtracts the highest value from the lowest.
So I tried fixing this by comparing the highest value after 2021-05-06 to their most previous login before the date. So I modified the query a bit:
SELECT a.`Username`, SEC_TO_TIME((MAX(h.`playtime`) - (SELECT MAX(`playtime`) FROM `player_history` WHERE a.`id` = `user_id` AND `created_at` < '2021-05-06'))*60) as 'Time' FROM `player_history` h, `accounts` a WHERE h.`created_at` >= '2021-05-06' AND h.`user_id` = a.`id` AND a.`admin` > 0 GROUP BY h.`user_id`
So now it will output:
username
time
Michael
00:50:00
Steve
00:50:00
Louise
00:52:00
Amy
00:10:00
But I feel like this whole query is quite inefficient. Is there a better way to do this?
I think you want lag():
SELECT a.username,
SEC_TO_TIME(SUM(h.playtime - COALESCE(h.prev_playtime, 0))) as time
FROM accounts a JOIN
(SELECT h.*,
LAG(playtime) OVER (PARTITION BY u.user_id ORDER BY h.created_at) as prev_playtime
FROM player_history h
) h
ON h.user_id = a.id
WHERE h.created_at > '2021-05-06' AND
a.admin > 0
GROUP BY a.username;
In addition to the LAG() logic, note the other changes to the query:
The use of proper, explicit, standard, readable JOIN syntax.
The use of consistent columns for the SELECT and GROUP BY.
The removal of single quotes around the column alias.
The removal of backticks; they just clutter the query, making it harder to write and to read.

The sum of Opening and Closing Balances for each Financial Year Grouped by Category in MYSQL

I have used three days attempting how to figure out this with no avail.I also did some search even from this forum but failed too.
It might look like Duplicate Question but to be honest this is different from others has been asked.
My question is how to get the sum of Balance Carry forward C/F and Closing balance for each financial year being GROUPED BY loan_id For each Date Range ie.Financial year?
transaction_t
Na loan_id date credit_amount debit_amount
1 1 2017-01-01 5,000 4,000
2 1 2017-05-01 6,000 2,000
3 2 2017-10-01 1,000 1,500
4 1 2018-10-30 2,000 400
5 2 2018-11-01 12,00 1,000
6 2 2019-01-15 1,800 500
7 1 2019-05-21 100 200
The above table schema and its data have mysql fiddle here I have also read this thread MySQL Open Balance Credit Debit Balance which is only working for single user.
So far I have tried:
SELECT loan_id,
SUM(credit)-(SELECT SUM(a.debit) FROM transaction_t a
WHERE a.transaction_date
BETWEEN '2019-01-01' AND '2020-12-31' AND a.loan_id = loan_id) AS OpeningBalance,
sum(credit),sum(debit),
(#OpeningBalance+SUM(credit))-SUM(debit) AS closing_balance
FROM transaction_t tr
WHERE transaction_date BETWEEN DATE('2019-01-01') AND DATE('2020-12-31')
GROUP BY loan_id
But the above is not giving correct results ,How do i get the results like these ones?
A: Query made for date between 2017-01-01 and 2018-12-31
loan_id opening_balance sum(credit_amount) sum(debit_amount) closing_balance
1 0 13,000.00 6,400.00 6,600.00
2 0 2,200.00 2,500.00 -300
B: Query made for date between 2019-01-01 and 2020-12-31
loan_id opening_balance sum(credit_amount) sum(debit_amount) closing_balance
1 6,600 100.00 200.00 6,500.00
2 -300 1,800.00 500.00 1,000
You are looking for conditional aggregation.
The key thing is that you need to start scanning the table from the beginning of the history in order to generate the initial balance. Then you just need to adjust the conditional sums:
Consider:
SET #start_date = '2017-01-01';
SET #end_date = '2018-12-31';
SELECT
loan_id,
SUM(CASE WHEN transaction_date < #start_date THEN credit - debit ELSE 0 END) opening_balance,
SUM(CASE WHEN transaction_date BETWEEN #start_date AND #end_date THEN credit ELSE 0 END) sum_credit,
SUM(CASE WHEN transaction_date BETWEEN #start_date AND #end_date THEN debit ELSE 0 END) sum_debit,
SUM(CASE WHEN transaction_date <= #end_date THEN credit - debit ELSE 0 END) closing_balance
FROM transaction_t
WHERE transaction_date <= #end_date
GROUP BY loan_id
In your DB Fiddle, this returns:
loan_id opening_balance sum_credit sum_debit closing_balance
1 0 13000 6400 6600
2 0 2200 2500 -300
And when changing the dates to 2020-2021:
loan_id opening_balance sum_credit sum_debit closing_balance
1 6600 100 200 6500
2 -300 1800 500 1000
NB: that was a well-asked question, that SO could use more of!

Compare all rooms to all other rooms (Cartesian product)

I have attendance data that is stored like this:
Building | Room | Date | Morning | Evening
------------------------------------------
BuildA A1 1 10 15
BuildA A1 2 20 35
BuildA A1 3 30 15
BuildA A2 1 60 30
BuildA A2 2 30 10
BuildA A2 3 40 20
BuildB B1 1 20 25
BuildB B1 2 10 35
BuildB B1 3 30 10
BuildB B2 1 15 25
BuildB B2 2 25 35
BuildB B2 3 25 15
I then need to see the difference in attendance for each time of day from the previous day for each room. The result would look like this:
Building | Room | Date | Morning | Evening | MorningDiff | EveningDiff
-----------------------------------------------------------------------
BuildA A1 1 10 15 0 0
BuildA A1 2 20 35 10 20
BuildA A1 3 30 15 10 -20
BuildA A2 1 60 30 0 0
BuildA A2 2 30 10 -30 -20
BuildA A2 3 40 20 10 10
BuildB B1 1 20 25 0 0
BuildB B1 2 10 35 -10 10
BuildB B1 3 30 10 20 -25
BuildB B2 1 15 25 0 0
BuildB B2 2 25 35 10 10
BuildB B2 3 25 15 0 -20
The previous I was able to accomplish with this query:
select t.*,
COALESCE((`morning` -
(select `morning`
from data t2
where t2.date < t.date
and t2.room = t.room
order by t2.date desc
limit 1 )) ,0)
as MorningDiff,
COALESCE((`evening` -
(select `evening`
from data t2
where t2.date < t.date
and t2.room = t.room
order by t2.date desc
limit 1 )) ,0)
as EveningDiff
from data t
order by room,date asc;
So now I have the difference in attendance. This is where it gets a little complicated now. Maybe first seeing what the final product I am after may clear it up:
Building1 | Room1 | TimeOfDay1 | Direction1 | Building2 | Room2 | TimeOfDay2 | Direction2 | OccuranceCount | Room1DirectionCount | Room2DirectionCount
-----------------------------------------------------------------------------------------------------------------------------------------------------
BuildA A1 Morning Up BuildA A2 Morning Up 1 2 1
BuildA A1 Morning Up BuildA A2 Morning Down 1 2 1
BuildA A1 Morning Up BuildA A2 Evening Up 1 2 1
.
.
.
The reason for getting the difference between dates is to see if the attendance increased or decreased from the previous day. We are not actually concerned with the actual number from the difference, we are just interested if it went up or it went down.
OccuranceCount field - If a room's attendance went up/down one day we are trying to see whether another rooms attendance went up/down the next day. This field is used then to count how many times room2 went up/down one day and that room1 went up/down the next day. So if we take the first row as an example it shows that room A1 morning attendance went up 1 time when room A2's morning attendance went up the previous day during the 3 day period.
Room1DirectionCount/Room2DirectionCount field - These fields simply show how many time each direction occurred for each room. So if in the time period of 100 days if room A1 increased attendance 60 times the count would be 60.
Since I am comparing all the rooms to each other I have tried to do a cross join to form a cartesian product but I have been unable to figure out how to do the join properly so it references the other room's previous day.
I am not sure why this question was marked as a duplicate of a question regarding pivot tables? I don't believe this question is answered by that.
I'm not 100% certain I understand your question, and there isn't really enough sample data/expected output to be sure, but I think this query will give you the results you want. It uses a couple of CTE's: one to get the differences for each building/room/date/timeofday combination, and the second to sum those (for the RoomDirectionCount columns), then just counts grouped rows to get the OccurrenceCount column.
with atdiff AS (SELECT
building, room, date, 'Morning' AS time_of_day,
morning - lag(morning) over (partition by building, room order by date) AS diff
from attendance
UNION SELECT
building, room, date, 'Evening',
evening - lag(evening) over (partition by building, room order by date) diff
from attendance),
dircounts AS (SELECT
building, room, time_of_day, SIGN(diff) AS direction, COUNT(*) AS DirectionCount
FROM atdiff
GROUP BY building, room, time_of_day, direction)
select a1.building AS Building1,
a1.room AS Room1,
a1.time_of_day AS TimeOfDay1,
(CASE SIGN(a1.diff) WHEN 1 THEN 'Up' WHEN -1 THEN 'Down' ELSE 'Unchanged' END) AS Direction1,
a2.building AS Building2,
a2.room AS Room2,
a2.time_of_day AS TimeOfDay2,
(CASE SIGN(a2.diff) WHEN 1 THEN 'Up' WHEN -1 THEN 'Down' ELSE 'Unchanged' END) AS Direction2,
COUNT(*) AS OccurrenceCount,
MIN(d1.DirectionCount) AS Room1DirectionCount,
MIN(d2.DirectionCount) AS Room2DirectionCount
from atdiff a1
join atdiff a2 on a2.date = a1.date + 1 AND (a2.building != a1.building OR a2.room != a1.room)
JOIN dircounts d1 ON d1.building = a1.building AND d1.room = a1.room AND d1.time_of_day = a1.time_of_day AND d1.direction = SIGN(a1.diff)
JOIN dircounts d2 ON d2.building = a2.building AND d2.room = a2.room AND d2.time_of_day = a2.time_of_day AND d2.direction = SIGN(a2.diff)
where a1.diff is not NULL
group by Building1, Room1, TimeofDay1, Direction1, Building2, Room2, TimeOfDay2, Direction2
order by Building1, Room1, TimeofDay1 DESC, Direction1 DESC, Building2, Room2, TimeOfDay2 DESC, Direction2 DESC
The output is too long to include here but I've created a demo on dbfiddle. Alternate demo on dbfiddle.uk
Note that I've used a WHERE a1.diff IS NOT NULL clause to exclude results from the first day, you could possibly put a COALESCE around the computation of diff in the atdiff table and then not use that.
I am having hard times figuring out the meaning of some of your columns in the second expected output. However, for what it's worth, here are some examples and demonstrations that might help you.
If you are using MySQL 8.0, you can use the wonderful window functions to access rows that are related to the current row. The following query returns your first expected output (although where there is no previous date, NULL is returned instead of 0, to distinguish from the case when the frequentation is the same as the previous day) :
select
a.*,
morning - lag(a.morning) over (partition by a.building, a.room order by a.date) morning_diff,
evening - lag(a.evening) over (partition by a.building, a.room order by a.date) evening_diff
from attendance a
order by a.building, a.room, a.date
See the db fiddle.
With older versions of mysql, you could use a self-LEFT JOIN to access the previous row :
select
a.*,
a.morning - a1.morning morning_diff,
a.evening - a1.evening evening_diff
from
attendance a
left join attendance a1
on a1.building = a.building and a1.room = a.room and a1.date = a.date - 1
order by a.building, a.room, a.date
See this MySQL 5.7 db fiddle.
Once you have a query that returns the attendance differences, you can easily see if it went up or down in an outer query. Consider, for example :
select t.*,
case
when morning_diff is null then 'Unknown'
when morning_diff = 0 then 'Even'
when morning_diff > 0 then 'Up'
when morning_diff < 0 then 'Down'
end morning_direction,
case
when evening_diff is null then 'Unknown'
when evening_diff = 0 then 'Even'
when evening_diff > 0 then 'Up'
when evening_diff < 0 then 'Down'
end evening_direction
from (
select
a.*,
morning - lag(a.morning) over (partition by a.building, a.room order by a.date) morning_diff,
evening - lag(a.evening) over (partition by a.building, a.room order by a.date) evening_diff
from attendance a
) t
order by t.building, t.room, t.date;
See this db fiddle.

Amazon Redshift query to get delinquent amount and days past due at the end of month

Changing the question because of a misunderstanding in use case.
Amazon Redshift Query for the following problem statement.
The data structure:
id - primary key
acc_id - id unique to a loan account (this id will be same for all
emi's for a particular loan account, this maybe repeated 6 times or
12 times based on loan tenure which can be 6 months or 12 months
respectively)
status - PAID or UNPAID (the emi's unpaid are followed my unpaid
emi's only)
s_id - just a scheduling id which would be consecutive numbers for a
a particular loan id
due_date - the due date for that particular emi
principal - amount that is due
The table:
id acc_id status s_id due_date principal
9999957 10003 PAID 102 2018-07-02 12:00:00 4205
9999958 10003 UNPAID 103 2018-08-02 12:00:00 4100
9999959 10003 UNPAID 104 2018-09-02 12:00:00 4266
9999960 10003 UNPAID 105 2018-10-02 12:00:00 4286
9999962 10004 PAID 106 2018-07-02 12:00:00 3200
9999963 10004 PAID 107 2018-08-02 12:00:00 3100
9999964 10004 UNPAID 108 2018-09-02 12:00:00 3266
9999965 10004 UNPAID 109 2018-10-02 12:00:00 3286
The use case -
The unpaid amount becomes delinquent (overdue) after the due_date.
So I need to calculate delinquent amount at the end of every month from the first due_date in this case is 2nd July to last due_date (assume it to be 2nd November which is the current month)
I also need to calculate days past due at the end of that month.
Illustration from the above data:
From the sample data provided, no EMI is due at the end of July so amount delinquent is 0
But at the end of August - the id 9999958 is due - as of 31st August
the amount delinquent is 4100 and days past due is 29 (31st August minus 2nd August)
The catch: I need to calculate this for the loan (acc_id) and not the emi.
To further explain, A first EMI will be 29 days due on 1st month and 59 days due on second month, also second EMI will be 29 days due on second month. But I need this at loan level (acc_id).
The same example continued for 30th september, the acc_id 10003 is due since 2nd August so as of 30th September the due amount is 8366 (4100 + 4266) and DPD (days_past_due) is 59 (29 + 30).
Also acc_id 10004 is due 3100 and DPD is 28 (30th september - 2nd september).
The final output would be something like this:
Month_End DPD_Band Amount
2018/08/31 0-29 4100
2018/08/31 30-59 0
2018/08/31 60-89 0
2018/08/31 90+ 0
2018/09/30 0-29 3100
2018/09/30 30-59 8366
2018/09/30 60-89 0
2018/09/30 90+ 0
Query attempt: DPD bands can be created based on case statements on delinquent days. I need real help in first creating End-of-months and then finding the portfolio level amounts as explained above for different delinquent days.
Edited to be RedShift compatible after the op clarified which RDBMS. (MySQL would need a different answer)
The following creates one record for each month between your first record, and the end of last month.
It then joins on to your unpaid records, and the aggregation chooses which bracket to put the results in to.
WITH
first_month AS
(
SELECT LAST_DAY(MIN(due_date)) AS end_date FROM yourTable
),
months AS
(
SELECT
LAST_DAY(ADD_MONTHS(first_month.end_date, s.id)) AS end_date
FROM
first_month
CROSS JOIN
generate_series(
1,
DATEDIFF(month, (SELECT end_date FROM first_month), CURRENT_DATE)
)
AS s(id)
),
monthly_delinquents AS
(
SELECT
yourTable.*,
months.end_date AS month_end_date,
DATEDIFF(DAY, yourTable.due_date, months.end_date) AS days_past_due
FROM
months
LEFT JOIN
yourTable
ON yourTable.status = 'UNPAID'
AND yourTable.due_date < months.end_date
)
SELECT
month_end_date,
SUM(CASE WHEN days_past_due >= 00 AND days_past_due < 30 THEN principal ELSE 0 END) AS dpd_00_29,
SUM(CASE WHEN days_past_due >= 30 AND days_past_due < 60 THEN principal ELSE 0 END) AS dpd_30_59,
SUM(CASE WHEN days_past_due >= 60 AND days_past_due < 90 THEN principal ELSE 0 END) AS dpd_60_89,
SUM(CASE WHEN days_past_due >= 90 THEN principal ELSE 0 END) AS dpd_90plus
FROM
monthly_delinquents
GROUP BY
month_end_date
ORDER BY
month_end_date
That said, normally the idea of pivoting things like this is a bad idea. What happens when something is a year past due? It just sits in the 90plus category and never moves. And, if you want to expand it you need to change the query and any other query you ever write that depends on it.
Instead, you could normalise your output...
WITH
first_month AS
(
SELECT LAST_DAY(MIN(due_date)) AS end_date FROM yourTable
),
months AS
(
SELECT
LAST_DAY(ADD_MONTHS(first_month.end_date, s.id)) AS end_date
FROM
first_month
CROSS JOIN
generate_series(
1,
DATEDIFF(month, (SELECT end_date FROM first_month), CURRENT_DATE)
)
AS s(id)
),
monthly_delinquents AS
(
SELECT
yourTable.*,
months.end_date AS month_end_date,
DATEDIFF(DAY, yourTable.due_date, months.end_date) AS days_past_due
FROM
months
LEFT JOIN
yourTable
ON yourTable.status = 'UNPAID'
AND yourTable.due_date < months.end_date
)
SELECT
month_end_date,
(days_past_due / 30) * 30 AS days_past_due_band,
SUM(principal) AS total_principal,
COUNT(*) AS total_rows
FROM
monthly_delinquents
GROUP BY
month_end_date,
(days_past_due / 30) * 30
ORDER BY
month_end_date,
(days_past_due / 30) * 30

complicated column calculation in mysql to get a count of monthly visits

I have a two tables that look like this
customer table
first last cust_id
John Doe 0
Jane Doe 1
ledger table
posted_date cust_id
2014-01-14 0
2014-01-20 0
2013-12-20 0
2013-12-20 1
2013-11-12 1
2013-11-10 1
I need to calculate the number of months where the customer posted a transaction at least once, this is being called CustomerMonths for the last 12 months. This means CustomerMonths for each cust_id would be between 0 and 12. So for this data I would want to see
cust_id CustomerMonths
0 2
1 2
This is because cust_id 0 was in at least once in Jan 2014 and at least once in Dec 2013.
Similarly, cust_id 1 was in at least once in Dec 2013 and at least once in Nov 2013.
For this example for cust_id 0:
2014-01-14,
2014-01-20 = 1 CustomerMonths
2013-12-20 = 1 CustomerMonths
so total CustomerMonths for last 12 months for cust_id 0 is 2.
I have this working for one month but not sure how to get this to work for the last 12 months. Although I'd settle for it working for the last two months. I think i could figure out the rest. Here's what I have.
select distinct
c.cust_id,
(case when count(lJan.posted_date) = 0 then 0 else
case when count(lJan.posted_date) > 0 then 1 end end) as CustomerMonths
from 'customer' c
left join 'ledger' lJan on (lJan.cust_id = c.cust_id and lJan.posted_date between '2014-01-01' and '2014-01-31')
group by c.cust_id
You need to count distinct months, so use count(distinct). The question is what is the argument. Try this:
select c.cust_id,
count(distinct year(l.posted_date) * 100 + month(l.posted_date)) as CustomerMonths
from customer c left join
ledger l
on l.cust_id = c.cust_id and
l.posted_date between '2013-01-01' and '2014-01-31'
group by c.cust_id;
Another way of writing the select:
select c.cust_id,
count(distinct date_format(l.posted_date, '%Y-%m')) as CustomerMonths