Count id for each day in a month - mysql

I have a database in mysql for a hospital where the columns are: id, entry_date, exit_date (the last two columns are the hospital patient entry and exit).
I would like to count the number of patients on each day of a given month
The code to count the number of ids for a given day is relatively simple (as described), but the count for each day of an entire month i do not know how to do.
Day 2019-09-01: x patients
Day 2019-09-02: y patients
Day 2019-09-03: z patients
.
.
.
x + y + z + ... = total patients on each day for all days of september
SELECT Count(id) AS patientsday
FROM saps
WHERE entry_date <= '2019-05-02'
AND ( exit_date > '2019-05-02'
OR exit_date IS NULL )
AND hospital = 'X'

First, assuming every day there is at least one patient entering this hospital, I would write a temporary table containing all the possibles dates called all_dates.
Second, I would create a temporary table joining the table you have with all_dates. In this case, the idea is to duplicate the id. For each day the patient was inside the hospital you will have the id related to this day on your table. For example, before your table looked like this:
id entry_date exit_date
1 2019-01-01 2019-01-05
2 2019-01-03 2019-01-04
3 2019-01-10 2019-01-15
With the joined table, your table will look like this:
id possible_dates
1 2019-01-01
1 2019-01-02
1 2019-01-03
1 2019-01-04
1 2019-01-05
2 2019-01-03
2 2019-01-04
3 2019-01-10
3 2019-01-11
3 2019-01-12
3 2019-01-13
3 2019-01-14
3 2019-01-15
Finally, all you have to do is count how many ids you have per day.
Here is the full query for this solution:
WITH all_dates AS (
SELECT distinct entry_date as possible_dates
FROM your_table_name
),
patients_per_day AS (
SELECT id
, possible_dates
FROM all_dates ad
LEFT JOIN your_table_name di
ON ad.possible_dates BETWEEN di.entry_date AND di.exit_date
)
SELECT possible_dates, COUNT(ID)
FROM patients_per_day
GROUP BY 1
Another possible solution, following almost the same strategy, changing only the conditons of the join is the query bellow:
WITH all_dates AS (
SELECT distinct entry_date as possible_dates
FROM your_table_name
),
date_intervals AS (
SELECT id
, entry_date
, exit_date
, datediff(entry_date, exite_date) as date_diference
FROM your_table_name
),
patients_per_day AS (
SELECT id
, possible_dates
FROM all_dates ad
LEFT JOIN your_table_name di
ON datediff(ad.possible_dates,di.entry_date)<= di.date_diference
)
SELECT possible_dates, COUNT(ID)
FROM patients_per_day
GROUP BY 1

This will break it down for number of entries for all dates. You can modify the SELECT to add a specific month and/or year.
SELECT
CONCAT(YEAR, '-', MONTH, '-', DAY) AS THE_DATE,
ENTRIES
FROM (
SELECT
DATE_FORMAT(entry_date, '%m') AS MONTH,
DATE_FORMAT(entry_date, '%d') AS DAY,
DATE_FORMAT(entry_date, '%Y') AS YEAR,
COUNT(*) AS ENTRIES
FROM
saps
GROUP BY
MONTH,
DAY,
YEAR
) AS ENTRIES
ORDER BY
THE_DATE DESC

Related

MySQL query for records that existed at any point each week

I have a table with created_at and deleted_at timestamps. I need to know, for each week, how many records existed at any point that week:
week
records
2022-01
4
2022-02
5
...
...
Essentially, records that were created before the end of the week and deleted after the beginning of the week.
I've tried various variations of the following but it's under-reporting and I can't work out why:
SELECT
DATE_FORMAT(created_at, '%Y-%U') AS week,
COUNT(*)
FROM records
WHERE
deleted_at > DATE_SUB(deleted_at, INTERVAL (WEEKDAY(deleted_at)+1) DAY)
AND created_at < DATE_ADD(created_at, INTERVAL 7 - WEEKDAY(created_at) DAY)
GROUP BY week
ORDER BY week
Any help would be massively appreciated!
I would create a table wktable that looks like so (for the last 5 weeks of last year):
yrweek | wkstart | wkstart
-------+------------+------------
202249 | 2022-11-27 | 2022-12-03
202250 | 2022-12-04 | 2022-12-10
202251 | 2022-12-11 | 2022-12-17
202252 | 2022-12-18 | 2022-12-24
202253 | 2022-12-25 | 2022-12-31
To get there, find a way to create 365 consecutive integers, make all the dates of 2022 out of that, and group them by year-week.
This is an example:
CREATE TABLE wk AS
WITH units(units) AS (
SELECT 0 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION
SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9
)
,tens AS(SELECT units * 10 AS tens FROM units )
,hundreds AS(SELECT tens * 10 AS hundreds FROM tens )
,
i(i) AS (
SELECT hundreds +tens +units
FROM units
CROSS JOIN tens
CROSS JOIN hundreds
)
,
dt(dt) AS (
SELECT
DATE_ADD(DATE '2022-01-01', INTERVAL i DAY)
FROM i
WHERE i < 365
)
SELECT
YEAR(dt)*100 + WEEK(dt) AS yrweek
, MIN(dt) AS wkstart
, MAX(dt) AS wkend
FROM dt
GROUP BY yrweek
ORDER BY yrweek;
With that table, go:
SELECT
yrweek
, COUNT(*) AS records
FROM wk
JOIN input_table ON wk.wkstart < input_table.deleted_at
AND wk.wkend > input_table.created_at
GROUP BY
yrweek
;
I first build a list with the records, their open count, and the closed count
SELECT
created_at,
deleted_at,
(SELECT COUNT(*)
from records r2
where r2.created_at <= r1.created_at ) as new,
(SELECT COUNT(*)
from records r2
where r2.deleted_at <= r1.created_at) as closed
FROM records r1
ORDER BY r1.created_at;
After that it's just adding a GROUP BY:
SELECT
date_format(created_at,'%Y-%U') as week,
MAX((SELECT COUNT(*)
from records r2
where r2.created_at <= r1.created_at )) as new,
MAX((SELECT COUNT(*)
from records r2
where r2.deleted_at <= r1.created_at)) as closed
FROM records r1
GROUP BY week
ORDER BY week;
see: DBFIDDLE
NOTE: Because I use random times, the results will change when re-run. A sample output is:
week
new
closed
2022-00
31
0
2022-01
298
64
2022-02
570
212
2022-03
800
421

Add the continuous date period records into one record with sql

Original Data:
ID Date Original_col
A 2021-04-10 1
B 2021-03-01 1
B 2021-05-01 1
C 2021-03-01 1
C 2021-03-02 2
C 2021-03-03 3
C 2021-05-07 1
Result data:
ID Date Result_col
A 2021-04-10 1
B 2021-03-01 1
B 2021-05-01 1
C 2021-03-01 3
C 2021-05-07 1
For ID = 'C' records, records with date between '2021-03-01' to '2021-03-03' are grouped together, only start date '2021-03-01' and max day '3' is kept, record with date = '2021-05-07' is kept cause there are no bigger records.
There are no strict restrictions on 'the date period', I need to group them together if they are continuous on Original_col.
You can identify the periods by subtracting an enumerated value. This is constant for "adjacent" days. The rest is just aggregation:
select id, min(date), max(original_col) as result_col
from (select t.*,
row_number() over (partition by id order by date) as seqnum
from t
) t
group by id, (date - interval seqnum day);
If the original_column is really enumerating the adjacent dates, then you don't even need a subquery:
select id, min(date), max(original_col) as result_col
from t
group by id, (date - interval original_col day);
However, I don't know if the values are just coincidences in the sample data in the question.

Creating an overdraft statement

I'm currently stuck on how to create a statement that shows daily overdraft statements for a particular council.
I have the following, councils, users, markets, market_transactions, user_deposits.
market_transaction run daily reducing user's account balance. When the account_balance is 0 the users go into overdraft (negative). When users make a deposit their account balance increases.
I Have put the following tables to show how transactions and deposits are stored.
if I reverse today's transactions I'm able to get what account balance a user had yesterday but to formulate a query to get the daily OD amount is where the problem is.
USERS
user_id
name
account_bal
1
Wells
-5
2
James
100
3
Joy
10
4
Mumbi
-300
DEPOSITS
id
user_id
amount
date
1
1
5
2021-04-26
2
3
10
2021-04-26
3
3
5
2021-04-25
4
4
5
2021-04-25
TRANSACTIONS
id
user_id
amount_tendered
date
1
1
5
2021-04-27
2
2
10
2021-04-26
3
3
15
2021-04-26
4
4
50
2021-04-25
The Relationships are as follows,
COUNCILS
council_id
name
1
a
2
b
3
c
MARKETS
market_id
name
council_id
1
x
3
2
y
1
3
z
2
MARTKET_USER_LINK
id
market_id
user_id
1
1
3
2
2
2
3
3
1
I'm running this SQL query to get the total amount users have spent and subtracting with the current user account balance.
Don't know If I can use this to figure out the account_balance for each day.
SELECT u.user_id, total_spent, total_deposits,m.council_id
FROM users u
JOIN market_user_link ul ON ul.user_id= u.user_id
LEFT JOIN markets m ON ul.market_id =m.market_id
LEFT JOIN councils c ON m.council_id =c.council_id
LEFT JOIN (
SELECT user_id, SUM(amount_tendered) AS total_spent
FROM transactions
WHERE DATE(date) BETWEEN DATE('2021-02-01') AND DATE(NOW())
GROUP BY user_id
) t ON t.user_id= u.user_id
ORDER BY user_id, total_spent ASC
// looks like this when run
| user_id | total_spent | council_id |
|-------------|----------------|------------|
| 1 | 50.00 | 1 |
| 2 | 2.00 | 3 |
I was hoping to reverse transactions and deposits done to get the account balance for a day then get the sum of users with an account balance < 0... But this has just failed to work.
The goal is to produce a query that shows daily overdraft (Only SUM the total account balance of users with account balance below 0 ) for a particular council.
Expected Result
date
council_id
o_d_amount
2021-04-24
1
-300.00
2021-04-24
2
-60.00
2021-04-24
3
-900.00
2021-04-25
1
-600.00
2021-04-25
2
-100.00
2021-04-25
3
-1200.00
This is actually not that hard, but the way you asked makes it hard to follow.
Also, your expected result should match the data you provided.
Edited: Previous solution was wrong - It counted withdraws and deposits more than once if you have more than one event for each user/date.
Start by having the total exchanged on each day, like
select user_id, date, sum(amount) exchanged_on_day from (
select user_id, date, amount amount from deposits
union all select user_id, date, -amount_tendered amount from transactions
) d
group by user_id, date
order by user_id, date;
What follows gets the state of the account only on days that had any deposits or withdraws.
To get the results of all days (and not just those with account movement) you just have to change the cross join part to get a table with all dates you want (like Get all dates between two dates in SQL Server) but I digress...
select dates.date, c.council_id, u.name username
, u.account_bal - sum(case when e.date >= dates.date then e.exchanged_on_day else 0 end) as amount_on_start_of_day
, u.account_bal - sum(case when e.date > dates.date then e.exchanged_on_day else 0 end) as amount_on_end_of_day
from councils c
inner join markets m on c.council_id=m.council_id
inner join market_user_link mul on m.market_id=mul.market_id
inner join users u on mul.user_id=u.user_id
left join (
select user_id, date, sum(amount) exchanged_on_day from (
select user_id, date, amount amount from deposits
union all select user_id, date, -amount_tendered amount from transactions
) d group by user_id, date
) e on u.user_id=e.user_id --exchange on each Day
cross join (select distinct date from (select date from deposits union select date from transactions) datesInternal) dates --all days that had a transaction
group by dates.date, c.council_id, u.name, u.account_bal
order by dates.date desc, c.council_id, u.name;
From there you can rearrange to get the result you want.
select date, council_id
, sum(case when amount_on_start_of_day<0 then amount_on_start_of_day else 0 end) o_d_amount_start
, sum(case when amount_on_end_of_day<0 then amount_on_end_of_day else 0 end) o_d_amount_end
from (
select dates.date, c.council_id, u.name username
, u.account_bal - sum(case when e.date >= dates.date then e.exchanged_on_day else 0 end) as amount_on_start_of_day
, u.account_bal - sum(case when e.date > dates.date then e.exchanged_on_day else 0 end) as amount_on_end_of_day
from councils c
inner join markets m on c.council_id=m.council_id
inner join market_user_link mul on m.market_id=mul.market_id
inner join users u on mul.user_id=u.user_id
left join (
select user_id, date, sum(amount) exchanged_on_day from (
select user_id, date, amount amount from deposits
union all select user_id, date, -amount_tendered amount from transactions
) d group by user_id, date
) e on u.user_id=e.user_id --exchange on each Day
cross join (select distinct date from (select date from deposits union select date from transactions) datesInternal) dates --all days that had a transaction
group by dates.date, c.council_id, u.name, u.account_bal
) result
group by date, council_id
order by date;
You can check it on https://www.db-fiddle.com/f/msScT6B5F7FjU2aQXVr2da/6
Basically the query maps users to councils, caculates periods of overdrafts for users, them aggregates over councils. I assume that starting balance is dated start of the month '2021-04-01' (it could be ending balance as well, see below), change it as needed. Also that negative starting balance counts as an overdraft. For simplicity and debugging the query is divided into a number of steps.
with uc as (
select distinct m.council_id, mul.user_id
from markets m
join market_user_link mul on m.market_id = mul.market_id
),
user_running_total as (
select user_id, date,
coalesce(lead(date) over(partition by user_id order by date) - interval 1 day, date) nxt,
sum(sum(s)) over(partition by user_id order by date) rt
from (
select user_id, date, -amount_tendered s
from transactions
union all
select user_id, date, amount
from deposits
union all
select user_id, se.d, se.s
from users
cross join lateral (
select date(NOW() + interval 1 day) d, 0 s
union all
select '2021-04-01' d, account_bal
) se
) t
group by user_id, date
),
user_overdraft as (
select user_id, date, nxt, least(rt, 0) ovd
from user_running_total
where date <= date(NOW())
),
dates as (
select date
from user_overdraft
union
select nxt
from user_overdraft
),
council__overdraft as (
select uc.council_id, d.date, sum(uo.ovd) total_overdraft, lag(sum(uo.ovd), 1, sum(uo.ovd) - 1) over(partition by uc.council_id order by d.date) prev_ovd
from uc
cross join dates d
join user_overdraft uo on uc.user_id = uo.user_id and d.date between uo.date and uo.nxt
group by uc.council_id, d.date
)
select council_id, date, total_overdraft
from council__overdraft
where total_overdraft <> prev_ovd
order by date, council_id
Really council__overdraft is quite usable, the last step just compacts output excluding intermidiate dates when overdraft is not changed.
With following sample data:
users
user_id name account_bal
1 Wells -5
2 James 100
3 Joy 10
4 Mumbi -300
deposits, odered by date, extra row added for the last date
id user_id amount date
3 3 5 2021-04-25
4 4 5 2021-04-25
1 1 5 2021-04-26
2 3 10 2021-04-26
5 3 73 2021-05-06
transactions, odered by date (note the added row, to illustrate running total in action)
id user_id amount_tendered date
5 4 50 2021-04-25
2 2 10 2021-04-26
3 3 15 2021-04-26
1 1 5 2021-04-27
4 3 17 2021-04-27
councils
council_id name
1 a
2 b
3 c
markets
market_id name council_id
1 x 3
2 y 1
3 z 2
market_user_link
id market_id user_id
1 1 3
2 2 2
3 3 1
4 3 4
the query ouput is
council_id
date
overdraft
1
2021-04-01
0
2
2021-04-01
-305
3
2021-04-01
0
2
2021-04-25
-350
2
2021-04-26
-345
2
2021-04-27
-350
3
2021-04-27
-7
3
2021-05-06
0
Alternatively, provided the users table is holding a closing (NOW()) balance, replace user_running_total CTE with the following code
user_running_total as (
select user_id, date,
coalesce(lead(date) over(partition by user_id order by date) - interval 1 day, date) nxt,
coalesce(sum(sum(s)) over(partition by user_id order by date desc
rows between unbounded preceding and 1 preceding), sum(s)) rt
from (
select user_id, date, amount_tendered s
from transactions
union all
select user_id, date, -amount
from deposits
union all
select user_id, se.d, se.s
from users
cross join lateral (
select date(NOW() + interval 1 day) d, account_bal s
union all
select '2021-04-01' d, 0
) se
) t
where DATE(date) between date '2021-04-01' and date(NOW() + interval 1 day)
group by user_id, date
),
This way the query starts with closing balance dated next date after now and rollouts a running total in the reverse order till '2021-04-01' as a starting date.
Output
council_id
date
overdraft
1
2021-04-01
0
2
2021-04-01
-260
3
2021-04-01
-46
2
2021-04-25
-305
3
2021-04-25
-41
2
2021-04-26
-300
3
2021-04-26
-46
2
2021-04-27
-305
3
2021-04-27
-63
3
2021-05-06
0
db-fiddle both versions

How can I join two unrelated mysql table and use group by date

I have 2 tables voucher and ledger the fields are
Ledger:
id total_sale cancel_amount date
1 3000 0 2018-01-20
2 3000 0 2018-01-29
3 5000 0 2018-01-30
4 10000 500 2018-01-30
5 2000 100 2018-01-31
6 2000 0 2018-01-31
voucher:
id expenditure date
1 500 2018-01-20
2 800 2018-01-30
3 1000 2018-01-30
4 200 2018-01-31
5 300 2018-01-31
I want a result like[ date between 2018-01-29 to 2018-01-31]
date total_sale total_expenditure
2018-01-29 3000 0
2018-01-30 15000 1800
2018-01-31 4000 500
Please someone help
For such requirements, always prefer a dimension table. Then you'll never get confused on joining two dissimilar tables witch no keys in common.
Tables:
create schema test;
create table date_dim(date date);
insert into date_dim values ('2018-01-20'),
('2018-01-21'),
('2018-01-22'),
('2018-01-23'),
('2018-01-24'),
('2018-01-25'),
('2018-01-26'),
('2018-01-27'),
('2018-01-28'),
('2018-01-29'),
('2018-01-30'),
('2018-01-31');
create table ledger(id int,total_sale int, cancel_amount int, date date);
insert into ledger values
(1,3000,0,'2018-01-20'),
(2,3000,0,'2018-01-29'),
(3,5000,0,'2018-01-30'),
(4,10000,500,'2018-01-30'),
(5,2000,100,'2018-01-31'),
(6,2000,0,'2018-01-31');
create table voucher(id int, expenditure int, date date);
insert into voucher values
(1,500,'2018-01-20'),
(2,800,'2018-01-30'),
(3,1000,'2018-01-30'),
(4,200,'2018-01-31'),
(5,300,'2018-01-31');
SQL Script (for solution):
select l.date,total_sale,
total_expenditure
from
(select d.date date,sum(total_sale)total_sale
from ledger l right join date_dim d on l.date=d.date
where d.date between '2018-01-29' and '2018-01-31'
group by 1)l
join
(select d.date date,sum(expenditure)total_expenditure
from voucher v right join date_dim d on v.date=d.date
where d.date between '2018-01-29' and '2018-01-31'
group by 1)v
on l.date=v.date
group by 1,2,3;
Resultset:
date total_sale total_expenditure
2018-01-29 3000 (null)
2018-01-30 15000 1800
2018-01-31 4000 500
Check the solution at SQL Fiddle
You're hoping to present three different aggregates as columns of your result set.
distinct dates
net sales (total less canceled) (I guess that's what you want.)
expenditures.
You need to create three subqueries and join them.
The dates: (http://sqlfiddle.com/#!9/666dcb/1/0)
SELECT DISTINCT DATE(date) date FROM Ledger
UNION
SELECT DISTINCT DATE(date) date FROM Voucher
The net sales: (http://sqlfiddle.com/#!9/666dcb/2/0)
SELECT DATE(date) date,
SUM(total_sale) - SUM(cancel_amount) total_sale
FROM Ledger
GROUP BY DATE(date)
The expenditures: (http://sqlfiddle.com/#!9/666dcb/3/0)
SELECT DATE(date) date,
SUM(expenditure) total_expenditures
FROM Voucher
GROUP BY DATE(date)
Then you need to join them together on date. (http://sqlfiddle.com/#!9/666dcb/5/0)
SELECT d.date, s.total_sale, e.total_expenditures
FROM (
SELECT DISTINCT DATE(date) date FROM Ledger
UNION
SELECT DISTINCT DATE(date) date FROM Voucher
) d
LEFT JOIN (
SELECT DATE(date) date,
SUM(total_sale) - SUM(cancel_amount) total_sale
FROM Ledger
GROUP BY DATE(date)
) s ON d.date = s.date
LEFT JOIN (
SELECT DATE(date) date,
SUM(expenditure) total_expenditures
FROM Voucher
GROUP BY DATE(date)
) e ON d.date = e.date
WHERE d.date >= somedate AND d.date <= anotherdate
ORDER BY d.date
Why do the first subquery--the one with the dates? It makes sure you get a row in your final result set for dates that don't have any sales or any expenditures.
Why do separate subqueries? Because you want to end up joining three virtual tables--three subqueries--that have either zero or one row per date. If you join tables with more than one row per date, you'll get combinatorial explosion of sales and expenditures, which will exaggerate your sums.
Why use DATE(date)? Because that will allow the date columns in your detail tables to contain date/time values in case you want that.
select l.dates,l.total_sale,
v.total_expenditure
from
(select l.dates dates,sum(total_sale)total_sale
from ledger l
where l.dates between '2018-01-29' and '2018-01-31'
group by l.dates)l
left join
(select v.dates,sum(expenditure)total_expenditure
from voucher v
where v.dates between '2018-01-29' and '2018-01-31'
group by v.dates)v
on l.dates=v.dates;

Calculating aggregated number of days in each month in sql

I've got a table with multiple columns and two of the columns are start_date and end_date.
I need to calculate the number of days in each month. Let's assume I have following data in my table
id | start_date | end_date
1 04.01.2016 15.02.2016
2 07.01.2016 22.01.2016
3 16.05.2016 11.07.2016
I want an output as follows
Month | numberOfTravelDays
January 51
February 15
May 15
June 31
July 11
This output I am expecting is the number of total travel days each month has been utilized. I am having trouble constructing the sql query for this. Can someone assist me on this?
This is what I have for now. And it's not doing the job. The below query also filters only this year's records(but ignore that).
select MONTH(start_date) as month,
COUNT(DATEDIFF(start_date, end_date)) as numberOfTravelDays
from travel
where YEAR(start_date) = YEAR(CURDATE())
group by MONTH(start_date),
MONTH(end_date)
Use a derived table:
select monstart,
sum(datediff(least(m.monend, t.end_date) + interval 1 day,
greatest(m.monstart, t.start_date)
)
) as days_worked
from travel t join
(select date('2016-01-01') as monstart, date('2016-01-31') as monend union all
select date('2016-02-01') as monstart, date('2016-02-29') as monend union all
. . .
) m
on t.end_date >= m.monstart and t.start_date <= m.monend
group by monstart;