I have a table with created_at and deleted_at timestamps. I need to know, for each week, how many records existed at any point that week:
week
records
2022-01
4
2022-02
5
...
...
Essentially, records that were created before the end of the week and deleted after the beginning of the week.
I've tried various variations of the following but it's under-reporting and I can't work out why:
SELECT
DATE_FORMAT(created_at, '%Y-%U') AS week,
COUNT(*)
FROM records
WHERE
deleted_at > DATE_SUB(deleted_at, INTERVAL (WEEKDAY(deleted_at)+1) DAY)
AND created_at < DATE_ADD(created_at, INTERVAL 7 - WEEKDAY(created_at) DAY)
GROUP BY week
ORDER BY week
Any help would be massively appreciated!
I would create a table wktable that looks like so (for the last 5 weeks of last year):
yrweek | wkstart | wkstart
-------+------------+------------
202249 | 2022-11-27 | 2022-12-03
202250 | 2022-12-04 | 2022-12-10
202251 | 2022-12-11 | 2022-12-17
202252 | 2022-12-18 | 2022-12-24
202253 | 2022-12-25 | 2022-12-31
To get there, find a way to create 365 consecutive integers, make all the dates of 2022 out of that, and group them by year-week.
This is an example:
CREATE TABLE wk AS
WITH units(units) AS (
SELECT 0 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION
SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9
)
,tens AS(SELECT units * 10 AS tens FROM units )
,hundreds AS(SELECT tens * 10 AS hundreds FROM tens )
,
i(i) AS (
SELECT hundreds +tens +units
FROM units
CROSS JOIN tens
CROSS JOIN hundreds
)
,
dt(dt) AS (
SELECT
DATE_ADD(DATE '2022-01-01', INTERVAL i DAY)
FROM i
WHERE i < 365
)
SELECT
YEAR(dt)*100 + WEEK(dt) AS yrweek
, MIN(dt) AS wkstart
, MAX(dt) AS wkend
FROM dt
GROUP BY yrweek
ORDER BY yrweek;
With that table, go:
SELECT
yrweek
, COUNT(*) AS records
FROM wk
JOIN input_table ON wk.wkstart < input_table.deleted_at
AND wk.wkend > input_table.created_at
GROUP BY
yrweek
;
I first build a list with the records, their open count, and the closed count
SELECT
created_at,
deleted_at,
(SELECT COUNT(*)
from records r2
where r2.created_at <= r1.created_at ) as new,
(SELECT COUNT(*)
from records r2
where r2.deleted_at <= r1.created_at) as closed
FROM records r1
ORDER BY r1.created_at;
After that it's just adding a GROUP BY:
SELECT
date_format(created_at,'%Y-%U') as week,
MAX((SELECT COUNT(*)
from records r2
where r2.created_at <= r1.created_at )) as new,
MAX((SELECT COUNT(*)
from records r2
where r2.deleted_at <= r1.created_at)) as closed
FROM records r1
GROUP BY week
ORDER BY week;
see: DBFIDDLE
NOTE: Because I use random times, the results will change when re-run. A sample output is:
week
new
closed
2022-00
31
0
2022-01
298
64
2022-02
570
212
2022-03
800
421
I have a system that stores the data only when they are changed. So, the dataset looks like below.
data_type_id
data_value
inserted_at
2
240
2022-01-19 17:20:52
1
30
2022-01-19 17:20:47
2
239
2022-01-19 17:20:42
1
29
2022-01-19 17:20:42
My data frequency is every 5 seconds. So, whether there's any timestamp or not I need to get the result by assuming in this 5th-second data value the same as the previous value.
As I am storing the data that are only changed, indeed the dataset should be like below.
data_type_id
data_value
inserted_at
2
240
2022-01-19 17:20:52
1
30
2022-01-19 17:20:52
2
239
2022-01-19 17:20:47
1
30
2022-01-19 17:20:47
2
239
2022-01-19 17:20:42
1
29
2022-01-19 17:20:42
I don't want to insert into my table, I just want to retrieve the data like this on the SELECT statement.
Is there any way I can create this query?
PS. I have many data_types hence when the OP makes a query, it usually gets around a million rows.
EDIT:
Information about server Server version: 10.3.27-MariaDB-0+deb10u1 Debian 10
The User is going to determine the SELECT DateTime. So, there's no certain between time.
As #Akina mentioned, sometimes there're some gaps between the inserted_at. The difference might be ~4seconds or ~6seconds instead of a certain 5seconds. Since it's not going to happen so frequently, It is okay to generate by ignoring this fact.
With the help of a query that gets you all the combinations of data_type_id and the 5-second moments you need, you can achieve the result you need using a subquery that gets you the closest data_value:
with recursive u as
(select '2022-01-19 17:20:42' as d
union all
select DATE_ADD(d, interval 5 second) from u
where d < '2022-01-19 17:20:52'),
v as
(select * from u cross join (select distinct data_type_id from table_name) t)
select v.data_type_id,
(select data_value from table_name where inserted_at <= d and data_type_id = v.data_type_id
order by inserted_at desc limit 1) as data_value,
d as inserted_at
from v
Fiddle
You can replace the recursive CTE with any query that gets you all the 5-second moments you need.
WITH RECURSIVE
cte1 AS ( SELECT #start_datetime dt
UNION ALL
SELECT dt + INTERVAL 5 SECOND FROM cte1 WHERE dt < #end_datetime),
cte2 AS ( SELECT *,
ROW_NUMBER() OVER (PARTITION BY test.data_type_id, cte1.dt
ORDER BY test.inserted_at DESC) rn
FROM cte1
LEFT JOIN test ON FIND_IN_SET(test.data_type_id, #data_type_ids)
AND cte1.dt >= test.inserted_at )
SELECT *
FROM cte2
WHERE rn = 1
https://dbfiddle.uk/?rdbms=mariadb_10.3&fiddle=380ad334de0c980a0ddf1b49bb6fa38e
I'm currently stuck on how to create a statement that shows daily overdraft statements for a particular council.
I have the following, councils, users, markets, market_transactions, user_deposits.
market_transaction run daily reducing user's account balance. When the account_balance is 0 the users go into overdraft (negative). When users make a deposit their account balance increases.
I Have put the following tables to show how transactions and deposits are stored.
if I reverse today's transactions I'm able to get what account balance a user had yesterday but to formulate a query to get the daily OD amount is where the problem is.
USERS
user_id
name
account_bal
1
Wells
-5
2
James
100
3
Joy
10
4
Mumbi
-300
DEPOSITS
id
user_id
amount
date
1
1
5
2021-04-26
2
3
10
2021-04-26
3
3
5
2021-04-25
4
4
5
2021-04-25
TRANSACTIONS
id
user_id
amount_tendered
date
1
1
5
2021-04-27
2
2
10
2021-04-26
3
3
15
2021-04-26
4
4
50
2021-04-25
The Relationships are as follows,
COUNCILS
council_id
name
1
a
2
b
3
c
MARKETS
market_id
name
council_id
1
x
3
2
y
1
3
z
2
MARTKET_USER_LINK
id
market_id
user_id
1
1
3
2
2
2
3
3
1
I'm running this SQL query to get the total amount users have spent and subtracting with the current user account balance.
Don't know If I can use this to figure out the account_balance for each day.
SELECT u.user_id, total_spent, total_deposits,m.council_id
FROM users u
JOIN market_user_link ul ON ul.user_id= u.user_id
LEFT JOIN markets m ON ul.market_id =m.market_id
LEFT JOIN councils c ON m.council_id =c.council_id
LEFT JOIN (
SELECT user_id, SUM(amount_tendered) AS total_spent
FROM transactions
WHERE DATE(date) BETWEEN DATE('2021-02-01') AND DATE(NOW())
GROUP BY user_id
) t ON t.user_id= u.user_id
ORDER BY user_id, total_spent ASC
// looks like this when run
| user_id | total_spent | council_id |
|-------------|----------------|------------|
| 1 | 50.00 | 1 |
| 2 | 2.00 | 3 |
I was hoping to reverse transactions and deposits done to get the account balance for a day then get the sum of users with an account balance < 0... But this has just failed to work.
The goal is to produce a query that shows daily overdraft (Only SUM the total account balance of users with account balance below 0 ) for a particular council.
Expected Result
date
council_id
o_d_amount
2021-04-24
1
-300.00
2021-04-24
2
-60.00
2021-04-24
3
-900.00
2021-04-25
1
-600.00
2021-04-25
2
-100.00
2021-04-25
3
-1200.00
This is actually not that hard, but the way you asked makes it hard to follow.
Also, your expected result should match the data you provided.
Edited: Previous solution was wrong - It counted withdraws and deposits more than once if you have more than one event for each user/date.
Start by having the total exchanged on each day, like
select user_id, date, sum(amount) exchanged_on_day from (
select user_id, date, amount amount from deposits
union all select user_id, date, -amount_tendered amount from transactions
) d
group by user_id, date
order by user_id, date;
What follows gets the state of the account only on days that had any deposits or withdraws.
To get the results of all days (and not just those with account movement) you just have to change the cross join part to get a table with all dates you want (like Get all dates between two dates in SQL Server) but I digress...
select dates.date, c.council_id, u.name username
, u.account_bal - sum(case when e.date >= dates.date then e.exchanged_on_day else 0 end) as amount_on_start_of_day
, u.account_bal - sum(case when e.date > dates.date then e.exchanged_on_day else 0 end) as amount_on_end_of_day
from councils c
inner join markets m on c.council_id=m.council_id
inner join market_user_link mul on m.market_id=mul.market_id
inner join users u on mul.user_id=u.user_id
left join (
select user_id, date, sum(amount) exchanged_on_day from (
select user_id, date, amount amount from deposits
union all select user_id, date, -amount_tendered amount from transactions
) d group by user_id, date
) e on u.user_id=e.user_id --exchange on each Day
cross join (select distinct date from (select date from deposits union select date from transactions) datesInternal) dates --all days that had a transaction
group by dates.date, c.council_id, u.name, u.account_bal
order by dates.date desc, c.council_id, u.name;
From there you can rearrange to get the result you want.
select date, council_id
, sum(case when amount_on_start_of_day<0 then amount_on_start_of_day else 0 end) o_d_amount_start
, sum(case when amount_on_end_of_day<0 then amount_on_end_of_day else 0 end) o_d_amount_end
from (
select dates.date, c.council_id, u.name username
, u.account_bal - sum(case when e.date >= dates.date then e.exchanged_on_day else 0 end) as amount_on_start_of_day
, u.account_bal - sum(case when e.date > dates.date then e.exchanged_on_day else 0 end) as amount_on_end_of_day
from councils c
inner join markets m on c.council_id=m.council_id
inner join market_user_link mul on m.market_id=mul.market_id
inner join users u on mul.user_id=u.user_id
left join (
select user_id, date, sum(amount) exchanged_on_day from (
select user_id, date, amount amount from deposits
union all select user_id, date, -amount_tendered amount from transactions
) d group by user_id, date
) e on u.user_id=e.user_id --exchange on each Day
cross join (select distinct date from (select date from deposits union select date from transactions) datesInternal) dates --all days that had a transaction
group by dates.date, c.council_id, u.name, u.account_bal
) result
group by date, council_id
order by date;
You can check it on https://www.db-fiddle.com/f/msScT6B5F7FjU2aQXVr2da/6
Basically the query maps users to councils, caculates periods of overdrafts for users, them aggregates over councils. I assume that starting balance is dated start of the month '2021-04-01' (it could be ending balance as well, see below), change it as needed. Also that negative starting balance counts as an overdraft. For simplicity and debugging the query is divided into a number of steps.
with uc as (
select distinct m.council_id, mul.user_id
from markets m
join market_user_link mul on m.market_id = mul.market_id
),
user_running_total as (
select user_id, date,
coalesce(lead(date) over(partition by user_id order by date) - interval 1 day, date) nxt,
sum(sum(s)) over(partition by user_id order by date) rt
from (
select user_id, date, -amount_tendered s
from transactions
union all
select user_id, date, amount
from deposits
union all
select user_id, se.d, se.s
from users
cross join lateral (
select date(NOW() + interval 1 day) d, 0 s
union all
select '2021-04-01' d, account_bal
) se
) t
group by user_id, date
),
user_overdraft as (
select user_id, date, nxt, least(rt, 0) ovd
from user_running_total
where date <= date(NOW())
),
dates as (
select date
from user_overdraft
union
select nxt
from user_overdraft
),
council__overdraft as (
select uc.council_id, d.date, sum(uo.ovd) total_overdraft, lag(sum(uo.ovd), 1, sum(uo.ovd) - 1) over(partition by uc.council_id order by d.date) prev_ovd
from uc
cross join dates d
join user_overdraft uo on uc.user_id = uo.user_id and d.date between uo.date and uo.nxt
group by uc.council_id, d.date
)
select council_id, date, total_overdraft
from council__overdraft
where total_overdraft <> prev_ovd
order by date, council_id
Really council__overdraft is quite usable, the last step just compacts output excluding intermidiate dates when overdraft is not changed.
With following sample data:
users
user_id name account_bal
1 Wells -5
2 James 100
3 Joy 10
4 Mumbi -300
deposits, odered by date, extra row added for the last date
id user_id amount date
3 3 5 2021-04-25
4 4 5 2021-04-25
1 1 5 2021-04-26
2 3 10 2021-04-26
5 3 73 2021-05-06
transactions, odered by date (note the added row, to illustrate running total in action)
id user_id amount_tendered date
5 4 50 2021-04-25
2 2 10 2021-04-26
3 3 15 2021-04-26
1 1 5 2021-04-27
4 3 17 2021-04-27
councils
council_id name
1 a
2 b
3 c
markets
market_id name council_id
1 x 3
2 y 1
3 z 2
market_user_link
id market_id user_id
1 1 3
2 2 2
3 3 1
4 3 4
the query ouput is
council_id
date
overdraft
1
2021-04-01
0
2
2021-04-01
-305
3
2021-04-01
0
2
2021-04-25
-350
2
2021-04-26
-345
2
2021-04-27
-350
3
2021-04-27
-7
3
2021-05-06
0
Alternatively, provided the users table is holding a closing (NOW()) balance, replace user_running_total CTE with the following code
user_running_total as (
select user_id, date,
coalesce(lead(date) over(partition by user_id order by date) - interval 1 day, date) nxt,
coalesce(sum(sum(s)) over(partition by user_id order by date desc
rows between unbounded preceding and 1 preceding), sum(s)) rt
from (
select user_id, date, amount_tendered s
from transactions
union all
select user_id, date, -amount
from deposits
union all
select user_id, se.d, se.s
from users
cross join lateral (
select date(NOW() + interval 1 day) d, account_bal s
union all
select '2021-04-01' d, 0
) se
) t
where DATE(date) between date '2021-04-01' and date(NOW() + interval 1 day)
group by user_id, date
),
This way the query starts with closing balance dated next date after now and rollouts a running total in the reverse order till '2021-04-01' as a starting date.
Output
council_id
date
overdraft
1
2021-04-01
0
2
2021-04-01
-260
3
2021-04-01
-46
2
2021-04-25
-305
3
2021-04-25
-41
2
2021-04-26
-300
3
2021-04-26
-46
2
2021-04-27
-305
3
2021-04-27
-63
3
2021-05-06
0
db-fiddle both versions
I have an orders table
Order_id User_id Order_date
1 32 2020-07-19
2 24 2020-07-21
3 27 2020-07-27
4 24 2020-08-14
5 32 2020-08-18
6 32 2020-08-19
7 58 2020-08-20
Now I want to find how many of the users ordered in 1st month also ordered in the next month. In this case, user_id's 32,24,27 ordered in 7th month but only 24 and 32 ordered in the next month.
I want the result to be like :
Date Retained_Users Total_users
2020-07 Null 3
2020-08 2 3
I'm lost here. Can someone please help me with this?
In MySQL 8.0, you can do this with window functions:
select
order_month,
count(distinct case when cnt_orders_last_month > 0 then user_id end) retained_users,
count(distinct user_id) total_users
from (
select
user_id,
date_format(order_date, '%Y-%m-01') as order_month,
count(*) over(
partition by user_id
order by date(date_format(order_date, '%Y-%m-01'))
range between interval 1 month preceding and interval 1 day preceding
) cnt_orders_last_month
from mytable
) t
group by order_month
The logic lies in the range specification of the window function; it orders record by month, and counts how many orders the customer placed last month. Then all that is left to do is aggregate and count distinct users.
Demo on DB Fiddle
I have a table of available date blocks (7 days in my case) which may or may not be consecutive:
start_date end_date booked id room_id
2012-07-14 2012-07-21 0 1 6
2012-07-21 2012-07-28 0 2 6
2012-07-28 2012-08-04 1 3 6
2012-08-04 2012-08-11 0 4 6
What I'd like to do is be able to get a result set that gives me one row per X weeks of consecutive unbooked dates, within a date range.
So, for 2 week blocks starting on the 14th of July and using the above table data, I would expect the following:
start_date end_date booked
2012-07-14 2012-07-28 0
The second block of 2 weeks would not be returned as one of the component weeks is booked.
Here are a few ideas I've tried already:
SELECT
MIN(start_date) AS start_date_min,
MAX(end_date) AS end_date_max,
CAST(GROUP_CONCAT(id) AS CHAR) AS ids,
SUM(booked) AS booked
FROM
available_dates
WHERE
(start_date>=20120714 AND end_date<=DATE_ADD(20120714, INTERVAL 14 DAY))
GROUP BY
room_id
HAVING
end_date_max=DATE_ADD(20120714, INTERVAL 14 DAY)
This gets me part of the way, however doesn't get me the consecutive results - that is the important part. It also only returns a single result (probably because of the HAVING clause) when I widen the test data.
Can anyone point me in the right direction?
If you have a calendar or a numbers table:
CREATE TABLE num
( i INT NOT NULL
, PRIMARY KEY (i)
) ;
INSERT INTO num
(i)
VALUES
(0), (1), (2), ..., (1000) ;
You could use something like this:
SELECT
avail.room_id,
MIN(avail.start_date) AS start_date_min,
MAX(avail.end_date) AS end_date_max,
CAST(GROUP_CONCAT(avail.id) AS CHAR) AS ids,
SUM(avail.booked) AS booked
FROM
available_dates AS avail
CROSS JOIN
( SELECT DATE('2012-07-14') AS start_date_check
, 52 AS max_week_check
) AS param
JOIN
num
ON avail.start_date = param.start_date_check + INTERVAL num.i WEEK
AND num.i < param.max_week_check
WHERE
avail.booked = 0
GROUP BY
avail.room_id,
( num.i / 2 )
HAVING
COUNT(*) = 2
You could also have this:
WHERE
1 =1 --- no WHERE condition
GROUP BY
avail.room_id,
( num.i / 2 )
HAVING --- and optionally
SUM(avail.booked) = 0 --- this