Compare the same row - mysql

I have a table - user_tracking - which stores the user_id, purchase sku, and event time_created. Each time a user returns to purchase the original user_id is referenced with a new timestamp:
User_ID Sku Time_Created
1 1234 2012-10-01 01:00:00
2 2345 2012-10-02 02:00:00
3 6789 2012-10-02 01:00:00
2 5432 2012-10-04 04:00:00
I want to measure the return customer usage, but only for customers that have returned within 7-60 days of initial purchase. Currently my query looks something like:
SELECT
total_purchases.user_id as user_1_id,
total_purchases.time_created as time_1_created,
total_purchases.total_purchases as total_original_purchases,
total_return.user_id as user_2_id,
total_return.time_created as time_2_created,
total_return.total_return_purchases as total_return_purchases
FROM (SELECT
user_tracking.user_id as user_id,
user_tracking.time_created as time_created,
COUNT(DISTINCT user_tracking.sku) as total_purchases
FROM user_tracking
WHERE user_tracking.time_created BETWEEN "2012-10-01 00:00:00"
AND "2012-10-15 00:00:00") AS total_purchases
LEFT JOIN (SELECT
user_tracking.user_id as user_id,
user_tracking.time_created as time_created,
COUNT(DISTINCT user_tracking.sku) as total_return_purchases
FROM user_tracking
WHERE user_tracking.time_created BETWEEN "2012-10-01 00:00:00"
and "2012-12-15 00:00:00") AS total_return
ON total_purchases.user_id = total_return.user_id
How can I ensure I'm only measuring purchases within 7-60 days with the original user?

You can use interval
AND datecolumn BETWEEN (datecolumn, INTERVAL 7 DAYS) AND (datecolumn, INTERVAL 60 DAYS)

Related

How to SELECT the last 30 days records from SQL, including days with zero?

I want to SELECT from my table the last 30 day records. My queries looks like this:
SELECT DATE(o_date) as date, count(id) AS sum FROM customers WHERE o_date BETWEEN DATE_SUB(CURDATE(), INTERVAL 30 DAY) AND NOW() GROUP BY o_date
Or this:
SELECT DATE(o_date) AS date, COUNT(id) AS sum FROM customers WHERE o_date >= DATE(NOW()) + INTERVAL -30 DAY GROUP BY DATE(o_date)
I want to create a list with dates and count of id-s.
But where I dont have any records in exact day, the query just skip that date. But I want to insert there a zero.
Example:
id
o_date
1
2021-11-23
2
2021-11-22
3
2021-11-20
4
2021-11-20
5
2021-11-19
6
2021-11-18
7
2021-11-18
The result will be this:
date
sum
2021-11-23
1
2021-11-22
1
2021-11-20
2
2021-11-19
1
2021-11-18
2
But where I dont have records like in this example in 2021-11-21 how can I insert to the sum 0?
Thank you!
UPDATE:
I need this query for MariaDB.
For MariaDB,
SELECT DATE(o_date) AS date, COUNT(id) AS sum FROM customers WHERE o_date BETWEEN DATE_SUB(NOW(), INTERVAL 30 DAY)
AND NOW();
For SQL,
SELECT DATE(o_date) AS date, COUNT(id) AS sum FROM customers WHERE DATEDIFF(day,o_date,GETDATE()) < 31
or
SELECT DATE(o_date) AS date, COUNT(id) AS sum FROM customers WHERE DATEDIFF(day,o_date,GETDATE()) between 0 and 30
From what I could gather, it should be :
SELECT * FROM customers WHERE o_date BETWEEN DATE_SUB(NOW(), INTERVAL 30 DAY) AND NOW();
Link to almost 10 year old post:
MySQL Query - Records between Today and Last 30 Days
Try this query:
SELECT DATE(o_date) AS date, COUNT(id) AS sum FROM customers WHERE o_date >= DATE_ADD(NOW(), INTERVAL -30 DAY)
Your real question seems to be about how to show all 30 days, even days with a zero value.
Since you are using MariaDB 10.0 or newer, there is a nifty trick to give all the days in a range:
MariaDB [test]> SELECT '2019-01-01' + INTERVAL seq-1 DAY AS dates FROM seq_1_to_31;
+-----------------------------------+
| dates |
+-----------------------------------+
| 2019-01-01 |
| 2019-01-02 |
| 2019-01-03 |
| 2019-01-04 |
| 2019-01-05 |
| 2019-01-06 | etc.
So, what you do is
SELECT ...
FROM ( select using seq table ) AS dates
JOIN ( your table ) AS yours ON dates.dy = yours.o_date
WHERE ...
Your secondary question about how to ask for a date range -- both of your attempts give the same result with the same performance.

Get occupancy per every 15-minute slot

We have a room where we can only have XX number of people inside due to current limitations. They come at different times and stay for a different length of time.
I'm trying to get a sum of people currently inside for each 15-min period for a specific date. The server is MySQL 8.0.21 deployed on AWS RDS.
MySQL 8.0 Table: Booking
ID
Name
PartySize
Date
BookedFrom
BookedTo
1
John
2
2021-01-01
2021-01-01 08:30:00
2021-01-01 10:00:00
2
Mary
4
2021-01-01
2021-01-01 09:00:00
2021-01-01 11:00:00
3
Nick
3
2021-01-01
2021-01-01 10:30:00
2021-01-01 12:30:00
I also have a "helper table" with a time slot for each 24 hour 15-min slot
MySQL Table: Timeslot
ID
Time
1
00:00:00
2
00:15:00
3
00:30:00
35
08:30:00
37
09:00:00
38
09:15:00
For example, when I run this query below, I will get the correct count (6 people) for 09:30. What is the most efficient way to get this result for each 15-min slot? Please note that while the BookedTo (datetime field) value may be past midnight, I will always be only making date specific queries.
SELECT
t.id, b.date, t.time, SUM(b.partysize) AS total
FROM
booking b,
timeslot t
WHERE
b.date = '2021-01-01'
AND t.time = '09:15:00'
AND b.bookedfrom <= '2021-01-01 09:15:00'
AND b.bookedto >= '2021-01-01 09:15:00'
Looking for this output for all times (including zeros)
Slot_ID
Date
Time
Total
33
2021-01-01
08:00:00
0
34
2021-01-01
08:15:00
0
35
2021-01-01
08:30:00
2
36
2021-01-01
08:30:00
2
37
2021-01-01
09:00:00
6
38
2021-01-01
09:15:00
6
SELECT
t.id as slot_id,
coalesce(b.date, '2021-01-01') as date,
t.time,
coalesce(sum(b.partysize),0) as total
FROM
timeslot t
LEFT JOIN booking b
ON t.time >= TIME(b.bookedfrom) AND t.time < TIME(b.bookedto) AND b.date = '2021-01-01'
WHERE
t.time BETWEEN '08:00:00' AND '17:00:00'
GROUP BY
t.id,
b.date,
t.time
Now, you have some confusing other requirements, but basically this works because multiple rows of timeslot will match to a single row of booking because of the time range expressed.
The confusing requirements are, you say it's only for 8-5pm, but "bookings might extend to the next day".. does it mean that a booking will start at 4pm and finish at 9am the next day? in which case you might need to adjust the AND b.date = '2021-01-01' to be more like AND (DATE(b.bookedfrom) = '2021-01-01' OR DATE(b.bookedto) = '2021-01-01') ...
Use a CTE that returns the specific date for which you want the results, which may not be the same as the column Date in Booking and CROSS join it to Timeslot.
The result should be LEFT joined to Booking and then aggregate:
WITH cte(Date) AS (SELECT '2021-01-01')
SELECT t.ID, t.time, c.Date,
COALESCE(SUM(b.PartySize), 0) Total
FROM cte c CROSS JOIN Timeslot t
LEFT JOIN Booking b
ON b.BookedFrom <= CONCAT(c.Date, ' ', t.time)
AND b.BookedTo >= CONCAT(c.Date, ' ', ADDTIME(t.time, '00:15:00'))
WHERE t.time BETWEEN '08:00:00' AND '17:00:00'
GROUP BY t.ID, c.Date, t.time
Since BookedFrom and BookedTo may not contain the same date, it is not safe to compare only the time parts of the 2 columns to the column time of Timeslot.
This is why all these conditions in the ON clause are needed.
See the demo.
this query works great ... if you wanna have all dates for all slots .. you will have to have a date table too (ideally within timeslot -> cross join dates and timeslots) ...
use inner join if you wanna get only matching dates and timeslots ..
SELECT t.id as slot_id
, b.date
, t.time as slot
, sum(ifnull(party_size,0)) as total
FROM test.timeslot t
LEFT JOIN test.booking b
ON t.time BETWEEN time(b.booked_from) AND time(b.booked_to)
GROUP BY t.id
, b.date
, t.time;
for all timeslots and selected dates:
https://www.db-fiddle.com/f/gLt2Fs8HTDUakMahZHxcTi/0
for matching timeslots and dates:
SELECT t.id as slot_id
, b.date
, t.time as slot
, sum(ifnull(party_size,0)) as total
FROM test.timeslot t
JOIN test.booking b
ON t.time BETWEEN time(b.booked_from) AND time(b.booked_to)
GROUP BY t.id
, b.date
, t.time;

How to include some data and other data in SQL conditioned to date?

So I have this query, that selects the users, some data, with some filters (such as group that they are in and stuff) and with them the amount they produced (in $) last month (get the last existing record from last month, using MAX(created_date)), for a management platform, which shows how much they produced this month and at the previous (us.amount_produced and up.amount_produced last_month_amount).
The problem is that it doesn't select users that are new (that haven´t produced any amount last month), and I need those to return too.
Any help is appreciated, thanks
(I was thinking about doing a JOIN or even two queries, but I´m sure about the best approach)
Note by examples below that the user #3 didnt have any logs at the User_Performance table before February, he was created on february. So the query below won't return him (i need it to return him)
User table structure:
Users
id email login amount_produced created_date
---------------------------------------------
1 foo#bar.com foo 1000 2019-12-20 22:30:01
2 jack#gmail.com jack 0 2019-12-20 22:30:01
3 john#gmail.com john 2000 2020-02-01 00:00:01
User_Group_Config table structure:
User_Group_Config
user_id group_id
---------------------------------------------
1 4
2 1
3 4
User_Performance table structure this table is a log table that a job inserts data every hour, calculating users productivity and logging:
Users
user_id amount_produced created_date
---------------------------------------------
1 500 2020-01-31 22:30:01
2 0 2020-01-31 22:30:01
1 500 2020-01-31 23:30:01
2 0 2020-01-31 23:30:01
1 1000 2020-02-01 00:30:01
2 0 2020-02-01 00:30:01
3 0 2020-02-01 00:30:01
SELECT
us.id,
us.email,
us.login,
ugc.group_id,
up.user_id,
up.amount_produced last_month_amount
FROM
db.User_Performance AS up,
db.User_Group_Config ugc,
db.User AS us
WHERE
created_date IN (SELECT
MAX(created_date)
FROM
User_Performance
WHERE
/* Here it filters only users that have data last month, I need these AND the ones that have no data to return zero here or null or undefined at this row)*/
MONTH(created_date) = MONTH(CURRENT_DATE - INTERVAL 1 MONTH)
GROUP BY user_id)
AND ugc.group_id = 4
AND up.user_id = ugc.user_id
AND us.id = up.user_id;
Desired Results (note that user #2 wasn´t selected since his group_id is #1
Results
(current month) (previous month)
id email login amount_produced last_month_amount
---------------------------------------------
1 foo#bar.com foo 1000 500
3 john#gmail.com john 0 null or 0
Test
SELECT
us.id,
us.contact_phone,
us.email,
us.first_name,
us.last_name,
us.login,
ugc.group_id,
us.create_date,
us.expire_date,
us.profile_photo,
us.dashboard_enabled,
us.general_rating,
us.rework_rating,
us.amount_produced,
us.amount_spent,
up.user_id,
up.amount_produced last_month_amount
FROM db.User_Performance AS up
LEFT JOIN db.User_Group_Config ugc ON up.user_id = ugc.user_id AND ugc.group_id = 4
LEFT JOIN db.User us ON us.id = up.user_id
WHERE
up.created_date IN (SELECT
MAX(created_date)
FROM
User_Performance
WHERE
/* Here it filters only users that have data last month, I need these AND the ones that have no data to return zero here or null or undefined at this row)*/
MONTH(created_date) = MONTH(CURRENT_DATE - INTERVAL 1 MONTH)
GROUP BY user_id);
Solved using this, with subquery and JOIN (not the best solution, but a solution):
SELECT
us.id,
us.email,
us.login,
ugc.group_id,
us.amount_produced,
(
SELECT
perf.amount_produced
FROM
User_Performance perf
WHERE
perf.user_id = us.id AND
perf.created_date BETWEEN DATE_FORMAT(CURRENT_DATE - INTERVAL 1 MONTH, '%Y-%m-01 00:00:00') and CONCAT(LAST_DAY(CURRENT_DATE - INTERVAL 1 MONTH), " 23:59:59")
ORDER BY
perf.created_date DESC
LIMIT 1
) as amount_produced_last_month
FROM
User AS us
INNER JOIN
User_Group_Config ugc ON ugc.user_id = us.id
WHERE
ugc.group_id = 4;

Amazon Redshift query to get delinquent amount and days past due at the end of month

Changing the question because of a misunderstanding in use case.
Amazon Redshift Query for the following problem statement.
The data structure:
id - primary key
acc_id - id unique to a loan account (this id will be same for all
emi's for a particular loan account, this maybe repeated 6 times or
12 times based on loan tenure which can be 6 months or 12 months
respectively)
status - PAID or UNPAID (the emi's unpaid are followed my unpaid
emi's only)
s_id - just a scheduling id which would be consecutive numbers for a
a particular loan id
due_date - the due date for that particular emi
principal - amount that is due
The table:
id acc_id status s_id due_date principal
9999957 10003 PAID 102 2018-07-02 12:00:00 4205
9999958 10003 UNPAID 103 2018-08-02 12:00:00 4100
9999959 10003 UNPAID 104 2018-09-02 12:00:00 4266
9999960 10003 UNPAID 105 2018-10-02 12:00:00 4286
9999962 10004 PAID 106 2018-07-02 12:00:00 3200
9999963 10004 PAID 107 2018-08-02 12:00:00 3100
9999964 10004 UNPAID 108 2018-09-02 12:00:00 3266
9999965 10004 UNPAID 109 2018-10-02 12:00:00 3286
The use case -
The unpaid amount becomes delinquent (overdue) after the due_date.
So I need to calculate delinquent amount at the end of every month from the first due_date in this case is 2nd July to last due_date (assume it to be 2nd November which is the current month)
I also need to calculate days past due at the end of that month.
Illustration from the above data:
From the sample data provided, no EMI is due at the end of July so amount delinquent is 0
But at the end of August - the id 9999958 is due - as of 31st August
the amount delinquent is 4100 and days past due is 29 (31st August minus 2nd August)
The catch: I need to calculate this for the loan (acc_id) and not the emi.
To further explain, A first EMI will be 29 days due on 1st month and 59 days due on second month, also second EMI will be 29 days due on second month. But I need this at loan level (acc_id).
The same example continued for 30th september, the acc_id 10003 is due since 2nd August so as of 30th September the due amount is 8366 (4100 + 4266) and DPD (days_past_due) is 59 (29 + 30).
Also acc_id 10004 is due 3100 and DPD is 28 (30th september - 2nd september).
The final output would be something like this:
Month_End DPD_Band Amount
2018/08/31 0-29 4100
2018/08/31 30-59 0
2018/08/31 60-89 0
2018/08/31 90+ 0
2018/09/30 0-29 3100
2018/09/30 30-59 8366
2018/09/30 60-89 0
2018/09/30 90+ 0
Query attempt: DPD bands can be created based on case statements on delinquent days. I need real help in first creating End-of-months and then finding the portfolio level amounts as explained above for different delinquent days.
Edited to be RedShift compatible after the op clarified which RDBMS. (MySQL would need a different answer)
The following creates one record for each month between your first record, and the end of last month.
It then joins on to your unpaid records, and the aggregation chooses which bracket to put the results in to.
WITH
first_month AS
(
SELECT LAST_DAY(MIN(due_date)) AS end_date FROM yourTable
),
months AS
(
SELECT
LAST_DAY(ADD_MONTHS(first_month.end_date, s.id)) AS end_date
FROM
first_month
CROSS JOIN
generate_series(
1,
DATEDIFF(month, (SELECT end_date FROM first_month), CURRENT_DATE)
)
AS s(id)
),
monthly_delinquents AS
(
SELECT
yourTable.*,
months.end_date AS month_end_date,
DATEDIFF(DAY, yourTable.due_date, months.end_date) AS days_past_due
FROM
months
LEFT JOIN
yourTable
ON yourTable.status = 'UNPAID'
AND yourTable.due_date < months.end_date
)
SELECT
month_end_date,
SUM(CASE WHEN days_past_due >= 00 AND days_past_due < 30 THEN principal ELSE 0 END) AS dpd_00_29,
SUM(CASE WHEN days_past_due >= 30 AND days_past_due < 60 THEN principal ELSE 0 END) AS dpd_30_59,
SUM(CASE WHEN days_past_due >= 60 AND days_past_due < 90 THEN principal ELSE 0 END) AS dpd_60_89,
SUM(CASE WHEN days_past_due >= 90 THEN principal ELSE 0 END) AS dpd_90plus
FROM
monthly_delinquents
GROUP BY
month_end_date
ORDER BY
month_end_date
That said, normally the idea of pivoting things like this is a bad idea. What happens when something is a year past due? It just sits in the 90plus category and never moves. And, if you want to expand it you need to change the query and any other query you ever write that depends on it.
Instead, you could normalise your output...
WITH
first_month AS
(
SELECT LAST_DAY(MIN(due_date)) AS end_date FROM yourTable
),
months AS
(
SELECT
LAST_DAY(ADD_MONTHS(first_month.end_date, s.id)) AS end_date
FROM
first_month
CROSS JOIN
generate_series(
1,
DATEDIFF(month, (SELECT end_date FROM first_month), CURRENT_DATE)
)
AS s(id)
),
monthly_delinquents AS
(
SELECT
yourTable.*,
months.end_date AS month_end_date,
DATEDIFF(DAY, yourTable.due_date, months.end_date) AS days_past_due
FROM
months
LEFT JOIN
yourTable
ON yourTable.status = 'UNPAID'
AND yourTable.due_date < months.end_date
)
SELECT
month_end_date,
(days_past_due / 30) * 30 AS days_past_due_band,
SUM(principal) AS total_principal,
COUNT(*) AS total_rows
FROM
monthly_delinquents
GROUP BY
month_end_date,
(days_past_due / 30) * 30
ORDER BY
month_end_date,
(days_past_due / 30) * 30

Cumulative Counts by Date Issue

I have a table that shows , for each date, a list of customer ids - shows customers who were active on any particular day. So each date can include ids that are also present in another date.
bdate customer_id
2012-01-12 111
2012-01-13 222
2012-01-13 333
2012-01-14 111
2012-01-14 333
2012-01-14 666
2012-01-14 777
I am looking to write a query which calculates the total number of unique ids between two dates - the starting date is the row date and the ending date is a particular date in the future.
My query looks like this:
select
bdate,
count(distinct customer_id) as cts
from users
where bdate between bdate and current_date
group by 1
order by 1
But this produces a count of unique users for each date like this:
bdate customer_id
2012-01-12 1
2012-01-13 2
2012-01-14 4
my desired result is ( for a count of users between starting row date and 2012-01-14 )
bdate customer_id
2012-01-12 5 - includes (111,222,333,666,777)
2012-01-13 5 - includes (222,333,111,666,777)
2012-01-14 4 - includes (111,333,666,777)
Like #Strawberry said, you can make a join like this:
select
t1.bdate,
count(distinct t2.customer_id) as cts
from users t1
join users t2 on t2.bdate >= t1.bdate
where t1.bdate between t1.bdate and current_date
group by t1.bdate
order by t1.bdate
join t2 can get you all the users between particular day and current_date, then count t2's customer_id, that's it.
SqlFiddle Demo Here