MySQL query calculate user retention in a certain date-range - mysql

I'm trying to select the count of users retaining in a certain date range and that was successful using this query and the table below:
+----------+-------------+
| Field | Type |
+----------+-------------+
| id | varchar(17) |
| log_date | date |
+----------+-------------+
SELECT last_day.log_date, COUNT(distinct last_day.id) as users_num
FROM (SELECT DISTINCT log_date, id
FROM `userActivity`) this_day
JOIN (SELECT DISTINCT log_date, id
FROM `userActivity`) last_day
ON this_day.id = last_day.id
AND this_day.log_date = "2018-10-01"
AND last_day.log_date BETWEEN "2018-10-01" AND "2018-10-30"
GROUP BY log_date;
But the problem that I'm facing is that I want the assume that every day of the date-range is day 0. (similar to the following example):
Note that the first row in the pic is the avg of the below results I need to calculate. Anyone has any idea how can I enhance my query to get the result like the picture?

This solution will only work on MySQL 8.x only since it requires CTEs (Common Table Expressions):
with digits as (
select 0 as n union select 1 union select 2 union select 3 union select 4
union select 5 union select 6 union select 7 union select 8 union select 9
),
series as (
select d1.n * 100 + d2.n * 10 + d3.n as n -- number series from 0 to 999
from digits d1
cross join digits d2
cross join digits d3
)
SELECT last_day.log_date, COUNT(distinct last_day.id) as users_num,
date_add("2018-10-01", interval s.n day) as current_start
FROM (SELECT DISTINCT log_date, id
FROM `userActivity`) this_day
JOIN (SELECT DISTINCT log_date, id
FROM `userActivity`) last_day
ON this_day.id = last_day.id
cross join series s
WHERE s.n <= 30
AND this_day.log_date = date_add("2018-10-01", interval s.n day)
AND last_day.log_date BETWEEN date_add("2018-10-01", interval s.n day)
AND date_add("2018-10-30", interval s.n day)
GROUP BY log_date, date_add("2018-10-01", interval s.n day);

Related

MySQL query for records that existed at any point each week

I have a table with created_at and deleted_at timestamps. I need to know, for each week, how many records existed at any point that week:
week
records
2022-01
4
2022-02
5
...
...
Essentially, records that were created before the end of the week and deleted after the beginning of the week.
I've tried various variations of the following but it's under-reporting and I can't work out why:
SELECT
DATE_FORMAT(created_at, '%Y-%U') AS week,
COUNT(*)
FROM records
WHERE
deleted_at > DATE_SUB(deleted_at, INTERVAL (WEEKDAY(deleted_at)+1) DAY)
AND created_at < DATE_ADD(created_at, INTERVAL 7 - WEEKDAY(created_at) DAY)
GROUP BY week
ORDER BY week
Any help would be massively appreciated!
I would create a table wktable that looks like so (for the last 5 weeks of last year):
yrweek | wkstart | wkstart
-------+------------+------------
202249 | 2022-11-27 | 2022-12-03
202250 | 2022-12-04 | 2022-12-10
202251 | 2022-12-11 | 2022-12-17
202252 | 2022-12-18 | 2022-12-24
202253 | 2022-12-25 | 2022-12-31
To get there, find a way to create 365 consecutive integers, make all the dates of 2022 out of that, and group them by year-week.
This is an example:
CREATE TABLE wk AS
WITH units(units) AS (
SELECT 0 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION
SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9
)
,tens AS(SELECT units * 10 AS tens FROM units )
,hundreds AS(SELECT tens * 10 AS hundreds FROM tens )
,
i(i) AS (
SELECT hundreds +tens +units
FROM units
CROSS JOIN tens
CROSS JOIN hundreds
)
,
dt(dt) AS (
SELECT
DATE_ADD(DATE '2022-01-01', INTERVAL i DAY)
FROM i
WHERE i < 365
)
SELECT
YEAR(dt)*100 + WEEK(dt) AS yrweek
, MIN(dt) AS wkstart
, MAX(dt) AS wkend
FROM dt
GROUP BY yrweek
ORDER BY yrweek;
With that table, go:
SELECT
yrweek
, COUNT(*) AS records
FROM wk
JOIN input_table ON wk.wkstart < input_table.deleted_at
AND wk.wkend > input_table.created_at
GROUP BY
yrweek
;
I first build a list with the records, their open count, and the closed count
SELECT
created_at,
deleted_at,
(SELECT COUNT(*)
from records r2
where r2.created_at <= r1.created_at ) as new,
(SELECT COUNT(*)
from records r2
where r2.deleted_at <= r1.created_at) as closed
FROM records r1
ORDER BY r1.created_at;
After that it's just adding a GROUP BY:
SELECT
date_format(created_at,'%Y-%U') as week,
MAX((SELECT COUNT(*)
from records r2
where r2.created_at <= r1.created_at )) as new,
MAX((SELECT COUNT(*)
from records r2
where r2.deleted_at <= r1.created_at)) as closed
FROM records r1
GROUP BY week
ORDER BY week;
see: DBFIDDLE
NOTE: Because I use random times, the results will change when re-run. A sample output is:
week
new
closed
2022-00
31
0
2022-01
298
64
2022-02
570
212
2022-03
800
421

create calendar table and join it to my table - lost as can be

I am not great at MYSQL but am trying to learn. I need to create a calendar table and LEFT JOIN it to my datatable. The reason is I am counting bookings each week and need t to show "0" if there are no bookings for that week. I cant use a standard calendar table I dont think because I dont want to have to manually fill in each week for all of eternity. I am looking for a solution that can add dates into the future once set up without human oversight. This should be dynamic and never ending unlike some examples I have found.I am on MYSQL 5.4
For the calendar table I have cobbled together this but don't think it is right:
SELECT
DATE(booking_date + INTERVAL (6 - WEEKDAY(booking_date)) DAY) EndOfWeekDate
FROM my_table
group by EndOfWeekDate
ORDER BY EndOfWeekDate DESC
and am trying to LEFT JOIN the above to:
SELECT DATE_FORMAT(booking_date, "%M %d %Y") AS week_Ending, CONCAT(YEAR(booking_date), '/', WEEK(booking_date)) AS week_name,
DATE(booking_date + INTERVAL (6 - WEEKDAY(booking_date)) DAY) EndOfWeekDate,
YEAR(booking_date), WEEK(booking_date), COUNT(*)
FROM my_table
GROUP BY EndOfWeekDate
ORDER BY EndOfWeekDate DESC
I am nowhere close to being able to get this right after toying for about 2 hours. Would someone a lot more experienced than me be able to illuminate where I should be going in order to make this happen?
Desired Display:
June 30 2020 | 2020/26 | 2020 | 26 | 5
July 6th 2020 | 2020/27 | 2020 | 27 | 0
July 13 2020 | 2020/28 | 2020 | 28 | 0
July 20 2020 | 2020/29 | 2020 | 29 | 2
Sample data
ID | Date | NAME
12 | 2020-08-24 | Bob Smith
#Ted Basically what you have to do is create a derived table that you can LEFT JOIN your existing table to so every day in the month will be listed and you can display data from your table for the days where it is present. This query is copy/paste ready, all you need to do is change the references to my_table to be the table name you want to get the data from.
Here's the example query:
SELECT `dateList`.`Date` AS `Date`,
CONCAT(YEAR(`dateList`.`Date`),'/',DAY(`dateList`.`Date`)) AS `year_month`,
YEAR(`dateList`.`Date`) AS `year`,
DAY(`dateList`.`Date`) AS `day`,
'Bob Smith' AS `name`,
CASE WHEN `mt`.`date` IS NULL THEN 0
ELSE COUNT(`mt`.`id`)
END AS `amt`
FROM
-- ---------------------------------------------------------------------------------------------------------------------------
-- this is the part you can copy/paste to be used to left join in a table of your choice ----------------------------------
(
SELECT `a`.`Date`
FROM (
SELECT LAST_DAY('2020-08-01') - INTERVAL (`a`.`a` + (10 * `b`.`a`) + (100 * `c`.`a`)) DAY AS `Date`
FROM (SELECT 0 AS `a` UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) AS `a`
CROSS JOIN (SELECT 0 AS `a` UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) AS `b`
CROSS JOIN (SELECT 0 AS `a` UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) AS `c`
) AS `a`
WHERE `a`.`Date` BETWEEN '2020-08-01' AND LAST_DAY('2020-08-01')
) AS `dateList`
-- ---------------------------------------------------------------------------------------------------------------------------
LEFT JOIN `my_table` AS `mt` ON `dateList`.`Date` = DATE(`mt`.`date`)
AND `mt`.`name` = 'Bob Smith'
GROUP BY `dateList`.`Date`
ORDER BY `dateList`.`Date` ASC
Here is a working example in sql fiddle.
Let me know if this doesn't make sense or if you need something different.

Counting reservations for each day, where res could span multiple days and should count for each day

I have a table with reservations in it. Each row is a reservation and has a start & end datetime field.
I want to construct a query which gives me the count of reservations on each day in a certain time interval, eg april 2018.
Selecting all the reservations within the given interval is fairly simple:
SELECT * FROM reservation
WHERE start <= '2018-05-01 00:00:00'
AND end >= '2018-04-01 00:00:00'
But then the 'trouble' starts.
I want to display a 'count' of reservations on each day in the interval. But a reservation could span multiple days. So grouping them on DAY(start) is not correct.
I don't want to query each day in the interval seperately as this would be very server-intensive.
Is there a way to do this through a MySQL query?
Sample data:
id | start | end
2 | 2018-04-01 12:00:00 | 2018-04-03 09:00:00
3 | 2018-04-01 09:00:00 | 2018-04-01 11:00:00
4 | 2018-04-06 13:00:00 | 2018-05-20 09:00:00
Result for 2018-04-01 to 2018-04-06:
2018-04-01 | 2 (2/3)
2018-04-02 | 1 (2)
2018-04-03 | 1 (2)
2018-04-04 | 0
2018-04-05 | 0
2018-04-06 | 1 (4)
in a sqlfiddle: http://sqlfiddle.com/#!9/e62ffa/2/0
First we will reuse the answer from DBA StackExchange. (You can use the accepted answer if you want, you would just need to create a dedicated table for that).
We will just modify the query a bit by using the condition that you need.
Your condition:
SELECT * FROM reservation
WHERE start <= '2018-05-01 00:00:00'
AND end >= '2018-04-01 00:00:00'
Modified answer from DBA Stackexchange:
SELECT date_field
FROM
(
SELECT
MAKEDATE(YEAR(NOW()),1) +
INTERVAL (MONTH(NOW())-1) MONTH +
INTERVAL daynum DAY date_field
FROM
(
SELECT t * 10 + u daynum
FROM
(SELECT 0 t UNION SELECT 1 UNION SELECT 2 UNION SELECT 3) A,
(SELECT 0 u UNION SELECT 1 UNION SELECT 2 UNION SELECT 3
UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7
UNION SELECT 8 UNION SELECT 9) B
ORDER BY daynum
) AA
) AAA
/*WHERE MONTH(date_field) = MONTH(NOW())*/
WHERE date_field BETWEEN '2018-04-01' AND '2018-05-01'
Take note that I only changed the WHERE Clause.
Now using that query as a DERIVED TABLE, we will include your Reservations table using LEFT JOIN.
SELECT D.date_field
, COUNT(R.Id)
FROM (
/* The query from above goes here */
) D
LEFT JOIN Reservations R ON D.date_field BETWEEN DATE(R.StartDate) AND DATE(R.EndDate)
GROUP BY D.date_field
Notice again that we used the DATE function to truncate the TIME part of our StartDate and EndDate because for example, 2018-04-01 denotes the whole day and it cannot be in between 2018-04-01 09:00:00 and 2018-04-01 11:00:00 for some under the hood reason I am not completely familiar of.
Here is a SQL Fiddle Demo of the result.
If someone could help me on this one. SELECT '2018-04-02' BETWEEN '2018-04-01 23:59:59' AND '2018-04-02 00:00:00' will result to 1 (TRUE). It seems that by default DATE will have a TIMESTAMP of 00:00:00.
Update for More Flexible Date Range (2018-04-11)
The query above from DBA StackExchange only lists down the days of the current month. I tried to search a bit and found this another good answer here in StackOverflow. Here is a part of the query:
SELECT CURDATE() - INTERVAL (A.A+ (10 * B.A)) DAY AS Date
FROM (
SELECT 0 AS A UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4
UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9
) AS A
CROSS JOIN (
SELECT 0 AS A UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4
UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9
) AS B
The query above will generate numbers (1 to 100) using CROSS JOIN and then subtracting it to the Current Date, then you will have dates from now up to 100 days back. You can add another CROSS JOIN of numbers to generate 1000 numbers if necessary.
I assume you will have StartDate and EndDate in your stored procedure or somewhere. We can replace the CURDATE with EndDate and then we will have 100 days back up to our EndDate. We will just add a WHERE clause to filter only the dates that we need using subquery/derived table.
SELECT D.Date
FROM (
SELECT CURDATE() - INTERVAL (A.A+ (10 * B.A)) DAY AS Date
FROM (
SELECT 0 AS A UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4
UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9
) AS A
CROSS JOIN (
SELECT 0 AS A UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4
UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9
) AS B
) AS D
WHERE D.Date BETWEEN #startDate AND #endDate
We can now use LEFT JOIN to include the Reservations table.
Here is another SQL Fiddle Demo for that. This also includes the Start and End Date variables, and a sample date range spanning from a previous year to the current year.
Again if you need more than 100 days of range, we will just need to add another CROSS JOIN of numbers, let's name that as C:
CROSS JOIN (
SELECT 0 AS A UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4
UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9
) AS C
And then add it to the calculation of past days in the SELECT statement.
SELECT CURDATE() - INTERVAL (A.A + (10 * B.A) + (100 * C.A)) DAY AS Date

non available row in result group by

I used the following query
select month(SubmittedDate), count(policyid) from tblpolicy p join tlkppolicystatus s on p.StatusID=s.StatusID where SubmittedDate between
'2017-01-01' and sysdate() and s.StatusID=1 group by month(SubmittedDate);
This returns the following output which is correct as month number 3 and 4 don't have any data.
Month Total
-----|-----
1 | 62
2 | 34
5 | 1
But I want the output to be like
Month Total
-----|-----
1 | 62
2 | 34
3 | 0
4 | 0
5 | 1
So that means if any month do have any data then also it will show with a value 0
Thanks
If you have data for all months, but none of the data has a status of 1, then the simplest method is probably to use conditional aggregation:
select month(SubmittedDate), sum(s.StatusID = 1)
from tblpolicy p join
tlkppolicystatus s
on p.StatusID=s.StatusID
where SubmittedDate between '2017-01-01' and sysdate()
group by month(SubmittedDate);
Of course, if those conditions don't hold, then the left join with a derived table is the best solution.
Try this
select coalesce(t1.month,t2.month) as month, coalesce(ct1.count,0) as count from
(
select month(SubmittedDate) as month, count(policyid) as count
from tblpolicy p join tlkppolicystatus s on p.StatusID=s.StatusID
where SubmittedDate between
'2017-01-01' and sysdate() and s.StatusID=1 group by month(SubmittedDate)
) as t1 right join
(
select 1 as month union all
select 2 as month union all
select 3 as month union all
select 4 as month union all
select 5 as month union all
select 6 as month union all
select 7 as month union all
select 8 as month union all
select 9 as month union all
select 10 as month union all
select 11 as month union all
select 12 as month
) as t2 on t1.month <= t2.month;

Count/group rows based on date including missing

I have a bunch of rows in my db that signify orders, i.e.
id | date
---------------------
1 | 2013-09-01
2 | 2013-09-01
3 | 2013-09-02
4 | 2013-09-04
5 | 2013-09-04
What I'd like is to display the count of rows per day, including missing days, so the output would be:
2013-09-01 | 2
2013-09-02 | 1
2013-09-03 | 0
2013-09-04 | 2
I've seen examples of having 2 tables, one with the records and the other with dates, but I'd ideally like to have a single table for this.
I can currently find the rows that have a record, but not days that do not.
Does anyone have a n idea on how to do this?
Thanks
If you want to get data for last 7 days, you can generate your pseudo-table via UNION, like:
SELECT
COUNT(t.id),
fixed_days.fixed_date
FROM t
RIGHT JOIN
(SELECT CURDATE() as fixed_date
UNION ALL SELECT CURDATE() - INTERVAL 1 day
UNION ALL SELECT CURDATE() - INTERVAL 2 day
UNION ALL SELECT CURDATE() - INTERVAL 3 day
UNION ALL SELECT CURDATE() - INTERVAL 4 day
UNION ALL SELECT CURDATE() - INTERVAL 5 day
UNION ALL SELECT CURDATE() - INTERVAL 6 day) AS fixed_days
ON t.`date` = `fixed_days`.`fixed_date`
GROUP BY
`fixed_days`.`fixed_date`
-see this fiddle demo. Note, that if your fields are DATETIME date type, then you'll need to applyy DATE() first:
SELECT
COUNT(t.id),
fixed_days.fixed_date
FROM t
RIGHT JOIN
(SELECT CURDATE() as fixed_date
UNION ALL SELECT CURDATE() - INTERVAL 1 day
UNION ALL SELECT CURDATE() - INTERVAL 2 day
UNION ALL SELECT CURDATE() - INTERVAL 3 day
UNION ALL SELECT CURDATE() - INTERVAL 4 day
UNION ALL SELECT CURDATE() - INTERVAL 5 day
UNION ALL SELECT CURDATE() - INTERVAL 6 day) AS fixed_days
ON DATE(t.`date`) = `fixed_days`.`fixed_date`
GROUP BY
`fixed_days`.`fixed_date`
Try Something like this!
SELECT
c1,
GROUP_CONCAT(c2 ORDER BY c2) AS 'C2 values'
FROM table
GROUP BY c1;
To retrieve a list of c1 values for which there exist specific values in another column c2, you need an IN clause specifying the c2 values and a HAVING clause specifying the required number of different items in the list ...
SELECT c1
FROM table
WHERE c2 IN (1,2,3,4)
GROUP BY c1
HAVING COUNT(DISTINCT c2)=4;
For more help by this related question
Counting all rows with specific columns and grouping by week
Use the following stored procedure it will allow you to get results for more than 7 days, just pass the number of days you want
DELIMITER $$
CREATE DEFINER=`server`#`%` PROCEDURE `test`(d INT)
BEGIN
CREATE TEMPORARY TABLE dates
(
f_day DATETIME
);
WHILE d > 0 DO
INSERT INTO dates SELECT DATE(NOW())-d;
SET d = d - 1;
END WHILE;
SELECT
IF(isnull(`id`),0,`id`),
`fixed_days`.`f_day`
FROM t
RIGHT JOIN
dates AS `fixed_days`
ON t.`date` = `fixed_days`.`f_day`
GROUP BY
`fixed_days`.`f_day`;
DROP TABLE dates;
END