Calculate active users by day, week and month - mysql

I have a temp_table with user_id and date and I want to find the DAU , WAU and MAU and I am querying this for that where:
DAU - Count of active users for that day
WAU - Count of active users in last 7 days
MAU - Count of active users in last 30 days
where the date is starting from a date that is mentioned here , so there can be no current_date comparison.
dau as (Select casted_date, count(user_id) as dau
from temp table
group by casted_date)
select casted date, dau,
sum(dau) over (order by casted_date rows between -6 preceding and current row) as wau,
sum(dau) over (order by casted_date rows between -29 preceding and current row) as mau
from dau;
but the query is giving me an error like this :
syntax error at or near "-".
PS: I am writing the query in mysql

I don't know if your query logic be completely correct, but the syntax error you are currently seeing is due to the window function calls. Consider this corrected version:
sum(dau) over (order by casted_date rows between 6 preceding and current row) as wau,
sum(dau) over (order by casted_date rows between 29 preceding and current row) as mau
There is no need to use -6 to refer to the previous 6 rows, as 6 preceding already means this.

First, you have a syntax error, you have casted date where you should have casted_date and I would not use an alias of date either which happens to be a MySQL keyword without escaping it.
You can use case-when to achieve your goal:
select
td.casted_date,
count(distinct td.id) as DAU,
count(distinct tw.id) as WAU,
count(distinct tm.id) as MAU
from temp td
left join temp tw
on tw.casted_date between date_sub(td.casted_date, interval 7 day) and td.casted_date
left join temp tm
on tm.casted_date between date_sub(td.casted_date, interval 30 day) and td.casted_date
group by td.casted_date;
Tested with this schema:
create table temp(
id int primary key auto_increment,
casted_date date
);
insert into temp(casted_date)
values
('2020-02-07'),
('2020-02-07'),
('2020-02-07'),
('2020-02-06'),
('2020-02-06'),
('2020-02-06'),
('2020-01-16'),
('2020-01-16'),
('2020-01-16');
Fiddle can be found here: http://sqlfiddle.com/#!9/441aaa/10

Xou can use biuild in date functions to parttion the window functions
create table temp(
id int primary key auto_increment,
casted_date date,
user_id int
);
insert into temp(casted_date,user_id)
values
('2020-02-07',1),
('2020-02-07',2),
('2020-02-07',3),
('2020-02-06',1),
('2020-02-06',2),
('2020-02-06',4),
('2020-01-16',1),
('2020-01-16',2),
('2020-01-16',1);
Records: 9 Duplicates: 0 Warnings: 0
WITH
dau as (Select casted_date, count(user_id) as dau
from temp
group by casted_date)
select casted_date, dau,
sum(dau) over (PARTITION BY YEAR(casted_date) ,WEEK(casted_date) order by casted_date ) as wau,
sum(dau) over (PARTITION BY YEAR(casted_date),MONTH(casted_date) order by casted_date ) as mau
from dau
ORDER BY casted_date ASC;
casted_date
dau
wau
mau
2020-01-16
3
3
3
2020-02-06
3
3
3
2020-02-07
3
6
6
fiddle

Related

Getting the number of users for this year and last year in SQL

My table is like this:
root_tstamp
userId
2022-01-26T00:13:24.725+00:00
d2212
2022-01-26T00:13:24.669+00:00
ad323
2022-01-26T00:13:24.629+00:00
adfae
2022-01-26T00:13:24.573+00:00
adfa3
2022-01-26T00:13:24.552+00:00
adfef
...
...
2021-01-26T00:12:24.725+00:00
d2212
2021-01-26T00:15:24.669+00:00
daddfe
2021-01-26T00:14:24.629+00:00
adfda
2021-01-26T00:12:24.573+00:00
466eff
2021-01-26T00:12:24.552+00:00
adfafe
I want to get the number of users in the current year and in previous year like below using SQL.
Date Users previous_year
2022-01-01 10 5
2022-01-02 20 15
The code is written as follows.
select CAST(root_tstamp as DATE) as Date,
count(DISTINCT userid) as users,
count(Distinct case when CAST(root_tstamp as DATE) = dateadd(MONTH,-12,CAST(root_tstamp as DATE)) then userid end) as previous_year
FROM table1
But it returns 0 for previous_year values.
How can I fix that?
Possible solution for SQL Server:
WITH cte AS ( SELECT 2022 [year]
UNION ALL
SELECT 2021 )
SELECT cte.[year],
COUNT(DISTINCT test.userId) current_users_amount,
COUNT(DISTINCT CASE WHEN YEAR(test.root_tstamp) < cte.[year]
THEN test.userId
END) previous_users_amount
FROM test
JOIN cte ON YEAR(test.root_tstamp) <= cte.[year]
GROUP BY cte.[year]
https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=88b78aad9acd965bdbac4c85a0b81927
This query (for MySql) returns unique number of userids where the root_timestamp is in the current year, by day, and the number of unique userids for the same day last year. If there is no record for a day in the current year nothing will be displayed for that day. If there are rows for the current year, but no rows for the same day last year, then NULL will be shown for that lastyear column.
SELECT cast(ty.root_tstamp as date) as Dte,
COUNT(DISTINCT ty.userId) as users_this_day,
count(distinct lysd.userid) as users_sameday_lastyear
FROM test ty
left join
test lysd
on cast(lysd.root_tstamp as date)=date_add(cast(ty.root_tstamp as date), interval -1 year)
WHERE YEAR(ty.root_tstamp) = year(current_date())
GROUP BY Dte
If you wish to show output rows for calendar days even if there are no rows in current year and/or last year, then you also need a calendar table to be introduced (let's hope that it is not what you need)

Grafana with MySQL exponential growth graph

I have a MySQL table to record the number of users
id
email
name
created
1
user1#example.com
John
2019-02-05 18:53:50.000000
2
user2#example.com
Rock
2019-02-06 18:53:50.000000
3
user3#example.com
Sena
2019-02-08 18:53:50.000000
4
user4#example.com
Anny
2019-02-08 18:53:50.000000
I want to get the exponential growth in count per day
date
count
2019-02-05
1
2019-02-06
2
2019-02-07
2
2019-02-08
4
And draw a similar graph on the Grafana portal
I tried using count() but it gives the count of data per day
The query generated on Graphana is
SELECT
UNIX_TIMESTAMP(created) DIV 86400 * 86400 AS "time",
count(id) AS "Verified"
FROM custom_domain_customdomain
WHERE
is_cname_verified = '1' AND
is_txt_verified = '1'
GROUP BY 1
ORDER BY UNIX_TIMESTAMP(created) DIV 86400 * 86400
Here is an example of the steps to create your cumulative count based on a contiguous date range.
The date_range CTE is based on one of the RECURSIVE CTE examples in the MySQL docs
The second CTE is a fairly straightforward LEFT JOIN from the date_range to your users table to do the count of new users per day.
The final SELECT query uses SUM() as a window function to give the cumulative user count.
WITH RECURSIVE
`date_range` (`date`) AS (
-- retrieve minimum date from users for start of date_range
SELECT DATE(MIN(`users`.`created`)) FROM `users`
UNION ALL
SELECT `date` + INTERVAL 1 DAY
FROM `date_range`
-- retrieve maximum date from users for end of date_range
WHERE `date` + INTERVAL 1 DAY <= (SELECT DATE(MAX(`users`.`created`)) FROM `users`)
),
`users_per_day` AS (
SELECT
`date_range`.`date` AS `created_date`,
COUNT(`users`.`id`) AS `new_user_count`
FROM `date_range`
LEFT JOIN `users` ON `users`.`created` BETWEEN `date_range`.`date` AND (`date_range`.`date` + INTERVAL 1 DAY - INTERVAL 1 SECOND)
GROUP BY `date_range`.`date`
)
SELECT
`created_date`,
`new_user_count`,
SUM(`new_user_count`) OVER (ORDER BY `created_date`) as `cumulative_count`
FROM `users_per_day`;
The users_per_day CTE and final SELECT can be combined but I have left them separate as it clearly shows the steps used and the overhead is negligible.

SQL Find the average of 3 day closest

I have an SQL structure like this:
Create Table Transactions (
Id integer primary key not null auto_increment,
ResourceId varchar(255),
Price Integer,
TransactionTime date
);
I would like to get the time (TransactionTime) along with the average of 3 days price. For example, the 3 day average of the 22nd will be the average of the 20th, 21st, and 22nd.
Thanks so much.
Presumably, you want this information on each row and for a given resource. If so:
select t.*,
avg(price) over (partition by resourceid
order by transactiontime
range between interval 2 day preceding and current row
) as avg_3
from transactions t;
For SQL server:
SELECT AVG(Price), MAX(TransactionTime) FROM Transactions GROUP BY FLOOR(DATEDIFF(DAY, GETDATE(), TransactionTime) / 3);
You can use nested select:
select t.TransactionTime,
(select sum(t1.Price) / 3
from Transactions t1 where t1.Data in (t.Data, t.Date-2);) as avg3;
from Transactions t;

mySQL window function with COUNT(DISTINCT)

I want to write in MySQL a window function which gives a 30 day roll, counting unique id's. To be more precise, my database has many entries per day as a timestamp, for many different id's. I want to count each day how many different id's connect, and also to get each day the total number of id's that have been online in the last 30 days.
Consider the following table:
CREATE TABLE `my_database` (
`timestamp` BIGINT(20) UNSIGNED NOT NULL,
`id` VARCHAR(32) NOT NULL);
INSERT INTO my_database (timestamp,id) VALUES (CURDATE(),1);
INSERT INTO my_database (timestamp,id) VALUES (DATE_SUB(CURDATE(), INTERVAL 1 DAY),2);
INSERT INTO my_database (timestamp,id) VALUES (DATE_SUB(CURDATE(), INTERVAL 2 DAY),1);
INSERT INTO my_database (timestamp,id) VALUES (DATE_SUB(CURDATE(), INTERVAL 2 DAY),3);
INSERT INTO my_database (timestamp,id) VALUES (DATE_SUB(CURDATE(), INTERVAL 29 DAY),4);
INSERT INTO my_database (timestamp,id) VALUES (DATE_SUB(CURDATE(), INTERVAL 300 DAY),2);
INSERT INTO my_database (timestamp,id) VALUES (DATE_SUB(CURDATE(), INTERVAL 1000 DAY),5);
Which looks like:
timestamp id
20190730 1
20190729 2
20190728 1
20190728 3
20190701 4
20181003 2
20161102 5
The result I want to get is the following:
date count_day count_30day
2019-07-30 1 4
2019-07-29 1 4
2019-07-28 2 3
2019-07-01 1 1
2018-10-03 1 1
2016-11-02 1 1
I don't know how to get the count_30day column. So far I have written the following:
SELECT DATE(a.`timestamp`) AS 'date',
COUNT(DISTINCT a.id) AS 'count_day',
COUNT(DISTINCT a.id) OVER (ORDER BY DATE(a.`timestamp`) ROWS BETWEEN 30 PRECEDING AND CURRENT ROW) AS 'count_30day'
FROM my_database AS a
GROUP
BY DATE(a.`timestamp`)
ORDER
BY DATE(a.`timestamp`) DESC
However that does not work for the count_30day column. I have been looking at other questions and the documentation and the syntax for the window functions seems to be correct as far as I have seen, but clearly is not as this does not work. How should I write the window function properly? Is there a better way to do this other than COUNT(DISTINCT)? Thanks!!
ROWS PRECEDING is related to number of rows, doesn't have anything to do with days
You need a subquery:
SQL DEMO
SELECT DATE(a.`timestamp`) AS 'date',
COUNT(DISTINCT a.id) AS 'count_day',
MAX( (SELECT COUNT(DISTINCT ID)
FROM my_database db2
WHERE db2.timestamp between DATE_SUB(a.timestamp, INTERVAL 30 DAY)
and a.timestamp
)
) as count30
FROM my_database AS a
GROUP
BY DATE(a.`timestamp`)
ORDER
BY DATE(a.`timestamp`) DESC

generate_series() equivalent in MySQL

I need to do a query and join with all days of the year but in my db there isn't a calendar table.
After google-ing I found generate_series() in PostgreSQL. Does MySQL have anything similar?
My actual table has something like:
date qty
1-1-11 3
1-1-11 4
4-1-11 2
6-1-11 5
But my query has to return:
1-1-11 7
2-1-11 0
3-1-11 0
4-1-11 2
and so on ..
This is how I do it. It creates a range of dates from 2011-01-01 to 2011-12-31:
select
date_format(
adddate('2011-1-1', #num:=#num+1),
'%Y-%m-%d'
) date
from
any_table,
(select #num:=-1) num
limit
365
-- use limit 366 for leap years if you're putting this in production
The only requirement is that the number of rows in any_table should be greater or equal to the size of the needed range (>= 365 rows in this example). You will most likely use this as a subquery of your whole query, so in your case any_table can be one of the tables you use in that query.
Enhanced version of solution from #Karolis that ensures it works for any year (including leap years):
select date from (
select
date_format(
adddate('2011-1-1', #num:=#num+1),
'%Y-%m-%d'
) date
from
any_table,
(select #num:=-1) num
limit
366
) as dt
where year(date)=2011
I was looking to this solution but without the "hardcoded" date, and I came-up with this one valid for the current year(helped from this answers).
Please note the
where year(date)=2011
is not needed as the select already filter the date. Also this way, it does not matter which table(at least as stated before the table has at least 366 rows) is been used, as date is "calculated" on runtime.
select date from (
select
date_format(
adddate(MAKEDATE(year(now()),1), #num:=#num+1),
'%Y-%m-%d'
) date
from
your_table,
(select #num:=-1) num
limit
366 ) as dt
Just in case someone is looking for generate_series() to generate a series of dates or ints as a temp table in MySQL.
With MySQL8 (MySQL version 8.0.27) you can do something like this to simulate:
WITH RECURSIVE nrows(date) AS (
SELECT MAKEDATE(2021,333) UNION ALL
SELECT DATE_ADD(date,INTERVAL 1 day) FROM nrows WHERE date<=CURRENT_DATE
)
SELECT date FROM nrows;
Result:
2021-11-29
2021-11-30
2021-12-01
2021-12-02
2021-12-03
2021-12-04
2021-12-05
2021-12-06