I want to write in MySQL a window function which gives a 30 day roll, counting unique id's. To be more precise, my database has many entries per day as a timestamp, for many different id's. I want to count each day how many different id's connect, and also to get each day the total number of id's that have been online in the last 30 days.
Consider the following table:
CREATE TABLE `my_database` (
`timestamp` BIGINT(20) UNSIGNED NOT NULL,
`id` VARCHAR(32) NOT NULL);
INSERT INTO my_database (timestamp,id) VALUES (CURDATE(),1);
INSERT INTO my_database (timestamp,id) VALUES (DATE_SUB(CURDATE(), INTERVAL 1 DAY),2);
INSERT INTO my_database (timestamp,id) VALUES (DATE_SUB(CURDATE(), INTERVAL 2 DAY),1);
INSERT INTO my_database (timestamp,id) VALUES (DATE_SUB(CURDATE(), INTERVAL 2 DAY),3);
INSERT INTO my_database (timestamp,id) VALUES (DATE_SUB(CURDATE(), INTERVAL 29 DAY),4);
INSERT INTO my_database (timestamp,id) VALUES (DATE_SUB(CURDATE(), INTERVAL 300 DAY),2);
INSERT INTO my_database (timestamp,id) VALUES (DATE_SUB(CURDATE(), INTERVAL 1000 DAY),5);
Which looks like:
timestamp id
20190730 1
20190729 2
20190728 1
20190728 3
20190701 4
20181003 2
20161102 5
The result I want to get is the following:
date count_day count_30day
2019-07-30 1 4
2019-07-29 1 4
2019-07-28 2 3
2019-07-01 1 1
2018-10-03 1 1
2016-11-02 1 1
I don't know how to get the count_30day column. So far I have written the following:
SELECT DATE(a.`timestamp`) AS 'date',
COUNT(DISTINCT a.id) AS 'count_day',
COUNT(DISTINCT a.id) OVER (ORDER BY DATE(a.`timestamp`) ROWS BETWEEN 30 PRECEDING AND CURRENT ROW) AS 'count_30day'
FROM my_database AS a
GROUP
BY DATE(a.`timestamp`)
ORDER
BY DATE(a.`timestamp`) DESC
However that does not work for the count_30day column. I have been looking at other questions and the documentation and the syntax for the window functions seems to be correct as far as I have seen, but clearly is not as this does not work. How should I write the window function properly? Is there a better way to do this other than COUNT(DISTINCT)? Thanks!!
ROWS PRECEDING is related to number of rows, doesn't have anything to do with days
You need a subquery:
SQL DEMO
SELECT DATE(a.`timestamp`) AS 'date',
COUNT(DISTINCT a.id) AS 'count_day',
MAX( (SELECT COUNT(DISTINCT ID)
FROM my_database db2
WHERE db2.timestamp between DATE_SUB(a.timestamp, INTERVAL 30 DAY)
and a.timestamp
)
) as count30
FROM my_database AS a
GROUP
BY DATE(a.`timestamp`)
ORDER
BY DATE(a.`timestamp`) DESC
Related
I have a temp_table with user_id and date and I want to find the DAU , WAU and MAU and I am querying this for that where:
DAU - Count of active users for that day
WAU - Count of active users in last 7 days
MAU - Count of active users in last 30 days
where the date is starting from a date that is mentioned here , so there can be no current_date comparison.
dau as (Select casted_date, count(user_id) as dau
from temp table
group by casted_date)
select casted date, dau,
sum(dau) over (order by casted_date rows between -6 preceding and current row) as wau,
sum(dau) over (order by casted_date rows between -29 preceding and current row) as mau
from dau;
but the query is giving me an error like this :
syntax error at or near "-".
PS: I am writing the query in mysql
I don't know if your query logic be completely correct, but the syntax error you are currently seeing is due to the window function calls. Consider this corrected version:
sum(dau) over (order by casted_date rows between 6 preceding and current row) as wau,
sum(dau) over (order by casted_date rows between 29 preceding and current row) as mau
There is no need to use -6 to refer to the previous 6 rows, as 6 preceding already means this.
First, you have a syntax error, you have casted date where you should have casted_date and I would not use an alias of date either which happens to be a MySQL keyword without escaping it.
You can use case-when to achieve your goal:
select
td.casted_date,
count(distinct td.id) as DAU,
count(distinct tw.id) as WAU,
count(distinct tm.id) as MAU
from temp td
left join temp tw
on tw.casted_date between date_sub(td.casted_date, interval 7 day) and td.casted_date
left join temp tm
on tm.casted_date between date_sub(td.casted_date, interval 30 day) and td.casted_date
group by td.casted_date;
Tested with this schema:
create table temp(
id int primary key auto_increment,
casted_date date
);
insert into temp(casted_date)
values
('2020-02-07'),
('2020-02-07'),
('2020-02-07'),
('2020-02-06'),
('2020-02-06'),
('2020-02-06'),
('2020-01-16'),
('2020-01-16'),
('2020-01-16');
Fiddle can be found here: http://sqlfiddle.com/#!9/441aaa/10
Xou can use biuild in date functions to parttion the window functions
create table temp(
id int primary key auto_increment,
casted_date date,
user_id int
);
insert into temp(casted_date,user_id)
values
('2020-02-07',1),
('2020-02-07',2),
('2020-02-07',3),
('2020-02-06',1),
('2020-02-06',2),
('2020-02-06',4),
('2020-01-16',1),
('2020-01-16',2),
('2020-01-16',1);
Records: 9 Duplicates: 0 Warnings: 0
WITH
dau as (Select casted_date, count(user_id) as dau
from temp
group by casted_date)
select casted_date, dau,
sum(dau) over (PARTITION BY YEAR(casted_date) ,WEEK(casted_date) order by casted_date ) as wau,
sum(dau) over (PARTITION BY YEAR(casted_date),MONTH(casted_date) order by casted_date ) as mau
from dau
ORDER BY casted_date ASC;
casted_date
dau
wau
mau
2020-01-16
3
3
3
2020-02-06
3
3
3
2020-02-07
3
6
6
fiddle
I need to extract data from a MySQL table, but am not allowed to include a record if there's a previous record less than a year old.
Given the following records, only the records 1, 3 and 5 should be included (because record 2 was created 1 month after record 1, and record 4 was created 1 month after record 3):
1 2019-12-21
2 2020-01-21
3 2021-12-21
4 2022-01-21
5 2023-12-21
I came up with the following non-functional solution:
SELECT
*
FROM
table t
WHERE
(created_at > DATE_ADD(
(SELECT
created_at
FROM
table t2
WHERE
t2.created_at < t.created_at
ORDER BY
t2.created_at
DESC LIMIT 1), INTERVAL 1 YEAR)
But this only returns the first and the last record, but not the third:
1 2019-12-21
5 2023-12-21
I know why: the third record gets excluded because record 2 is less than a year old. But record 2 shouldn't be taken into account, because it won't make the list itself.
How can I solve this?
Using lag, assuming your MySql supports it, you can calculate the difference in months using period_diff
with d as (
select * ,
period_diff(extract(year_month FROM date),
extract(year_month from lag(date,1,date) over (order by date))
) as m
from t
)
select id, date
from d
where m=0 or m>12
Demo Fiddle
I have a MySQL table to record the number of users
id
email
name
created
1
user1#example.com
John
2019-02-05 18:53:50.000000
2
user2#example.com
Rock
2019-02-06 18:53:50.000000
3
user3#example.com
Sena
2019-02-08 18:53:50.000000
4
user4#example.com
Anny
2019-02-08 18:53:50.000000
I want to get the exponential growth in count per day
date
count
2019-02-05
1
2019-02-06
2
2019-02-07
2
2019-02-08
4
And draw a similar graph on the Grafana portal
I tried using count() but it gives the count of data per day
The query generated on Graphana is
SELECT
UNIX_TIMESTAMP(created) DIV 86400 * 86400 AS "time",
count(id) AS "Verified"
FROM custom_domain_customdomain
WHERE
is_cname_verified = '1' AND
is_txt_verified = '1'
GROUP BY 1
ORDER BY UNIX_TIMESTAMP(created) DIV 86400 * 86400
Here is an example of the steps to create your cumulative count based on a contiguous date range.
The date_range CTE is based on one of the RECURSIVE CTE examples in the MySQL docs
The second CTE is a fairly straightforward LEFT JOIN from the date_range to your users table to do the count of new users per day.
The final SELECT query uses SUM() as a window function to give the cumulative user count.
WITH RECURSIVE
`date_range` (`date`) AS (
-- retrieve minimum date from users for start of date_range
SELECT DATE(MIN(`users`.`created`)) FROM `users`
UNION ALL
SELECT `date` + INTERVAL 1 DAY
FROM `date_range`
-- retrieve maximum date from users for end of date_range
WHERE `date` + INTERVAL 1 DAY <= (SELECT DATE(MAX(`users`.`created`)) FROM `users`)
),
`users_per_day` AS (
SELECT
`date_range`.`date` AS `created_date`,
COUNT(`users`.`id`) AS `new_user_count`
FROM `date_range`
LEFT JOIN `users` ON `users`.`created` BETWEEN `date_range`.`date` AND (`date_range`.`date` + INTERVAL 1 DAY - INTERVAL 1 SECOND)
GROUP BY `date_range`.`date`
)
SELECT
`created_date`,
`new_user_count`,
SUM(`new_user_count`) OVER (ORDER BY `created_date`) as `cumulative_count`
FROM `users_per_day`;
The users_per_day CTE and final SELECT can be combined but I have left them separate as it clearly shows the steps used and the overhead is negligible.
What I am trying to do is to get the record count for each day of the last 7 days,
Let's say I have 3 records today, 4 records yesterday, 2 records two days ago, etc.
I'd like to have something like that:
[12/06/2021] 1
[11/06/2021] 4
[10/06/2021] 3
[09/06/2021] 6
[08/06/2021] 7
[07/06/2021] 2
[06/06/2021] 7
(Or get only the count, it's OK too.)
I have a field - message_datetime that saves the datetime.
Is there a way to do this in one query?
What I've done:
select CAST(message_datetime AS DATE),count(message_datetime) from messages group by CAST(message_datetime AS DATE) WHERE message_datetime
It worked but I wanted the last 7 days. Thanks
I was waiting for you to post your own effort, but since somebody has already "jumped the gun" and started to post answers, so:
Assuming the name of the table is my_table, then try:
select date(message_datetime) as message_date, count(*) as cnt from my_table
where datediff(curdate(), date(message_datetime)) < 7
group by date(message_datetime)
order by date(message_datetime) desc
Update
Following the Strawberry's suggestion, here is an updated query that should be more performant if message_datetime is an indexed column:
select date(message_datetime) as message_date, count(*) as cnt from my_table
where message_date_time >= date_sub(curdate(), interval 6 day)
group by date(message_datetime)
order by date(message_datetime) desc
i have this table
desc t1;
with this datas
mysql> select* from t1;
now i do this
mysql> select day_date,count(day_date) from t1 group by (date(day_date));
The MySQL table is simple:
id | name | create_time
Every time the user makes an operation, it inserts one record into this table, like
1 | "do ABC" | 2011-12-05
2 | "do BCD" | 2011-12-05
I want to get the top 50 operations from this table during the last 7 days. How can I write the SQL?
Try something like this:
SELECT `name`, COUNT(id) as operations
FROM myTable
WHERE create_time BETWEEN DATE_SUB(NOW(), INTERVAL -7 DAY) AND NOW()
GROUP BY `name`
ORDER BY COUNT(id) DESC
LIMIT 50;
This will count the number of operations for each operation name, sort the results by the count and only return 50 records. Please note that this is exactly seven days of history, in other words if you run the query at noon, the beginning of the range will be noon 7 days ago.
See LIMIT, COUNT, GROUP BY, DATE_SUB
select count(id), name
from tablename
where DATE_SUB(NOW(), INTERVAL 7 DAY) < create_time
group by name
order by count(id) desc
limit 50;
SELECT *
FROM `table`
WHERE `create_time` >= DATE_SUB(now(), INTERVAL 7 DAY)
LIMIT 50;
You can get result by this query
select id, name, create_time
from tablename
where DATEDIFF(dd,create_time ,NOW())=7
order by create_time
limit 50;