I have a MySQL table to record the number of users
id
email
name
created
1
user1#example.com
John
2019-02-05 18:53:50.000000
2
user2#example.com
Rock
2019-02-06 18:53:50.000000
3
user3#example.com
Sena
2019-02-08 18:53:50.000000
4
user4#example.com
Anny
2019-02-08 18:53:50.000000
I want to get the exponential growth in count per day
date
count
2019-02-05
1
2019-02-06
2
2019-02-07
2
2019-02-08
4
And draw a similar graph on the Grafana portal
I tried using count() but it gives the count of data per day
The query generated on Graphana is
SELECT
UNIX_TIMESTAMP(created) DIV 86400 * 86400 AS "time",
count(id) AS "Verified"
FROM custom_domain_customdomain
WHERE
is_cname_verified = '1' AND
is_txt_verified = '1'
GROUP BY 1
ORDER BY UNIX_TIMESTAMP(created) DIV 86400 * 86400
Here is an example of the steps to create your cumulative count based on a contiguous date range.
The date_range CTE is based on one of the RECURSIVE CTE examples in the MySQL docs
The second CTE is a fairly straightforward LEFT JOIN from the date_range to your users table to do the count of new users per day.
The final SELECT query uses SUM() as a window function to give the cumulative user count.
WITH RECURSIVE
`date_range` (`date`) AS (
-- retrieve minimum date from users for start of date_range
SELECT DATE(MIN(`users`.`created`)) FROM `users`
UNION ALL
SELECT `date` + INTERVAL 1 DAY
FROM `date_range`
-- retrieve maximum date from users for end of date_range
WHERE `date` + INTERVAL 1 DAY <= (SELECT DATE(MAX(`users`.`created`)) FROM `users`)
),
`users_per_day` AS (
SELECT
`date_range`.`date` AS `created_date`,
COUNT(`users`.`id`) AS `new_user_count`
FROM `date_range`
LEFT JOIN `users` ON `users`.`created` BETWEEN `date_range`.`date` AND (`date_range`.`date` + INTERVAL 1 DAY - INTERVAL 1 SECOND)
GROUP BY `date_range`.`date`
)
SELECT
`created_date`,
`new_user_count`,
SUM(`new_user_count`) OVER (ORDER BY `created_date`) as `cumulative_count`
FROM `users_per_day`;
The users_per_day CTE and final SELECT can be combined but I have left them separate as it clearly shows the steps used and the overhead is negligible.
Related
I need to extract data from a MySQL table, but am not allowed to include a record if there's a previous record less than a year old.
Given the following records, only the records 1, 3 and 5 should be included (because record 2 was created 1 month after record 1, and record 4 was created 1 month after record 3):
1 2019-12-21
2 2020-01-21
3 2021-12-21
4 2022-01-21
5 2023-12-21
I came up with the following non-functional solution:
SELECT
*
FROM
table t
WHERE
(created_at > DATE_ADD(
(SELECT
created_at
FROM
table t2
WHERE
t2.created_at < t.created_at
ORDER BY
t2.created_at
DESC LIMIT 1), INTERVAL 1 YEAR)
But this only returns the first and the last record, but not the third:
1 2019-12-21
5 2023-12-21
I know why: the third record gets excluded because record 2 is less than a year old. But record 2 shouldn't be taken into account, because it won't make the list itself.
How can I solve this?
Using lag, assuming your MySql supports it, you can calculate the difference in months using period_diff
with d as (
select * ,
period_diff(extract(year_month FROM date),
extract(year_month from lag(date,1,date) over (order by date))
) as m
from t
)
select id, date
from d
where m=0 or m>12
Demo Fiddle
I have the following database schema
ID creation_date
1 2019-06-03
2 2019-06-04
3 2019-06-04
4 2019-06-10
5 2019-06-11
I need to find out the total size of the table group by week. The output I am looking for is something like
year week number_of_records
2019 23 3
2019 24 5
I am writing the following query which only gives me number of record created in each week
> select year(creation_date) as year, weekofyear(creation_date) as week,
> count(id) from input group by year, week;
Output I get is
year week number_of_records
2019 23 3
2019 24 2
Take a look to window (or analytic) functions.
Unlike aggregate functions, window functions preserve resulting rows and facilitate operations related to them. When using order by in over clause, windowing is done from first row to current row according to specified order, which is exactly what you need.
select year, week, sum(number_of_records) over (order by year, week)
from (
select year(creation_date) as year, weekofyear(creation_date) as week,
count(id) as number_of_records
from input group by year, week
) your_sql
I guess you will also need to reset sum for each year, which I leave as exercise for you (hint: partition clause).
For versions prior to 8.0...
Schema (MySQL v5.7)
CREATE TABLE my_table
(ID SERIAL PRIMARY KEY
,creation_date DATE NOT NULL
);
INSERT INTO my_table VALUES
(1 , '2019-06-03'),
(2 , '2019-06-04'),
(3 , '2019-06-04'),
(4 ,'2019-06-10'),
(5 ,'2019-06-11');
Query #1
SELECT a.yearweek
, #i:=#i+a.total running
FROM
(SELECT DATE_FORMAT(x.creation_date,'%x-%v') yearweek
, COUNT(*) total
FROM my_table x
GROUP BY yearweek
)a
JOIN (SELECT #i:=0) vars
ORDER BY a.yearweek;
| yearweek | running |
| -------- | ------- |
| 2019-23 | 3 |
| 2019-24 | 5 |
---
View on DB Fiddle
You seem to want a cumulative sum. You can do this with window functions directly in an aggregation query:
select year(creation_date) as year, weekofyear(creation_date) as week,
count(*) as number_of_starts,
sum(count(*)) over (order by min(creation_date)) as number_of_records
from input
group by year, week;
I need an Amazon Redshift SQL query to calculate the number of a particular day fall in between two dates.
Date Format - YYYY-MM-DD
For example - Start date = 2019-06-14, End Date = 2019-10-09, Day - 2nd of every month
Now, I want to calculate the count of 2nd-day fall in between 2019-06-14 and 2019-10-09
So, the actual result for the above example should be 4. Since 4 times the 2nd-day will fall in between 2019-06-14 and 2019-10-09.
I tried the DATE_DIFF function and months_between function of redshift. But failed to build the logic. Since not able to understand what math or equation should be.
for me it seems as if you wanted to select from a calendar table. That's how you can solve your problem. You'll notice that the query looks a little hacky because Redshift does not support any functions to generate sequences, which leaves you with creating sequence tables yourself (see seq_10 and seq_1000). Once you have a sequence, you can easily create a calendar with all the information you need (eg. day_of_month).
That's the query answering your question:
WITH seq_10 as (
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 1
), seq_1000 as (
select
row_number() over () - 1 as n
from
seq_10 a cross join
seq_10 b cross join
seq_10 c
), calendar as (
select '2018-01-01'::date + n as date,
extract(day from date) as day_of_month,
extract(dow from date) as day_of_week
from seq_1000
)
select count(*) from calendar
where day_of_month = 2
and date between '2019-06-14' and '2019-10-09'
I want to visualize my entries by counting how many have been created at the same day.
SELECT dayname(created_at), count(*) FROM logs
group by day(created_at)
ORDER BY created_at desc
LIMIT 7
So I get something like:
Thursday 4
Wednesday 12
Monday 4
Sunday 1
Saturday 20
Friday 23
Thursday 10
But I also want to have the Tuesday in there with 0 so I have it for one week.
Is there a way to do this with full mysql or do I need to update the result before I can give it to the chart?
EDIT:
This is the final query:
SELECT
DAYNAME(date_add(NOW(), interval days.id day)) AS day,
count(logs.id) AS amount
FROM days LEFT OUTER JOIN
(SELECT *
FROM logs
WHERE TIMESTAMPDIFF(DAY,DATE(created_at),now()) < 7) logs
on datediff(created_at, NOW()) = days.id
GROUP BY days.id
ORDER BY days.id desc;
The table days includes numbers from 0 to -6
You only need a table of offsets which could be a real table or something built on the fly like select 0 ofs union all select -1 ....
create table days (ofs int);
insert into days (ofs) values
(0), (-1), (-2), (-3),
(-4), (-5), (-6), (-7);
select
date_add('20160121', interval days.ofs day) as created_at,
count(data.id) as cnt
from days left outer join logs data
on datediff(data.created_at, '20160121') = days.ofs
group by days.ofs
order by days.ofs;
http://sqlfiddle.com/#!9/3e6bc7/1
For performance it would probably be better to limit the search in the data (logs) table:
select
date_add('20160121', interval days.ofs day) as created_at,
count(data.id) as cnt
from days left outer join
(select * from logs where created_at between <start> and <end>) data
on datediff(data.created_at, '20160121') = days.offset
group by days.offset
order by days.offset;
One downside is that you do have to parameterize this with a fixed anchor date in a couple of expressions. It might be better to have a table of real dates sitting in a table somewhere so you don't have to do the calculations.
Use RIGHT JOIN to a dates table, so you can request data for each and all days, no matter if some days have data or not, simply, mull days will show as CERO or NULL.
You can create a dates table, some sort of calendar table.
id_day | day_date |
--------------------
1 | 2016-01-01 |
2 | 2016-01-02 |
.
.
365 | 2016-12-31 |
With this table, you can relate date, then extract day, month, week, whatever you want with MYSQL DATE AND TIME FUNCTIONS
SELECT t2.dayname(day_date), count(t1.created_at) FROM logs t1 right join dates_table t2 on t1.created_at=t2.day_date group by t2.day_date ORDER BY t1.created_at desc LIMIT 7
Data:
values date
14 1.1.2010
20 1.1.2010
10 2.1.2010
7 4.1.2010
...
sample query about january 2010 should get 31 rows. One for every day. And values vould be added. Right now I could do this with 31 queries but I would like this to work with one. Is it possible?
results:
1. 34
2. 10
3. 0
4. 7
...
This is actually surprisingly difficult to do in SQL. One way to do it is to have a long select statement with UNION ALLs to generate the numbers from 1 to 31. This demonstrates the principle but I stopped at 4 for clarity:
SELECT MonthDate.Date, COALESCE(SUM(`values`), 0) AS Total
FROM (
SELECT 1 AS Date UNION ALL
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT 4 UNION ALL
--
SELECT 28 UNION ALL
SELECT 29 UNION ALL
SELECT 30 UNION ALL
SELECT 31) AS MonthDate
LEFT JOIN Table1 AS T1
ON MonthDate.Date = DAY(T1.Date)
AND MONTH(T1.Date) = 1 AND YEAR(T1.Date) = 2010
WHERE MonthDate.Date <= DAY(LAST_DAY('2010-01-01'))
GROUP BY MonthDate.Date
It might be better to use a table to store these values and join with it instead.
Result:
1, 34
2, 10
3, 0
4, 7
Given that for some dates you have no data, you'll need to fill in the gaps. One approach to this is to have a calendar table prefilled with all dates you need, and join against that.
If you want the results to show day numbers as you have showing in your question, you could prepopulate these in your calendar too as labels.
You would join your data table date field to the date field of the calendar table, group by that field, and sum values. You might want to specify limits for the range of dates covered.
So you might have:
CREATE TABLE Calendar (
label varchar,
cal_date date,
primary key ( cal_date )
)
Query:
SELECT
c.label,
SUM( d.values )
FROM
Calendar c
JOIN
Data_table d
ON d.date_field = c.cal_date
WHERE
c.cal_date BETWEEN '2010-01-01' AND '2010-01-31'
GROUP BY
d.date_field
ORDER BY
d.date_field
Update:
I see you have datetimes rather than dates. You could just use the MySQL DATE() function in the join, but that would probably not be optimal. Another approach would be to have start and end times in the Calendar table defining a 'time bucket' for each day.
This works for me... Its a modification of a query I found on another site. The "INTERVAL 1 MONTH" clause ensures I get the current month data, including zeros for days that have no hits. Change this to "INTERVAL 2 MONTH" to get last months data, etc.
I have a table called "payload" with a column "timestamp" - Im then joining the timestamp column on to the dynamically generated dates, casting it so that the dates match in the ON clause.
SELECT `calendarday`,COUNT(P.`timestamp`) AS `cnt` FROM
(SELECT #tmpdate := DATE_ADD(#tmpdate, INTERVAL 1 DAY) `calendarday`
FROM (SELECT #tmpdate :=
LAST_DAY(DATE_SUB(CURDATE(),INTERVAL 1 MONTH)))
AS `dynamic`, `payload`) AS `calendar`
LEFT JOIN `payload` P ON DATE(P.`timestamp`) = `calendarday`
GROUP BY `calendarday`
To dynamically get the dates within a date range using SQL you can do this (example in mysql):
Create a table to hold the numbers 0 through 9.
CREATE TABLE ints ( i tinyint(4) );
insert into ints (i)
values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
Run a query like so:
select ((curdate() - interval 2 year) + interval (t.i * 100 + u.i * 10 + v.i) day) AS Date
from
ints t
join ints u
join ints v
having Date between '2015-01-01' and '2015-05-01'
order by t.i, u.i, v.i
This will generate all dates between Jan 1, 2015 and May 1, 2015.
Output
2015-01-01
2015-01-02
2015-01-03
2015-01-04
2015-01-05
2015-01-06
...
2015-05-01
The query joins the table ints 3 times and gets an incrementing number (0 through 999). It then adds this number as a day interval starting from a certain date, in this case a date 2 years ago. Any date range from 2 years ago and 1,000 days ahead can be obtained with the example above.
To generate a query that generates dates for more than 1,000 days simply join the ints table once more to allow for up to 10,000 days of range, and so forth.
If I'm understanding the rather vague question correctly, you want to know the number of records for each date within a month. If that's true, here's how you can do it:
SELECT COUNT(value_column) FROM table WHERE date_column LIKE '2010-01-%' GROUP BY date_column