Check existence of rows for last n days - mysql

I want to perform a check "is there an entry for each of the last 100 days in a table" where the table has something like a reference date column and was thinking about joining with a subquery that returns sysdate - 0, sysdate - 1, ... sysdate - 100.
Updates (for clarification):
I need to know which dates are missing in the last n days
I want to avoid additional tables (also temp tables)
Is this a good approach?

Assuming your Oracle table looks like this...
CREATE TABLE DATE_TABLE (
D DATE,
-- And other fields, PK etc...
)
...and assuming D contains "round" dates (i.e. no time-of-day), the following query will give you all the missing dates between :min_date and :min_date + :day_count:
SELECT *
FROM (
SELECT (TO_DATE(:min_date) + LEVEL - 1) GENERATED_DATE
FROM DUAL
CONNECT BY LEVEL <= :day_count
)
WHERE
GENERATED_DATE NOT IN (SELECT D FROM DATE_TABLE)
In plain English:
Generate all dates in given interval (the sub-query).
Check if any of them is missing from the table (the super-query).

This is what you are looking for:
Select seqnum.date, count(issues.id)
from
(
SELECT
Curdate() - interval (TENS.SeqValue + ONES.SeqValue) day Date
FROM
(
SELECT 0 SeqValue
UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5
UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9
) ONES
CROSS JOIN
(
SELECT 0 SeqValue
UNION SELECT 10 UNION SELECT 20 UNION SELECT 30 UNION SELECT 40 UNION SELECT 50
UNION SELECT 60 UNION SELECT 70 UNION SELECT 80 UNION SELECT 90
) TENS
) seqnum
left join issues on (cast(issues.created_on as date) = seqnum.date)
group by seqnum.date
I ran it against a Redmine instance to see how many issues were created in the last 100 days, day by day, including days where no issue was created. Adjust to your data structures accordingly. Enjoy :)

This might help you out:
create table your_table (a_date date not null);
insert into your_table values (date(now()-interval 0 day));
insert into your_table values (date(now()-interval 1 day));
insert into your_table values (date(now()-interval 2 day));
insert into your_table values (date(now()-interval 3 day));
insert into your_table values (date(now()-interval 5 day));
insert into your_table values (date(now()-interval 6 day));
insert into your_table values (date(now()-interval 9 day));
insert into your_table values (date(now()-interval 10 day));
insert into your_table values (date(now()-interval 50 day));
insert into your_table values (date(now()-interval 60 day));
insert into your_table values (date(now()-interval 70 day));
insert into your_table values (date(now()-interval 80 day));
insert into your_table values (date(now()-interval 90 day));
select a_date,ifnull(datediff(next_date ,a_date)-1,0) as number_of_days_missing_after_date
from
(
select dt.a_date,
(select min(a_date) from your_table dtp where dtp.a_date > dt.a_date and dtp.a_date >= date(now()-interval 100 day)) as next_date
from
(select a_date
from your_table
union select date(now()-interval 100 day)) dt
where dt.a_date >= date(now()-interval 100 day)
) a;
Should give you an indication as to which dates are missing.
I have created table your_table in place of the table that you are planning to check for missing dates (so that I could illustrate my example more clearly to you).
Perhaps the only problem with this solution is that it will not give you a list of the missing dates - though they can be easily derived by looking at the (in the example) a_date column and the number of days missing after a_date.
Hope it helps & good luck!

If there aren’t any gaps (e.g., weekends), a trick you can use is
SELECT COUNT(DISTINCT somedate)=100 FROM sometable -- evaluates to boolean
WHERE somedate>=sysdate-100;
Depending on your RDBMS, a GROUP BY may work faster than COUNT DISTINCT.
[Post clarification comment]
As you have explained it now, I think you want a LEFT JOIN between a dense date list and your table, then COUNT and GROUP BY. If you are using MySQL, the way to generate the date list in a subquery is covered by this earlier stackoverflow (I don't know MySQL nearly well enough to have thought of the accepted answer). It is easier in most other DB systems.
The solution mentioned at that link of a permanent calendar table is not so bad, either.

In MySQL:
SELECT * FROM `table_name` WHERE `date_column` > DATE_SUB(CURDATE(), INTERVAL 99 DAY)
See MySQL Date Arithmetic

Related

Getting all previous records of table by date MySQL

My table currently has 21000 records, it's daily updated and almost 300 entries are inserted. Now, what I want is to have a query which will fetch the counts of elements that my table had for the previous 10 days, so it returns:
26000
21300
21000
etc
Right now, I wrote this:
"SELECT COUNT(*) from tbl_task where `task_start_time` < '2020-12-01'"
And it returns 21000 but only for 1 day. I want by query to return records according to 10 days.
However, this does it for only 1 day.
edit : database flavor is mysql and date column is date not datetime
The most efficient method may be aggregation and cumulative sums:
select date(task_start_time) as dte, count(*) as cnt_on_day,
sum(count(*)) over (order by date(task_start_time)) as running_cnt
from tbl_task
group by dte
order by dte desc
limit 10;
This returns the last 10 days in the data. You can easily adjust to more days if you like -- in fact all of them -- without much trouble.
I don't know if I'm wrong, but could you not simple add a GROUP BY - statement? Like:
"SELECT COUNT(*) from tbl_task where `task_start_time` < '2020-12-01' GROUP
BY task_start_time"
EDIT:
This should only work if task_start_time is a date, not if it is a datetime
EDIT2:
If it is a datetime you could use the date function:
SELECT COUNT(*) from tbl_task where `task_start_time` < '2020-12-01' GROUP
BY DATE(task_start_time)
You can use UNION ALL and date arithmetic.
SELECT count(*)
FROM tbl_task
WHERE task_start_time < current_date
UNION ALL
SELECT count(*)
FROM tbl_task
WHERE task_start_time < date_sub(current_date, INTERVAL 1 DAY)
...
UNION ALL
SELECT count(*)
FROM tbl_task
WHERE task_start_time < date_sub(current_date, INTERVAL 9 DAY);
Edit:
You might also join a derived table that uses FROM-less SELECTs and UNION ALL to get the days to look back and then aggregate. This might be a little easier to construct dynamically. (But it may be slower I suspect.)
SELECT count(*)
FROM (SELECT 0 x
UNION ALL
SELECT 1
...
UNION ALL
SELECT 9)
INNER JOIN tbl_task t
ON t.task_start_time < date_sub(current_date, INTERVAL x.x DAY)
GROUP BY x.x;
In MySQL version 8+ you can even use a recursive CTE to construct the table with the days.
WITH RECURSIVE x
AS
(
SELECT 0 x
UNION ALL
SELECT x + 1
FROM x
WHERE x + 1 < 10
)
SELECT count(*)
FROM x
INNER JOIN tbl_task t
ON t.task_start_time < date_sub(current_date, INTERVAL x.x DAY)
GROUP BY x.x;

How do I select SQL data in buckets when data doesn't exist for one bucket?

I'm trying to get a complete set of buckets for a given dataset, even if no records exist for some buckets.
For example, I want to display totals by day of week, with zero total for days with no records.
SELECT
WEEKDAY(transaction_date) AS day_of_week,
SUM(sales) AS total_sales
FROM table1
GROUP BY day_of_week
If I have sales every day, I'll get 7 rows in my result representing total sales on days 0-6.
If I don't have sales on Day 2, I get no result for Day 2.
What's the most efficient way to force a zero value for day 2?
Should I join to a temporary table or array of defined buckets? ['0','1','2','3','4','5','6']
Or is it better to insert zeros outside of MySQL, after I've done the query?
I am using MySQL, but this is a general SQL question.
In MySQL, you could simply use a derived table of numbers from 1 to 7, left join it with the table, then aggregate:
select d.day_of_week, sum(sales) AS total_sales
from (
select 1 day_of_week union all select 2 union all select 3 union all select 4
union all select 5 union all select 6 union all select 7
) d
left join table1 t1 on weekday(t1.transaction_date) = d.day_of_week
group by day_of_week
Very recent versions have the values(row...) syntax, which shortens the query:
select d.day_of_week, sum(sales) AS total_sales
from (values row(1), row(2), row(3), row(4), row(5), row(6), row(7)) d(day_of_week)
left join table1 t1 on weekday(t1.transaction_date) = d.day_of_week
group by day_of_week
Basically you want the answer to be 0 when the data is actually null for that bucket, therefore you want the max(null, 0). A max function wouldn't natively work with NULL in this way, however, you can use COALESCE to force it:
COALESCE(MAX(SUM(sales)),0)
as suggested by this answer
First off you need a calendar table; something like this or this. Or create calendar subset on the fly. I am not sure of the mySQL syntax, but here is what it would look like in SQL Server.
DECLARE
#FromDate DATE
, #ToDate DATE
-- set these variables to appropriate values
SET #FromDate = '2020-03-01';
SET #ToDate = '2020-03-31';
;WITH cteCalendar (MyDate) AS
(
SELECT CONVERT(DATE, #FromDate) AS MyDate
UNION ALL
SELECT DATEADD(DAY, 1, MyDate)
FROM cteCalendar
WHERE DATEADD(DAY, 1, MyDate) <= #ToDate
)
SELECT WEEKDAY(cte.MyDate) AS day_of_week,
SUM(sales) AS total_sales
FROM cteCalendar cte
LEFT JOIN table1 t1 ON cte.MyDate = t1.transaction_date
GROUP BY day_of_week

Function to identify records that are within 5 minute intervals

I have a MYSQL table with many records as shown in the image below.
I need to identify rows that are within 5 minute intervals for example, and mark each row in a new column that row is within 5 minutes. See the example of the output.
How can I do this through a function?
Test
WITH RECURSIVE
cte AS ( SELECT MIN(`datetime`) AS dt_start,
MIN(`datetime`) + INTERVAL 5 MINUTE AS dt_end,
1 AS group_num
FROM sourcetable
UNION ALL
SELECT dt_end,
dt_end + INTERVAL 5 MINUTE,
group_num + 1
FROM cte
WHERE dt_end <= ( SELECT MAX(`datetime`)
FROM sourcetable )
)
SELECT sourcetable.*, cte.group_num
FROM sourcetable
JOIN cte ON `datetime` >= dt_start
AND `datetime` < dt_end
If your 5-minute intervals are based on calendar time, then the most efficient method would use window functions:
select t.*,
dense_rank() over (order by to_seconds(floor(data / 300))) as tr
from t;
If it is based on when the first record within each group starts, then you need recursive CTEs as Akina suggests.

MySQL: How to do date between by year and month (no date)

If there are 2 columns in mysql database: year; month, now I want to do a sum calculation based a year-month range without specifying the date. Let's say 2010-11 to 2011-07, how can I realize it?
SELECT * FROM TT WHERE F1 BETWEEN '2010-11' AND '2011-07'
It doesn't work.
If you want to take all rows from 2010-11 to 2011-07, until the first day of August:
SELECT * FROM `table`
WHERE `date_column` BETWEEN '2010-11-01' AND '2011-08-01'
Use this query if you want to get all rows from the full months of January to June:
SELECT * FROM `table`
WHERE YEAR(`date_column`)=2011 AND MONTH(`date_column`) BETWEEN 1 AND 6
If you want to use different years, then write different queries for each year:
SELECT * FROM `table`
WHERE
(YEAR(`date_column`)=2010 AND MONTH(`date_column`) BETWEEN 11 AND 12) OR
(YEAR(`date_column`)=2011 AND MONTH(`date_column`) BETWEEN 1 AND 7)
Try this, if F1 is of type date
SELECT * FROM TT WHERE F1 BETWEEN '2010-11-01' AND '2011-07-31'
and this if F1 is of type datetime
SELECT * FROM TT WHERE F1 BETWEEN '2010-11-01 00:00:00' AND '2011-07-31 23:59:59'
if year and month are saved in different columns then use this
SELECT * FROM TT WHERE DATE(CONCAT(year_column, '-', month_column, '-01'))
BETWEEN '2010-11-01' AND '2011-07-31'
I came across to the same situation and successfully managed by doing this:
SELECT * FROM `TT` WHERE CONCAT(year_column,month_column) between '201011' and '201107';
hope it helps others also..

How to set value of table1.column1 to an average of table2.column2

I've looked all over and I can't seem to find a clear solution. Sorry if I missed it.
My Problem:
I have data that is accumulated hourly and placed in table1
I want table2.metric_ to contain an average of table1.metric_ for that day.
table1 : record_key_, id_, metric_, date_
table2 : record_key_, id_, metric_, date_
to get the list of records for the day I use:
SELECT * FROM table1 t WHERE t.date_ >= DATE_SUB(NOW(),INTERVAL 1 DAY) WHERE t.id_='id1';
what does the INSERT query for table2 look like so that table2.metric_ is an average of the values from the table1.metric_ column for all records of id1 in table1 returned by the previous SELECT statement?
insert into table2 SELECT avg(total) FROM (SELECT count(*) as Total FROM table1 t WHERE t.col= DATE_SUB(NOW(),INTERVAL 1 DAY) group by colname) as a
INSERT INTO table2 metric_
SELECT AVG(t.metric_) FROM table1 t WHERE t.date_ >= DATE_SUB(NOW(),INTERVAL 1 DAY) WHERE t.id_='id1');
should be the solution
You can add the date_ formatted as you like in the SELECT statement
Did not test the query as I don't have MySQL test databases.
Source: INSERT-SELECT in MySQL reference
(I can not comment on accepted answer, but i am quite sure it is wrong? It counts how much entries there are and calculates the average to (how many hours does a day have? )