How to accelerate time ranged query? - mysql

I have a MySQL data table with around 30 million records, one of the fields is create_time, which is a timestamp.
I want to get the count of records in a specified time range, and use unique keywords to get the unique user count.
The SQL statement is:
select count(distinct owner_id) from code_orange_checkpointrecord
where create_time between '2023-01-01 00:00:00' and '2023-01-30 23:59:59';
But the query is very slow. I tried to create an index on create_time, but the result is a lot slower.
I think I am doing something wrong?

The precise query you showed us will be accelerated by this multi-column index.
CREATE INDEX create_owner
ON code_orange_checkpointrecord (create_time, owner_id);
And, by the way, it's best to avoid BETWEEN for time ranges. Use this instead to get all the records for January. 23:59:59 might not be the precise last timestamp in the month, so it's better to use < and the first timestamp you don't want.
where create_time >= '2023-01-01 00:00:00'
and create_time < '2023-02-01 00:00:00';

Related

SQL date_sub vs datediff performance in looking over a date window

I'm trying to look at the number of active users of a product (toy example) over the last 30 days.
I'm considering two approaches.
One, date_sub is used to find the date 29 days before (the interval is 30 days inclusive of the start date) an end date. The where window is then defined by that earlier date and the end date.
That is this example:
SELECT
activity_date AS day,
COUNT(DISTINCT user_id) AS active_users
FROM Activity
WHERE
activity_date >= DATE_SUB("2019-07-27", INTERVAL 29 DAY)
AND
activity_date >= "2019-07-27"
A second approach is to calculate the datediff from a start date, then restrict the where clause to the previous time period.
SELECT
activity_date as day,
COUNT(DISTINCT user_id) AS active_users
FROM Activity
WHERE
datediff('2019-07-27', activity_date) < 30
AND
activity_date <= '2019-07-27'
I have no insight into which is the better option. I'd love for others to weigh in.
Use the first option:
activity_date
BETWEEN DATE_SUB(DATE("2019-07-27"), INTERVAL 29 DAY)
AND DATE("2019-07-27")
This compares the stored value directly to date litterals. Such an expression can take advantage of an index on the date column.
In, constrast the second expression applies date function datediff() to the date column. This makes the expression non-SARGable, meaning that it will not benefit an index:
datediff('2019-07-27', activity_date) < 30
and activity_date <= '2019-07-27'
Note that the first expression could be simply phrased:
activity_date >= '2019-07-27' - interval 29 day
and activity_date <= '2019-07-27'
I am unsure whether the second comparison should be >= rather than >. A reason why it would make sense is that activitydate has no time component. But I would recomment using <, because it works for both cases; if you want data up until '2019-07-27' included, you can do:
activity_date >= '2019-07-27' - interval 29 day
and activity_date < '2019-07-28'
I would definitely use the first query, if you have an index on the activity_date column.
When you do DATE_SUB() or DATE() on constant values, MySQL only needs to do that calculation once before it begins examining rows. The result of the expression is a constant.
Comparing an indexed column BETWEEN the two constant values, then it can use that index to locate the matching rows efficiently, using a range search.
Whereas if you put your column inside the call to DATEDIFF(), it has to re-calculate the result on every row examined, and it can't use the index. It will be forced to examine every row in the table. This is called a table-scan.
You can use EXPLAIN to confirm this. The first query will show type: range but the second query will show type: ALL, and the row column of the EXPLAIN will show an estimate roughly equal to the size of the table.
FWIW, this is generally true: any expression where you put a column inside a function call spoils any benefit of an index on that column. Indexes work because they're stored in sorted order, but MySQL can't use an index on a column inside an expression or function, because it doesn't do any analysis to determine if the result of the expression has the same sort order as the column itself.

Select rows from yesterday's date

I have a query function that selects all rows from the previous days. However, I need it to only select the rows with yesterdays date but am unsure how to include just the previous day.
My current query is:
SELECT pdate FROM table 1
WHERE pdate < Date(NOW()) + INTERVAL 1 DAY
I would imagine it would look something like this, which has the advantage of using indexes (if you have them implemented)
SELECT pdate FROM table 1
WHERE pdate >= Date(NOW()) - INTERVAL 1 DAY
AND pdate < Date(NOW())
You can use the DATE_SUB() function to get yesterday's date, and then use a WHERE clause condition to look for just that date, like this:
SELECT *
FROM myTable
WHERE DATE(pDate) = DATE_SUB(pDate, INTERVAL 1 DAY);
Here is a list of MySQL's Date and Time Functions which may help you.
NOTE: This will work, but because you are using a function on the pDate column in the where clause this will not be able to take advantage of any indexes you have (see comments below). Brian Driscoll has given an answer that will work better. I am choosing to leave this answer because while it is less efficient, I believe it is more readable as the where clause is very explicit in what it is checking and is slightly more readable. Whether or not the trade off is worth it here is up to the developer.

Select between date range and specific time range

Is there a way to Select records between dates and specific time range. For example all the records from 2013-11-01 to 2013-11-30 between hours 05:00 to 15:00. Here what i made until now.
select count(*) as total from tradingaccounts accounts
inner join tradingaccounts_audit audit on audit.parent_id = accounts.id
where date(audit.date_created) between date('2013-11-01 00:00:00') AND date('2013-11-01 23:59:59')
But how am I going to set the specific time range?
You can use the HOUR function to add an additional condition on the hours:
select count(*) as total from tradingaccounts accounts
inner join tradingaccounts_audit audit on audit.parent_id = accounts.id
where date(audit.date_created) between date('2013-11-01 00:00:00') AND date('2013-11-01 23:59:59')
AND HOUR (audit.date_created) BETWEEN 5 AND 15
As others answered, HOUR() could help you.
But HOUR() or DATE() cannot use INDEX. To make query faster, I suggest that add time_created TIME column and save only TIME part. after that ADD INDEX(date_created, time_created). finally with below query, you can retrieve rows with high speed.
where audit.date_created between '2013-11-01 00:00:00' AND '2013-11-01 23:59:59'
AND audit.time_created BETWEEN '05:00:00' AND '15:00:00'
Replace your DATE function, it skips time part. Use TIMESTAMP instead.
add following line in query and set your hour in BETWEEN
and time(audit.date_created) between '00:00:01' AND '01:00:00'
If you use date(audit.date_created), the index on date_created field could not take effect.
Just simply use where audit.date_created >= 'xx-xx-xx' and audit.date_created < 'xx-xx-xx'

Query to limit number of requests per day

I am trying to limit the user to perform only 5 requests per day. To do that, I am counting the number of request already performed during the same day using the query below:
SELECT COUNT(*) FROM table_name WHERE date_field >= CURDATE()
This gives me requests already performed days before. All I need is only the number of request performed in the course today. Any idea please?
SELECT COUNT(*) FROM table_name WHERE date_field >= CURDATE()
... will return the number of rows where date_field is greater than or equal to today if date_field is a date datatype. That is, ensure time is not a component but it certainly cannot return "requests already performed days before" unless there is a bug in the value stored in date_field column.
https://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html
Try:
SELECT COUNT(*) FROM table_name WHERE date(date_field) = CURDATE()
Assuming date_field is a timestamp/datetime field, date() will extract just the date part and evaluate it against the current date. You can read more about it on MySQL's Documentation

SQL SELECT WHERE with date and time

I have a SELECT query where I want to find all rows whose DATE and TIME are between 2011-12-11 23:00:00 and 2011-12-12 23:00:00 I try to do it with WHERE but row is empty
WHERE (date >= '2011-12-11' AND time > '23:00:00' )
AND (date < '2011-12-12' AND time < '23:00:00' )
Pls, any good suggestion how to change this?
You could use:
SELECT * FROM table WHERE DATETIME(date) BETWEEN '2011-11-11 23:00:00' AND '2011-12-13 23:00:00'
or separate:
SELECT * FROM table WHERE DATETIME(date) > '2011-12-11 23:00:00' AND DATETIME(date) < '2011-12-13 23:00:00'
EDIT:
I am not sure I understand what you are trying to achieve here or how your DB is laid out but assuming date and time are separate fields:
SELECT * FROM table WHERE DATETIME(concat(DATE(date),' ',TIME(time))) BETWEEN '2011-11-11 23:00:00' AND '2011-12-13 23:00:00'
I haven't tested but this may work.
Yup, that's pretty much not going to work. Show me all rows where time is greater than 11 pm and time is less that 11 pm. Time and Date are different fields?
You'll have to be a little more clever building up the query:
WHERE (date = '2011-12-11' AND time > '23:00:00' )
or ( date = '2011-12-12' AND time < '23:00:00' )
for a 24 hour window, you just need to have 2 clauses. If you want more than a 24 hour window, you'll need three clauses, one for the start date, one for the end date and one for all the dates in between:
WHERE (date = '2011-12-11' AND time > '23:00:00' )
or ( date = '2011-12-13' AND time < '23:00:00' )
or (date >='2011-12-12' and date < '2011-12-13')
ha, and I have the solution without rebuild the dbase - it's working :))
WHERE
CONCAT(date,' ',time) >= '2011-12-11 23:00:00'
AND
CONCAT(date,' ',time) < '2011-12-12 23:00:00'
Maybe it helps for someone.
thanks for all helping people, brgs
hard to tell without the complete query. also assuming that the date column is actually a date type(?) you would usually do something like TO_DATE('2012-12-11','yyyy-mm-dd') to convert to date types in the comparison.
Let's make sure of certain things
You need to get rid of the idea of separate date and time fields when searching
You need to create an additional column in your table called date_time (type DATETIME) which combines the two fields.
You should probably ditch the separate date and time fields and have just date_time
You can then create an index on date_time
Here is the command to do that
ALTER TABLE yourtable ADD INDEX date_time (date_time);
Once you do these things, THEN you can create a query with a WHERE clause that looks like this:
WHERE date_time >= '2011-12-11 23:00:00'
AND date_time < '2011-12-12 23:00:00'
If you cannot combine the date and time fields, you can still create an index
ALTER TABLE yourtable ADD INDEX date_time (date,time);
Given that situation, you can create a query with a WHERE clause that looks like this:
WHERE (date >= '2011-12-11' AND time >= '23:00:00')
AND (date <= '2011-12-12' AND time < '23:00:00')
The EXPLAIN plan for either situation should result in a fast execution of the query with the use of the date_time index.
Give it a Try !!!