What is the difference between DATEDIFF function and subtract INTERVAL DAY directly?
SELECT * FROM table WHERE DATEDIFF(CURDATE(), publish_date) <= 3
SELECT * FROM table WHERE publish_date >= CURDATE() - INTERVAL 3 DAY
Result data are the same, but it seems 2. way is a bit faster?
The first expression, that uses DATEDIFF() requires applying the date function on each and every row before the filtering can happen.
By contrast, the second expression does not imply such pre-processing: CURDATE() - INTERVAL 3 DAY is computed just once, and then compared directly against the value of publish_date. This predicate can take advantage of an index on the date column. This is the right way to do it.
In technical terms, we say that the second predicate is sargable, while the first one isn't: this stands for Search ARGument ABLE
As a rule of thumb: do not apply functions the column that you filter on if you have a way around.
Related
I'm trying to look at the number of active users of a product (toy example) over the last 30 days.
I'm considering two approaches.
One, date_sub is used to find the date 29 days before (the interval is 30 days inclusive of the start date) an end date. The where window is then defined by that earlier date and the end date.
That is this example:
SELECT
activity_date AS day,
COUNT(DISTINCT user_id) AS active_users
FROM Activity
WHERE
activity_date >= DATE_SUB("2019-07-27", INTERVAL 29 DAY)
AND
activity_date >= "2019-07-27"
A second approach is to calculate the datediff from a start date, then restrict the where clause to the previous time period.
SELECT
activity_date as day,
COUNT(DISTINCT user_id) AS active_users
FROM Activity
WHERE
datediff('2019-07-27', activity_date) < 30
AND
activity_date <= '2019-07-27'
I have no insight into which is the better option. I'd love for others to weigh in.
Use the first option:
activity_date
BETWEEN DATE_SUB(DATE("2019-07-27"), INTERVAL 29 DAY)
AND DATE("2019-07-27")
This compares the stored value directly to date litterals. Such an expression can take advantage of an index on the date column.
In, constrast the second expression applies date function datediff() to the date column. This makes the expression non-SARGable, meaning that it will not benefit an index:
datediff('2019-07-27', activity_date) < 30
and activity_date <= '2019-07-27'
Note that the first expression could be simply phrased:
activity_date >= '2019-07-27' - interval 29 day
and activity_date <= '2019-07-27'
I am unsure whether the second comparison should be >= rather than >. A reason why it would make sense is that activitydate has no time component. But I would recomment using <, because it works for both cases; if you want data up until '2019-07-27' included, you can do:
activity_date >= '2019-07-27' - interval 29 day
and activity_date < '2019-07-28'
I would definitely use the first query, if you have an index on the activity_date column.
When you do DATE_SUB() or DATE() on constant values, MySQL only needs to do that calculation once before it begins examining rows. The result of the expression is a constant.
Comparing an indexed column BETWEEN the two constant values, then it can use that index to locate the matching rows efficiently, using a range search.
Whereas if you put your column inside the call to DATEDIFF(), it has to re-calculate the result on every row examined, and it can't use the index. It will be forced to examine every row in the table. This is called a table-scan.
You can use EXPLAIN to confirm this. The first query will show type: range but the second query will show type: ALL, and the row column of the EXPLAIN will show an estimate roughly equal to the size of the table.
FWIW, this is generally true: any expression where you put a column inside a function call spoils any benefit of an index on that column. Indexes work because they're stored in sorted order, but MySQL can't use an index on a column inside an expression or function, because it doesn't do any analysis to determine if the result of the expression has the same sort order as the column itself.
In my database, I have a table called 'fine', in that table I have three fields, issue_date, expiry_date and fine_amount. I want the expiry_date to be computed from the issue date. The expiry date should always have 20 days more than the issue_date, So I wrote the query as:
ALTER TABLE fine ADD
expiry_date AS DATE_ADD(CURRENT_DATE,INTERVAL 20 DAY)
But there is a syntax error. I can't seem to find the solution.
Also I want the fine_amount to be 10 * (difference in days between current date and expiry date if current days exceeds expiry date). How do I go about doing that?
You can't implement the fine logic using a computed column because the formula involves the current time which is non deterministic. From the MySQL documentation:
Literals, deterministic built-in functions, and operators are permitted. A function is deterministic if, given the same data in tables, multiple invocations produce the same result, independently of the connected user. Examples of functions that fail this definition: CONNECTION_ID(), CURRENT_USER(), NOW().
So your best bet probably is to just compute values for these columns at the time you actually select. For example:
SELECT issue_date,
DATE_ADD(issue_date, INTERVAL 20 DAY) AS expiry_date,
CASE WHEN NOW() > DATE_ADD(issue_date, INTERVAL 20 DAY)
THEN 10*DATEDIFF(NOW(), DATE_ADD(issue_date, INTERVAL 20 DAY))
ELSE 0 END AS fine_amount
FROM fine
I have a MySQL DB table with multiple date type fields. I need to do different SELECT queries on this table but I am not sure which way is the best to find records from the same month.
I know I can do the following:
SELECT *
FROM table
WHERE MONTH(somedate) = 5
AND YEAR(somedate) = 2015
But I keep reading that isn't efficient and that I should go with using actual dates, i.e.
SELECT *
FROM table
WHERE somedate BETWEEN '2015-05-01' AND '2015-05-31'
However, all I would have is the month and the year as variables coming in from PHP. How do I easily and quickly calculate the last day of the month if I go with second option?
Don't calculate the last day of the month. Calculate the first day of the next month instead.
Your query can be like this
WHERE t.mydatetimecol >= '2015-05-01'
AND t.mydatetimecol < '2015-05-01' + INTERVAL 1 MONTH
Note that we're doing a less than comparison, not a "less than or equal to"... this is very convenient for comparing TIMESTAMP and DATETIME columns, which can include a time portion.
Note that a BETWEEN comparison is a "less than or equal to". To get a comparison equivalent to the query above, we'd need to do
WHERE t.mydatetimecol
BETWEEN '2015-05-01' AND '2015-05-01' + INTERVAL 1 MONTH + INTERVAL -1 SECOND
(This assumes that the resolution of DATETIME and TIMESTAMP is down to a second. In other databases, such as SQL Server, the resolution is finer than a second, so there we'd have the potential of missing a row with value of '2015-05-31 23:59:59.997'. We don't have a problem like that with the less than the first day of the next month comparison... < '2015-06-01'
No need to do the month or date math yourself, let MySQL do it for you. If you muck with adding 1 to the month, you have to handle the rollover from December to January, and increment the year. MySQL has all that already builtin.
date('t', strtotime("$year-$month-01")) will give days in the month
I am trying to get data from a database between 8PM (say, today) and 2AM tomorrow.
I have been using clauses such as where hour(date_field)>=20 and hour(date_field) <23 to obtain data in the same day.
Here the date_field is datetime
All I want is to be able to tell SQL to get data after 8PM today, increment the datefield and then get data till 2AM tomorrow.
Any help will be appreciated.
The normal pattern for retrieving rows based on a datetime range is perform comparisons on the bare column, comparing the column value to constants derived from expressions.
To get rows for a single contiguous range, 8PM today to 2AM tomorrow, for example:
WHERE t.date_column >= DATE(NOW()) + INTERVAL 20 HOUR
AND t.date_column < DATE(NOW()) + INTERVAL 26 HOUR
To unpack that a little bit: NOW() returns current datetime, the DATE() function truncates the time portion to midnight, then we add back in enough hours to get '8PM today', or enough hours to get '2AM tomorrow'.
If you are meaning to retrieve multiple "8PM to 2AM" periods, for a whole series of days.
First, you'd want an upper and lower bound of the date_column to be retrieved (unless you want every possible date)
WHERE t.date_column >= '2014-08-01 20:00:00'
AND t.date_column < '2014-10-02 02:00:00'
From that, we need to filter out all of the rows that aren't between 8PM and 2AM. One convenient way to do that would be to "subtract" two hours from the datetime col, and check for hour >= 6PM.
AND HOUR(t.date_column + INTERVAL -2 HOUR) >= 18
Note that the expression involving date_column will need to be evaluated for EVERY row in the table, unless there are some other predicates that filter rows out. With a suitable index available, MySQL can use an index range scan operation for predicates of the form date_column >= const and date_column < const. (It can't do that when the column is wrapped in a function or expression.)
I wish to query for
MyDate= '2013-07-08'
From the following records
MyDate
2013-07-08 09:15:21
2013-07-08 09:15:48
2013-07-09 09:20:39
I have come up with some ugly stuff :
MyDate > '2013-07-07 23:59:59' AND MyDate < '2013-07-09 00:00:01'
Is there a better/simple/elegant way to do this?
Use DATE() to isolate the date portion of the datetime expression.
WHERE DATE(MyDate) = '2013-07-08'
If your trying to compare dates use this. If not disregard.
This may not be the most perfect way but, i have used this in the past. Basically i would format both dates so they can be used with a greater than or equal to statement(YEAR/MONTH/DAY).
SELECT * FROM table
WHERE MyDate > DATE_FORMAT(2013-07-07 23:59:59, '%Y%m%y')
AND MyDate < DATE_FORMAT(2013-07-09 00:00:01, '%Y%m%y')
The normative pattern to matching the date portion of a DATETIME in a predicate (e.g. a WHERE clause) is:
WHERE MyDate >= '2013-07-08'
AND MyDate < '2013-07-08' + INTERVAL 1 DAY
When no time component is supplied, MySQL uses midnight as the time component, so there's no need to supply a time component of midnight. The bare column references in the predicate allow for MySQL to consider making efficient range scan on an index on the MyDate column.
For completeness, we'll note that it's also possible to use a ETWEEN operator. But because the "high side" comparison with the BETWEEN is a "less than or equal to", to get just values with date component of a single day, we'd need to back up the smallest fraction of time, which for a DATETIME is a single second:
WHERE MyDate BETWEEN '2013-07-08'
AND '2013-07-08' + INTERVAL 1 DAY + INTERVAL -1 SECOND
(If we had a datatype that had a finer resolution, we'd want to step back from the next day by that smallest unit of resolution.)
To avoid that issue, how fine a resolution is on a given datetime/timestamp datatype (more of an issue with other databases such as SQL Server than MySQL), I just have a preference of the former pattern, using a predicate like:
dateexpr >= midnight and dateexpr < midnight of next day
That's unambiguous, and there's no possible way to have a time value of 23:59:59.997 to be missed, and no possibility of getting exactly at midnight of the next day included.
Because the default time component, when none is supplied, is midnight, the first query predicate is equivalent to:
WHERE MyDate >= '2013-07-08 00:00:00'
AND MyDate < '2013-07-08 00:00:00' + INTERVAL 1 DAY
I think all those extra zeros to explicitly specify a time value of midnight are unnecessary clutter.