What is better: select date with trunc date or between - mysql

I need to create a query to select some data of my mysql db based on date, but in my where clause i have to options:
1 - trunc the date:
select count(*) from mailing_user where date_format(create_date, '%Y-%m-%d')='2013-11-05';
2 - use between
select count(*) from mailing_user where create_date between '2013-11-05 00:00:00' and '2013-11-05 23:59:59';
the two query's will work, but whats the better? Or, what's recommended? Why?

Here is an article to read.
http://willem.stuursma.name/2009/01/09/mysql-performance-with-date-functions/
If your created_date column is indexed, the 2nd query will be faster.
But if the column is not indexed and if this is your defined date format, you can use the following query.
select count(*) from mailing_user where DATE(create_date) = '2013-11-05';
I use DATE instead of DATE_FORMAT as I can make use of the native feature of getting in this format('2013-11-05').

From your question it seems you want to select records from one day, according to the documentation A DATETIME or TIMESTAMP value can include a trailing fractional seconds part in up to microseconds (6 digits) precision.
So this means your second query might actually get unlucky and miss some records that were inserted into the table at the very last second of that day, so that is why I would say the first one is more precise and is guaranteed to always get you the correct result.
The downside of this is that you cannot index that column using the date_format-function, because MySQL isn't cool with that.
If you don't want to use date_format and get around the precision issue you would change
where create_date between '2013-11-05 00:00:00' and '2013-11-05 23:59:59'
into
where create_date >= '2013-11-05 00:00:00' and create_date < '2013-12-05 00:00:00'

Number 2 will be faster if you have an index on the create_date because number one won't be able to use the index to quickly scan the results.
However this requires there to be an index on the create_date.
Otherwise I imagine they would be similar speed, possibly the second would still be faster because of the smaller processing time to compare(datetime comparison rather than converting to a string and comparing strings), but I doubt it'd be significant.

Related

SQL: Reuse function result in query without using sub-query

In a MySQL DB table that stores sale orders, I have a LastReviewed column that holds the last date and time when the sale order was modified (type timestamp, default value CURRENT_TIMESTAMP). I'd like to plot the number of sales that were modified each day, for the last 90 days, for a particular user.
I'm trying to craft a SELECT that returns the number of days since LastReviewed date, and how many records fall within that range. Below is my query, which works just fine:
SELECT DATEDIFF(CURDATE(), LastReviewed) AS days, COUNT(*) AS number FROM sales
WHERE UserID=123 AND DATEDIFF(CURDATE(),LastReviewed)<=90
GROUP BY days
ORDER BY days ASC
Notice that I am computing the DATEDIFF() as well as CURDATE() multiple times for each record. This seems really ineffective, so I'd like to know how I can reuse the results of the previous computation. The first thing I tried was:
SELECT DATEDIFF(CURDATE(), LastReviewed) AS days, COUNT(*) AS number FROM sales
WHERE UserID=123 AND days<=90
GROUP BY days
ORDER BY days ASC
Error: Unknown column 'days' in 'where clause'. So I started to look around the net. Based on another discussion (Can I reuse a calculated field in a SELECT query?), I next tried the following:
SELECT DATEDIFF(CURDATE(), LastReviewed) AS days, COUNT(*) AS number FROM sales
WHERE UserID=123 AND (SELECT days)<=90
GROUP BY days
ORDER BY days ASC
Error: Unknown column 'days' in 'field list'. I'm also tried the following:
SELECT #days := DATEDIFF(CURDATE(), LastReviewed) AS days,
COUNT(*) AS number FROM sales
WHERE UserID=123 AND #days <=90
GROUP BY days
ORDER BY days ASC
The query returns zero result, so #days<=90 seems to return false even though if I put it in the SELECT clause and remove the WHERE clause, I can see some results with #days values below 90.
I've gotten things to work by using a sub-query:
SELECT * FROM (
SELECT DATEDIFF(CURDATE(),LastReviewed) AS sales ,
COUNT(*) AS number FROM sales
WHERE UserID=123
GROUP BY days
) AS t
WHERE days<=90
ORDER BY days ASC
However I odn't know whether it's the most efficient way. Not to mention that even this solution computes CURDATE() once per record even though its value will be the same from the start to the end of the query. Isn't that wasteful? Am I overthinking this? Help would be welcome.
Note: Mods, should this be on CodeReview? I posted here because the code I'm trying to use doesn't actually work
There are actually two problems with your question.
First, you're overlooking the fact that WHERE precedes SELECT. When the server evaluates WHERE <expression>, it then already knows the value of the calculations done to evaluate <expression> and can use those for SELECT.
Worse than that, though, you should almost never write a query that uses a column as an argument to a function, since that usually requires the server to evaluate the expression for each row.
Instead, you should use this:
WHERE LastReviewed < DATE_SUB(CURDATE(), INTERVAL 90 DAY)
The optimizer will see this and get all excited, because DATE_SUB(CURDATE(), INTERVAL 90 DAY) can be resolved to a constant, which can be used on one side of a < comparison, which means that if an index exists with LastReviewed as the leftmost relevant column, then the server can immediately eliminate all of the rows with LastReviewed >= that constant value, using the index.
Then DATEDIFF(CURDATE(), LastReviewed) AS days (still needed for SELECT) will only be evaluated against the rows we already know we want.
Add a single index on (UserID, LastReviewed) and the server will be able to pinpoint exactly the relevant rows extremely quickly.
Builtin functions are much less costly than, say, fetching rows.
You could get a lot more performance improvement with the following 'composite' index:
INDEX(UserID, LastReviewed)
and change to
WHERE UserID=123
AND LastReviewed >= CURRENT_DATE() - INTERVAL 90 DAY
Your formulation is 'hiding' LastRevieded in a function call, making it unusable in an index.
If you are still not satisfied with that improvement, then consider a nightly query that computes yesterday's statistics and puts them in a "Summary table". From there, the SELECT you mentioned can run even faster.

STR_TO_DATE() vs CONSTANT when comparing DATETIME field

There is a table with DATETIME field named 'created_at'.
I try execute two queries like this:
SELECT * FROM myTable WHERE created_at BETWEEN '2015-03-15 10:25:00' AND '2015-03-25 10:30:00';
SELECT * FROM myTable WHERE created_at BETWEEN
STR_TO_DATE('2015-03-15 10:25:00', '%Y-%m-%d %H:%i:%s') AND STR_TO_DATE('2015-03-25 10:30:00', '%Y-%m-%d %H:%i:%s');
I always used the first query, but recently came across an article in which describes that the second approach is the best way to compare DATETIME. Unfortunately, it does not contain any explain why that approach is the best way.
Now, I have some questions:
Is there any difference between these two approaches?
Which way is more preferable?
Thanks!
I much prefer to put the constants in directly. I believe that MySQL will process the str_to_date() function only once for the query, because the arguments are constants. However, I don't like to depend on this optimization.
The advantage to str_to_date() is that it should be independent of internationalization settings so the result should be unambiguous. However, the use of ISO standard formats should be equivalent, and that is the structure of your constants.
However, that aside, a better way to write the query is:
SELECT *
FROM myTabl
WHERE created_at >= '2015-03-15 10:25:00' AND
created_at < '2015-03-25 10:30:00'
I am guessing that you don't really want 10 days, five minutes and one second in the interval, but want exactly 10 days and five minutes. In any case, the use of between with dates and datetimes can cause unexpected results, particularly when you do:
where datetime between '2015-03-15' and '2015-03-16'
If you think you are getting two dates, you are wrong. You are getting all date times on the first day plus midnight on the second.

MySQL fastest way to search by DATE if a record exist

I have found many way to search a mysql record by DATE
Method 1:
SELECT id FROM table WHERE datetime LIKE '2015-01-01%' LIMIT 1
Method 2 (same as method 1 + ORDER BY):
SELECT id FROM table WHERE datetime LIKE '2015-01-01%' ORDER BY datetime DESC LIMIT 1
Method 3:
SELECT id FROM table WHERE datetime BETWEEN '2015-01-01' AND '2015-01-01 23:59:59' LIMIT 1
Method 4:
SELECT id FROM table WHERE DATE_FORMAT( datetime, '%y.%m.%d' ) = DATE_FORMAT( '2015-01-01', '%y.%m.%d' )
Method 5 (I think is the slowest):
SELECT id FROM table WHERE DATE(`datetime`) = '2015-01-01' LIMIT 1
What is the fastest?
In my case the table has 1 million rows, and the date to search is always recent.
The fastest of the methods you've mentioned is
SELECT id
FROM table
WHERE datetime BETWEEN '2015-01-01' AND '2015-01-01 23:59:59'
LIMIT 1
This is made fast when you create an index on the datetime column. The index can be random-accessed to find the first matching row, and then scanned until the last matching row. So it's not necessary to read the whole table, or even the whole index. And, when you use LIMIT 1, it just reads the single row. Very fast, even on an enormous table.
Your other means of search apply a function to each row:
datetime LIKE '2011-01-01%' casts datetime as a string for each row.
Methods 3,4, and 5 all use explicit functions like DATE() on the contents of each row.
The use of these functions defeats the use of indexes to find your data.
Pro tip: Don't use BETWEEN for date arithmetic because it handles the ending condition poorly. Instead use
WHERE datetime >= '2015-01-01'
AND datetime < '2015-01-02'
This performs just as well as BETWEEN and gets you out of having to write the last moment of 2015-01-01 explicitly as 23:59:59. That isn't correct with higher precision timestamps anyway.
The fastest way, assuming there's in index on the datetime column, is a variant of method 3 except both range values are datetime literals:
SELECT id FROM table
WHERE datetime BETWEEN '2015-01-01 00:00:00' AND '2015-01-01 23:59:59'
LIMIT 1
Using literal of the same type as the column means there won't be any casting of the column to perform comparison, giving the best chance of using an index on the column. I have used this in production to great effect.

WHERE clause to filter times that are under an hour

SELECT
name,
start_time,
TIME(cancelled_date) AS cancelled_time,
TIMEDIFF(start_time, TIME(cancelled_date)) AS difference
FROM
bookings
I'm trying to get from the database a list of bookings which were cancelled with less than an hour's notice. The start time and the cancellation times are both in TIME format, I know a timestamp would have made this easier. So above I've calculated the time difference between the two values and now need to add a WHERE clause to restrict it to only those records that have a difference of under 1:00:00. Obviously this isn't a number, it's a time, so a simple bit of maths won't do it.
start_time is a TIME
cancelled_date is a DATETIME but I'm converting it to TIME in the query to then calculate cancelled_time and difference.
I would be inclined to do this by adding and hour to the notice, something like this:
WHERE start_time > date_add(cancelled_date, interval 1 hour)
I can't quite tell what the right logic is from the question, because your column names don't match the description.
In this case, so a subtraction or doing the comparison are similar performance wise. But, if you had a constant instead of cancelled_date, then there is a difference. The following:
WHERE start_time < date_add(now(), interval -1 hour)
Allows the engine to use an index on start_time.
you can use having difference<time('1:00')

Getting week started date using MySQL

If I have MySQL query like this, summing word frequencies per week:
SELECT
SUM(`city`),
SUM(`officers`),
SUM(`uk`),
SUM(`wednesday`),
DATE_FORMAT(`dateTime`, '%d/%m/%Y')
FROM myTable
WHERE dateTime BETWEEN '2011-09-28 18:00:00' AND '2011-10-29 18:59:00'
GROUP BY WEEK(dateTime)
The results given by MySQL take the first value of column dateTime, in this case 28/09/2011 which happens to be a Saturday.
Is it possible to adjust the query in MySQL to show the date upon which the week commences, even if there is no data available, so that for the above, 2011-09-28 would be replaced with 2011/09/26 instead? That is, the date of the start of the week, being a Monday. Or would it be better to adjust the dates programmatically after the query has run?
The dateTime column is in format 2011/10/02 12:05:00
It is possible to do it in SQL but it would be better to do it in your program code as it would be more efficient and easier. Also, while MySQL accepts your query, it doesn't quite make sense - you have DATE_FORMAT(dateTime, '%d/%m/%Y') in select's field list while you group by WEEK(dateTime). This means that the DB engine has to select random date from current group (week) for each row. Ie consider you have records for 27.09.2011, 28.09.2011 and 29.09.2011 - they all fall onto same week, so in the final resultset only one row is generated for those three records. Now which date out of those three should be picked for the DATE_FORMAT() call? Answer would be somewhat simpler if there is ORDER BY in the query but it still doesn't quite make sense to use fields/expressions in the field list which aren't in GROUP BY or which aren't aggregates. You should really return the week number in the select list (instead of DATE_FORMAT call) and then in your code calculate the start and end dates from it.