Lets say, I have a table:
+------------+-----------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+------------+-----------+------+-----+-------------------+-----------------------------+
| id | int(10) | NO | PRI | | AUTOINCREMENT |
| id_action | int(10) | NO | IDX | | |
| a_date | date | NO | IDX | | |
| a_datetime | datetime | NO | IDX | | |
+------------+-----------+------+-----+-------------------+-----------------------------+
Each row has some id_action, and the a_date and a_datetime when it was executed on the website.
My question is, when I want to return COUNT() of each id_action grouped by a_date, is it same, when I use this two selects, or they are different in speed? Thanks for any explanation.
SELECT COUNT(id_action), id_action, a_date
FROM my_table
GROUP BY a_date
ORDER BY a_date DESC
and
SELECT COUNT(id_action), id_action, DATE_FORMAT(a_datetime, '%Y-%m-%d') AS `a_date`
FROM my_table
GROUP BY DATE_FORMAT(a_datetime, '%Y-%m-%d')
ORDER BY a_date DESC
In other words, my question is, that each action has its datetime, and if I really need column a_date, or it is the same using DATE_FORMAT function and column a_datetime and I dont need column a_date?
I ran both the queries on similar table on MySQL 5.5.
The table has 10634079 rows.
First one took 10.66 initially and always takes approx 10 secs on further attempts.
Seconds Query takes 1.25 mins to execute first time, on second, 3rd.... attempts its taking 22.091 secs
So in my view, if your are looking for performance, then you must have column a_date, as its taking half of the time when executed without Date_Format.
If performance is not the primay concern (like data redundancy can be) then a_datetime column will serve all other date/datetime related purposes.
DATE : The DATE type is used for values with a date part but no time part.
DATETIME: The DATETIME type is used for values that contain both date and time parts.
so if you have DATETIME you can always derive DATE from it but from DATE you can not get DATETIME.
And as per your sql there will not be a major difference.
It will be better not to have a_date because you already have a_datetime.
but in general if you can use TIMESTAMP you should, because it is more space-efficient than DATETIME.
Using a_date to group by day will be more efficient than a_datetime because of your conversion. In T-SQL I use a combination of DATEADD() and DATEDIFF() to get the date only from DATETIME since math is more efficient than data conversion. For example (again, using T-SQL though I'm sure there's something similar for MySQL):
SELECT COUNT(id_action), id_action,
DATEADD(DD,DATEDIFF(DD,0,a_datetime),0) AS [a_date]
FROM my_table
GROUP BY DATEADD(DD,DATEDIFF(DD,0,a_datetime),0) AS [a_date]
ORDER BY a_date DESC
This will find the number of days between day 0 and a_datetime then add that number of days to day 0 again. (Day 0 is just an arbitrary date chosen for it's simplicity.)
Perhaps the MySQL version of that would be:
DATE_ADD('2014-01-01', INTERVAL DATEDIFF('2014-01-01',a_datetime) DAY)
Sorry I don't have MySQL installed or I would try that myself. I'd expect it to be more efficient than casting/formatting but less efficient than using a_date.
If you are doing a function in your group by clause: "GROUP BY DATE_FORMAT(a_datetime, '%Y-%m-%d')", you will not be leveraging your index: "a_datetime".
As for speed, I believe there will be no noticeable difference between indexing on datetime vs date (but it's always easy to test with 'explain')
Lastly, you can always read a datetime as a date (using cast functions if need be). Your schema is not normalized if you have both a a_date and a_datetime. You should consider removing one of them. If date provides enough granularity for your application, then get rid of datetime. Otherwise, get rid of a_date and cast as required
As already mentioned, the performance of any function(o_datetime) will be worse than just a_date. The choice depends on on your needs, if there is no need to DATETIME, take a DATE and that is.
If you still need to find a function to convert, then I advise you to take a date().
See also How to cast DATETIME as a DATE in mysql?
Put the two statements in editor SQL and execute (CTRL-L) statistics.
https://technet.microsoft.com/en-us/library/ms178071%28v=sql.105%29.aspx
https://msdn.microsoft.com/pt-br/library/ms190287.aspx?f=255&MSPPError=-2147217396
Related
I don't know what is wrong in my MYSQL query:
SELECT * FROM package_customers pc
left join installations ins on ins.package_customer_id = pc.id
WHERE pc.status = 'scheduled'
AND CAST(ins.schedule_date as DATE) >='10-27-2017'
The fields are:
status data type enum
schedule_date data type varchar
In the column schedule_date, the data is like this: 10-27-2017 12AM 12PM
I am trying to find date-wise data.
cast function can work if the source data is in acceptable format.
There are some conditions to validate date and time formats.
Your schedule_date column value does not match them. And hence, cast failed.
Please read documentation on Date and Time Types.
You should think of redesigning the table to include schedule_start and schedule_end columns with datetime data type. MySQL has various date and time functions to work with such data fields.
For time being, your varchar date data can be handled in following way.
mysql> SELECT #dt_string:='10-27-2017 12AM 12PM' AS dt_string
-> , #dtime:=STR_TO_DATE( #dt, '%m-%d-%Y %h%p %h%p' ) AS date_time
-> , DATE( #dtime ) AS my_date;
+-----------------------+---------------------+------------+
| dt_string | date_time | my_date |
+-----------------------+---------------------+------------+
| 10-27-2017 12AM 12PM | 2017-10-27 12:00:00 | 2017-10-27 |
+-----------------------+---------------------+------------+
1 row in set (0.00 sec)
There's a date (its a varchar(30)) in a format, like this
d.m.y
, where
day = a day without leading zeros
m = a month with leading zeros
y = last two numbers of a year
And a table, that looks like this
id | date | price
1 | 7.04.14 | 10
2 | 8.04.14 | 20
3 | 9.04.14 | 30
And when a query is executed,
SELECT `price` FROM `table` WHERE `date` BETWEEN '7.04.14' AND '9.04.14';
it returns nothing
The thing: I cannot change a date format, and I have to get prices between two dates. Is there an easy way of doing this?
Just parse the dates.
SELECT price
FROM `table`
WHERE STR_TO_DATE(`date`, '%d.%m.%y')
BETWEEN STR_TO_DATE(...) AND STR_TO_DATE(...)
Also, consider taking a look at the manual page for STR_TO_DATE.
But as #juergen d writes, it is far better to use date types.
Running the following statement, MySQL seems to mix things up:
select now(), if(false, date(now()), time(now()));
| 2013-07-24 10:06:21 | 2010-06-21 00:00:00 |
If replacing the second argument of the if with a literal string, the statement behaves correctly:
select now(), if(false, 'Banana', time(now()));
| 2013-07-24 10:06:21 | 10:06:21 |
Is this a bug or some really strange quirk?
The return type of IF has to be a datatype that includes the types of both arguments. So if one of the arguments is a DATE and the other is a TIME, the type of IF will be DATETIME.
This doesn't seem necessary in the trivial example query, but consider something like:
SELECT IF(col1, date(col2), time(col2)) AS dt
FROM Table
All the rows of the result have to have the same datatype in the dt column, even though the specific data will depend on what's in that row.
If you want just the date or time, convert it to a string.
I have an table in my DB something like this:
----------------------------------------------------------
| event_id | date | start_time | end_time | duration |
----------------------------------------------------------
| 1 | 2011-05-13 | 01:00:00 | 04:00:00 | 10800 |
| 2 | 2011-05-12 | 17:00:00 | 01:00:00 | 28800 |
| 3 | 2011-05-11 | 11:00:00 | 14:00:00 | 10800 |
----------------------------------------------------------
This sample data doesn't give a totally accurate picture, there is typically events covering every hour of every day.
The date always refers to the start_time, as the end_time can sometimes be the following day.
The duration is in seconds.
SELECT *
FROM event_schedules
WHERE (
date = CURDATE() //today
OR
date = DATE_SUB(CURDATE(), INTERVAL 1 DAY) //yesterday
)
// and ended before now()
AND DATE_ADD(CONCAT(date, ' ', start_time), INTERVAL duration SECOND) < NOW()
ORDER BY CONCAT(date, ' ', start_time) DESC
LIMIT 1
I have a clause in there, the OR'ed clause in brackets, that is unnecessary. I hoped that it might improve the query time, by first filtering out any "events" that do not start today or yesterday. The only way to find the most recent "event" is by ordering the records and taking the first. By adding this extra unnecessary clause am I actually reducing the list of records that need to be ordered? If it does I can't imagine the optimizer being able to make this optimization, most other questions similar to this talk about the optimizer.
Be careful when adding filters to your WHERE clause for performance. While it can reduce the overall number of rows that need to be searched, the actual filter itself can cause a higher cost if it's filtering a ton of records and not using an index. In your case, if the column date is indexed, you'll probably get better performance because it can use the index in the OR part, where as it can't in the other parts because it's being called as a function. Also, can you have future dates? If not, why don't you change the OR to
date > DATE_SUB(CURDATE(), INTERVAL 1 DAY)
The order of the where clause does affect the way the sql engine gets the results.
Many of them have a way to view what the engine does with a query. If you're using sqlserver look for "show estimated execution plan" in your client tool. Some have a verb like "explain" that can be used to show how the engine treats a query.
Well, the optimizer in the query engine is a big part of any query's performance, or the relative performance of two equivalent statements.
You didn't tell us if you ran the query with and without the extra where. There may be a performance difference, there may not.
My guess is that the LIMIT has a lot to do with it. The engine knows this is a "one and done" operation. Without the WHERE, sorting is an NlogN operation, which in this special case can be made linear with a simple scan of the dates to find the most recent.
With the WHERE, you're actually increasing the number of steps it has to perform; either it has to fully order the table (NlogN) and then scan that list for the first record that matches the WHERE clause (linear worst-case, constant best-case), OR it has to filter by the WHERE (linear), then scan those records again to find the max date (linear again). Whichever one turns out faster, they're both slower than one linear scan of the list for the most recent date.
What's the best way to store a date value for which in many cases only the year may be known?
MySQL allows zeros in date parts unless the NO_ZEROES_IN_DATE sql mode is enabled, which isn't by default. Is there any reason not to use a date field where if the month and day may be zero, or to split it up to 3 different fields for year, month and day (year(4), tinyint, tinyint)?
A better way is to split the date into 3 fields. Year, Month, Day. This gives you full flexibility for storing, sorting, and searching.
Also, it's pretty trivial to put the fields back together into a real date field when necessary.
Finally, it's portable across DBMS's. I don't think anyone else supports a 0 as a valid part of a date value.
Unless portability across DBMS is important, I would definitely be inclined to use a single date field. If you require even moderately complex date related queries, having your day, month and year values in separate fields will become a chore.
MySQL has a wealth of date related functions - http://dev.mysql.com/doc/refman/5.1/en/date-and-time-functions.html. Use YEAR(yourdatefield) if you want to return just the year value, or the same if you want to include it in your query's WHERE clause.
You can use a single date field in Mysql to do this. In the example below field has the date data type.
mysql> select * from test;
+------------+------+
| field | id |
+------------+------+
| 2007-00-00 | 1 |
+------------+------+
1 row in set (0.00 sec)
mysql> select * from test where YEAR(field) = 2007;
+------------+------+
| field | id |
+------------+------+
| 2007-00-00 | 1 |
+------------+------+
I would use one field it will make the queries easier.
Yes using the Date and Time functions would be better.
Thanks BrynJ
You could try a LIKE operative. Such as:
SELECT * FROM table WHERE date_feield LIKE 2009;
It depends on how you use the resulting data. A simple answer would be to simply store those dates where only the year is known as January 1. This approach is really simple and allows you to aggregate by year using all the standard built in date functions.
The problem arises if the month or date is significant. For example if you are trying to determine the age of a record in days, weeks, months or if you want to show distribution across this smaller level of granularity. This problem exists any way, though. If you have some full dates and some with only a year, how do you want to represent them in such instances.