Storing a date where only the year may be known - mysql

What's the best way to store a date value for which in many cases only the year may be known?
MySQL allows zeros in date parts unless the NO_ZEROES_IN_DATE sql mode is enabled, which isn't by default. Is there any reason not to use a date field where if the month and day may be zero, or to split it up to 3 different fields for year, month and day (year(4), tinyint, tinyint)?

A better way is to split the date into 3 fields. Year, Month, Day. This gives you full flexibility for storing, sorting, and searching.
Also, it's pretty trivial to put the fields back together into a real date field when necessary.
Finally, it's portable across DBMS's. I don't think anyone else supports a 0 as a valid part of a date value.

Unless portability across DBMS is important, I would definitely be inclined to use a single date field. If you require even moderately complex date related queries, having your day, month and year values in separate fields will become a chore.
MySQL has a wealth of date related functions - http://dev.mysql.com/doc/refman/5.1/en/date-and-time-functions.html. Use YEAR(yourdatefield) if you want to return just the year value, or the same if you want to include it in your query's WHERE clause.

You can use a single date field in Mysql to do this. In the example below field has the date data type.
mysql> select * from test;
+------------+------+
| field | id |
+------------+------+
| 2007-00-00 | 1 |
+------------+------+
1 row in set (0.00 sec)
mysql> select * from test where YEAR(field) = 2007;
+------------+------+
| field | id |
+------------+------+
| 2007-00-00 | 1 |
+------------+------+
I would use one field it will make the queries easier.
Yes using the Date and Time functions would be better.
Thanks BrynJ

You could try a LIKE operative. Such as:
SELECT * FROM table WHERE date_feield LIKE 2009;

It depends on how you use the resulting data. A simple answer would be to simply store those dates where only the year is known as January 1. This approach is really simple and allows you to aggregate by year using all the standard built in date functions.
The problem arises if the month or date is significant. For example if you are trying to determine the age of a record in days, weeks, months or if you want to show distribution across this smaller level of granularity. This problem exists any way, though. If you have some full dates and some with only a year, how do you want to represent them in such instances.

Related

SQL Like Search

I am trying run to say find the devices that did not contain 01: in the past 7 days.
I have tried "Where column Not Like '%01:%'" but it just removes the 01: and still shows the machine that had the 01: in the past 7 days.
I have a table called devices. Each location has a unique ID number. Each device runs a job at 1am and 7pm. Devices should have 1 entry for 01:00:00 per week then 3 entries for 19:00:00 per week. Ex of cell data is 2017-10-23 19:00:02.
So I begin with
Select * From devices
Where locationid=##
AND jobdate < DATE_SUB(NOW(), INTERVAL 7 DAY))
AND jobdate not like '%01:%'
What I get in result is the machine that did run at 01:00 2 days ago. The job date shows 19:00 so it sounds like it just removed the 01:.
I am thinking of grouping the job data then say list the computer that did not have 2017-10-23 01:00:02 .
There is a good deal of intuition in the following suggestion, more on that later.
Most databases don't actually store date/time information is a WYSIWYG fashion. Indeed if you think about it long enough you will understand that date/times are really "sets of numbers". That is why we can do things like calculate the number of days from date1 to date2 etc. So, IF the data is stored as a datetime data type don't attempt to use LIKE (which is for text) against a datetime column. Instead look for date and time related functions that may apply to your situation. Here you are looking for not equal to specific time of day (I think). So, to remove "date" from consideration convert it to "time", and then you can filter on that.
So below, I introduce a new column jobtime which is the time portion of jobdate, and then I look for any times not equal to a given value.
SQL Fiddle
MySQL 5.6 Schema Setup:
CREATE TABLE Devices
(`locationid` varchar(2), `jobdate` datetime)
;
INSERT INTO Devices
(`locationid`, `jobdate`)
VALUES
('##', '2017-10-23 01:00:00'),
('##', '2017-10-23 19:00:02')
;
Query 1:
select
*
from (
select locationid, cast(jobdate as time) jobtime, jobdate
from devices
) d
where locationid = '##'
and jobtime <> '01:00:00'
;
Results:
| locationid | jobtime | jobdate |
|------------|----------|----------------------|
| ## | 19:00:02 | 2017-10-23T19:00:02Z |
...
why is there "intuition" above? (the "more on this later")
It is remarkably frustrating to not know which database is in use because the syntax differs so much between the vendors. It is also essential to know the EXACT data type of the jobdate column - because if it is varchar for example I have just made a complete fool of myself in the query above. In other words we are not likely to answer because key facts are missing.
Finally, you have data! It's in your table(s) already. Why not make it easy on everyone by sharing a few bits of it? Provide "sample data" with your question, and the "expected result" too (i.e. provide 2 things, not one without the other, and do not use images of data!!!). Hopefully you can see from the example above how useful sample data & result is. For example, if my intuition is way off, you can tell in an instant that it is - even if you don't read the SQL.
Rant over, not all points raised here apply to this question.

Storing date periods in database

I would like to discuss the "best" way to storage date periods in a database. Let's talk about SQL/MySQL, but this question may be for any database. I have the sensation I am doing something wrong for years...
In english, the information I have is:
-In year 2014, value is 1000
-In year 2015, value is 2000
-In year 2016, there is no value
-In year 2017 (and go on), value is 3000
Someone may store as:
BeginDate EndDate Value
2014-01-01 2014-12-31 1000
2015-01-01 2015-12-31 2000
2017-01-01 NULL 3000
Others may store as:
Date Value
2014-01-01 1000
2015-01-01 2000
2016-01-01 NULL
2017-01-01 3000
First method validation rules looks like mayhem to develop in order to avoid holes and overlaps.
In second method the problem seem to filter one punctual date inside a period.
What my colleagues prefer? Any other suggestion?
EDIT: I used full year only for example, my data usually change with day granularity.
EDIT 2: I thought about using stored "Date" as "BeginDate", order rows by Date, then select the "EndDate" in next (or previous) row. Storing "BeginDate" and "Interval" would lead to hole/overlap problem as method one, that I need a complex validation rule to avoid.
It mostly depends on the way you will be using this information - I'm assuming you do more than just store values for a year in your database.
Lots of guesses here, but I guess you have other tables with time-bounded data, and that you need to compare the dates to find matches.
For instance, in your current schema:
select *
from other_table ot
inner join year_table yt on ot.transaction_date between yt.year_start and yt.year_end
That should be an easy query to optimize - it's a straight data comparison, and if the table is big enough, you can add indexes to speed it up.
In your second schema suggestion, it's not as easy:
select *
from other_table ot
inner join year_table yt
on ot.transaction_date between yt.year_start
and yt.year_start + INTERVAL 1 YEAR
Crucially - this is harder to optimize, as every comparison needs to execute a scalar function. It might not matter - but with a large table, or a more complex query, it could be a bottleneck.
You can also store the year as an integer (as some of the commenters recommend).
select *
from other_table ot
inner join year_table yt on year(ot.transaction_date) = yt.year
Again - this is likely to have a performance impact, as every comparison requires a function to execute.
The purist in me doesn't like to store this as an integer - so you could also use MySQL's YEAR datatype.
So, assuming data size isn't an issue you're optimizing for, the solution really would lie in the way your data in this table relates to the rest of your schema.

Difference between DATE and DATETIME in WHERE clause

Lets say, I have a table:
+------------+-----------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+------------+-----------+------+-----+-------------------+-----------------------------+
| id | int(10) | NO | PRI | | AUTOINCREMENT |
| id_action | int(10) | NO | IDX | | |
| a_date | date | NO | IDX | | |
| a_datetime | datetime | NO | IDX | | |
+------------+-----------+------+-----+-------------------+-----------------------------+
Each row has some id_action, and the a_date and a_datetime when it was executed on the website.
My question is, when I want to return COUNT() of each id_action grouped by a_date, is it same, when I use this two selects, or they are different in speed? Thanks for any explanation.
SELECT COUNT(id_action), id_action, a_date
FROM my_table
GROUP BY a_date
ORDER BY a_date DESC
and
SELECT COUNT(id_action), id_action, DATE_FORMAT(a_datetime, '%Y-%m-%d') AS `a_date`
FROM my_table
GROUP BY DATE_FORMAT(a_datetime, '%Y-%m-%d')
ORDER BY a_date DESC
In other words, my question is, that each action has its datetime, and if I really need column a_date, or it is the same using DATE_FORMAT function and column a_datetime and I dont need column a_date?
I ran both the queries on similar table on MySQL 5.5.
The table has 10634079 rows.
First one took 10.66 initially and always takes approx 10 secs on further attempts.
Seconds Query takes 1.25 mins to execute first time, on second, 3rd.... attempts its taking 22.091 secs
So in my view, if your are looking for performance, then you must have column a_date, as its taking half of the time when executed without Date_Format.
If performance is not the primay concern (like data redundancy can be) then a_datetime column will serve all other date/datetime related purposes.
DATE : The DATE type is used for values with a date part but no time part.
DATETIME: The DATETIME type is used for values that contain both date and time parts.
so if you have DATETIME you can always derive DATE from it but from DATE you can not get DATETIME.
And as per your sql there will not be a major difference.
It will be better not to have a_date because you already have a_datetime.
but in general if you can use TIMESTAMP you should, because it is more space-efficient than DATETIME.
Using a_date to group by day will be more efficient than a_datetime because of your conversion. In T-SQL I use a combination of DATEADD() and DATEDIFF() to get the date only from DATETIME since math is more efficient than data conversion. For example (again, using T-SQL though I'm sure there's something similar for MySQL):
SELECT COUNT(id_action), id_action,
DATEADD(DD,DATEDIFF(DD,0,a_datetime),0) AS [a_date]
FROM my_table
GROUP BY DATEADD(DD,DATEDIFF(DD,0,a_datetime),0) AS [a_date]
ORDER BY a_date DESC
This will find the number of days between day 0 and a_datetime then add that number of days to day 0 again. (Day 0 is just an arbitrary date chosen for it's simplicity.)
Perhaps the MySQL version of that would be:
DATE_ADD('2014-01-01', INTERVAL DATEDIFF('2014-01-01',a_datetime) DAY)
Sorry I don't have MySQL installed or I would try that myself. I'd expect it to be more efficient than casting/formatting but less efficient than using a_date.
If you are doing a function in your group by clause: "GROUP BY DATE_FORMAT(a_datetime, '%Y-%m-%d')", you will not be leveraging your index: "a_datetime".
As for speed, I believe there will be no noticeable difference between indexing on datetime vs date (but it's always easy to test with 'explain')
Lastly, you can always read a datetime as a date (using cast functions if need be). Your schema is not normalized if you have both a a_date and a_datetime. You should consider removing one of them. If date provides enough granularity for your application, then get rid of datetime. Otherwise, get rid of a_date and cast as required
As already mentioned, the performance of any function(o_datetime) will be worse than just a_date. The choice depends on on your needs, if there is no need to DATETIME, take a DATE and that is.
If you still need to find a function to convert, then I advise you to take a date().
See also How to cast DATETIME as a DATE in mysql?
Put the two statements in editor SQL and execute (CTRL-L) statistics.
https://technet.microsoft.com/en-us/library/ms178071%28v=sql.105%29.aspx
https://msdn.microsoft.com/pt-br/library/ms190287.aspx?f=255&MSPPError=-2147217396

Datetime vs Date and Time Mysql

I generally use datetime field to store created_time updated time of data within an application.
But now i have come across a database table where they have kept date and time separate fields in table.
So what are the schema in which two of these should be used and why?
What are pros and cons attached with using of two?
There is a huge difference in performance when using DATE field above DATETIME field. I have a table with more then 4.000.000 records and for testing purposes I added 2 fields with both their own index. One using DATETIME and the other field using DATE.
I disabled MySQL query cache to be able to test properly and looped over the same query for 1000x:
SELECT * FROM `logs` WHERE `dt` BETWEEN '2015-04-01' AND '2015-05-01' LIMIT 10000,10;
DATETIME INDEX:
197.564 seconds.
SELECT * FROM `logs` WHERE `d` BETWEEN '2015-04-01' AND '2015-05-01' LIMIT 10000,10;
DATE INDEX:
107.577 seconds.
Using a date indexed field has a performance improvement of: 45.55%!!
So I would say if you are expecting a lot of data in your table please consider in separating the date from the time with their own index.
I tend to think there are basically no advantages to storing the date and time in separate fields. MySQL offers very convenient functions for extracting the date and time parts of a datetime value.
Okay. There can be some efficiency reasons. In MySQL, you can put separate indexes on the fields. So, if you want to search for particular times, for instance, then a query that counts by hours of the day (for instance) can use an index on the time field. An index on a datetime field would not be used in this case. A separate date field might make it easier to write a query that will use the date index, but, strictly speaking, a datetime should also work.
The one time where I've seen dates and times stored separately is in a trading system. In this case, the trade has a valuation date. The valuation time is something like "NY Open" or "London Close" -- this is not a real time value. It is a description of the time of day used for valuation.
The tricky part is when you have to do date arithmetic on a time value and you do not want a date portion coming into the mix. Ex:
myapptdate = 2014-01-02 09:00:00
Select such and such where myapptdate between 2014-01-02 07:00:00 and 2014-01-02 13:00:00
1900-01-02 07:00:00
2014-01-02 07:00:00
One difference I found is using BETWEEN for dates with non-zero time.
Imagine a search with "between dates" filter. Standard user's expectation is it will return records from the end day as well, so using DATETIME you have to always add an extra day for the BETWEEN to work as expected, while using DATE you only pass what user entered, with no extra logic needed.
So query
SELECT * FROM mytable WHERE mydate BETWEEN '2020-06-24' AND '2020-06-25'
will return a record for 2020-06-25 16:30:00, while query:
SELECT * FROM mytable WHERE mydatetime BETWEEN '2020-06-24' AND '2020-06-25'
won't - you'd have to add an extra day:
SELECT * FROM mytable WHERE mydatetime BETWEEN '2020-06-24' AND '2020-06-26'
But as victor diaz mentioned, doing datetime calculations with date+time would be a super inefficient nightmare and far worse, than just adding a day to the second datetime. Therefore I'd only use DATE if the time is irrelevant, or as a "cache" for speeding queries up for date lookups (see Elwin's answer).

Sort Date in Mysql table in DESC order

I want to show date column in DESC order where date is entered as VARCHAR and is in order 20-JUN-2007 I have already used ORDER BY RIGHT(vPublishedDate, 4) but it doesn't effect the month and date
Here is one way to do it using STR_TO_DATE (take into account the other answers about converting the column to date, although you may not have control over the database):
SELECT ...
FROM ...
ORDER BY STR_TO_DATE(vPublishedDate,'%d-%M-%Y')
As an example:
SELECT STR_TO_DATE('20-JUN-2007','%d-%M-%Y') as Date;
+------------+
| Date |
+------------+
| 2007-06-20 |
+------------+
Why are you using a VARCHAR to store a DATE? Use a DATE to store a DATE and then, as if by magic, sorting works all on its own.
You really should be storing dates as dates, not character-type fields. Then you wouldn't need to worry about this sort of "SQL gymnastics" (as I like to call it).
Databases are for storing data, not formatting.
By forcing yourself to manipulate sub-columns, you basically prevent the database from performing any useful optimisations.
In order to do what you want with the data you have you have to do something like:
use substring to extract individual sub-column information to get them in the order you want; and
use some sort of lookup to turn a string like "NOV" into 11 (since the month names will sort as DEC, FEB, AUG, APR, JAN, JUL, JUN, MAR, MAY, NOV, OCT, SEP).
And this would be a serious performance killer. Now there may be a function which can turn that particular date format into a proper date but I urge you: don't use it.
Set up or change your database to use an intelligent schema and all these problems will magically disappear.
It's a lot easier to turn a date column into any sort of output format than to do the same with a character column.
Change that VARCHARto a Date type column, if you can.
You can also try this, although this is NOT the RIGHT approach.
Select STR_TO_DATE(your_date_column,'%d/%m/%Y') AS your_new_date from your_table order by your_new_date DESC
Try converting the varchar to date using str_to_date and then you can apply the sorting logic.
I would suggest you to change the type as Date.
Then run a script which converts your dates to the correct DB format.
Sorting would be then be just as simple as sorting ids in MySql