Better to use two columns or DATETIME - mysql

I'm working on a MySQL database which will create a "Today at" list and send it to subscribers. I'm wondering if it's better to use the DATETIME data type on the start and end fields, or two have two columns, startDate and startTime (with the appropriate data types). My first thought was to use DATETIME, but that makes subsequent use of the system a bit awkward, since you can no longer write:
SELECT * FROM event_list WHERE startAt='2009-04-20';
Instead, the best I found was:
SELECT * FROM event_list WHERE startAt LIKE '2009-04-20%';
and I don't like the hack or its potential impact on performance.

Just use the DATE() function.
SELECT * FROM event_list WHERE DATE(startAt) = '2009-04-20'

SELECT * FROM event_list WHERE startAt >= '2009-04-20' AND startAt < '2009-04-21'
This will use an index on startAt efficiently and handle the boundary conditions correctly. (Any WHERE clause including a function won't be able to use an index - it has no way to know that the expression result has the same ordering as the column values.
Using two columns is a bit like having columns for the integer and decimal parts of real numbers. If you don't need the time, just don't save it in the first place.

you can try smf like this
select * from event_list where date(startAt) = '2009-04-20

How about the best of both worlds -- have a table that uses a single datetime column and a view of that table that gives you both date and time fields.
create view vw_event_list
as select ..., date(startAt) as startDate, time(startAt) as startTime
select * from vw_event_list where startDate = '2009-04-20'

The real consideration between separate date and time fields or 1 datetime field is indexing. You do not want to do this:
select * from event_list where date(startAt) = '2009-04-20'
on a datetime field because it won't use an index. MySQL will convert the startAt data to a date in order to compare it, which means it can't use the index.
You want to do this:
select * from event_list where startAt BETWEEN '2009-04-20 00:00:00' AND '2009-04-20 23:59:59'
The problem with a datetime field is that you can't really use it a compound index since the value is fairly unique. For example, a compound index on startAt+event isn't going to allow you to search on date+event, only datetime+event.
But if you split the data between date and time fields, you can index startDate+event and search on it efficiently.
That's just an example for discussion purposes, you could obviously index on event+startAt instead and it would work. But you may find yourself wanting to search/summarize based on date plus another field. Creating a compound index on that data would make it very efficient.

Just one more thing to add: Beware time zones, if you're offering an online service it'll come up sooner or later and it's really difficult to do retroactively.
Daylight Savings Time is especially bad.
(DAMHIK)

Related

SQL SELECT - order dates with wrong format

I was tasked with ordering some entries in our web application. Current solution made by some other guy 10 years ago, is that there is a select on db and then it iterates and make table.
Problem is, that date is in dd-mm-yyyy format and in varchar data.
And not really sure, if I am brave enought to make changes to the database.
So is there some way to order it anyway within a select, some way to order it by the end meaby? Or only way without making some gruesome function in code is to change the db?
You can use the STR_TO_DATE() function for this. Try
ORDER BY STR_TO_DATE(varcharDateColumn, '%d-%m-%Y')
It converts your character-string dates to the DATE datatype where ordering works without trouble.
As of MySQL 5.7 or later, you can add a so-called generated column to your table without touching the other data.
ALTER TABLE tbl
ADD COLUMN goodDate
AS (STR_TO_DATE(varcharDateColumn, '%m-%d-%Y'))
STORED;
You can even put an index on that column if you need to use it for searrching.
ALTER TABLE t1 ADD INDEX goodDate(goodDate);
You can use STR_TO_DATE function, but this will work only for small tables(maybe thousands of records), on large data sets you will face performance problems:
SELECT *
FROM (
SELECT '01-5-2013' AS Date
UNION ALL
SELECT '02-6-2013' AS Date
UNION ALL
SELECT '01-6-2013' AS Date
) AS t1
ORDER BY STR_TO_DATE(Date,'%d-%m-%Y')
Long term solution should be conversion of that column to proper date type.

Creating an index for YEAR() function

I'm trying to accelerate a query. This query looks if the year is '2017'. I'm comparing costing between using: LIKE = '2017%', YEAR (date) = '2017' and date BETWEEN '2017-1-1' AND '2017-12-31'.
I would like to make an index for the colum date but using the function year, something similar to:
CREATE INDEX indexDATE ON
table (YEAR(date));
Is this possible?
The right way to express the logic is:
date BETWEEN '2017-01-01' AND '2017-12-31'
or as I prefer:
date >= '2017-01-01' AND date < '2018-01-01'
This can use an index on (date).
The expression LIKE = '2017%' is simply bad coding. You are using string functions on a date/time column. That is a really bad idea and it precludes the use of indexes.
The expression YEAR (date) = 2017 is logically ok -- once you remove the single quotes around the number. However, the use of the function on the column precludes the use of an index.
Finally, in most data sets, years are not very selective. That is, you are still going to be selecting a significant portion of the rows. Indexes are less useful for such queries.

What keys should be indexed here to make this query optimal

I have a query that looks like the following:
SELECT * from foo
WHERE days >= DATEDIFF(CURDATE(), last_day)
In this case, days is an INT. last_day is a DATE column.
so I need two individual indexes here for days and last_day?
This query predicate, days >= DATEDIFF(CURDATE(), last_day), is inherently not sargeable.
If you keep the present table design you'll probably benefit from a compound index on (last_day, days). Nevertheless, satisfying the query will require a full scan of that index.
Single-column indexes on either one of those columns, or both, will be useless or worse for improving this query's performance.
If you must have this query perform very well, you need to reorganize your table a bit. Let's figure that out. It looks like you are trying to exclude "overdue" records: you want expiration_date < CURDATE(). That is a sargeable search predicate.
So if you added a new column expiration_date to your table, and then set it as follows:
UPDATE foo SET expiration_date = last_day + INTERVAL days DAY
and then indexed it, you'd have a well-performing query.
You must be careful with indexes, they can help you reading, but they can reduce performance in insert.
You may consider to create a partition over last_day field.
I should try to create only in last_day field, but, I think the best is making some performance tests with different configurations.
Since you are using an expression in the where criteria, mysql will not be able to use indexes on any of the two fields. If you use this expression regularly and you have at least mysql v5.7.8, then you can create a generated column and create an index on it.
The other option is to create a regular column and set its value to the result of this expression and index this column. You will need triggers to keep it updated.

Why does MySQL drops my index when using DATE(`table`.`column`)

I have a MySQL innodb table with a few columns.
one of them is named "dateCreated" which is a DATETIME column and it is indexed.
My query:
SELECT
*
FROM
`table1`
WHERE
DATE(`dateCreated`) BETWEEN '2014-8-7' AND '2013-8-7'
MySQL for some reason refuses to use the index on the dateCreated column (even with USE INDEX or FORCE INDEX.
However, if I change the query to this:
SELECT
*
FROM
`table1`
WHERE
`dateCreated` BETWEEN '2014-8-7' AND '2013-8-7'
note the DATE(...) removal
MySQL uses the index just fine.
I could manage without using the DATE() function, but this is just weird to me.
I understand that maybe MySQL indexes the full date and time and when searching only a part of it, it gets confused or something. But there must be a way to use a partial date (lets say MONTH(...) or DATE(...)) and still benefit from the indexed column and avoid the full table scan.
Any thoughts..?
Thanks.
As you have observed once you apply a function to that field you destroy access to the index. So,
It will help if you don't use between. The rationale for applying the function to the data is so you can get the data to match the parameters. There are just 2 parameter dates and several hundred? thousand? million? rows of data. Why not reverse this, change the parameters to suit the data? (making it a "sargable" predicate)
SELECT
*
FROM
`table1`
WHERE
( `dateCreated` >= '2013-08-07' AND `dateCreated` < '2014-08-07' )
;
Note 2013-08-07 is used first, and this needs to be true if using between also. You will not get any results using between if the first date is younger than the second date.
Also note that exactly 12 months of data is contained >= '2013-08-07' AND < '2014-08-07', I presume this is what you are seeking.
Using the combination of date(dateCreated) and between would include 1 too many days as all events during '2014-08-07' would be included. If you deliberately wanted one year and 1 day then add 1 day to the higher date i.e. so it would be < '2014-08-08'

Timestamp as int field, query performance

I'm storing timestamp as int field. And on large table it takes too long to get rows inserted at date because I'm using mysql function FROM_UNIXTIME.
SELECT * FROM table WHERE FROM_UNIXTIME(timestamp_field, '%Y-%m-%d') = '2010-04-04'
Is there any ways to speed this query? Maybe I should use query for rows using timestamp_field >= x AND timestamp_field < y?
Thank you
EDITED This query works great, but you should take care of index on timestamp_field.
SELECT * FROM table WHERE
timestamp_field >= UNIX_TIMESTAMP('2010-04-14 00:00:00')
AND timestamp_field <= UNIX_TIMESTAMP('2010-04-14 23:59:59')
Use UNIX_TIMESTAMP on the constant instead of FROM_UNIXTIME on the column:
SELECT * FROM table
WHERE timestamp_field
BETWEEN UNIX_TIMESTAMP('2010-04-14 00:00:00')
AND UNIX_TIMESTAMP('2010-04-14 23:59:59')
This can be faster because it allows the database to use an index on the column timestamp_field, if one exists. It is not possible for the database to use the index when you use a non-sargable function like FROM_UNIXTIME on the column.
If you don't have an index on timestamp_field then add one.
Once you have done this you can also try to further improve performance by selecting the columns you need instead of using SELECT *.
If you're able to, it would be faster to either store the date as a proper datetime field, or, in the code running the query, to convert the date you're after to a unix timestamp before sending it to the query.
The FROM_UNIXTIME would have to convert every record in the table before it can check it which, as you can see, has performance issues. Using a native datatype that is closest to what you're actually using in your queries, or querying with the column's data type, is the fastest way.
So, if you need to continue using an int field for your time, then yes, using < and > on a strict integer would boost performance greatly, assuming you store things to the second, rather than the timestamp that would be for midinight of that day.