Why does MySQL drops my index when using DATE(`table`.`column`) - mysql

I have a MySQL innodb table with a few columns.
one of them is named "dateCreated" which is a DATETIME column and it is indexed.
My query:
SELECT
*
FROM
`table1`
WHERE
DATE(`dateCreated`) BETWEEN '2014-8-7' AND '2013-8-7'
MySQL for some reason refuses to use the index on the dateCreated column (even with USE INDEX or FORCE INDEX.
However, if I change the query to this:
SELECT
*
FROM
`table1`
WHERE
`dateCreated` BETWEEN '2014-8-7' AND '2013-8-7'
note the DATE(...) removal
MySQL uses the index just fine.
I could manage without using the DATE() function, but this is just weird to me.
I understand that maybe MySQL indexes the full date and time and when searching only a part of it, it gets confused or something. But there must be a way to use a partial date (lets say MONTH(...) or DATE(...)) and still benefit from the indexed column and avoid the full table scan.
Any thoughts..?
Thanks.

As you have observed once you apply a function to that field you destroy access to the index. So,
It will help if you don't use between. The rationale for applying the function to the data is so you can get the data to match the parameters. There are just 2 parameter dates and several hundred? thousand? million? rows of data. Why not reverse this, change the parameters to suit the data? (making it a "sargable" predicate)
SELECT
*
FROM
`table1`
WHERE
( `dateCreated` >= '2013-08-07' AND `dateCreated` < '2014-08-07' )
;
Note 2013-08-07 is used first, and this needs to be true if using between also. You will not get any results using between if the first date is younger than the second date.
Also note that exactly 12 months of data is contained >= '2013-08-07' AND < '2014-08-07', I presume this is what you are seeking.
Using the combination of date(dateCreated) and between would include 1 too many days as all events during '2014-08-07' would be included. If you deliberately wanted one year and 1 day then add 1 day to the higher date i.e. so it would be < '2014-08-08'

Related

SQL SELECT - order dates with wrong format

I was tasked with ordering some entries in our web application. Current solution made by some other guy 10 years ago, is that there is a select on db and then it iterates and make table.
Problem is, that date is in dd-mm-yyyy format and in varchar data.
And not really sure, if I am brave enought to make changes to the database.
So is there some way to order it anyway within a select, some way to order it by the end meaby? Or only way without making some gruesome function in code is to change the db?
You can use the STR_TO_DATE() function for this. Try
ORDER BY STR_TO_DATE(varcharDateColumn, '%d-%m-%Y')
It converts your character-string dates to the DATE datatype where ordering works without trouble.
As of MySQL 5.7 or later, you can add a so-called generated column to your table without touching the other data.
ALTER TABLE tbl
ADD COLUMN goodDate
AS (STR_TO_DATE(varcharDateColumn, '%m-%d-%Y'))
STORED;
You can even put an index on that column if you need to use it for searrching.
ALTER TABLE t1 ADD INDEX goodDate(goodDate);
You can use STR_TO_DATE function, but this will work only for small tables(maybe thousands of records), on large data sets you will face performance problems:
SELECT *
FROM (
SELECT '01-5-2013' AS Date
UNION ALL
SELECT '02-6-2013' AS Date
UNION ALL
SELECT '01-6-2013' AS Date
) AS t1
ORDER BY STR_TO_DATE(Date,'%d-%m-%Y')
Long term solution should be conversion of that column to proper date type.

What keys should be indexed here to make this query optimal

I have a query that looks like the following:
SELECT * from foo
WHERE days >= DATEDIFF(CURDATE(), last_day)
In this case, days is an INT. last_day is a DATE column.
so I need two individual indexes here for days and last_day?
This query predicate, days >= DATEDIFF(CURDATE(), last_day), is inherently not sargeable.
If you keep the present table design you'll probably benefit from a compound index on (last_day, days). Nevertheless, satisfying the query will require a full scan of that index.
Single-column indexes on either one of those columns, or both, will be useless or worse for improving this query's performance.
If you must have this query perform very well, you need to reorganize your table a bit. Let's figure that out. It looks like you are trying to exclude "overdue" records: you want expiration_date < CURDATE(). That is a sargeable search predicate.
So if you added a new column expiration_date to your table, and then set it as follows:
UPDATE foo SET expiration_date = last_day + INTERVAL days DAY
and then indexed it, you'd have a well-performing query.
You must be careful with indexes, they can help you reading, but they can reduce performance in insert.
You may consider to create a partition over last_day field.
I should try to create only in last_day field, but, I think the best is making some performance tests with different configurations.
Since you are using an expression in the where criteria, mysql will not be able to use indexes on any of the two fields. If you use this expression regularly and you have at least mysql v5.7.8, then you can create a generated column and create an index on it.
The other option is to create a regular column and set its value to the result of this expression and index this column. You will need triggers to keep it updated.

Two(with subquery) or one query to select max(date) in where clause. MySQL

I need to create a table and store there cached status of some events. So I will have to do only two operations:
1) Insert id of event, it's status, and time of when this record was stored in db;
2) Get last record with certain event id.
There are several methods to get the result (status):
Method 1:
SELECT status FROM status_log a
WHERE a.event_id = 1
ORDER BY a.update_date DESC
LIMIT 1
Method 2:
SELECT status FROM status_log a
WHERE a.update_date = (
SELECT max(b.update_date) FROM status_log b
WHERE b.event_id = 1
) AND a.event_id = 1
So I have two questions:
Which query to use
Which field type to set to update_date field (int or timestamp)
Actually, your second query does not resolve question 'find record with greatest date of update for event #1' - because there could be many different events with same latest update_date. So, in terms of semantics - you should use first query. (after your edit this is fixed)
First query will be effective if you'll create an index by event_id index and this column will have good cardinality (i.e. WHERE clause will filter few enough rows by using that index). However, this can be improved by adding column update_date to index - but that makes sense only if there will be many rows with same event_id (many enough for MySQL to use second index part) - and again with good cardinality inside first index part.
But in practice - my advice is just a theory, you'll have to figure it out with EXPLAIN syntax and your own measures on real data.
As for data type - common practice is to use proper data type (i.e. datetime/timestamp for something which means time point)
Which query to use
I believe the first one should be faster. Anyway just run an EXPLAIN on them and you'll find out yourself.
The index you should be using will be:
ALERT TABLE status_log ADD INDEX(event_id, update_date)
Now... did you notice that those queries are NOT equivalent? The second one will return all status from all event_id that have a maximum date.
Which field type to set to update_date field (int or timestamp)
If you have a field named update_date I just can't imagine why an int would serve the same purpose. Rephrasing the question to choose between datetime or timestamp, then the answer is up to the requirements. If you just want to know when a record in the DB was updated use a timestamp. If the update_date refers to an entity in your domain model go for a datetime. You will most likely need to perform calculations on the date (add time, remove time, extract a month, etc) so using a unix timestamp (which I'd say should be almost write-only) will result in extra calculation time because you'll have to convert the timestamp to a datetime and then perform the function over that result.

Where datetime=date without full table scan?

Possible duplicate of: How to select date from datetime column?
But the problem with the accepted answer is it will preform a full table scan.
I want to do something like this:
UPDATE records SET earnings=(SELECT SUM(rate)
FROM leads
WHERE records.user_id=leads.user_id
AND DATE(leads.datetime)=records.date)
Notice the last portion: DATE(leads.datetime)=records.date. This does exactly what it needs to do, but it has to scan every row. Some users have thousands of leads so it can take a while.
The leads table has an INDEX on user_id,datetime.
I know you can use interval functions and do something like WHERE datetime BETWEEN date AND interval + days or something like that.
What is the most efficient and accurate way to do this?
I'm not familiar with date functions in MySQL, but try changing it to
UPDATE records SET earnings=
(SELECT SUM(rate)
FROM leads
WHERE records.user_id=leads.user_id
AND leads.datetime >= records.date
And leads.datetime < records.date [+ one day]) -- however you do that in MySQL
You are getting a complete table scan because the expression DATE(leads.datetime) is not Sargable. This is because it is a function which needs to operate on the value stored in a column of the table, and which is also stored in any index on that column. The function's value, obviously, cannot be pre-computed and stored in any index, only the actual column value, so no index search can identify which rows will, after having the function executed on them, meet the criteria expressed in the Where clause predicate. Changing the expression so that the column value is, by itself on one side or the other of the where clause operator, (equal sign or whatever), allows the column values in the index to be searched based on a single expression.
You can try this:
UPDATE records
SET earnings = (SELECT SUM(rate)
FROM leads
WHERE records.user_id=leads.user_id AND
leads.datetime >= records.date and
leads.datetime < date_add(records.date, interval 1 day)
);
You need an index on leads(user_id, datetime) for this to work.

Better to use two columns or DATETIME

I'm working on a MySQL database which will create a "Today at" list and send it to subscribers. I'm wondering if it's better to use the DATETIME data type on the start and end fields, or two have two columns, startDate and startTime (with the appropriate data types). My first thought was to use DATETIME, but that makes subsequent use of the system a bit awkward, since you can no longer write:
SELECT * FROM event_list WHERE startAt='2009-04-20';
Instead, the best I found was:
SELECT * FROM event_list WHERE startAt LIKE '2009-04-20%';
and I don't like the hack or its potential impact on performance.
Just use the DATE() function.
SELECT * FROM event_list WHERE DATE(startAt) = '2009-04-20'
SELECT * FROM event_list WHERE startAt >= '2009-04-20' AND startAt < '2009-04-21'
This will use an index on startAt efficiently and handle the boundary conditions correctly. (Any WHERE clause including a function won't be able to use an index - it has no way to know that the expression result has the same ordering as the column values.
Using two columns is a bit like having columns for the integer and decimal parts of real numbers. If you don't need the time, just don't save it in the first place.
you can try smf like this
select * from event_list where date(startAt) = '2009-04-20
How about the best of both worlds -- have a table that uses a single datetime column and a view of that table that gives you both date and time fields.
create view vw_event_list
as select ..., date(startAt) as startDate, time(startAt) as startTime
select * from vw_event_list where startDate = '2009-04-20'
The real consideration between separate date and time fields or 1 datetime field is indexing. You do not want to do this:
select * from event_list where date(startAt) = '2009-04-20'
on a datetime field because it won't use an index. MySQL will convert the startAt data to a date in order to compare it, which means it can't use the index.
You want to do this:
select * from event_list where startAt BETWEEN '2009-04-20 00:00:00' AND '2009-04-20 23:59:59'
The problem with a datetime field is that you can't really use it a compound index since the value is fairly unique. For example, a compound index on startAt+event isn't going to allow you to search on date+event, only datetime+event.
But if you split the data between date and time fields, you can index startDate+event and search on it efficiently.
That's just an example for discussion purposes, you could obviously index on event+startAt instead and it would work. But you may find yourself wanting to search/summarize based on date plus another field. Creating a compound index on that data would make it very efficient.
Just one more thing to add: Beware time zones, if you're offering an online service it'll come up sooner or later and it's really difficult to do retroactively.
Daylight Savings Time is especially bad.
(DAMHIK)