SQL query using a date field to ensure I use my index - mysql

I have a table with date column named day. The way I have it indexed is using multiple keys:
KEY user_id (user_id,day)
I want to make sure I use the index properly when I make a query that selects every row for a user_id from the beginning of the month to a given day in the month. For example, let's say I want to query for every day since the beginning of the month until today, what's the best way to write my query to ensure that I hit the index, here's what I have so far:
select * from table_name
WHERE user_id = 1
AND (day between DATE_FORMAT(NOW() ,'%Y-%m-01') AND NOW() )

use Explain to ensure if your query using the index you've created
For example
explain
select * from table_name
WHERE user_id = 1
AND (day between DATE_FORMAT(NOW() ,'%Y-%m-01') AND NOW() )
should give you the details of the query plan by the mysql optimizer
In the query plan in possible keys there user_id(index) should be present --- > which states that the possible indexes that can be used to fetch the result set from a particular table
keys field user_id(index) should be present --- > which shows that the index that is being used to get the results
Extra ---> Additional information (using index--the whole result set can be fetched from the index file itself, using where ---> query uses index for filter the criteria and so on...)
In Your query you have mentioned user_id = 1 constant which will help more in reducing the number of records that are to be scanned, even though there is a range check on day, so the query will use index provided that you have less percentage of duplicate values in user_id column

Related

Will adding an index to a column improve the select query (without where) performance in SQL?

I have a MySQL table that contains 20 000 000 rows, and columns like (user_id, registered_timestamp, etc). I have written a below query to get a count of users registered day wise. The query was taking a long time to execute. Will adding an index to the registered_timestamp column improve the execution time?
select date(registered_timestamp), count(userid) from table group by 1
Consider using this query to get a list of dates and the number of registrations on each date.
SELECT date(registered_timestamp) date, COUNT(*)
FROM table
GROUP BY date(registered_timestamp)
Then an index on table(registered_timestamp) will help a little because it's a covering index.
If you adapt your query to return dates from a limited range, for example.
SELECT date(registered_timestamp) date, COUNT(*)
FROM table
WHERE registered_timestamp >= CURDATE() - INTERVAL 8 DAY
AND registered_timestamp < CURDATE()
GROUP BY date(registered_timestamp)
the index will help. (This query returns results for the week ending yesterday.) However, the index will not help this query.
SELECT date(registered_timestamp) date, COUNT(*)
FROM table
WHERE DATE(registered_timestamp) >= CURDATE() - INTERVAL 8 DAY /* slow! */
GROUP BY date(registered_timestamp)
because the function on the column makes the query unsargeable.
You probably can address this performance issue with a MySQL generated column. This command:
ALTER TABLE `table`
ADD registered_date DATE
GENERATED ALWAYS AS DATE(registered_timestamp)
STORED;
Then you can add an index on the generated column
CREATE INDEX regdate ON `table` ( registered_date );
Then you can use that generated (derived) column in your query, and get a lot of help from that index.
SELECT registered_date, COUNT(*)
FROM table
GROUP BY registered_date;
But beware, creating the generated column and its index will take a while.
select date(registered_timestamp), count(userid) from table group by 1
Would benefit from INDEX(registered_timestamp, userid) but only because such an index is "covering". The query will still need to read every row of the index, and do a filesort.
If userid is the PRIMARY KEY, then this would give you the same answers without bothering to check each userid for being NOT NULL.
select date(registered_timestamp), count(*) from table group by 1
And INDEX(registered_timestamp) would be equivalent to the above suggestion. (This is because InnoDB implicitly tacks on the PK.)
If this query is common, then you could build and maintain a "summary table", which collects the count every night for the day's registrations. Then the query would be a much faster fetch from that smaller table.

Why does MySQL drops my index when using DATE(`table`.`column`)

I have a MySQL innodb table with a few columns.
one of them is named "dateCreated" which is a DATETIME column and it is indexed.
My query:
SELECT
*
FROM
`table1`
WHERE
DATE(`dateCreated`) BETWEEN '2014-8-7' AND '2013-8-7'
MySQL for some reason refuses to use the index on the dateCreated column (even with USE INDEX or FORCE INDEX.
However, if I change the query to this:
SELECT
*
FROM
`table1`
WHERE
`dateCreated` BETWEEN '2014-8-7' AND '2013-8-7'
note the DATE(...) removal
MySQL uses the index just fine.
I could manage without using the DATE() function, but this is just weird to me.
I understand that maybe MySQL indexes the full date and time and when searching only a part of it, it gets confused or something. But there must be a way to use a partial date (lets say MONTH(...) or DATE(...)) and still benefit from the indexed column and avoid the full table scan.
Any thoughts..?
Thanks.
As you have observed once you apply a function to that field you destroy access to the index. So,
It will help if you don't use between. The rationale for applying the function to the data is so you can get the data to match the parameters. There are just 2 parameter dates and several hundred? thousand? million? rows of data. Why not reverse this, change the parameters to suit the data? (making it a "sargable" predicate)
SELECT
*
FROM
`table1`
WHERE
( `dateCreated` >= '2013-08-07' AND `dateCreated` < '2014-08-07' )
;
Note 2013-08-07 is used first, and this needs to be true if using between also. You will not get any results using between if the first date is younger than the second date.
Also note that exactly 12 months of data is contained >= '2013-08-07' AND < '2014-08-07', I presume this is what you are seeking.
Using the combination of date(dateCreated) and between would include 1 too many days as all events during '2014-08-07' would be included. If you deliberately wanted one year and 1 day then add 1 day to the higher date i.e. so it would be < '2014-08-08'

Two(with subquery) or one query to select max(date) in where clause. MySQL

I need to create a table and store there cached status of some events. So I will have to do only two operations:
1) Insert id of event, it's status, and time of when this record was stored in db;
2) Get last record with certain event id.
There are several methods to get the result (status):
Method 1:
SELECT status FROM status_log a
WHERE a.event_id = 1
ORDER BY a.update_date DESC
LIMIT 1
Method 2:
SELECT status FROM status_log a
WHERE a.update_date = (
SELECT max(b.update_date) FROM status_log b
WHERE b.event_id = 1
) AND a.event_id = 1
So I have two questions:
Which query to use
Which field type to set to update_date field (int or timestamp)
Actually, your second query does not resolve question 'find record with greatest date of update for event #1' - because there could be many different events with same latest update_date. So, in terms of semantics - you should use first query. (after your edit this is fixed)
First query will be effective if you'll create an index by event_id index and this column will have good cardinality (i.e. WHERE clause will filter few enough rows by using that index). However, this can be improved by adding column update_date to index - but that makes sense only if there will be many rows with same event_id (many enough for MySQL to use second index part) - and again with good cardinality inside first index part.
But in practice - my advice is just a theory, you'll have to figure it out with EXPLAIN syntax and your own measures on real data.
As for data type - common practice is to use proper data type (i.e. datetime/timestamp for something which means time point)
Which query to use
I believe the first one should be faster. Anyway just run an EXPLAIN on them and you'll find out yourself.
The index you should be using will be:
ALERT TABLE status_log ADD INDEX(event_id, update_date)
Now... did you notice that those queries are NOT equivalent? The second one will return all status from all event_id that have a maximum date.
Which field type to set to update_date field (int or timestamp)
If you have a field named update_date I just can't imagine why an int would serve the same purpose. Rephrasing the question to choose between datetime or timestamp, then the answer is up to the requirements. If you just want to know when a record in the DB was updated use a timestamp. If the update_date refers to an entity in your domain model go for a datetime. You will most likely need to perform calculations on the date (add time, remove time, extract a month, etc) so using a unix timestamp (which I'd say should be almost write-only) will result in extra calculation time because you'll have to convert the timestamp to a datetime and then perform the function over that result.

Maximize efficiency of SQL SELECT statement

Assum that we have a vary large table. for example - 3000 rows of data.
And we need to select all the rows that thire field status < 4.
We know that the relevance rows will be maximum from 2 months ago (of curse that each row has a date column).
does this query is the most efficient ??
SELECT * FROM database.tableName WHERE status<4
AND DATE< '".date()-5259486."' ;
(date() - php , 5259486 - two months.)...
Assuming you're storing dates as DATETIME, you could try this:
SELECT * FROM database.tableName
WHERE status < 4
AND DATE < DATE_SUB(NOW(), INTERVAL 2 MONTHS)
Also, for optimizing search queries you could use EXPLAIN ( http://dev.mysql.com/doc/refman/5.6/en/explain.html ) like this:
EXPLAIN [your SELECT statement]
Another point where you can tweak response times is by carefully placing appropriate indexes.
Indexes are used to find rows with specific column values quickly. Without an index, MySQL must begin with the first row and then read through the entire table to find the relevant rows. The larger the table, the more this costs. If the table has an index for the columns in question, MySQL can quickly determine the position to seek to in the middle of the data file without having to look at all the data.
Here are some explanations & tutorials on MySQL indexes:
http://www.tutorialspoint.com/mysql/mysql-indexes.htm
http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html
However, keep in mind that using TIMESTAMP instead of DATETIME is more efficient; the former is 4 bytes; the latter is 8. They hold equivalent information (except for timezone issues).
3,000 rows of data is not large for a database. In fact, it is on the small side.
The query:
SELECT *
FROM database.tableName
WHERE status < 4 ;
Should run pretty quickly on 3,000 rows, unless each row is very, very large (say 10k). You can always put an index on status to make it run faster.
The query suggested by cassi.iup makes more sense:
SELECT *
FROM database.tableName
WHERE status < 4 AND DATE < DATE_SUB(NOW(), INTERVAL 2 MONTHS);
It will perform better with a composite index on status, date. My question is: do you want all rows with a status of 4 or do you want all rows with a status of 4 in the past two months? In the first case, you would have to continually change the query. You would be better off with:
SELECT *
FROM database.tableName
WHERE status < 4 AND DATE < date('2013-06-19');
(as of the date when I am writing this.)

Where datetime=date without full table scan?

Possible duplicate of: How to select date from datetime column?
But the problem with the accepted answer is it will preform a full table scan.
I want to do something like this:
UPDATE records SET earnings=(SELECT SUM(rate)
FROM leads
WHERE records.user_id=leads.user_id
AND DATE(leads.datetime)=records.date)
Notice the last portion: DATE(leads.datetime)=records.date. This does exactly what it needs to do, but it has to scan every row. Some users have thousands of leads so it can take a while.
The leads table has an INDEX on user_id,datetime.
I know you can use interval functions and do something like WHERE datetime BETWEEN date AND interval + days or something like that.
What is the most efficient and accurate way to do this?
I'm not familiar with date functions in MySQL, but try changing it to
UPDATE records SET earnings=
(SELECT SUM(rate)
FROM leads
WHERE records.user_id=leads.user_id
AND leads.datetime >= records.date
And leads.datetime < records.date [+ one day]) -- however you do that in MySQL
You are getting a complete table scan because the expression DATE(leads.datetime) is not Sargable. This is because it is a function which needs to operate on the value stored in a column of the table, and which is also stored in any index on that column. The function's value, obviously, cannot be pre-computed and stored in any index, only the actual column value, so no index search can identify which rows will, after having the function executed on them, meet the criteria expressed in the Where clause predicate. Changing the expression so that the column value is, by itself on one side or the other of the where clause operator, (equal sign or whatever), allows the column values in the index to be searched based on a single expression.
You can try this:
UPDATE records
SET earnings = (SELECT SUM(rate)
FROM leads
WHERE records.user_id=leads.user_id AND
leads.datetime >= records.date and
leads.datetime < date_add(records.date, interval 1 day)
);
You need an index on leads(user_id, datetime) for this to work.