Suppose I have a table 'Tasks' with a DATETIME column approve_time. I have an index on said column.
If I were to write a query of the form:
SELECT task_id, task_desc, task_owner, approve_time
FROM Tasks
WHERE DATE(approve_time) = '2011-08-31'
My question is about the performance of such a query:
Does MYSQL index DATETIME columns in a way that allows constraining by the date component to be fast?
Or does MYSQL know how to optimize the query into something like the following?
WHERE approve_time >= '2011-08-31 00:00:00'
AND approve_time < '2011-09-01 00:00:00'
Or does the query incur a tablescan?
Does MYSQL index DATETIME columns in a way that allows constraining by the date component to be fast?
NO
Or does MYSQL know how to optimize the query into something like the following?
YES
the second query will lead to range filter,
try
explain extended query_1; <--- number of rows sent for scan is more,
which should be aLL rows
vs
explain extended query_2;
the value of DATE(approve_time) only can determined after the function applied to column approve_time in all the row, which mean there is not going to make use on index
Related
I have MariaDB 10.1.14, For a long time I'm doing the following query without problems (it tooks about 3 seconds):
SELECT
sum(transaction_total) as sum_total,
count(*) as count_all,
transaction_currency
FROM
transactions
WHERE
DATE(transactions.created_at) = DATE(CURRENT_DATE)
AND transaction_type = 1
AND transaction_status = 2
GROUP BY
transaction_currency
Suddenly, I'm not sure exactly why, this query take about 13 seconds.
This is the EXPLAIN:
And those are the all indexes of transactions table:
What is the reason for the sudden query time increase? and how can I decrease it?
If you are adding more data to your table the query time will increase.
But you can do a few things to improve the performance.
Create a composite index for ( transaction_type, transaction_status, created_at)
Remove the DATE() functions (or any function) from your fields, because that doesn't allow engine use the index. CURRENT_DATE is a constant so there doesn't matter, but isn't necessary because already return DATE
if created_at isnt date you can use
created_at >= CURRENT_DATE and created_at < CURRENT_DATE + 1
or create a different field to only save the date part.
+1 to answer from #JuanCarlosOropeza, but you can go a little further with the index.
ALTER TABLE transactions ADD INDEX (
transaction_type,
transaction_status,
created_at,
transaction_currency,
transaction_total
);
As #RickJames mentioned in comments, the order of columns is important.
First, columns in equality comparisons
Next, you can index one column that is used for a range comparison (which is anything besides equality), or GROUP BY or ORDER BY. You have both range comparison and GROUP BY, but you can only get the index to help with one of these.
Last, other columns needed for the query, if you think you can get a covering index.
I describe more detail about index design in my presentation How to Design Indexes, Really (video: https://www.youtube.com/watch?v=ELR7-RdU9XU).
You're probably stuck with the "using temporary" since you have a range condition and also a GROUP BY referencing different columns. But you can at least eliminate the "using filesort" by this trick:
...
GROUP BY
transaction_currency
ORDER BY NULL
Supposing that it's not important to you which order the rows of the query results return in.
I don't know what has made your query slower. More data? Fragmentation? New DB version?
However, I am surprised to see that there is no index really supporting the query. You should have a compound index starting with the column with highest cardinality (the date? well, you can try different column orders and see which index the DBMS picks for the query).
create index idx1 on transactions(created_at, transaction_type, transaction_status);
If created_at contains a date part, then you may want to create a computed column created_on only containing the date and index that instead.
You can even extend this index to a covering index (where clause fields followed by group by clause fields followed by select clause fields):
create index idx2 on transactions(created_at, transaction_type, transaction_status,
transaction_currency, transaction_total);
Possible duplicate of: How to select date from datetime column?
But the problem with the accepted answer is it will preform a full table scan.
I want to do something like this:
UPDATE records SET earnings=(SELECT SUM(rate)
FROM leads
WHERE records.user_id=leads.user_id
AND DATE(leads.datetime)=records.date)
Notice the last portion: DATE(leads.datetime)=records.date. This does exactly what it needs to do, but it has to scan every row. Some users have thousands of leads so it can take a while.
The leads table has an INDEX on user_id,datetime.
I know you can use interval functions and do something like WHERE datetime BETWEEN date AND interval + days or something like that.
What is the most efficient and accurate way to do this?
I'm not familiar with date functions in MySQL, but try changing it to
UPDATE records SET earnings=
(SELECT SUM(rate)
FROM leads
WHERE records.user_id=leads.user_id
AND leads.datetime >= records.date
And leads.datetime < records.date [+ one day]) -- however you do that in MySQL
You are getting a complete table scan because the expression DATE(leads.datetime) is not Sargable. This is because it is a function which needs to operate on the value stored in a column of the table, and which is also stored in any index on that column. The function's value, obviously, cannot be pre-computed and stored in any index, only the actual column value, so no index search can identify which rows will, after having the function executed on them, meet the criteria expressed in the Where clause predicate. Changing the expression so that the column value is, by itself on one side or the other of the where clause operator, (equal sign or whatever), allows the column values in the index to be searched based on a single expression.
You can try this:
UPDATE records
SET earnings = (SELECT SUM(rate)
FROM leads
WHERE records.user_id=leads.user_id AND
leads.datetime >= records.date and
leads.datetime < date_add(records.date, interval 1 day)
);
You need an index on leads(user_id, datetime) for this to work.
I need to calculate the num of rows created on a daily basis for a huge Table in mysql. I'm currently using
select count(1) from table_name group by Date
THe query is taking more 2000sec and counting. I was wondering if there's any optimized query or a way to optimize my query.
If you're only interested in items that were created on those dates, you could calculate the count at end-of-day and store it another table.
That lets you run the COUNT query on a much smaller data set (Use WHERE DATE(NOW()) = Date and drop the GROUP BY)
Then then query the new table when you need the data.
Make sure that "date" field is of "date" type, not datetime nor timestamp
Index that column
If you need it for one day, add a "where" statement. i.e. WHERE date="2013-07-10"
Add an index on the Date column, there's no other way to optimize this query that I can think of.
CREATE INDEX ix_date
ON table_name (Date);
I'm storing timestamp as int field. And on large table it takes too long to get rows inserted at date because I'm using mysql function FROM_UNIXTIME.
SELECT * FROM table WHERE FROM_UNIXTIME(timestamp_field, '%Y-%m-%d') = '2010-04-04'
Is there any ways to speed this query? Maybe I should use query for rows using timestamp_field >= x AND timestamp_field < y?
Thank you
EDITED This query works great, but you should take care of index on timestamp_field.
SELECT * FROM table WHERE
timestamp_field >= UNIX_TIMESTAMP('2010-04-14 00:00:00')
AND timestamp_field <= UNIX_TIMESTAMP('2010-04-14 23:59:59')
Use UNIX_TIMESTAMP on the constant instead of FROM_UNIXTIME on the column:
SELECT * FROM table
WHERE timestamp_field
BETWEEN UNIX_TIMESTAMP('2010-04-14 00:00:00')
AND UNIX_TIMESTAMP('2010-04-14 23:59:59')
This can be faster because it allows the database to use an index on the column timestamp_field, if one exists. It is not possible for the database to use the index when you use a non-sargable function like FROM_UNIXTIME on the column.
If you don't have an index on timestamp_field then add one.
Once you have done this you can also try to further improve performance by selecting the columns you need instead of using SELECT *.
If you're able to, it would be faster to either store the date as a proper datetime field, or, in the code running the query, to convert the date you're after to a unix timestamp before sending it to the query.
The FROM_UNIXTIME would have to convert every record in the table before it can check it which, as you can see, has performance issues. Using a native datatype that is closest to what you're actually using in your queries, or querying with the column's data type, is the fastest way.
So, if you need to continue using an int field for your time, then yes, using < and > on a strict integer would boost performance greatly, assuming you store things to the second, rather than the timestamp that would be for midinight of that day.
I would like to speed a MySQL query that basically retrieve a page of data following the pattern below
select
my_field_A,
my_field_B
where
time_id >= UNIX_TIMESTAMP('1901-01-01 00:00:00') AND
time_id < UNIX_TIMESTAMP('2009-01-16 00:00:00')
The field time_id is an MySQL index, yet, the query behaves as if the entire database was read at each query (retrieving a couple of lines being already quite slow). I not an expert in MySQL. Can someone guess what I am doing wrong?
Make sure you have an index (B-tree) on time_id, this should be efficient for range queries. Also make sure that time_id is in the appropriate time format.
If you really want to understand what mysql is doing you can add the keyword 'explain' infront of the query and run it in your mysql client. This will show some information about what mysql is doing and what kind of scans are performed.
http://dev.mysql.com/doc/refman/5.0/en/using-explain.html
As there are probably lots of time_id's falling under these criteria, MySQL may think that the full table scan is better.
Try forcing the index:
SELECT
my_field_A,
my_field_B
FROM mytable FORCE INDEX (index_name_on_time_id)
WHERE
time_id >= UNIX_TIMESTAMP('1901-01-01 00:00:00') AND
time_id < UNIX_TIMESTAMP('2009-01-16 00:00:00')
Do you need the lower range? Are there any entries earlier than 1901?
How is the time_id column generated? If the time_id is always greater with each new entry being added into DB, you may want to consider finding ID with closest entry to 2009-01-16 and then select by ID
select my_field_A, my_field_B
FROM
mytable
WHERE
id <= ?
If that is not the case, try checking out partitioning in available from MySQL 5.1 and break down the table by years, that should increase speed dramatically.