I have MariaDB 10.1.14, For a long time I'm doing the following query without problems (it tooks about 3 seconds):
SELECT
sum(transaction_total) as sum_total,
count(*) as count_all,
transaction_currency
FROM
transactions
WHERE
DATE(transactions.created_at) = DATE(CURRENT_DATE)
AND transaction_type = 1
AND transaction_status = 2
GROUP BY
transaction_currency
Suddenly, I'm not sure exactly why, this query take about 13 seconds.
This is the EXPLAIN:
And those are the all indexes of transactions table:
What is the reason for the sudden query time increase? and how can I decrease it?
If you are adding more data to your table the query time will increase.
But you can do a few things to improve the performance.
Create a composite index for ( transaction_type, transaction_status, created_at)
Remove the DATE() functions (or any function) from your fields, because that doesn't allow engine use the index. CURRENT_DATE is a constant so there doesn't matter, but isn't necessary because already return DATE
if created_at isnt date you can use
created_at >= CURRENT_DATE and created_at < CURRENT_DATE + 1
or create a different field to only save the date part.
+1 to answer from #JuanCarlosOropeza, but you can go a little further with the index.
ALTER TABLE transactions ADD INDEX (
transaction_type,
transaction_status,
created_at,
transaction_currency,
transaction_total
);
As #RickJames mentioned in comments, the order of columns is important.
First, columns in equality comparisons
Next, you can index one column that is used for a range comparison (which is anything besides equality), or GROUP BY or ORDER BY. You have both range comparison and GROUP BY, but you can only get the index to help with one of these.
Last, other columns needed for the query, if you think you can get a covering index.
I describe more detail about index design in my presentation How to Design Indexes, Really (video: https://www.youtube.com/watch?v=ELR7-RdU9XU).
You're probably stuck with the "using temporary" since you have a range condition and also a GROUP BY referencing different columns. But you can at least eliminate the "using filesort" by this trick:
...
GROUP BY
transaction_currency
ORDER BY NULL
Supposing that it's not important to you which order the rows of the query results return in.
I don't know what has made your query slower. More data? Fragmentation? New DB version?
However, I am surprised to see that there is no index really supporting the query. You should have a compound index starting with the column with highest cardinality (the date? well, you can try different column orders and see which index the DBMS picks for the query).
create index idx1 on transactions(created_at, transaction_type, transaction_status);
If created_at contains a date part, then you may want to create a computed column created_on only containing the date and index that instead.
You can even extend this index to a covering index (where clause fields followed by group by clause fields followed by select clause fields):
create index idx2 on transactions(created_at, transaction_type, transaction_status,
transaction_currency, transaction_total);
Related
I've got a report which I need to show the month and year profit from my transaction the query which I made and works is very slow and can not figure out how I can manage to change the query the way that consumes less time to load.
SELECT MONTH(MT4_TRADES.CLOSE_TIME) as MONTH
, YEAR(MT4_TRADES.CLOSE_TIME) as YEAR
, SUM(MT4_TRADES.SWAPS) as SWAPS
, SUM(MT4_TRADES.VOLUME)/100 as VOLUME
, SUM(MT4_TRADES.PROFIT) AS PROFIT
FROM MT4_TRADES
JOIN MT4_USERS
ON MT4_TRADES.LOGIN = MT4_USERS.LOGIN
WHERE MT4_TRADES.CMD < 2
AND MT4_TRADES.CLOSE_TIME <> "1970-01-01 00:00:00"
AND MT4_USERS.AGENT_ACCOUNT <> "1"
GROUP
BY YEAR(MT4_TRADES.CLOSE_TIME)
, MONTH(MT4_TRADES.CLOSE_TIME)
ORDER
BY YEAR
This is the full query, any suggestion would be highly appreciated.
This is the result of explain:
Echoing the comment from #Barmar, look at the EXPLAIN output to see query execution plan. Verify that suitable indexes are being used.
Likely the big rock in terms of performance is the "Using filesort" operation.
To get around that, we would need a suitable index available. and that would require some changes to the table. (The typical question on "improve query performance" topic on SO comes with a restrictions that we "can't add indexes or make any changes to the table".)
I'd be looking at a functional index (feature added in MySQL 8.0, for MySQL 5.7, I'd be looking at adding generated columns and including generated columns in a secondary index, featured added in MySQL 5.7)
CREATE INDEX `MT4_TRADES_ix2` ON MT4_TRADES ((YEAR(close_time)),(MONTH(close_time)))
I'd be tempted to go with a covering index, and also change the grouping to a single expression e.g. DATE_FORMAT(close_time,'%Y-%m')
CREATE INDEX `MT4_TRADES_ix3` ON MT4_TRADES ((DATE_FORMAT(close_time,'%Y-%m'))
,swaps,volume,profit,login,cmd,closetime)
from the query, it looks like login is going to be UNIQUE in MT4_USERS table, likely that's the PRIMARY KEY or a UNIQUE KEY, so an index is going to be available, but we're just guessing...
With suitable indexes available, we could so something like this:
SELECT DATE_FORMAT(close_time,'%Y-%m') AS close_year_mo
, SUM(IF(t.cmd < 2 AND t.close_time <> '1970-01-01', t.swaps ,NULL)) AS swaps
, SUM(IF(t.cmd < 2 AND t.close_time <> '1970-01-01', t.volume ,NULL))/100 AS volume
, SUM(IF(t.cmd < 2 AND t.close_time <> '1970-01-01', t.profit ,NULL)) AS profit
FROM MT4_TRADES t
JOIN MT4_USERS u
ON u.login = t.login
AND u.agent_account <> '1'
GROUP BY close_year_mo
ORDER BY close_year_mo
and we'd expect MySQL to do a loose index scan, with the EXPLAIN output top show "using index for group-by" and not show "Using filesort"
EDIT
For versions of MySQL before 5.7, we could create new columns, e.g.year_close and month_close, populate the columns with the results of expressions YEAR(close_time) and MONTH(close_time) (we could create BEFORE INSERT and BEFORE UPDATE triggers to handle that automatically for us)
Then we could create index with those columns as the leading columns
CREATE INDEX ... ON MT4_TRADES ( year_close, month_close, ... )
And then reference the new columns in the query
SELECT t.year_close AS `YEAR`
, t.month_close AS `MONTH`
FROM MT4_TRADES t
JOIN ...
WHERE ...
GROUP
BY t.year_close
, t.month_close
Ideally include in the index all of referenced columns from MT4_TRADES, to make a covering index for the query.
Long time lurker, first time questioner. ;-)
Using PHP 5.6 and MySQL Ver 14.14 Distrib 5.6.41, for Win64 (x86_64) Yeah, I know a little behind the times and we're working on updating. But that's where we are now. ;-)
Updates for questions asked:
The index is on the CreateDate. I thought there might be an issue with that column being a DateTime so I created another column that was just a date, set an index on that and retried, but it didn't have any effect.
ulc has 8965 rows total. With index searches 3787
et has 9530 rows. In the query that doesn't use the index, it searches just one row as it's searching on the primary key from the first query.
The formatting of the comparison date doesn't seem to matter. I've tried all sorts of formats, including just straight "2018-01-01 {00:00:00}'. No change.
I've got what I consider a weird one, but I suspect for someone here it's going to be a "duh!" one. I've got a query that includes a date range for the primary table and then goes to get other bits of data from other tables based on a set of unique ids from the first table. Don't worry, I'll have examples below. When I do the search on just the primary table, the range index works as expected and only searches the relevant rows. However, when I add in the next table with the ON clause, it ignores the index and searches all of the rows of the primary table. If I leave off the on clause, it goes back to using the index correctly. I tried using the FORCE INDEX (USE is ignored) and while that makes it use the index, it slows the query way down. Anyway, here are the queries:
Works:
select CreateDate
from ulc
Inner Join et
WHERE ulc.CreateDate >= STR_TO_DATE("01/01/2018", "%m/%d/%Y")
AND ulc.CreateDate <= STR_TO_DATE("08/02/2018", "%m/%d/%Y")
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE ulc range index_CreateDate index_CreateDate 5 NULL 3787 Using where; Using index
1 SIMPLE et index NULL index_BankProcessorProfile 5 NULL 9530 Using index; Using join buffer (Block Nested Loop)
Doesn't work:
select CreateDate
from ulc
Inner Join et on et.TranID = ulc.TranID
WHERE ulc.CreateDate >= STR_TO_DATE("01/01/2018", "%m/%d/%Y")
AND ulc.CreateDate <= STR_TO_DATE("08/02/2018", "%m/%d/%Y")
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE ulc ALL TranID,index_CreateDate NULL NULL NULL 8965 Using where
1 SIMPLE et eq_ref PRIMARY PRIMARY 8 showpro.ulc.TranID 1 Using index
For the second one I just added the on et.TranID = ulc.TranID
Additionally, if I change it from a range to a specific date, the index works as well.
Just guessing here without more data,but adding a new table to the JOIN changes the data distribution.
So if in the first case the WHERE condition return probably a small(relatively) percentage of the data,in you second case the optimizer decides you`ll get faster results without using the index since the same conditions might not be quite so selective for the new batch of data.
Add the table definitions and a COUNT for both queries,both total and based on your queries,for a better answer.
if you are using DateTime in your query its suggested to use "YYYY-MM-DD HH:MM:SS" in where class
if you are using Date in your query its suggested to use the format "YYYY-MM-DD" in your where class.you have used STR_TO_DATE("01/01/2018", "%m/%d/%Y") which will typecast to '2018-01-01' seems to be fine
you try to find the complexity of the query using EXPLAIN
explain select CreateDate
from ulc
Inner Join et on et.TranID = ulc.TranID
WHERE ulc.CreateDate >= STR_TO_DATE("01/01/2018", "%m/%d/%Y")
AND ulc.CreateDate <= STR_TO_DATE("08/02/2018", "%m/%d/%Y")
you can check if et.TranID and ulc.TranID have proper index or not
(I'm going to have to guess at some things, since you have not provided SHOW CREATE TABLE. As a 'long time lurker', you should have realized this.)
First guess is that TranID is not the PRIMARY KEY of ulc?
The solution is to add a "composite" INDEX(CreateDate, TranID) to ulc. (Actually, you should replace the existing INDEX(CreateDate) (Second guess is that you have that index now.)
Now I will try to explain why the first query was happy with INDEX(CreateDate) but the second was not.
In the first query, INDEX(CreateDate) is a "covering" index. That is, this index contains all the columns of ulc that are needed by the SELECT. So, it is almost guaranteed that using the index would be better than scanning the table. It will be a "range index scan" of that index.
The second query needs both CreateDate and TranID, so your index won't be "covering". There are two ways to perform the first part of the query. But first, note that (in InnoDB) a secondary index has all the columns of the PRIMARY KEY (third guess: it is (id)).
Range scan of the index. But, in order to get TranID, it first gets id, then does a lookup in the PRIMARY KEY/data to get TranID`. This process is more costly than simply staying in the index, so the Optimizer does not want to do it unless the estimated number of rows is 'small'.
Since 3787/8965 is not "small", the Optimizer decides that it is probably faster to scan ALL 8965 rows, filtering out the ones not needed.
My proposed index is 'covering', thereby avoiding the bounding back and forth between index and data. So, a range index scan is efficient.
Your observation that switching to a single date made use of the index -- Well, 1 row out of 8965 is 'small', so the index (and the bouncing) is deemed to be the faster way.
As for formatting of the date -- True, it does not matter. This is because the parser notices that STR_TO_DATE("01/01/2018", "%m/%d/%Y") is a constant that can be evaluated once, and does so.
My cookbook should take you directly to the composite index without having to scratch your head over this Question.
Your first query is a "cross join" since it has no ON clause to relate the tables together, and it will return about 35 million rows (9530*3787). The second query will have about 3787 rows, maybe fewer (if some of the joins fail to find a match).
"how little changes between the two queries" -- Never think that! The Optimizer will latch on to seemingly insignificant differences. SELECT CreateDate versus SELECT * -- a huge difference. Most of what I said about the 'first query' would be thrown out. Even changing to SELECT ChangeDate, x would be enough to make a big wrinkle. If the datatypes of TranID in the two tables differed enough, the indexes become useless. Etc, etc.
I have InnoDB table Items with multi-column non-unique index (group_id, type_id, expiry_date).
In case of making query
SELECT * FROM Items WHERE group_id = 1 AND type_id IN (1,2,3) AND expiry_date BETWEEN '2017-01-01' AND '2018-01-01'
Will index work fine as I'm using IN for second field of index and them additionally have range for 3rd will or I should benefit from spitting it to?
SELECT * FROM Items WHERE group_id = 1 AND type_id = 1 AND expiry_date BETWEEN '2017-01-01' AND '2018-01-01'
UNION
SELECT * FROM Items WHERE group_id = 1 AND type_id = 2 AND expiry_date BETWEEN '2017-01-01' AND '2018-01-01'
UNION
SELECT * FROM Items WHERE group_id = 1 AND type_id = 3 AND expiry_date BETWEEN '2017-01-01' AND '2018-01-01'
EXPLAIN shows identical query plans for both queries, but I have quite a small table for testing and not sure if query optimiser will act same way on big amount of data.
And how in general does index works for case using IN/OR/BETWEEN on 2 consequent field in index?
For your second query, use union all rather than union. You always want union all, unless you want to incur overhead for removing duplicates.
I would guess that you would benefit from the second query on larger data. I don't think MySQL supports skip-scans on indexes, so the index is only being used to group_id and type_id, but not directly for date.
What version of MySQL/MariaDB? There have been Optimizations recently; don't now if they would help here.
You have a possible bug -- in including an extra day in AND expiry_date BETWEEN '2017-01-01' AND '2018-01-01'. Change to
AND expiry_date >= '2017-01-01'
AND expiry_date < '2017-01-01' + INTERVAL 1 YEAR
(This count as a single 'range' test. BETWEEN is also a range test, but it is 'inclusive', hence the 'bug'.)
I would simply have two composite indexes (if I could not find the real answer to your Question):
(group_id, type_id, expiry_date)
(group_id, expiry_date)
Case 1: The Optimizer can get past IN: then the first index works.
Case 2: The optimizer cannot get past IN: Then one of these happens:
The IN list has only one item. Then it is converted from IN to =, and the first index is optimal, with all 3 columns used.
The optimizer decides the first index is better -- small IN list, large date range.
The optimizer decides that the date range is better (smaller range) and picks the second index.
The UNION approach may or may not be better in this situation. There is a lot of overhead of gathering the data into a temp table. The temp table was recently eliminated, but only for certain cases of UNION ALL.
Yes, use UNION ALL. That eliminates an sort and possibly an extra temp table.
Test with a large dataset. For under 1K rows, the performance is not likely to matter.
Rule of Thumb in ordering columns in an index:
= test(s)
IN, if any
one 'range' (BETWEEN, <, etc), if any
Consider making a "covering" index.
My Cookbook
There are other optimizations that depend on what is in the * in SELECT *.
Suppose you have a table with the following columns:
id
date
col1
I would like to be able to query this table with a specific id and date, and also order by another column. For example,
SELECT * FROM TABLE WHERE id = ? AND date > ? ORDER BY col1 DESC
According to this range documentation, an index will stop being used after it hits the > operator. But according to this order by documentation, an index can only be used to optimize the order by clause if it is ordering by the last column in the index. Is it possible to get an indexed lookup on every part of this query, or can you only get 2 of the 3? Can I do any better than index (id, date)?
Plan A: INDEX(id, date) -- works best if when it filters out a lot of rows, making the subsequent "filesort" not very costly.
Plan B: INDEX(col1), which may work best if very few rows are filtered by the WHERE clause. This avoids the filesort, but is not necessarily faster than the other choices here.
Plan C: INDEX(id, date, col1) -- This is a "covering" index if the query does not reference any other fields. The potential advantage here is to look only at the index, and not have to touch the data. If it applies, Plan C is better than Plan A.
You have not provided enough information to say which of these INDEXes will work best. Suggest you add C and B, if "covering" applies; else add A and B. The see which index the Optimizer picks. (There is still a chance that the Optimizer will not pick 'right'.)
(These three indexes are what my Index blog recommends.)
I have a SELECT statement which I would like to optimize. The mysql - order by optimization says that in some cases the index cannot be used to optimize the ORDER BY. Specifically the point:
You use ORDER BY on nonconsecutive parts of a key
SELECT * FROM t1 WHERE key2=constant ORDER BY key_part2;
makes me thinking, that this could be the case. I'm using following indexes:
UNIQUE KEY `met_value_index1` (`RTU_NB`,`DATETIME`,`MP_NB`),
KEY `met_value_index` (`DATETIME`,`RTU_NB`)
With following SQL-statement:
SELECT * FROM met_value
WHERE rtu_nb=constant
AND mp_nb=constant
AND datetime BETWEEN constant AND constant
ORDER BY mp_nb, datetime
Would it be enough delete the index met_value_index1 and create it with the new ordering RTU_NB, MP_NB, DATETIME?
Do I have to include RTU_NB into the ORDER BY clause?
Outcome: I have tried what #meriton suggested and added the index met_value_index2. The SELECT completed after 1.2 seconds, previously it completed after 5.06 seconds. The following doesn't belong to the question but as a side note: After some other tries I switched the engine from MyISAM to InnoDB – with rtu_nb, mp_nb, datetime as primary key – and the statement completed after 0.13 seconds!
I don't get your query. If a row must match mp_np = constant to be returned, all rows returned will have the same mp_nb, so including mp_nb in the order by clause has no effect. I recommend you use the semantically equivalent statement:
SELECT * FROM met_value
WHERE rtu_nb=constant
AND mp_nb=constant
AND datetime BETWEEN constant AND constant
ORDER BY datetime
to avoid needlessly confusing the query optimizer.
Now, to your question: A database can implement an order by clause without sorting if it knows that the underlying access will return the rows in proper order. In the case of indexes, that means that an index can assist with sorting if the rows matched by the where clause appear in the index in the order requested by the order by clause.
That is the case here, so the database could actually do an index range scan over met_value_index1 for the rows where rtu_nb=constant AND datetime BETWEEN constant AND constant, and then check whether mp_nb=constant for each of these rows, but that would amount to checking far more rows than necessary if mp_nb=constant has high selectivity. Put differently, an index is most useful if the matching rows are contiguous in the index, because that means the index range scan will only touch rows that actually need to be returned.
The following index will therefore be more helpful for this query:
UNIQUE KEY `met_value_index2` (`RTU_NB`,`MP_NB`, `DATETIME`),
as all matching rows will be right next to each other in the index and the rows appear in the index in the order the order by clause requests. I can not say whether the query optimizer is smart enough to get that, so you should check the execution plan.
I do not think it will use any index for the ORDER BY. But you should look at the execution plan. Or here.
The order of the fields as they appear in the WHERE clause must match the order in the index. So with your current query you need one index with the fields in order of rtu_nb, mp_nb, datetime.