I have a SELECT statement which I would like to optimize. The mysql - order by optimization says that in some cases the index cannot be used to optimize the ORDER BY. Specifically the point:
You use ORDER BY on nonconsecutive parts of a key
SELECT * FROM t1 WHERE key2=constant ORDER BY key_part2;
makes me thinking, that this could be the case. I'm using following indexes:
UNIQUE KEY `met_value_index1` (`RTU_NB`,`DATETIME`,`MP_NB`),
KEY `met_value_index` (`DATETIME`,`RTU_NB`)
With following SQL-statement:
SELECT * FROM met_value
WHERE rtu_nb=constant
AND mp_nb=constant
AND datetime BETWEEN constant AND constant
ORDER BY mp_nb, datetime
Would it be enough delete the index met_value_index1 and create it with the new ordering RTU_NB, MP_NB, DATETIME?
Do I have to include RTU_NB into the ORDER BY clause?
Outcome: I have tried what #meriton suggested and added the index met_value_index2. The SELECT completed after 1.2 seconds, previously it completed after 5.06 seconds. The following doesn't belong to the question but as a side note: After some other tries I switched the engine from MyISAM to InnoDB – with rtu_nb, mp_nb, datetime as primary key – and the statement completed after 0.13 seconds!
I don't get your query. If a row must match mp_np = constant to be returned, all rows returned will have the same mp_nb, so including mp_nb in the order by clause has no effect. I recommend you use the semantically equivalent statement:
SELECT * FROM met_value
WHERE rtu_nb=constant
AND mp_nb=constant
AND datetime BETWEEN constant AND constant
ORDER BY datetime
to avoid needlessly confusing the query optimizer.
Now, to your question: A database can implement an order by clause without sorting if it knows that the underlying access will return the rows in proper order. In the case of indexes, that means that an index can assist with sorting if the rows matched by the where clause appear in the index in the order requested by the order by clause.
That is the case here, so the database could actually do an index range scan over met_value_index1 for the rows where rtu_nb=constant AND datetime BETWEEN constant AND constant, and then check whether mp_nb=constant for each of these rows, but that would amount to checking far more rows than necessary if mp_nb=constant has high selectivity. Put differently, an index is most useful if the matching rows are contiguous in the index, because that means the index range scan will only touch rows that actually need to be returned.
The following index will therefore be more helpful for this query:
UNIQUE KEY `met_value_index2` (`RTU_NB`,`MP_NB`, `DATETIME`),
as all matching rows will be right next to each other in the index and the rows appear in the index in the order the order by clause requests. I can not say whether the query optimizer is smart enough to get that, so you should check the execution plan.
I do not think it will use any index for the ORDER BY. But you should look at the execution plan. Or here.
The order of the fields as they appear in the WHERE clause must match the order in the index. So with your current query you need one index with the fields in order of rtu_nb, mp_nb, datetime.
Related
I have problem with MySQL ORDER BY, it slows down query and I really don't know why, my query was a little more complex so I simplified it to a light query with no joins, but it stills works really slow.
Query:
SELECT
W.`oid`
FROM
`z_web_dok` AS W
WHERE
W.`sent_eRacun` = 1 AND W.`status` IN(8, 9) AND W.`Drzava` = 'BiH'
ORDER BY W.`oid` ASC
LIMIT 0, 10
The table has 946,566 rows, with memory taking 500 MB, those fields I selecting are all indexed as follow:
oid - INT PRIMARY KEY AUTOINCREMENT
status - INT INDEXED
sent_eRacun - TINYINT INDEXED
Drzava - VARCHAR(3) INDEXED
I am posting screenshoots of explain query first:
The next is the query executed to database:
And this is speed after I remove ORDER BY.
I have also tried sorting with DATETIME field which is also indexed, but I get same slow query as with ordering with primary key, this started from today, usually it was fast and light always.
What can cause something like this?
The kind of query you use here calls for a composite covering index. This one should handle your query very well.
CREATE INDEX someName ON z_web_dok (Drzava, sent_eRacun, status, oid);
Why does this work? You're looking for equality matches on the first three columns, and sorting on the fourth column. The query planner will use this index to satisfy the entire query. It can random-access the index to find the first row matching your query, then scan through the index in order to get the rows it needs.
Pro tip: Indexes on single columns are generally harmful to performance unless they happen to match the requirements of particular queries in your application, or are used for primary or foreign keys. You generally choose your indexes to match your most active, or your slowest, queries. Edit You asked whether it's better to create specific indexes for each query in your application. The answer is yes.
There may be an even faster way. (Or it may not be any faster.)
The IN(8, 9) gets in the way of easily handling the WHERE..ORDER BY..LIMIT completely efficiently. The possible solution is to treat that as OR, then convert to UNION and do some tricks with the LIMIT, especially if you might also be using OFFSET.
( SELECT ... WHERE .. = 8 AND ... ORDER BY oid LIMIT 10 )
UNION ALL
( SELECT ... WHERE .. = 9 AND ... ORDER BY oid LIMIT 10 )
ORDER BY oid LIMIT 10
This will allow the covering index described by OJones to be fully used in each of the subqueries. Furthermore, each will provide up to 10 rows without any temp table or filesort. Then the outer part will sort up to 20 rows and deliver the 'correct' 10.
For OFFSET, see http://mysql.rjweb.org/doc.php/index_cookbook_mysql#or
I'm trying to optimize a query that looks something like
SELECT DISTINCT(some_attribute)
FROM some_table
WHERE soft_deleted=0
I already have indices on some_attribute and soft_deleted individually.
The table from which I am pulling from is relatively large(>100GB), so this query can take tens of minutes. Would a multi-column index on some_attribute and soft_deleted make a significant impact or are there some other optimizations that I can make?
We are going to assume this table is using InnoDB storage engine, and assume that soft_deleted column is integer-ish datatype, and that some_attribute column is a smallish datatype column.
For the exact query text shown in the question, optimal execution plan will likely make use of an index with soft_deleted and some_attribute as the leading columns in that order, i.e.
... ON some_table (soft_deleted, some_attribute, ...)
The index will also contain the columns from the cluster index (even if they aren't listed), so we could also include the names of those columns in the index following the two leading columns. MySQL will also be able to make use of an index that includes additional columns, again, following the two leading columns.
Use EXPLAIN to see the execution plan.
I expect the optimal execution plan will include "Using index for GROUP BY" in the Extra column, and avoid a "Using filesort" operation.
With the index suggested above, compare the execution plan for this query:
SELECT t.some_attribute
FROM some_table t
WHERE t.soft_deleted = 0
GROUP
BY t.soft_deleted
, t.some_attribute
ORDER
BY NULL
If we already have an index defined with some_attribute as the leading column, and also including the soft_deleted column, e.g.
... ON some_table (some_attribute, soft_deleted, ... )
(an index on just the some_attribute column would be redundant, and could be dropped)
we might re-write the SQL and check the EXPLAIN output for a query like this:
SELECT t.some_attribute
FROM some_table t
GROUP
BY t.some_attribute
, IF(t.soft_deleted = 0,1,0)
HAVING t.soft_deleted = 0
ORDER
BY NULL
If we have a guarantee that soft_deleted only has two distinct values, then we could simplify to just
SELECT t.some_attribute
FROM some_table t
GROUP
BY t.some_attribute
, t.soft_deleted
HAVING t.soft_deleted = 0
ORDER
BY NULL
Optimal performance of a query against this table, to return the specified resultset, is likely going to be found in an execution plan that avoids a "Using filesort" operation and using an index to satisfy the DISTINCT/GROUP BY operation.
Note that DISTINCT is a keyword not a function. The parens around some_attribute have no effect, and can be omitted. (Including the spurious parens almost makes it look like we think DISTINCT is a function.)
I have MariaDB 10.1.14, For a long time I'm doing the following query without problems (it tooks about 3 seconds):
SELECT
sum(transaction_total) as sum_total,
count(*) as count_all,
transaction_currency
FROM
transactions
WHERE
DATE(transactions.created_at) = DATE(CURRENT_DATE)
AND transaction_type = 1
AND transaction_status = 2
GROUP BY
transaction_currency
Suddenly, I'm not sure exactly why, this query take about 13 seconds.
This is the EXPLAIN:
And those are the all indexes of transactions table:
What is the reason for the sudden query time increase? and how can I decrease it?
If you are adding more data to your table the query time will increase.
But you can do a few things to improve the performance.
Create a composite index for ( transaction_type, transaction_status, created_at)
Remove the DATE() functions (or any function) from your fields, because that doesn't allow engine use the index. CURRENT_DATE is a constant so there doesn't matter, but isn't necessary because already return DATE
if created_at isnt date you can use
created_at >= CURRENT_DATE and created_at < CURRENT_DATE + 1
or create a different field to only save the date part.
+1 to answer from #JuanCarlosOropeza, but you can go a little further with the index.
ALTER TABLE transactions ADD INDEX (
transaction_type,
transaction_status,
created_at,
transaction_currency,
transaction_total
);
As #RickJames mentioned in comments, the order of columns is important.
First, columns in equality comparisons
Next, you can index one column that is used for a range comparison (which is anything besides equality), or GROUP BY or ORDER BY. You have both range comparison and GROUP BY, but you can only get the index to help with one of these.
Last, other columns needed for the query, if you think you can get a covering index.
I describe more detail about index design in my presentation How to Design Indexes, Really (video: https://www.youtube.com/watch?v=ELR7-RdU9XU).
You're probably stuck with the "using temporary" since you have a range condition and also a GROUP BY referencing different columns. But you can at least eliminate the "using filesort" by this trick:
...
GROUP BY
transaction_currency
ORDER BY NULL
Supposing that it's not important to you which order the rows of the query results return in.
I don't know what has made your query slower. More data? Fragmentation? New DB version?
However, I am surprised to see that there is no index really supporting the query. You should have a compound index starting with the column with highest cardinality (the date? well, you can try different column orders and see which index the DBMS picks for the query).
create index idx1 on transactions(created_at, transaction_type, transaction_status);
If created_at contains a date part, then you may want to create a computed column created_on only containing the date and index that instead.
You can even extend this index to a covering index (where clause fields followed by group by clause fields followed by select clause fields):
create index idx2 on transactions(created_at, transaction_type, transaction_status,
transaction_currency, transaction_total);
1 - PRIMARY used in a secondary index, e.g. secondary index on (PRIMARY,column1)
2 - I'm aware mysql cannot continue using the rest of an index as soon as one part was used for a range scan, however: IN (...,...,...) is not considered a range, is it? Yes, it is a range, but I've read on mysqlperformanceblog.com that IN behaves differently than BETWEEN according to the use of index.
Could anyone confirm those two points? Or tell me why this is not possible? Or how it could be possible?
UPDATE:
Links:
http://www.mysqlperformanceblog.com/2006/08/10/using-union-to-implement-loose-index-scan-to-mysql/
http://www.mysqlperformanceblog.com/2006/08/14/mysql-followup-on-union-for-query-optimization-query-profiling/comment-page-1/#comment-952521
UPDATE 2: example of nested SELECT:
SELECT * FROM user_d1 uo
WHERE EXISTS (
SELECT 1 FROM `user_d1` ui
WHERE ui.birthdate BETWEEN '1990-05-04' AND '1991-05-04'
AND ui.id=uo.id
)
ORDER BY uo.timestamp_lastonline DESC
LIMIT 20
So, the outer SELECT uses timestamp_lastonline for sorting, the inner either PK to connect with the outer or birthdate for filtering.
What other options rather than this query are there if MySQL cannot use index on a range scan and for sorting?
The column(s) of the primary key can certainly be used in a secondary index, but it's not often worthwhile. The primary key guarantees uniqueness, so any columns listed after it cannot be used for range lookups. The only time it will help is when a query can use the index alone
As for your nested select, the extra complication should not beat the simplest query:
SELECT * FROM user_d1 uo
WHERE uo.birthdate BETWEEN '1990-05-04' AND '1991-05-04'
ORDER BY uo.timestamp_lastonline DESC
LIMIT 20
MySQL will choose between a birthdate index or a timestamp_lastonline index based on which it feels will have the best chance of scanning fewer rows. In either case, the column should be the first one in the index. The birthdate index will also carry a sorting penalty, but might be worthwhile if a large number of recent users will have birth dates outside of that range.
If you wish to control the order, or potentially improve performance, a (timestamp_lastonline, birthdate) or (birthdate, timestamp_lastonline) index might help. If it doesn't, and you really need to select based on the birthdate first, then you should select from the inner query instead of filtering on it:
SELECT * FROM (
SELECT * FROM user_d1 ui
WHERE ui.birthdate BETWEEN '1990-05-04' AND '1991-05-04'
) as uo
ORDER BY uo.timestamp_lastonline DESC
LIMIT 20
Even then, MySQL's optimizer might choose to rewrite your query if it finds a timestamp_lastonline index but no birthdate index.
And yes, IN (..., ..., ...) behaves differently than BETWEEN. Only the latter can effectively use a range scan over an index; the former would look up each item individually.
2.IN will obviously differ from BETWEEN. If you have an index on that column, BETWEEN will need to get the starting point and it's all done. If you have IN, it will look for a matching value in the index value by value thus it will look for the values as many times as there are values compared to BETWEEN's one time look.
yes #Andrius_Naruševičius is right the IN statement is merely shorthand for EQUALS OR EQUALS OR EQUALS has no inherent order whatsoever where as BETWEEN is a comparison operator with an implicit greater than or less than and therefore absolutely loves indexes
I honestly have no idea what you are talking about, but it does seem you are asking a good question I just have no notion what it is :-). Are you saying that a primary key cannot contain a second index? because it absolutely can. The primary key never needs to be indexed because it is ALWAYS indexed automatically, so if you are getting an error/warn (I assume you are?) about supplementary indices then it's not the second, third index causing it it's the PRIMARY KEY not needing it, and you mentioning that probably is the error. Having said that I have no idea what question you asked - it's my answer to my best guess as to your actual question.
I want to run a simple query to get the "n" oldest records in the table. (It has a creation_date column).
How can i get that without using "order-by". It is a very big table and using order by on entire table to get only "n" records is not so convincing.
(Assume n << size of table)
When you are concerned about performance, you should probably not discard the use of order by too early.
Queries like that can be implemende as Top-N query supported by an appropriate index, that's running very fast because it doesn't need to sort the entire table, not even the selecte rows, because the data is already sorted in the index.
example:
select *
from table
where A = ?
order by creation_date
limit 10;
without appropriate index it will be slow if you are having lot's of data. However, if you create an index like that:
create index test on table (A, creation_date );
The query will be able to start fetching the rows in the correct order, without sorting, and stop when the limit is reached.
Recipe: put the where columns in the index, followed by the order by columns.
If there is no where clause, just put the order by into the index. The order by must match the index definition, especially if there are mixed asc/desc orders.
The indexed Top-N query is the performance king--make sure to use them.
I few links for further reading (all mine):
How to use index efficienty in mysql query
http://blog.fatalmind.com/2010/07/30/analytic-top-n-queries/ (Oracle centric)
http://Use-The-Index-Luke.com/ (not yet covering Top-N queries, but that's to come in 2011).
I haven't tested this concept before but try and create an index on the creation_date column. Which will automatically sort the rows is ascending order. Then your select query can use the orderby creation_date desc with the Limit 20 to get the first 20 records. The database engine should realize the index has already done the work sorting and wont actually need to sort, because the index has already sorted it on save. All it needs to do is read the last 20 records from the index.
Worth a try.
Create an index on creation_date and query by using order by creation_date asc|desc limit n and the response will be very fast (in fact it cannot be faster). For the "latest n" scenario you need to use desc.
If you want more constraints on this query (e.g where state='LIVE') then the query may become very slow and you'll need to reconsider the indexing strategy.
You can use Group By if your grouping some data and then Having clause to select specific records.