I have the following query:
SELECT * FROM items
WHERE collection_id = 10
ORDER BY item_order ASC,id DESC
LIMIT 25
Right now I have two indexes, one on collection_id,id and another on collection_id,item_order.
item_order can be null if the user has not specified an order for the items, in which case I want them sorted by id.
Is my index setup optimal, or is there a way to have one three column index that handles both sorting by id and item_order? It seem redundant to index the "collection_id" column two times..
The optimal index for this query is (collection_id,id,item_order).
MySQL will only use one index per table per query, and it looks for matching indexes by order of columns in the query. The easiest way to determine what an index should look like for this query is by looking at the WHERE conditions followed by the ORDER BY conditions.
When in doubt, use EXPLAIN liberally and make sure it's not unnecessarily creating temporary tables or using filesort.
Using EXPLAIN before a select statement will tell you which of your indexes it is using. The official documentation is here:
MySQL 5: Using EXPLAIN
A good tutorial is here:
Optimizing MySQL Queries and Indexes
For the query above, the ideal index will be along the lines of (collection_id, item_order, id).
Indexing the same column multiple times is by no means a waste of time - so long as you don't end up with two identical indexes, or indexes which are never used.
Related
Let's say I have transactions table in a mysql database, I want to create a multi-column index on 3 columns reference, kind and status.
I have this request that I am trying to speed up :
Transaction.where(parent_ref: merchant_ref, kind: 'OFFER',status: 1) which performs the following SQL :
SELECT `merchant_transactions`.* FROM `merchant_transactions`
WHERE `merchant_transactions`.`parent_ref` = '1-0001'
AND `merchant_transactions`.`kind` = 'BATCH_BET'
AND `merchant_transactions`.`status` = 1
The parent_ref column can take a really wide variety of values so if I have 1M records in that table I will have 500K different references. Status can only take 6 different values and kind only 3.
What will be the best order for the columns in my index for optimal performance.
Does the spread of values in my columns have an impact ? intuitively I would say that I would need to start with the column with the lowest spread of values. In that example I would thus do index(kind, status, reference).
Are there any other factors related to the values in my tables to take into account when figuring out the order of columns for my index ?
Okay, now that you've shared the query, we can see that you reference all three columns in your WHERE clause, all three predicates are doing equality comparisons, and the expression in the WHERE clause uses only AND operations.
There are no more exotic parts of the query like JOIN, GROUP BY, ORDER BY, DISTINCT, etc. to complicate the optimization of this query.
Given these conditions, my experience is that the order of columns hardly matters. If there's any difference, it's barely perceptible.
I'd put the column that is unique first, based on some assumption that it's most selective and therefore narrows down the search most effectively. But I'm not sure it would make any noticeable difference either way.
In your example, each of 3 columns is tested with =, and they are ANDd together. So build a 3-column composite with those 3 columns. The order of the columns will not matter for this query. Contrary to what others may say, "cardinality" of the individual columns does not matter in a composite INDEX.
See my indexing cookbook
SELECT * FROM messages_messages WHERE (from_user_id=? AND to_user_id=?) OR (from_user_id=? AND to_user_id=?) ORDER BY created_at DESC
I have another query, which is this:
SELECT COUNT(*) FROM messages_messages WHERE from_user_id=? AND to_user_id=? AND read_at IS NULL
I want to index both of these queries, but I don't want to create 2 separate indexes.
Right now, I'm using 2 indexes:
[from_user_id, to_user_id, created_at]
[from_user_id, to_user_id, read_at]
I was wondering if I could do this with one index instead of 2?
These are the only 2 queries I have for this table.
The docs explain fairly completely how MySQL uses indices. In particular, its optimizer can use any left prefix of a multi-column index. Therefore, you could drop either of your two existing indices, and the other would be eligible for use in both queries, though it would be more selective / useful for one than for the other.
In principle, it could be more beneficial to keep your first index, provided that the created_at column was indexed in descending order. In practice, MySQL allows you to specify index column order, but in fact implements only ascending order. Therefore, having created_at in your index probably doesn't help very much.
No, you need both indexes for these two queries if you want to optimize fully.
Once you reach the column used for either sorting or range comparison (IS [NOT] NULL counts as a range predicate for this purpose), you don't get any benefit from putting more columns in the index. In other words, your index can have:
Some columns that are used in equality predicates
One column that is used either in a range predicate, or to avoid a filesort -- but not both.
Extra columns used in neither searching nor sorting, but only for the sake of a covering index.
So you cannot make a four-column index that serves both queries.
The only way you can reduce this to one index, as #JohnBollinger says, is to make an index that optimizes for one query, and uses a subset of the index for the second query. But that won't work as well.
I have the following query:
SELECT *
FROM shop_user_member_spots
WHERE delete_flag = 0
ORDER BY date_spotted desc
LIMIT 10
When run, it takes a few minutes. The table is around 2.5 million rows.
Here is the table (not designed by me but I am able to make some changes):
And finally, here are the indexes (again, not made by me!):
I've been attempting to get this query running fast for hours now, to no avail.
Here is the output of EXPLAIN:
Any help / articles are much appreciated.
Based on your query, it seems the index you would want would be on (delete_flag, date_spotted). You have an index that has the two columns, but the id column is in between them, which would make the index unhelpful in sorting based on date_spotted. Now whether mysql will use the index based on Zohaib's answer I can't say (sorry, I work most often with SQL Server).
The problem that I see in the explain plan is that the index on spotted date is not being used, insted filesort mechanism is being used to sort (as far as index on delete flag is concerned, we actually gain performance benefit of index if the column on which index is being created contains unique values)
the mysql documentation says
Index will not used for order by clause if
The key used to fetch the rows is not the same as the one used in the ORDER BY:
SELECT * FROM t1 WHERE key2=constant ORDER BY key1;
http://dev.mysql.com/doc/refman/5.0/en/order-by-optimization.html
I guess same is the case here. Although you can try using Force Index
I want to run a simple query to get the "n" oldest records in the table. (It has a creation_date column).
How can i get that without using "order-by". It is a very big table and using order by on entire table to get only "n" records is not so convincing.
(Assume n << size of table)
When you are concerned about performance, you should probably not discard the use of order by too early.
Queries like that can be implemende as Top-N query supported by an appropriate index, that's running very fast because it doesn't need to sort the entire table, not even the selecte rows, because the data is already sorted in the index.
example:
select *
from table
where A = ?
order by creation_date
limit 10;
without appropriate index it will be slow if you are having lot's of data. However, if you create an index like that:
create index test on table (A, creation_date );
The query will be able to start fetching the rows in the correct order, without sorting, and stop when the limit is reached.
Recipe: put the where columns in the index, followed by the order by columns.
If there is no where clause, just put the order by into the index. The order by must match the index definition, especially if there are mixed asc/desc orders.
The indexed Top-N query is the performance king--make sure to use them.
I few links for further reading (all mine):
How to use index efficienty in mysql query
http://blog.fatalmind.com/2010/07/30/analytic-top-n-queries/ (Oracle centric)
http://Use-The-Index-Luke.com/ (not yet covering Top-N queries, but that's to come in 2011).
I haven't tested this concept before but try and create an index on the creation_date column. Which will automatically sort the rows is ascending order. Then your select query can use the orderby creation_date desc with the Limit 20 to get the first 20 records. The database engine should realize the index has already done the work sorting and wont actually need to sort, because the index has already sorted it on save. All it needs to do is read the last 20 records from the index.
Worth a try.
Create an index on creation_date and query by using order by creation_date asc|desc limit n and the response will be very fast (in fact it cannot be faster). For the "latest n" scenario you need to use desc.
If you want more constraints on this query (e.g where state='LIVE') then the query may become very slow and you'll need to reconsider the indexing strategy.
You can use Group By if your grouping some data and then Having clause to select specific records.
A quick question. I have a simple table with this structure:
USERS_TABLE = {id}, name, number_field_1, number_field_2, number_field_3, number_field_4, number_field_5
Sorting is a major feature, one field at a time at the moment. Table can have up to 50 million records. Most selects are done using the "id" field, which is the primary key and it's indexed.
A sample query:
SELECT * FROM USERS_TABLE WHERE id=10 ORDER BY number_field_1 DESC LIMIT 100;
The question:
Do I create separate indexes for each "number_field_*" to optimize ORDER BY statements, or is there a better way to optimize? Thanks
There's no silver bullet here, you will have to test this for your data and your queries.
If all your queries (e.g. WHERE id=10) returns few rows, it might not be worth indexing the order by columns. On the other hand, if they return a lot of rows, it might help greatly.
If you always order by atleast one of the fields, consider creating an index on that column, or a compound index on some of these columns - keep in mind that if you have a compound index on (num_field_1,num_field_2) and you order only by num_field_2 , that index will not be used. You have to include the leftmost fields of the index to make use of it.
On the other hand, you seem to have a lot of different fields you order by, the drawback of creating an index on each an every one of them is your indexes will be much larger, and your inserts/deletes/updates might start to get slower.
Basically - there is no shortcut. You'll have to play around with which indexes works best for your queries and tune accordingly, while carefully analyzing your queries with EXPLAIN