I've got a query that I use often:
Site.where("mobile_visible = true AND (created_at > :date OR updated_at > :date)", :date => "12-04-30")
It produces this sql
SELECT `sites`.* FROM `sites` WHERE (mobile_visible = true AND (created_at > '12-04-30' OR updated_at > '12-04-30'))
I want to add an index or indexes to make this query more efficient. Should I add 3 indexes for the 3 columns separately or 1 index indexing all three columns separately?
The best approach is to construct an index that hits all elements of you're where clause -- not just one.
Databases generally can't use more than one index at a time for a particular portion of a query. If you add three indexes, the database will try to determine which one gives the greatest benefit and it will pick that one. It may or may not choose the best one depending on how the query execution plan is determined.
For this situation, I'd recommend adding the index:
add_index :sites, [:mobile_visible, :created_at, :updated_at]
Related
Suppose you have a table with the following columns:
id
date
col1
I would like to be able to query this table with a specific id and date, and also order by another column. For example,
SELECT * FROM TABLE WHERE id = ? AND date > ? ORDER BY col1 DESC
According to this range documentation, an index will stop being used after it hits the > operator. But according to this order by documentation, an index can only be used to optimize the order by clause if it is ordering by the last column in the index. Is it possible to get an indexed lookup on every part of this query, or can you only get 2 of the 3? Can I do any better than index (id, date)?
Plan A: INDEX(id, date) -- works best if when it filters out a lot of rows, making the subsequent "filesort" not very costly.
Plan B: INDEX(col1), which may work best if very few rows are filtered by the WHERE clause. This avoids the filesort, but is not necessarily faster than the other choices here.
Plan C: INDEX(id, date, col1) -- This is a "covering" index if the query does not reference any other fields. The potential advantage here is to look only at the index, and not have to touch the data. If it applies, Plan C is better than Plan A.
You have not provided enough information to say which of these INDEXes will work best. Suggest you add C and B, if "covering" applies; else add A and B. The see which index the Optimizer picks. (There is still a chance that the Optimizer will not pick 'right'.)
(These three indexes are what my Index blog recommends.)
SELECT * FROM messages_messages WHERE (from_user_id=? AND to_user_id=?) OR (from_user_id=? AND to_user_id=?) ORDER BY created_at DESC
I have another query, which is this:
SELECT COUNT(*) FROM messages_messages WHERE from_user_id=? AND to_user_id=? AND read_at IS NULL
I want to index both of these queries, but I don't want to create 2 separate indexes.
Right now, I'm using 2 indexes:
[from_user_id, to_user_id, created_at]
[from_user_id, to_user_id, read_at]
I was wondering if I could do this with one index instead of 2?
These are the only 2 queries I have for this table.
The docs explain fairly completely how MySQL uses indices. In particular, its optimizer can use any left prefix of a multi-column index. Therefore, you could drop either of your two existing indices, and the other would be eligible for use in both queries, though it would be more selective / useful for one than for the other.
In principle, it could be more beneficial to keep your first index, provided that the created_at column was indexed in descending order. In practice, MySQL allows you to specify index column order, but in fact implements only ascending order. Therefore, having created_at in your index probably doesn't help very much.
No, you need both indexes for these two queries if you want to optimize fully.
Once you reach the column used for either sorting or range comparison (IS [NOT] NULL counts as a range predicate for this purpose), you don't get any benefit from putting more columns in the index. In other words, your index can have:
Some columns that are used in equality predicates
One column that is used either in a range predicate, or to avoid a filesort -- but not both.
Extra columns used in neither searching nor sorting, but only for the sake of a covering index.
So you cannot make a four-column index that serves both queries.
The only way you can reduce this to one index, as #JohnBollinger says, is to make an index that optimizes for one query, and uses a subset of the index for the second query. But that won't work as well.
I have the following query:
SELECT dt_stamp
FROM claim_notes
WHERE type_id = 0
AND dt_stamp >= :dt_stamp
AND DATE( dt_stamp ) = :date
AND user_id = :user_id
AND note LIKE :click_to_call
ORDER BY dt_stamp
LIMIT 1
The claim_notes table has about half a million rows, so this query runs very slowly since it has to search against the unindexed note column (which I can't do anything about). I know that when the type_id, dt_stamp, and user_id conditions are applied, I'll be searching against about 60 rows instead of half a million. But MySQL doesn't seem to apply these in order. What I'd like to do is to see if there's a way to tell MySQL to only apply the note LIKE :click_to_call condition to the rows that meet the former conditions so that it's not searching all rows with this condition.
What I've come up with is this:
SELECT dt_stamp
FROM (
SELECT *
FROM claim_notes
WHERE type_id = 0
AND dt_stamp >= :dt_stamp
AND DATE( dt_stamp ) = :date
AND user_id = :user_id
)
AND note LIKE :click_to_call
ORDER BY dt_stamp
LIMIT 1
This works and is extremely fast. I'm just wondering if this is the right way to do this, or if there is a more official way to handle it.
It shouldn't be necessary to do this. The MySQL optimizer can handle it if you have multiple terms in your WHERE clause separated by AND. Basically, it knows how to do "apply all the conditions you can using indexes, then apply unindexed expressions only to the remaining rows."
But choosing the right index is important. A multi-column index is best for a series of AND terms than individual indexes. MySQL can apply index intersection, but that's much less effective than finding the same rows with a single index.
A few logical rules apply to creating multi-column indexes:
Conditions on unique columns are preferred over conditions on non-unique columns.
Equality conditions (=) are preferred over ranges (>=, IN, BETWEEN, !=, etc.).
After the first column in the index used for a range condition, subsequent columns won't use an index.
Most of the time, searching the result of a function on a column (e.g. DATE(dt_stamp)) won't use an index. It'd be better in that case to store a DATE data type and use = instead of >=.
If the condition matches > 20% of the table, MySQL probably will decide to skip the index and do a table-scan anyway.
Here are some webinars by myself and my colleagues at Percona to help explain index design:
Tools and Techniques for Index Design
MySQL Indexing: Best Practices
Advanced MySQL Query Tuning
Really Large Queries: Advanced Optimization Techniques
You can get the slides for these webinars for free, and view the recording for free, but the recording requires registration.
Don't go for the derived table solution as it is not performant. I'm surprised about the fact that having = and >= operators MySQL is going for the LIKE first.
Anyway, I'd say you could try adding some indexes on those fields and see what happens:
ALTER TABLE claim_notes ADD INDEX(type_id, user_id);
ALTER TABLE claim_notes ADD INDEX(dt_stamp);
The latter index won't actually improve the search on the indexes but rather the sorting of the results.
Of course, having an EXPLAIN of the query would help.
If I'm trying to increase the performance of a query that uses 4 different columns from a specific table, should I create 4 different indexes (one with each column individually) or should I create 1 index with all columns included?
One index with all 4 values is by my experience the fastest. If you use a where, try to put the columns in an order that makes it useful for the where.
An index with all four columns; the columns used in the WHERE should go first, and those for which you do == compare should go first of all.
Sometimes, giving priority to integer columns gives better results; YMMV.
So for example,
SELECT title, count(*) FROM table WHERE class = 'post' AND topic_id = 17
AND date > ##BeginDate and date < ##EndDate;
would have an index on: topic_id, post, date, and title, in this order.
The "title" in the index is only used so that the DB may find the value of "title" for those records matching the query, without the extra access to the data table.
The more balanced the distribution of the records on the first fields, the best results you will have (in this example, say 10% of the rows have topic_id = 17, you would discard the other 90% without ever having to run a string comparison with 'post' -- not that string comparisons are particularly costly. Depending on the data, you might find it better to index date first and post later, or even use date first as a MySQL PARTITION.
Single index is usually more effective than index merge, so if you have condition like f1 = 1 AND f2 = 2 AND f3 = 3 AND f4 = 4 single index would right decision.
To achieve best performance enumerate index fields in descending order of cardinality (count of distinct values), this will help to reduce analyzed rows count.
Index of less than 4 fields can be more effective, as it requires less memory.
http://www.mysqlperformanceblog.com/2008/08/22/multiple-column-index-vs-multiple-indexes/
I want to run a simple query to get the "n" oldest records in the table. (It has a creation_date column).
How can i get that without using "order-by". It is a very big table and using order by on entire table to get only "n" records is not so convincing.
(Assume n << size of table)
When you are concerned about performance, you should probably not discard the use of order by too early.
Queries like that can be implemende as Top-N query supported by an appropriate index, that's running very fast because it doesn't need to sort the entire table, not even the selecte rows, because the data is already sorted in the index.
example:
select *
from table
where A = ?
order by creation_date
limit 10;
without appropriate index it will be slow if you are having lot's of data. However, if you create an index like that:
create index test on table (A, creation_date );
The query will be able to start fetching the rows in the correct order, without sorting, and stop when the limit is reached.
Recipe: put the where columns in the index, followed by the order by columns.
If there is no where clause, just put the order by into the index. The order by must match the index definition, especially if there are mixed asc/desc orders.
The indexed Top-N query is the performance king--make sure to use them.
I few links for further reading (all mine):
How to use index efficienty in mysql query
http://blog.fatalmind.com/2010/07/30/analytic-top-n-queries/ (Oracle centric)
http://Use-The-Index-Luke.com/ (not yet covering Top-N queries, but that's to come in 2011).
I haven't tested this concept before but try and create an index on the creation_date column. Which will automatically sort the rows is ascending order. Then your select query can use the orderby creation_date desc with the Limit 20 to get the first 20 records. The database engine should realize the index has already done the work sorting and wont actually need to sort, because the index has already sorted it on save. All it needs to do is read the last 20 records from the index.
Worth a try.
Create an index on creation_date and query by using order by creation_date asc|desc limit n and the response will be very fast (in fact it cannot be faster). For the "latest n" scenario you need to use desc.
If you want more constraints on this query (e.g where state='LIVE') then the query may become very slow and you'll need to reconsider the indexing strategy.
You can use Group By if your grouping some data and then Having clause to select specific records.