When I manually create tables in MySQL, I add indexes one at a time for each field that I think I will use for queries.
When I use phpMyAdmin to create tables for me, and I select my indexes in the create-table form, I see that phpMyAdmin combines my indexes into 1 (plus my primary).
What's the difference? Is one better than the other? In which case?
Thanks!
Neither is a particularly good strategy, but if I had to choose I'd pick the multiple single indexes.
The reason is that an index can only be used if you use all the fields in any complete prefix of the index. If you have an index (a, b, c, d, e, f) then this works fine for a query that filters on a or a query that filter on both a and b, but it will be useless for a query filtering only on c.
There's no easy rule that always works for choosing the best indexes. You need to look at the types of queries you are making and choose the indexes that would speed up those particular queries. If you think carefully about the order of the columns you can find a small number of indexes that will be useful for multiple different queries. For example if in one query you filter on both a and b, and another query you filter on only b then an index on (b, a) will be usable by both queries but an index an (a, b) will not.
This actually depends on your queries. Some queries make better use of multicolumn indexes, some not.
EXPLAIN is your friend.
http://dev.mysql.com/doc/refman/5.6/en/explain.html
Also a very good resource is here:
http://dev.mysql.com/doc/refman/5.6/en/optimization-indexes.html
Related
Consider we have A,B,C,D,E,F,G,H columns in my table and if I make composite indexing on column ABCDE because these column are coming in where clause and then I want composite indexing on ABCEF then I create new composite indexing on ABCEF in same table but in a different query, we want indexing on column ABCGH for that i make another composite indexing in same table,
So my question is can we make too many composite indexes as per our requirements because sometimes our column in where clause gets change and we have to increase its performance so tell me is there any other way to optimise the query or we can make multiple composite indexes as per the requirements.
I tried multiple composite indexing and it did not give me any problem till now but still i want to know from all of you that will it give me any problem in future or is it ok to use only composite indexes and not single index.
Your answers will help me alot waiting for your replies.
Thanks in advance.
You can have as many as you want. However, each additional index has a cost when updating, inserting or deleting. The trick is to
find common segments and make indexes for those.
Or create them as required when queries are too slow.
As an example, if you are "needing" indexes for ABCDE, ABDEF, and ABGIH then create an index on just AB
InnoDB supports up to 64 indexes per table (cf. https://dev.mysql.com/doc/refman/8.0/en/innodb-limits.html).
If you try to create a composite index for every permutation of N columns, you would need N-factorial indexes. So for 8 columns, you would need 40,320 indexes. Clearly this is more than InnoDB supports.
You probably don't need that many indexes. In practice, I've rarely seen more than 6 indexes in a given table. All queries that are needed are optimized by one of those.
I understand you said sometimes you change the terms in your query's WHERE clause, so it might need a composite index with different columns.
You can rely on indexes that have a subset of all the columns that would be optimal. That won't be 100% optimized, but it will be better than no index.
You can't predict the optimal set of indexes for a given query until you write that query.
There is a limit of 64 secondary indexes (at least in InnoDB).
Order the indexes so that columns being tested with = come first. (The order of those columns in the INDEX does not matter.)
The leftmost columns in an index are the most important.
There is little or now use in including more than one column that will be searched by a range.
Study your likely queries, and find the most common combinations of 2 or 3 columns; build indexes starting with those.
In your two examples (ABCDEFGH and ABCEF), ABC would work for both (assuming at least A and B are tested with =). If you do throw on more columns, that one INDEX can still be used for both cases.
Maybe you would what to declare both ABCDEFGH and BCEFA; This handles your ABCDEF case, plus cases that have B, but not A. (Remember 'leftmost'.)
Use the SlowLog to find the slowest queries and make better indexes for them.
More on indexing: Index Cookbook
Each index requires space on the disk to be stored, and time to be updated every time you update(/insert/delete) an indexed column value.
So as long as you don't run out of storage or write operations are too slow, because you have to update too many indexes, you are not limited to create as many specific indexes as you want.
This depends on your use case and should be measured with production like data.
A common solution would be to create one index specific for your most important query e.g. in your case ABCDE.
Other queries can still use the as many columns from left to right until there is a first difference. e.g. a query searching for ABCEF could still use ABC on the previous mentioned index.
To also utilise column E you could add a where condition to D to your query in a way you know it matches all values e.g. D < 100 if you know there are only values 1-99.
Let's say you have a table with columns A and B, among others. You create a multi-column index (A, B) on the table.
Does your query have to take the order of indexes into account? For example,
select * from MyTable where B=? and A in (?, ?, ?);
In the query we put B first and A second. But the index is (A, B). Does the order matter?
Update: I do know that the order of indexes matters significantly in terms of the leftmost prefix rule. However, does it matter which column comes first in the query itself?
In this case no but I recommend to use the EXPLAIN keyword and you will see which optimizations MySQL will use (or not).
The order of columns in the index can affect the way the MySQL optimiser uses the index. Specifically, MySQL can use your compound index for queries on column A because it's the first part of the compound index.
However, your question refers to the order of column references in the query. Here, the optimiser will take care of the references appropriately, and the order is unimportant. The different clauses must come in a particular order to satisfy syntax rules, so you have little control anyway.
Mysql reference on multi-column index optimisation is here
You can test out specific queries of you think they are problems, but otherwise I wouldn't worry about this optimization. Your query will mostly likely be mangled from its original form by the query plan. That is to say MySQL should do a good job of planning how it will use the indices to optimize speed. This may require the conditions to be in a different order, but I doubt it. If MySQL actually did have to reorder the conditions for optimization it would be a very minor cost relative to the execution of the query (at least if the result set is large).
I very often search the table posts for values in the columns user+status and user+time.
SELECT * FROM `posts` WHERE `user`='xxx' and `status`='active'
SELECT * FROM `posts` WHERE `user`='xxx' and `time`>...
Thus I have set up two indices (user, status) and (user, time)
I'm aware, that writing processes are slowed down the more indices need to be updated. But I think in this case it is useful to have both indices, since reading operations outnumber writing operations by far.
Anyway, PHPMyAdmin gives a Warning saying "More than one index has been created for the column user". Can I just ignore this warning? I checked the Wordpress DB tables and saw that they have put a column at the second position, if it already had an index.
comment_approved_date_gmt = INDEX(comment_approved, comment_date_gmt)
comment_date_gmt = INDEX(comment_date_gmt)
Why don't they use only one two column index (INDEX(comment_date_gmt, comment_approved)), that would save INDEX(comment_date_gmt)? and why is it disadvantageous to have two indices starting with the same column-name?
Is there a general rule, which column should go first in my query? For example the one with the lowest number of different entries (e.G. status) and afterwards the one with a higher number of different values (e.g. user names)
Yes, the order of columns in an index matters.
Think of an analogy to a telephone book. It's like an index on (last_name, first_name). Looking up a person by last name, you use the sorted order of the phone book to help you find them quickly.
But if you only know the person's first name, they are scattered throughout the book. To find one, you'd have to search the book page by page.
Yes, indexes can be redundant.
Any query that is searching for last_name can use a single-column index on (last_name), or it can get the same benefit from a two-column index on (last_name, first_name). So why create both indexes?
There's a tool pt-duplicate-key-checker that can help you identify redundant indexes. I've never come across a database that didn't have at least a few such indexes.
phpMyAdmin is wrong.
If phpMyAdmin is warning about the indexes (user, status) and (user, time), then it's being over-zealous, because these indexes are not redundant with respect to each other. Basically, an index is redundant if its columns comprise a left-prefix of the columns in another index. So an index (A) is redundant with respect to an index (A, B), but an index (A, C) is distinct from (A, B) and both may be used by different queries.
PS: I cover these points and more in my presentation How to Design Indexes, Really.
I feel that the ordering of columns in a SQL query is a premature optimisation, which according to Knuth, is the root of all evil. You should program for maintenance, not for optimisation and let the optimiser take care of the speed.
Consider fetching data with
SELECT * FROM table WHERE column1='XX' && column2='XX'
Mysql will filter the results matching with the first part of WHERE clause, then with the second part. Am I right?
Imagine the first part match 10 records, and adding the second part filters 5 records. Is it needed to INDEX the second column too?
You are talking about short circuit evaluation. A DBMS has cost-based optimizer. There is no guarantee wich of both conditions will get evaluated first.
To apply that to your question: Yes, it might be benificial to index your second column.
Is it used regulary in searches?
What does the execution plan tell you?
Is the access pattern going to change in the near future?
How many records does the table contain?
Would a Covering Index would be a better choice?
...
Indexes are optional in MySQL, but they can increase performance.
Currently, MySQL can only use one index per table select, so with the given query, if you have an index on both column1 and column2, MySQL will try to determine the index that will be the most beneficial, and use only one.
The general solution, if the speed of the select is of utmost importance, is to create a multi-column index that includes both columns.
This way, even though MySQL could only use one index for the table, it would use the multi-column index that has both columns indexed, allowing MySQL to quickly filter on both criteria in the WHERE clause.
In the multi-column index, you would put the column with the highest cardinality (the highest number of distinct values) first.
For even further optimization, "covering" indexes can be applied in some cases.
Note that indexes can increase performance, but with some cost. Indexes increase memory and storage requirements. Also, when updating or inserting records into a table, the corresponding indexes require maintenance. All of these factors must be considered when implementing indexes.
Update: MySQL 5.0 can now use an index on more than one column by merging the results from each index, with a few caveats.
The following query is a good candidate for Index Merge Optimization:
SELECT * FROM t1 WHERE key1=1 AND key2=1
When processing such query RDBMS will use only one index. Having separate indices on both colums will allow it to choose one that will be faster.
Whether it's needed depends on your specific situation.
Is the query slow as it is now?
Would it be faster with index on another column?
Would it be faster with one index containing both
columns?
You may need to try and measure several approaches.
You don't have to INDEX the second column but it may speed up your SELECT
If my User table has several fields that are queryable (say DepartmentId, GroupId, RoleId) will it make any speed difference if I create an index for each combination of those fields?
By "queryable", I'm referring to a query screen where the end user can select records based on Department, Group or Role by selecting from a drop-down.
At the moment, I have a index on DepartmentId, GroupId and RoleId. That's a single non-unique index per field.
If an end user selects "anyone in Group B", the SQL looks like:
select * from User where GroupId = 2
Having an index on GroupId should speed that up.
But if the end user select "anyone in Group B and in Role C", the SQL would look like this:
select * from User where GroupId = 2 and RoleId = 3
Having indexes on GroupId and RoleId individually may not make any difference, right?
A better index for that search would be if I had one index spanning both GroupId and RoleId.
But if that's the case, than that would mean that I would need to have an index for every combination of queryable fields. So I would need all these indexes:
DepartmentId
GroupId
RoleId
DepartmentId and GroupId
DepartmentId and RoleId
GroupId and RoleId
Department Id, GroupId and RoleId
Can anyone shed some light on this? I'm using MySQL if that makes a difference.
A multi-column index can be used for any left prefix of that index. So, an index on (A, B, C) can be used for queries on (A), (A, B) and (A, B, C), but it cannot, for example, be used for queries on (B) or (B, C).
If the columns are all indexed individually, MySQL (5.0 or later) may also use Index Merge Optimization.
Generally speaking, indexes will increase query speed, but decrease insert/update speed, and increase disk space/overhead. So asking if you should index each combination of columns is like asking if you should optimize every function in your code. It may make some things faster, or it may barely help, and it might just hurt more than it helps.
The effectiveness of indexes depends on:
Percentage of SELECTs vs. INSERTs and UPDATEs
The specifics of the SELECT queries, and whether they use JOINs
Size of table being indexed
RAM and processor speed
MySQL settings for how much RAM to use, etc
So, it's hard to give a general answer. The basic sound advice would be: Add indexes if queries are too slow. And remember to use EXPLAIN to see which indexes to add. Note that this is kind of like the database version of the general advice: Profile your app before spending time on optimization.
My experience is with SQL Server rather than mysql and it is possible that this makes a difference. However, in general, the engine can use multiple indexes on a single query. While there are certainly benefits to having a more comprehensive single index(it provides a greater boost, especially if it forms a covering index), you will still have a benefit from using an index on each field of the query.
Furthermore, keep in mind that each index must be maintained separately, so you will suffer a performance reduction on write operations as your number of indexes grow.
Create indexes carefully!
I would suggest to collect queries statistics and decide which column is more often used whilst search so you can create Clustered index on this particular column (anyway when you are creating Index on multiple columns - physically data can be ordered only by a single column)
Also please be aware that Clustered index could significantly decrease performance of UPDATE/INSERT/DELETE queries because it causes physical data reordering.
What I have found is that it's best to index anything the user will search on. I have actually found better performance by creating indexes with multiple columns if a search for those columns will be executed.
For instance, if someone can search on both roleid and groupid at the same time, having an index with both of those columns will actually be a little faster than having just one index on each one. However, having an index on each queryable column can still good, since you may miss a combination of columns.
A key consideration is to see how much space the indexes will take up. Since these columns are integer fields, it shouldn't be a big deal. A little time creating indexes could reap significant benefits.
The best thing to do will be to experiment. Do a search on multiple columns and time it, then add a combined index and rerun it.
Remove all indexes and run CRUD statements against the table using a free tool called "SQL sentry plan explorer".
It will show you which indexes are necessary.
Indexes are created based on CRUD and not on the table by itself.