Too many composite(multi column )indexing is ok? - mysql

Consider we have A,B,C,D,E,F,G,H columns in my table and if I make composite indexing on column ABCDE because these column are coming in where clause and then I want composite indexing on ABCEF then I create new composite indexing on ABCEF in same table but in a different query, we want indexing on column ABCGH for that i make another composite indexing in same table,
So my question is can we make too many composite indexes as per our requirements because sometimes our column in where clause gets change and we have to increase its performance so tell me is there any other way to optimise the query or we can make multiple composite indexes as per the requirements.
I tried multiple composite indexing and it did not give me any problem till now but still i want to know from all of you that will it give me any problem in future or is it ok to use only composite indexes and not single index.
Your answers will help me alot waiting for your replies.
Thanks in advance.

You can have as many as you want. However, each additional index has a cost when updating, inserting or deleting. The trick is to
find common segments and make indexes for those.
Or create them as required when queries are too slow.
As an example, if you are "needing" indexes for ABCDE, ABDEF, and ABGIH then create an index on just AB

InnoDB supports up to 64 indexes per table (cf. https://dev.mysql.com/doc/refman/8.0/en/innodb-limits.html).
If you try to create a composite index for every permutation of N columns, you would need N-factorial indexes. So for 8 columns, you would need 40,320 indexes. Clearly this is more than InnoDB supports.
You probably don't need that many indexes. In practice, I've rarely seen more than 6 indexes in a given table. All queries that are needed are optimized by one of those.
I understand you said sometimes you change the terms in your query's WHERE clause, so it might need a composite index with different columns.
You can rely on indexes that have a subset of all the columns that would be optimal. That won't be 100% optimized, but it will be better than no index.
You can't predict the optimal set of indexes for a given query until you write that query.

There is a limit of 64 secondary indexes (at least in InnoDB).
Order the indexes so that columns being tested with = come first. (The order of those columns in the INDEX does not matter.)
The leftmost columns in an index are the most important.
There is little or now use in including more than one column that will be searched by a range.
Study your likely queries, and find the most common combinations of 2 or 3 columns; build indexes starting with those.
In your two examples (ABCDEFGH and ABCEF), ABC would work for both (assuming at least A and B are tested with =). If you do throw on more columns, that one INDEX can still be used for both cases.
Maybe you would what to declare both ABCDEFGH and BCEFA; This handles your ABCDEF case, plus cases that have B, but not A. (Remember 'leftmost'.)
Use the SlowLog to find the slowest queries and make better indexes for them.
More on indexing: Index Cookbook

Each index requires space on the disk to be stored, and time to be updated every time you update(/insert/delete) an indexed column value.
So as long as you don't run out of storage or write operations are too slow, because you have to update too many indexes, you are not limited to create as many specific indexes as you want.
This depends on your use case and should be measured with production like data.
A common solution would be to create one index specific for your most important query e.g. in your case ABCDE.
Other queries can still use the as many columns from left to right until there is a first difference. e.g. a query searching for ABCEF could still use ABC on the previous mentioned index.
To also utilise column E you could add a where condition to D to your query in a way you know it matches all values e.g. D < 100 if you know there are only values 1-99.

Related

Improve Mysql Select Query Performance [duplicate]

I've been using indexes on my MySQL databases for a while now but never properly learnt about them. Generally I put an index on any fields that I will be searching or selecting using a WHERE clause but sometimes it doesn't seem so black and white.
What are the best practices for MySQL indexes?
Example situations/dilemmas:
If a table has six columns and all of them are searchable, should I index all of them or none of them?
What are the negative performance impacts of indexing?
If I have a VARCHAR 2500 column which is searchable from parts of my site, should I index it?
You should definitely spend some time reading up on indexing, there's a lot written about it, and it's important to understand what's going on.
Broadly speaking, an index imposes an ordering on the rows of a table.
For simplicity's sake, imagine a table is just a big CSV file. Whenever a row is inserted, it's inserted at the end. So the "natural" ordering of the table is just the order in which rows were inserted.
Imagine you've got that CSV file loaded up in a very rudimentary spreadsheet application. All this spreadsheet does is display the data, and numbers the rows in sequential order.
Now imagine that you need to find all the rows that have some value "M" in the third column. Given what you have available, you have only one option. You scan the table checking the value of the third column for each row. If you've got a lot of rows, this method (a "table scan") can take a long time!
Now imagine that in addition to this table, you've got an index. This particular index is the index of values in the third column. The index lists all of the values from the third column, in some meaningful order (say, alphabetically) and for each of them, provides a list of row numbers where that value appears.
Now you have a good strategy for finding all the rows where the value of the third column is "M". For instance, you can perform a binary search! Whereas the table scan requires you to look N rows (where N is the number of rows), the binary search only requires that you look at log-n index entries, in the very worst case. Wow, that's sure a lot easier!
Of course, if you have this index, and you're adding rows to the table (at the end, since that's how our conceptual table works), you need to update the index each and every time. So you do a little more work while you're writing new rows, but you save a ton of time when you're searching for something.
So, in general, indexing creates a tradeoff between read efficiency and write efficiency. With no indexes, inserts can be very fast -- the database engine just adds a row to the table. As you add indexes, the engine must update each index while performing the insert.
On the other hand, reads become a lot faster.
Hopefully that covers your first two questions (as others have answered -- you need to find the right balance).
Your third scenario is a little more complicated. If you're using LIKE, indexing engines will typically help with your read speed up to the first "%". In other words, if you're SELECTing WHERE column LIKE 'foo%bar%', the database will use the index to find all the rows where column starts with "foo", and then need to scan that intermediate rowset to find the subset that contains "bar". SELECT ... WHERE column LIKE '%bar%' can't use the index. I hope you can see why.
Finally, you need to start thinking about indexes on more than one column. The concept is the same, and behaves similarly to the LIKE stuff -- essentially, if you have an index on (a,b,c), the engine will continue using the index from left to right as best it can. So a search on column a might use the (a,b,c) index, as would one on (a,b). However, the engine would need to do a full table scan if you were searching WHERE b=5 AND c=1)
Hopefully this helps shed a little light, but I must reiterate that you're best off spending a few hours digging around for good articles that explain these things in depth. It's also a good idea to read your particular database server's documentation. The way indices are implemented and used by query planners can vary pretty widely.
Check out presentations like More Mastering the Art of Indexing.
Update 12/2012: I have posted a new presentation of mine: How to Design Indexes, Really. I presented this in October 2012 at ZendCon in Santa Clara, and in December 2012 at Percona Live London.
Designing the best indexes is a process that has to match the queries you run in your app.
It's hard to recommend any general-purpose rules about which columns are best to index, or whether you should index all columns, no columns, which indexes should span multiple columns, etc. It depends on the queries you need to run.
Yes, there is some overhead so you shouldn't create indexes needlessly. But you should create the indexes that give benefit to the queries you need to run quickly. The overhead of an index is usually far outweighed by its benefit.
For a column that is VARCHAR(2500), you probably want to use a FULLTEXT index or a prefix index:
CREATE INDEX i ON SomeTable(longVarchar(100));
Note that a conventional index can't help if you're searching for words that may be in the middle of that long varchar. For that, use a fulltext index.
I won't repeat some of the good advice in other answers, but will add:
Compound Indices
You can create compound indices - an index that includes multiple columns. MySQL can use these from left to right. So if you have:
Table A
Id
Name
Category
Age
Description
if you have a compound index that includes Name/Category/Age in that order, these WHERE clauses would use the index:
WHERE Name='Eric' and Category='A'
WHERE Name='Eric' and Category='A' and Age > 18
but
WHERE Category='A' and Age > 18
would not use that index because everything has to be used from left to right.
Explain
Use Explain / Explain Extended to understand what indices are available to MySQL and which one it actually selects. MySQL will only use ONE key per query.
EXPLAIN EXTENDED SELECT * from Table WHERE Something='ABC'
Slow Query Log
Turn on the slow query log to see which queries are running slow.
Wide Columns
If you have a wide column where MOST of the distinction happens in the first several characters, you can use only the first N characters in your index. Example: We have a ReferenceNumber column defined as varchar(255) but 97% of the cases, the reference number is 10 characters or less. I changed the index to only look at the first 10 characters and improved performance quite a bit.
If a table has six columns and all of them are searchable, should i index all of them or none of them
Are you searching on a field by field basis or are some searches using multiple fields?
Which fields are most being searched on?
What are the field types? (Index works better on INTs than on VARCHARs for example)
Have you tried using EXPLAIN on the queries that are being run?
What are the negetive performance impacts of indexing
UPDATEs and INSERTs will be slower. There's also the extra storage space requirments, but that's usual unimportant these days.
If i have a VARCHAR 2500 column which is searchable from parts of my site, should i index it
No, unless it's UNIQUE (which means it's already indexed) or you only search for exact matches on that field (not using LIKE or mySQL's fulltext search).
Generally I put an index on any fields that i will be searching or selecting using a WHERE clause
I'd normally index the fields that are the most queried, and then INTs/BOOLEANs/ENUMs rather that fields that are VARCHARS. Don't forget, often you need to create an index on combined fields, rather than an index on an individual field. Use EXPLAIN, and check the slow log.
Load Data Efficiently: Indexes speed up retrievals but slow down inserts and deletes, as well as updates of values in indexed columns. That is, indexes slow down most operations that involve writing. This occurs because writing a row requires writing not only the data row, it requires changes to any indexes as well. The more indexes a table has, the more changes need to be made, and the greater the average performance degradation. Most tables receive many reads and few writes, but for a table with a high percentage of writes, the cost of index updating might be significant.
Avoid Indexes: If you don’t need a particular index to help queries perform better, don’t create it.
Disk Space: An index takes up disk space, and multiple indexes take up correspondingly more space. This might cause you to reach a table size limit more quickly than if there are no indexes. Avoid indexes wherever possible.
Takeaway: Don't over index
In general, indices help speedup database search, having the disadvantage of using extra disk space and slowing INSERT / UPDATE / DELETE queries. Use EXPLAIN and read the results to find out when MySQL uses your indices.
If a table has six columns and all of them are searchable, should i index all of them or none of them?
Indexing all six columns isn't always the best practice.
(a) Are you going to use any of those columns when searching for specific information?
(b) What is the selectivity of those columns (how many distinct values are there stored, in comparison to the total amount of records on the table)?
MySQL uses a cost-based optimizer, which tries to find the "cheapest" path when performing a query. And fields with low selectivity aren't good candidates.
What are the negetive performance impacts of indexing?
Already answered: extra disk space, lower performance during insert - update - delete.
If i have a VARCHAR 2500 column which is searchable from parts of my site, should i index it?
Try the FULLTEXT Index.
1/2) Indexes speed up certain select operations but they slow down other operations like insert, update and deletes. It can be a fine balance.
3) use a full text index or perhaps sphinx

Indexing mysql table for selects

I'm looking to add some mysql indexes to a database table. Most of the queries are for selects. Would it be best to create a separate index for each column, or add an index for province, province and number, and name. Does it even make sense to index province since there are only about a dozen options?
select * from employees where province = 'ab' and number = 'v45g';
select * from employees where province = 'ab';
If the usage changed to more inserts should I remove all the indexes except for the number?
An index is a data structure that maps the values of a column into a fast searchable tree. This tree contains the index of rows which the DB can use to find rows fast. One thing to know, some DB engines read plus or minus a bunch of rows to take advantage of disk read ahead. So you may actually read 50 or 100 rows per index read, and not just one. Hence, if you access 30% of a table through an index, you may wind up reading all table data multiple times.
Rule of thumb:
- index the more unique values, a tree with 2 branches and half of your table on either side is not too useful for narrowing down a search
- use as few index as possible
- use real world examples numbers as much as possible. Performance can change dynamically based on data or the whim of the DB engine, so it's very important to try and track how fast your queries are running consistently (ie: log this in case a query ever gets slow). But from this data you can add indexes without being blind
Okay, so there are multiple kinds of index, single and multiple column. You want multiple indexes when it makes sense for indexes to access each other, multiple columns typically when you are refining with a where clause. Think of the first as good when you want joins, or you have "or" conditions. The second is better when you have and conditions and successively filter rows.
In your case name does not make sense since like does not use index. city and number do make sense, probably as a multi-column index. Province could help as well as the last index.
So an index with these columns would likely help:
(number,city,province)
Or try as well just:
(number,city)
You should index fields that are searched upon and have high selectivity / cardinality. Indexes make writes slower.
Other thing is that indexes can be added and dropped at any time so maybe you should let this for a later review of the database and optimization of querys.
That being said one index that you can be sure to add is in the column that holds the name of a person. That's almost always used in searching.
According to MySQL documentation found here:
You can create multiple column indexes and the first column mentioned in the index declaration uses index when searched alone but not the others.
Documentation also says that if you create a hash of the columns and save in another column and index the hashed column the search could be faster then multiple indexes.
SELECT * FROM tbl_name
WHERE hash_col=MD5(CONCAT(val1,val2))
AND col1=val1 AND col2=val2;
You could use an unique index on province.

Is it needed to INDEX second column for use in WHERE clause?

Consider fetching data with
SELECT * FROM table WHERE column1='XX' && column2='XX'
Mysql will filter the results matching with the first part of WHERE clause, then with the second part. Am I right?
Imagine the first part match 10 records, and adding the second part filters 5 records. Is it needed to INDEX the second column too?
You are talking about short circuit evaluation. A DBMS has cost-based optimizer. There is no guarantee wich of both conditions will get evaluated first.
To apply that to your question: Yes, it might be benificial to index your second column.
Is it used regulary in searches?
What does the execution plan tell you?
Is the access pattern going to change in the near future?
How many records does the table contain?
Would a Covering Index would be a better choice?
...
Indexes are optional in MySQL, but they can increase performance.
Currently, MySQL can only use one index per table select, so with the given query, if you have an index on both column1 and column2, MySQL will try to determine the index that will be the most beneficial, and use only one.
The general solution, if the speed of the select is of utmost importance, is to create a multi-column index that includes both columns.
This way, even though MySQL could only use one index for the table, it would use the multi-column index that has both columns indexed, allowing MySQL to quickly filter on both criteria in the WHERE clause.
In the multi-column index, you would put the column with the highest cardinality (the highest number of distinct values) first.
For even further optimization, "covering" indexes can be applied in some cases.
Note that indexes can increase performance, but with some cost. Indexes increase memory and storage requirements. Also, when updating or inserting records into a table, the corresponding indexes require maintenance. All of these factors must be considered when implementing indexes.
Update: MySQL 5.0 can now use an index on more than one column by merging the results from each index, with a few caveats.
The following query is a good candidate for Index Merge Optimization:
SELECT * FROM t1 WHERE key1=1 AND key2=1
When processing such query RDBMS will use only one index. Having separate indices on both colums will allow it to choose one that will be faster.
Whether it's needed depends on your specific situation.
Is the query slow as it is now?
Would it be faster with index on another column?
Would it be faster with one index containing both
columns?
You may need to try and measure several approaches.
You don't have to INDEX the second column but it may speed up your SELECT

mysql - good pratice: table with more then one index?

Sorry all,
I do have this mini table with 3 columns but, I will list the values either by one or other column.
1) Is it ok to have two of those columns with indexes?
2) Should we have only one index per table?
3) At the limit, if we have a table with 100 columns for example, and we have 50 of them with indexes, is this ok?
Thanks,
MEM
It's fine to have many indexes, even multiple indexes on one table, as long as you are using them.
Run EXPLAIN on your queries and check which indexes are being used. If there are indexes that are not being used by any queries then they aren't giving you any benefit and are just slowing down modifications to the table. These unused indexes should be removed.
You may also want to consider multiple-column indexes if you have not already done so.
To quickly answer your questions, I think:
Yes, it's OK to have two or even more columns with indexes
Not necessarily. You can have more than one index per table.
It might be OK, it might be not OK. It depends. The thing with indexes is that they take space (in DISK) and they make operations that modify your data (INSERT/UPDATE/DELETE) slower since for each and everyone of them, there will be a TABLE and INDEX update involved.
Your index creation should be driven by your queries. See what queries are you going to have on your system and create indexes accordingly.
There is no problem with having more than one index per table, and no, one index per table isn't really a guideline.
Having too many indexes is inefficient because
It requires additional storage
MySQL needs to determine which index to use for a particular query
Edit : As per Pablo and Mark, you need to understand how your data is being accessed in order for you to build effective indices. You can then refine and reduce the indexes.

MySQL indexes - what are the best practices?

I've been using indexes on my MySQL databases for a while now but never properly learnt about them. Generally I put an index on any fields that I will be searching or selecting using a WHERE clause but sometimes it doesn't seem so black and white.
What are the best practices for MySQL indexes?
Example situations/dilemmas:
If a table has six columns and all of them are searchable, should I index all of them or none of them?
What are the negative performance impacts of indexing?
If I have a VARCHAR 2500 column which is searchable from parts of my site, should I index it?
You should definitely spend some time reading up on indexing, there's a lot written about it, and it's important to understand what's going on.
Broadly speaking, an index imposes an ordering on the rows of a table.
For simplicity's sake, imagine a table is just a big CSV file. Whenever a row is inserted, it's inserted at the end. So the "natural" ordering of the table is just the order in which rows were inserted.
Imagine you've got that CSV file loaded up in a very rudimentary spreadsheet application. All this spreadsheet does is display the data, and numbers the rows in sequential order.
Now imagine that you need to find all the rows that have some value "M" in the third column. Given what you have available, you have only one option. You scan the table checking the value of the third column for each row. If you've got a lot of rows, this method (a "table scan") can take a long time!
Now imagine that in addition to this table, you've got an index. This particular index is the index of values in the third column. The index lists all of the values from the third column, in some meaningful order (say, alphabetically) and for each of them, provides a list of row numbers where that value appears.
Now you have a good strategy for finding all the rows where the value of the third column is "M". For instance, you can perform a binary search! Whereas the table scan requires you to look N rows (where N is the number of rows), the binary search only requires that you look at log-n index entries, in the very worst case. Wow, that's sure a lot easier!
Of course, if you have this index, and you're adding rows to the table (at the end, since that's how our conceptual table works), you need to update the index each and every time. So you do a little more work while you're writing new rows, but you save a ton of time when you're searching for something.
So, in general, indexing creates a tradeoff between read efficiency and write efficiency. With no indexes, inserts can be very fast -- the database engine just adds a row to the table. As you add indexes, the engine must update each index while performing the insert.
On the other hand, reads become a lot faster.
Hopefully that covers your first two questions (as others have answered -- you need to find the right balance).
Your third scenario is a little more complicated. If you're using LIKE, indexing engines will typically help with your read speed up to the first "%". In other words, if you're SELECTing WHERE column LIKE 'foo%bar%', the database will use the index to find all the rows where column starts with "foo", and then need to scan that intermediate rowset to find the subset that contains "bar". SELECT ... WHERE column LIKE '%bar%' can't use the index. I hope you can see why.
Finally, you need to start thinking about indexes on more than one column. The concept is the same, and behaves similarly to the LIKE stuff -- essentially, if you have an index on (a,b,c), the engine will continue using the index from left to right as best it can. So a search on column a might use the (a,b,c) index, as would one on (a,b). However, the engine would need to do a full table scan if you were searching WHERE b=5 AND c=1)
Hopefully this helps shed a little light, but I must reiterate that you're best off spending a few hours digging around for good articles that explain these things in depth. It's also a good idea to read your particular database server's documentation. The way indices are implemented and used by query planners can vary pretty widely.
Check out presentations like More Mastering the Art of Indexing.
Update 12/2012: I have posted a new presentation of mine: How to Design Indexes, Really. I presented this in October 2012 at ZendCon in Santa Clara, and in December 2012 at Percona Live London.
Designing the best indexes is a process that has to match the queries you run in your app.
It's hard to recommend any general-purpose rules about which columns are best to index, or whether you should index all columns, no columns, which indexes should span multiple columns, etc. It depends on the queries you need to run.
Yes, there is some overhead so you shouldn't create indexes needlessly. But you should create the indexes that give benefit to the queries you need to run quickly. The overhead of an index is usually far outweighed by its benefit.
For a column that is VARCHAR(2500), you probably want to use a FULLTEXT index or a prefix index:
CREATE INDEX i ON SomeTable(longVarchar(100));
Note that a conventional index can't help if you're searching for words that may be in the middle of that long varchar. For that, use a fulltext index.
I won't repeat some of the good advice in other answers, but will add:
Compound Indices
You can create compound indices - an index that includes multiple columns. MySQL can use these from left to right. So if you have:
Table A
Id
Name
Category
Age
Description
if you have a compound index that includes Name/Category/Age in that order, these WHERE clauses would use the index:
WHERE Name='Eric' and Category='A'
WHERE Name='Eric' and Category='A' and Age > 18
but
WHERE Category='A' and Age > 18
would not use that index because everything has to be used from left to right.
Explain
Use Explain / Explain Extended to understand what indices are available to MySQL and which one it actually selects. MySQL will only use ONE key per query.
EXPLAIN EXTENDED SELECT * from Table WHERE Something='ABC'
Slow Query Log
Turn on the slow query log to see which queries are running slow.
Wide Columns
If you have a wide column where MOST of the distinction happens in the first several characters, you can use only the first N characters in your index. Example: We have a ReferenceNumber column defined as varchar(255) but 97% of the cases, the reference number is 10 characters or less. I changed the index to only look at the first 10 characters and improved performance quite a bit.
If a table has six columns and all of them are searchable, should i index all of them or none of them
Are you searching on a field by field basis or are some searches using multiple fields?
Which fields are most being searched on?
What are the field types? (Index works better on INTs than on VARCHARs for example)
Have you tried using EXPLAIN on the queries that are being run?
What are the negetive performance impacts of indexing
UPDATEs and INSERTs will be slower. There's also the extra storage space requirments, but that's usual unimportant these days.
If i have a VARCHAR 2500 column which is searchable from parts of my site, should i index it
No, unless it's UNIQUE (which means it's already indexed) or you only search for exact matches on that field (not using LIKE or mySQL's fulltext search).
Generally I put an index on any fields that i will be searching or selecting using a WHERE clause
I'd normally index the fields that are the most queried, and then INTs/BOOLEANs/ENUMs rather that fields that are VARCHARS. Don't forget, often you need to create an index on combined fields, rather than an index on an individual field. Use EXPLAIN, and check the slow log.
Load Data Efficiently: Indexes speed up retrievals but slow down inserts and deletes, as well as updates of values in indexed columns. That is, indexes slow down most operations that involve writing. This occurs because writing a row requires writing not only the data row, it requires changes to any indexes as well. The more indexes a table has, the more changes need to be made, and the greater the average performance degradation. Most tables receive many reads and few writes, but for a table with a high percentage of writes, the cost of index updating might be significant.
Avoid Indexes: If you don’t need a particular index to help queries perform better, don’t create it.
Disk Space: An index takes up disk space, and multiple indexes take up correspondingly more space. This might cause you to reach a table size limit more quickly than if there are no indexes. Avoid indexes wherever possible.
Takeaway: Don't over index
In general, indices help speedup database search, having the disadvantage of using extra disk space and slowing INSERT / UPDATE / DELETE queries. Use EXPLAIN and read the results to find out when MySQL uses your indices.
If a table has six columns and all of them are searchable, should i index all of them or none of them?
Indexing all six columns isn't always the best practice.
(a) Are you going to use any of those columns when searching for specific information?
(b) What is the selectivity of those columns (how many distinct values are there stored, in comparison to the total amount of records on the table)?
MySQL uses a cost-based optimizer, which tries to find the "cheapest" path when performing a query. And fields with low selectivity aren't good candidates.
What are the negetive performance impacts of indexing?
Already answered: extra disk space, lower performance during insert - update - delete.
If i have a VARCHAR 2500 column which is searchable from parts of my site, should i index it?
Try the FULLTEXT Index.
1/2) Indexes speed up certain select operations but they slow down other operations like insert, update and deletes. It can be a fine balance.
3) use a full text index or perhaps sphinx