Do mysql composite indexes make some other indexes completely redundant? - mysql

If I have an a composite index on (a, b) I understand that queries only concerned with 'a' will still use the composite index (but not queries concerned with 'b')
My question is whether there is any valid reason to have a single-column index on 'a' if I have the (a, b) index? What I've read has seemed vague as to whether the (a,b) index was a complete substitute for a, or merely a "better than nothing" index.
This assumes that I do filtering by both a and a,b. I have a table with way too many indexes that is hurting write performance and want to double check before dropping indexes that I'm only fairly sure are not doing any good.
Also, does this answer change depending on whether I am using InnoDb or MyISAM? The table concerned is MyISAM, but most of our tables are InnoDb.

Your (a,b) index will also handle queries involving only 'a' and there is no need for an index on (a) alone.
From the documentation:
If the table has a multiple-column
index, any leftmost prefix of the
index can be used by the optimizer to
find rows.
For example, if you have a
three-column index on (col1, col2,
col3), you have indexed search
capabilities on (col1), (col1, col2),
and (col1, col2, col3).

My question is whether there is any valid reason to have a single-column index on 'a' if I have the (a, b) index?
No, there is no reason to have an index on (a) if you have one on (a,b)

Related

Is a single-column index needed when having multicolumn index?

I've been dropped in a system that was poorly designed. Now I'm doing DBA on their DB, and I have a lot of situation like the following (Pseudo-code):
Table t1{
c1;
c2;
c3;
c4;
key(c1);
key(c2);
key(c1,c2);
key(c1,c2,c3);}
Are the single column indexes really necessary, since I already have a multicolumn one containing those columns?
Or on the other hand - is the multiline column needed since I already have the single column ones?
A short excerpt from the documentation page about how MySQL uses indexes:
If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to look up rows. For example, if you have a three-column index on (col1, col2, col3), you have indexed search capabilities on (col1), (col1, col2), and (col1, col2, col3). For more information, see Section 8.3.5, “Multiple-Column Indexes”.
You better remove the indexes on (c1) and (c1,c2). They are not used but they use storage space and consume processor power to be kept up-to-date when the table data changes.
The single column index on c1 is redundant. The two column index is redundant with the three column index as well.
The only two indexes you need are on (c2) and (c1, c2, c3).
MySQL has pretty good documentation on composite indexes.

MySQL how are indexes used in this example?

This is probably in the MySQL documentation, but I have not been able to find it. So I know that if I'm selecting a record from a database, the fastest results are when the fields I'm selecting and the fields in the WHERE clause are parts of an index. Say that I have a statement like this:
SELECT a FROM t1 WHERE b=X AND c=Y
What key or combination of keys would give me the fastest result?
Option 1: one key that's (a, b, c).
Option 2: one key that's (b, c) because those are in the where statement.
Option 3: one key that's (b, c, a) because b and c are in the where statement, and a is the value that ultimately needs to be looked up. (Seems logical to me, but I have no idea if this makes any MySQL sense...)
Options 4: two keys, one that's (b, c) and one that is just (a).
Sorry, I'm a really MySQL newbie...
In your case a composite index on (b,c) should do the job. You do not need an index on a since it is not in your WHERE clause. Its presence in the SELECT list doesn't affect how the rest of the query has to be indexed.
You could also use (b,c,a) in that order since MySQL will use column combinations in composite indexes starting from left to right. That isn't necessary for this use case but could future-proof your code if you ever did need to query all three columns Indexing (a,b,c) would not work in this query for that reason.
WHERE b='X' AND c='Y' AND z='Z'
From the MySQL docs on index usage
If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to find rows. For example, if you have a three-column index on (col1, col2, col3), you have indexed search capabilities on (col1), (col1, col2), and (col1, col2, col3).
As always, when in doubt, check the query's execution plan after creating your index to verify that it can be used as intended.
EXPLAIN SELECT a FROM t1 WHERE b='X' AND c='Y'

Efficiency of multicolumn indexes in MySQL

If I have a MyISAM table with a 3-column index, something like
create table t (
a int,
b int,
c int,
index abc (a, b, c)
) engine=MyISAM;
the question is, can the following query fully utilize the index:
select * from t where a=1 and c=2;
in other words, considering that an index is a b-tree, can MySQL skip the column in the middle and still do a quick search on first and last columns?
EXPLAIN does seem to be showing that the index will be used, however, the Extra says: Using where; Using index and I have no idea what this really means.
The answer is "no".
The MySQL documentation is quite clear on how indexes are used:
If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to find rows. For example, if you have a three-column index on (col1, col2, col3), you have indexed search capabilities on (col1), (col1, col2), and (col1, col2, col3). (http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html.)
What happens is that the index gets used for "a=1". All records that match are loaded, to see if "c=2" is true. The filter ends up using a combination of indexes and explicit record filtering.
By the way, if you want to handle all combinations of two columns, you need several indexes:
(a, b, c)
(b, a, c)
(c, b, a)
Even if you are using an index for all parts of a WHERE clause, you
may see Using where if the column can be NULL.
As per MySQL documentation, the above statement clarifies that the column in your table could be null and hence it says using where as well though it has covering index for fields in where clause.
http://dev.mysql.com/doc/refman/5.1/en/explain-output.html#explain-extra-information

When is one index better than two in MYSQL

I have read about when you are making a multicolumn index that the order matters and that typically you want the columns which will appear in the WHERE clauses first before others that would be in ORDER BY etc. However, won't you also get a speed up if you just index each one separately? (apparently not as my own experiments show me that the combined index behavior can be much faster than simply having each one separately indexed). When should you use a multicolumn index and what types of queries does it give a boost to?
A multi-column index will be the most effective for situations where all criteria are part of the multi-column index - even more so than having multiple single indexes.
Also, keep in mind, if you have a multi-column index, but don't utilize the first column indexed by the multi-column index (or don't otherwise start from the beginning and stay adjacent to previously-used indexes), the multi-column index won't have any benefit. (For example, if I have an index over columns [B, C, D], but have a WHERE that only uses [C, D], this multi-column index will have no benefit.)
Reference: http://dev.mysql.com/doc/refman/5.0/en/multiple-column-indexes.html :
MySQL can use multiple-column indexes for queries that test all the
columns in the index, or queries that test just the first column, the
first two columns, the first three columns, and so on. If you specify
the columns in the right order in the index definition, a single
composite index can speed up several kinds of queries on the same
table.
The short answer is, "it depends."
The long answer is this: if you sometimes do
WHERE COL1 = value1 and COL2 = value2
and sometimes do
WHERE COL1 = value1
but never, or almost never, do
WHERE COL2 = value2
then a compound index on (COL1, COL2) will be your best choice. (True for mySQL as well as other makes and models of DBMS.)
More indexes slow down INSERT and UPDATE operations, so they aren't free.
A compound index (c1,c2) is very useful in the following cases:
The obvious case where c1=v1 AND c2=v2
If you do WHERE c1=v1 AND c2 BETWEEN v2 and v3 - this does a "range scan" on the composite index
SELECT ... FROM t WHERE c1=v1 ORDER BY c2 - in this case it can use the index to sort the results.
In some cases, as a "covering index", e.g. "SELECT c1,c2 FROM t WHERE c1=4" can ignore the table contents, and just fetch a (range) from the index.

Mysql covering vs composite vs column index

In the following query
SELECT col1,col2
FROM table1
WHERE col3='value1'
AND col4='value2'
If I have 2 separate indexes one on col3 and the other on col4, Which one of them will be used in this query ?
I read somewhere that for each table in the query only one index is used. Does that mean that there is no way for the query to use both indexes ?
Secondly, If I created a composite index using both col3 and col4 together but used only col3 in the WHERE clause will that be worse for the performance?
example:
SELECT col1,col2
FROM table1
WHERE col3='value1'
Lastly, Is it better to just use Covering indexes in all cases ? and does it differ between MYISAM and innodb storage engines ?
A covering index is not the same as a composite index.
If I have 2 separate indexes one on col3 and the other on col4, Which one of them will be used in this query ?
The index with the highest cardinality.
MySQL keeps statistics on which index has what properties.
The index that has the most discriminating power (as evident in MySQL's statistics) will be used.
I read somewhere that for each table in the query only one index is used. Does that mean that there is no way for the query to used both indexes ?
You can use a subselect.
Or even better use a compound index that includes both col3 and col4.
Secondly, If I created a composite index using both col3 and col4 together but used only col3 in the WHERE clause will that be worse for the performance? example:
Compound index
The correct term is compound index, not composite.
Only the left-most part of the compound index will be used.
So if the index is defined as
index myindex (col3, col4) <<-- will work with your example.
index myindex (col4, col3) <<-- will not work.
See: http://dev.mysql.com/doc/refman/5.0/en/multiple-column-indexes.html
Note that if you select a left-most field, you can get away with not using that part of the index in your where clause.
Imagine we have a compound index
Myindex(col1,col2)
SELECT col1 FROM table1 WHERE col2 = 200 <<-- will use index, but not efficiently
SELECT * FROM table1 where col2 = 200 <<-- will NOT use index.
The reason this works is that the first query uses the covering index and does a scan on that.
The second query needs to access the table and for that reason scanning though the index does not make sense.
This only works in InnoDB.
What's a covering index
A covering index refers to the case when all fields selected in a query are covered by an index, in that case InnoDB (not MyISAM) will never read the data in the table, but only use the data in the index, significantly speeding up the select.
Note that in InnoDB the primary key is included in all secondary indexes, so in a way all secondary indexes are compound indexes.
This means that if you run the following query on InnoDB:
SELECT indexed_field FROM table1 WHERE pk = something
MySQL will always use a covering index and will not access the actual table. Although it could use a covering index, it will prefer the PRIMARY KEY because it only needs to hit a single row.
I upvoted Johan's answer for completeness, but I think the following statement he makes regarding secondary indexes is incorrect and/or confusing;
Note that in InnoDB the primary key is included in all secondary indexes,
so in a way all secondary indexes are compound indexes.
This means that if you run the following query on InnoDB:
SELECT indexed_field FROM table1 WHERE pk = something
MySQL will always use a covering index and will not access the actual table.
While I agree the primary key is INCLUDED in the secondary index, I do not agree MySQL "will always use a covering index" in the SELECT query specified here.
To see why, note that a full index "scan" is always required in this case. This is not the same as a "seek" operation, but is instead a 100% scan of the secondary index contents. This is due to the fact the secondary index is not ordered by the primary key; it is ordered by "indexed_field" (otherwise it would not be much use as an index!).
In light of this latter fact, there will be cases where it is more efficient to "seek" the primary key, and then extract indexed_field "from the actual table," not from the secondary index.
This is a question I hear a lot and there is a lot of confusion around the issues due to:
The differences in mySQL over the years.
Indexes and multiple index support changed over the years (towards being supported)
the InnoDB / myISAM differences
There are some key differences (below) but I do not believe multiple indexes are one of them
MyISAM is older but proven. Data in MyISAM tables is split between three different files for:- table format, data, and indexes.
InnoDB is relatively newer than MyISAM and is transaction safe. InnoDB also provides row-locking as opposed to table-locking which increases multi-user concurrency and performance. InnoDB also has foreign-key constraints.
Because of its row-locking feature InnoDB is well suited to high load environments.
To be sure about things, make sure to use explain_plan to analyze the query execution.
Compound index is not the same as a composite index.
Composite index covers all the columns in your filter, join and select criteria. All of these columns are stored on all of the index pages accordingly throughout the index B-tree.
Compound index covers all the filter and join key columns in the B-tree, but keeps the select columns only on the leaf pages as they will not be searched, rather only extracted!
This saves space and consequently creates less index pages, hence faster I/O.