Is a single-column index needed when having multicolumn index? - mysql

I've been dropped in a system that was poorly designed. Now I'm doing DBA on their DB, and I have a lot of situation like the following (Pseudo-code):
Table t1{
c1;
c2;
c3;
c4;
key(c1);
key(c2);
key(c1,c2);
key(c1,c2,c3);}
Are the single column indexes really necessary, since I already have a multicolumn one containing those columns?
Or on the other hand - is the multiline column needed since I already have the single column ones?

A short excerpt from the documentation page about how MySQL uses indexes:
If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to look up rows. For example, if you have a three-column index on (col1, col2, col3), you have indexed search capabilities on (col1), (col1, col2), and (col1, col2, col3). For more information, see Section 8.3.5, “Multiple-Column Indexes”.
You better remove the indexes on (c1) and (c1,c2). They are not used but they use storage space and consume processor power to be kept up-to-date when the table data changes.

The single column index on c1 is redundant. The two column index is redundant with the three column index as well.
The only two indexes you need are on (c2) and (c1, c2, c3).
MySQL has pretty good documentation on composite indexes.

Related

Unique first column in multi-column index

I have multi-column index for 2 columns. Can I make first column unique without making separate index for that?
If I understand correctly mysql can use only first column in this index for lookups, so can it use it to detect uniqueness?
The short answer is "No". Because it doesn't make much sense.
Indeed, MySQL is able to use a multiple-column index for operations that use only the leftmost "n" columns from the index definition.
Let's say you have an index on columns (col1, col2). MySQL can use it to find records matching conditions on both col1 and col2, GROUP BY col1, col2 or ORDER BY col1, col2. It is important to notice that col1 and col2 needs to used in this order in the GROUP BY or ORDER BY clause. Their order doesn't matter on WHERE or ON clauses as long as both are used.
MySQL can also use the same index for WHERE or ON conditions and GROUP BY or ORDER BY clauses that contain only col1. It cannot, however, use the index if col2 appears without col1.
What happens when you have an index on columns (col1, col2) and all the rows have distinct values in column col1?
Let's assume we have a table that have distinct values in column col1 and it has an index on columns (col1, col2). When MySQL needs to find the rows that match WHERE col1 = val1 AND col2 = val2, by consulting the index it can find the row that have col1 = val1. It doesn't need to use the index to refine the list of candidate rows because there is no list: there is at most one row having col1 = val1.
Sure, most of the times MySQL will use the index to check if col2 = val2 but having col2 in this index doesn't bring more useful information to the index. The storage space it takes and the processing power it uses on table data updates are too big for the tiny contribution it adds to rows searching.
The whole purpose of having indexes on multiple columns is to help searching by shrinking the list of matching rows for a given set of values when the columns included in a multiple-column index cannot be used individually because they don't contain enough distinct values.
Technically speaking, there is no way to tell MySQL you want to have a multiple-column index on (col1, col2) that must have unique values on col1. Create an UNIQUE INDEX on col1 instead. Then think about the data you have in the table and the queries you run against it and decide if another index on col2 only isn't better than the multiple-column index on (col1, col2).
In order to decide you can create the new indexes (UNIQUE on col1, INDEX on col2), put EXPLAIN in front of the most frequent queries you run on the table and check what index will pick MySQL up for use.
You need to have enough data (thousands of rows, at least, more is better) in the table to get accurate results.
You asked.
I have multi-column index for 2 columns. Can I make first column unique without making separate index for that?
The answer is no. You need a separate unique index on the first column to enforce a uniqueness constraint.

MySQL how are indexes used in this example?

This is probably in the MySQL documentation, but I have not been able to find it. So I know that if I'm selecting a record from a database, the fastest results are when the fields I'm selecting and the fields in the WHERE clause are parts of an index. Say that I have a statement like this:
SELECT a FROM t1 WHERE b=X AND c=Y
What key or combination of keys would give me the fastest result?
Option 1: one key that's (a, b, c).
Option 2: one key that's (b, c) because those are in the where statement.
Option 3: one key that's (b, c, a) because b and c are in the where statement, and a is the value that ultimately needs to be looked up. (Seems logical to me, but I have no idea if this makes any MySQL sense...)
Options 4: two keys, one that's (b, c) and one that is just (a).
Sorry, I'm a really MySQL newbie...
In your case a composite index on (b,c) should do the job. You do not need an index on a since it is not in your WHERE clause. Its presence in the SELECT list doesn't affect how the rest of the query has to be indexed.
You could also use (b,c,a) in that order since MySQL will use column combinations in composite indexes starting from left to right. That isn't necessary for this use case but could future-proof your code if you ever did need to query all three columns Indexing (a,b,c) would not work in this query for that reason.
WHERE b='X' AND c='Y' AND z='Z'
From the MySQL docs on index usage
If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to find rows. For example, if you have a three-column index on (col1, col2, col3), you have indexed search capabilities on (col1), (col1, col2), and (col1, col2, col3).
As always, when in doubt, check the query's execution plan after creating your index to verify that it can be used as intended.
EXPLAIN SELECT a FROM t1 WHERE b='X' AND c='Y'

Efficiency of multicolumn indexes in MySQL

If I have a MyISAM table with a 3-column index, something like
create table t (
a int,
b int,
c int,
index abc (a, b, c)
) engine=MyISAM;
the question is, can the following query fully utilize the index:
select * from t where a=1 and c=2;
in other words, considering that an index is a b-tree, can MySQL skip the column in the middle and still do a quick search on first and last columns?
EXPLAIN does seem to be showing that the index will be used, however, the Extra says: Using where; Using index and I have no idea what this really means.
The answer is "no".
The MySQL documentation is quite clear on how indexes are used:
If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to find rows. For example, if you have a three-column index on (col1, col2, col3), you have indexed search capabilities on (col1), (col1, col2), and (col1, col2, col3). (http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html.)
What happens is that the index gets used for "a=1". All records that match are loaded, to see if "c=2" is true. The filter ends up using a combination of indexes and explicit record filtering.
By the way, if you want to handle all combinations of two columns, you need several indexes:
(a, b, c)
(b, a, c)
(c, b, a)
Even if you are using an index for all parts of a WHERE clause, you
may see Using where if the column can be NULL.
As per MySQL documentation, the above statement clarifies that the column in your table could be null and hence it says using where as well though it has covering index for fields in where clause.
http://dev.mysql.com/doc/refman/5.1/en/explain-output.html#explain-extra-information

Mysql covering vs composite vs column index

In the following query
SELECT col1,col2
FROM table1
WHERE col3='value1'
AND col4='value2'
If I have 2 separate indexes one on col3 and the other on col4, Which one of them will be used in this query ?
I read somewhere that for each table in the query only one index is used. Does that mean that there is no way for the query to use both indexes ?
Secondly, If I created a composite index using both col3 and col4 together but used only col3 in the WHERE clause will that be worse for the performance?
example:
SELECT col1,col2
FROM table1
WHERE col3='value1'
Lastly, Is it better to just use Covering indexes in all cases ? and does it differ between MYISAM and innodb storage engines ?
A covering index is not the same as a composite index.
If I have 2 separate indexes one on col3 and the other on col4, Which one of them will be used in this query ?
The index with the highest cardinality.
MySQL keeps statistics on which index has what properties.
The index that has the most discriminating power (as evident in MySQL's statistics) will be used.
I read somewhere that for each table in the query only one index is used. Does that mean that there is no way for the query to used both indexes ?
You can use a subselect.
Or even better use a compound index that includes both col3 and col4.
Secondly, If I created a composite index using both col3 and col4 together but used only col3 in the WHERE clause will that be worse for the performance? example:
Compound index
The correct term is compound index, not composite.
Only the left-most part of the compound index will be used.
So if the index is defined as
index myindex (col3, col4) <<-- will work with your example.
index myindex (col4, col3) <<-- will not work.
See: http://dev.mysql.com/doc/refman/5.0/en/multiple-column-indexes.html
Note that if you select a left-most field, you can get away with not using that part of the index in your where clause.
Imagine we have a compound index
Myindex(col1,col2)
SELECT col1 FROM table1 WHERE col2 = 200 <<-- will use index, but not efficiently
SELECT * FROM table1 where col2 = 200 <<-- will NOT use index.
The reason this works is that the first query uses the covering index and does a scan on that.
The second query needs to access the table and for that reason scanning though the index does not make sense.
This only works in InnoDB.
What's a covering index
A covering index refers to the case when all fields selected in a query are covered by an index, in that case InnoDB (not MyISAM) will never read the data in the table, but only use the data in the index, significantly speeding up the select.
Note that in InnoDB the primary key is included in all secondary indexes, so in a way all secondary indexes are compound indexes.
This means that if you run the following query on InnoDB:
SELECT indexed_field FROM table1 WHERE pk = something
MySQL will always use a covering index and will not access the actual table. Although it could use a covering index, it will prefer the PRIMARY KEY because it only needs to hit a single row.
I upvoted Johan's answer for completeness, but I think the following statement he makes regarding secondary indexes is incorrect and/or confusing;
Note that in InnoDB the primary key is included in all secondary indexes,
so in a way all secondary indexes are compound indexes.
This means that if you run the following query on InnoDB:
SELECT indexed_field FROM table1 WHERE pk = something
MySQL will always use a covering index and will not access the actual table.
While I agree the primary key is INCLUDED in the secondary index, I do not agree MySQL "will always use a covering index" in the SELECT query specified here.
To see why, note that a full index "scan" is always required in this case. This is not the same as a "seek" operation, but is instead a 100% scan of the secondary index contents. This is due to the fact the secondary index is not ordered by the primary key; it is ordered by "indexed_field" (otherwise it would not be much use as an index!).
In light of this latter fact, there will be cases where it is more efficient to "seek" the primary key, and then extract indexed_field "from the actual table," not from the secondary index.
This is a question I hear a lot and there is a lot of confusion around the issues due to:
The differences in mySQL over the years.
Indexes and multiple index support changed over the years (towards being supported)
the InnoDB / myISAM differences
There are some key differences (below) but I do not believe multiple indexes are one of them
MyISAM is older but proven. Data in MyISAM tables is split between three different files for:- table format, data, and indexes.
InnoDB is relatively newer than MyISAM and is transaction safe. InnoDB also provides row-locking as opposed to table-locking which increases multi-user concurrency and performance. InnoDB also has foreign-key constraints.
Because of its row-locking feature InnoDB is well suited to high load environments.
To be sure about things, make sure to use explain_plan to analyze the query execution.
Compound index is not the same as a composite index.
Composite index covers all the columns in your filter, join and select criteria. All of these columns are stored on all of the index pages accordingly throughout the index B-tree.
Compound index covers all the filter and join key columns in the B-tree, but keeps the select columns only on the leaf pages as they will not be searched, rather only extracted!
This saves space and consequently creates less index pages, hence faster I/O.

Do mysql composite indexes make some other indexes completely redundant?

If I have an a composite index on (a, b) I understand that queries only concerned with 'a' will still use the composite index (but not queries concerned with 'b')
My question is whether there is any valid reason to have a single-column index on 'a' if I have the (a, b) index? What I've read has seemed vague as to whether the (a,b) index was a complete substitute for a, or merely a "better than nothing" index.
This assumes that I do filtering by both a and a,b. I have a table with way too many indexes that is hurting write performance and want to double check before dropping indexes that I'm only fairly sure are not doing any good.
Also, does this answer change depending on whether I am using InnoDb or MyISAM? The table concerned is MyISAM, but most of our tables are InnoDb.
Your (a,b) index will also handle queries involving only 'a' and there is no need for an index on (a) alone.
From the documentation:
If the table has a multiple-column
index, any leftmost prefix of the
index can be used by the optimizer to
find rows.
For example, if you have a
three-column index on (col1, col2,
col3), you have indexed search
capabilities on (col1), (col1, col2),
and (col1, col2, col3).
My question is whether there is any valid reason to have a single-column index on 'a' if I have the (a, b) index?
No, there is no reason to have an index on (a) if you have one on (a,b)