Efficiency of multicolumn indexes in MySQL - mysql

If I have a MyISAM table with a 3-column index, something like
create table t (
a int,
b int,
c int,
index abc (a, b, c)
) engine=MyISAM;
the question is, can the following query fully utilize the index:
select * from t where a=1 and c=2;
in other words, considering that an index is a b-tree, can MySQL skip the column in the middle and still do a quick search on first and last columns?
EXPLAIN does seem to be showing that the index will be used, however, the Extra says: Using where; Using index and I have no idea what this really means.

The answer is "no".
The MySQL documentation is quite clear on how indexes are used:
If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to find rows. For example, if you have a three-column index on (col1, col2, col3), you have indexed search capabilities on (col1), (col1, col2), and (col1, col2, col3). (http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html.)
What happens is that the index gets used for "a=1". All records that match are loaded, to see if "c=2" is true. The filter ends up using a combination of indexes and explicit record filtering.
By the way, if you want to handle all combinations of two columns, you need several indexes:
(a, b, c)
(b, a, c)
(c, b, a)

Even if you are using an index for all parts of a WHERE clause, you
may see Using where if the column can be NULL.
As per MySQL documentation, the above statement clarifies that the column in your table could be null and hence it says using where as well though it has covering index for fields in where clause.
http://dev.mysql.com/doc/refman/5.1/en/explain-output.html#explain-extra-information

Related

Is a single-column index needed when having multicolumn index?

I've been dropped in a system that was poorly designed. Now I'm doing DBA on their DB, and I have a lot of situation like the following (Pseudo-code):
Table t1{
c1;
c2;
c3;
c4;
key(c1);
key(c2);
key(c1,c2);
key(c1,c2,c3);}
Are the single column indexes really necessary, since I already have a multicolumn one containing those columns?
Or on the other hand - is the multiline column needed since I already have the single column ones?
A short excerpt from the documentation page about how MySQL uses indexes:
If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to look up rows. For example, if you have a three-column index on (col1, col2, col3), you have indexed search capabilities on (col1), (col1, col2), and (col1, col2, col3). For more information, see Section 8.3.5, “Multiple-Column Indexes”.
You better remove the indexes on (c1) and (c1,c2). They are not used but they use storage space and consume processor power to be kept up-to-date when the table data changes.
The single column index on c1 is redundant. The two column index is redundant with the three column index as well.
The only two indexes you need are on (c2) and (c1, c2, c3).
MySQL has pretty good documentation on composite indexes.

Unique first column in multi-column index

I have multi-column index for 2 columns. Can I make first column unique without making separate index for that?
If I understand correctly mysql can use only first column in this index for lookups, so can it use it to detect uniqueness?
The short answer is "No". Because it doesn't make much sense.
Indeed, MySQL is able to use a multiple-column index for operations that use only the leftmost "n" columns from the index definition.
Let's say you have an index on columns (col1, col2). MySQL can use it to find records matching conditions on both col1 and col2, GROUP BY col1, col2 or ORDER BY col1, col2. It is important to notice that col1 and col2 needs to used in this order in the GROUP BY or ORDER BY clause. Their order doesn't matter on WHERE or ON clauses as long as both are used.
MySQL can also use the same index for WHERE or ON conditions and GROUP BY or ORDER BY clauses that contain only col1. It cannot, however, use the index if col2 appears without col1.
What happens when you have an index on columns (col1, col2) and all the rows have distinct values in column col1?
Let's assume we have a table that have distinct values in column col1 and it has an index on columns (col1, col2). When MySQL needs to find the rows that match WHERE col1 = val1 AND col2 = val2, by consulting the index it can find the row that have col1 = val1. It doesn't need to use the index to refine the list of candidate rows because there is no list: there is at most one row having col1 = val1.
Sure, most of the times MySQL will use the index to check if col2 = val2 but having col2 in this index doesn't bring more useful information to the index. The storage space it takes and the processing power it uses on table data updates are too big for the tiny contribution it adds to rows searching.
The whole purpose of having indexes on multiple columns is to help searching by shrinking the list of matching rows for a given set of values when the columns included in a multiple-column index cannot be used individually because they don't contain enough distinct values.
Technically speaking, there is no way to tell MySQL you want to have a multiple-column index on (col1, col2) that must have unique values on col1. Create an UNIQUE INDEX on col1 instead. Then think about the data you have in the table and the queries you run against it and decide if another index on col2 only isn't better than the multiple-column index on (col1, col2).
In order to decide you can create the new indexes (UNIQUE on col1, INDEX on col2), put EXPLAIN in front of the most frequent queries you run on the table and check what index will pick MySQL up for use.
You need to have enough data (thousands of rows, at least, more is better) in the table to get accurate results.
You asked.
I have multi-column index for 2 columns. Can I make first column unique without making separate index for that?
The answer is no. You need a separate unique index on the first column to enforce a uniqueness constraint.

Are the MySQL query performance benefits of indices retained if a subset of the index columns are used in a query?

Do I get to keep the performance and efficiency advantages of having an index setup for multiple columns on a MySQL table if I run a SELECT statement that queries some subset of those columns in the index?
So, if I have an index setup on columns A, B and C but my statement only queries for columns A and B, is that the same as having no index setup at all. Do I need to have another index setup exclusively for A and B to gain any performance benefits with queries?
Short answer to a generic question: It's depends
Long answer:
The DB build the explain plan based on the statistics of the table. basically the DB engine estimates how much it "effort" it takes for every operation the two main factors in this case are the indexed data size and distribution of the indexed data.
Data distribution
If the first two columns data granularity is low (a few possible value for example values column A stands for gender column B stands for age) then there is a good chance that the optimizer will prefer to read the entire table rather then using the index. ** At this case adding an index only on A,B won't be useful either**
** Indexed data size **
Another factor is the size of data in column C. the size of data in column C effects directly on the index size. since reading the index tree also requires IO the bigger the index the so is the cost.
lets assume that the data in column C is comment and the average comment size is 500 chars. the data may have lot's of possible values but the index is going to be a very large index. This may also cause the DB to prefer reading the entire table rather then using the index. ** At this case adding an index on A,B is useful **
See this answer: https://stackoverflow.com/a/20939127/2520738
Basically:
If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to look up rows. For example, if you have a three-column index on (col1, col2, col3), you have indexed search capabilities on (col1), (col1, col2), and (col1, col2, col3).
So basically, yes, if your index reads A, B, C from left to right, you can search on A, A and B, A and B and C. If you don't have single column indexes on B or C then no index will be used when they are searched individually.

MySQL how are indexes used in this example?

This is probably in the MySQL documentation, but I have not been able to find it. So I know that if I'm selecting a record from a database, the fastest results are when the fields I'm selecting and the fields in the WHERE clause are parts of an index. Say that I have a statement like this:
SELECT a FROM t1 WHERE b=X AND c=Y
What key or combination of keys would give me the fastest result?
Option 1: one key that's (a, b, c).
Option 2: one key that's (b, c) because those are in the where statement.
Option 3: one key that's (b, c, a) because b and c are in the where statement, and a is the value that ultimately needs to be looked up. (Seems logical to me, but I have no idea if this makes any MySQL sense...)
Options 4: two keys, one that's (b, c) and one that is just (a).
Sorry, I'm a really MySQL newbie...
In your case a composite index on (b,c) should do the job. You do not need an index on a since it is not in your WHERE clause. Its presence in the SELECT list doesn't affect how the rest of the query has to be indexed.
You could also use (b,c,a) in that order since MySQL will use column combinations in composite indexes starting from left to right. That isn't necessary for this use case but could future-proof your code if you ever did need to query all three columns Indexing (a,b,c) would not work in this query for that reason.
WHERE b='X' AND c='Y' AND z='Z'
From the MySQL docs on index usage
If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to find rows. For example, if you have a three-column index on (col1, col2, col3), you have indexed search capabilities on (col1), (col1, col2), and (col1, col2, col3).
As always, when in doubt, check the query's execution plan after creating your index to verify that it can be used as intended.
EXPLAIN SELECT a FROM t1 WHERE b='X' AND c='Y'

Do mysql composite indexes make some other indexes completely redundant?

If I have an a composite index on (a, b) I understand that queries only concerned with 'a' will still use the composite index (but not queries concerned with 'b')
My question is whether there is any valid reason to have a single-column index on 'a' if I have the (a, b) index? What I've read has seemed vague as to whether the (a,b) index was a complete substitute for a, or merely a "better than nothing" index.
This assumes that I do filtering by both a and a,b. I have a table with way too many indexes that is hurting write performance and want to double check before dropping indexes that I'm only fairly sure are not doing any good.
Also, does this answer change depending on whether I am using InnoDb or MyISAM? The table concerned is MyISAM, but most of our tables are InnoDb.
Your (a,b) index will also handle queries involving only 'a' and there is no need for an index on (a) alone.
From the documentation:
If the table has a multiple-column
index, any leftmost prefix of the
index can be used by the optimizer to
find rows.
For example, if you have a
three-column index on (col1, col2,
col3), you have indexed search
capabilities on (col1), (col1, col2),
and (col1, col2, col3).
My question is whether there is any valid reason to have a single-column index on 'a' if I have the (a, b) index?
No, there is no reason to have an index on (a) if you have one on (a,b)