MySQL how are indexes used in this example? - mysql

This is probably in the MySQL documentation, but I have not been able to find it. So I know that if I'm selecting a record from a database, the fastest results are when the fields I'm selecting and the fields in the WHERE clause are parts of an index. Say that I have a statement like this:
SELECT a FROM t1 WHERE b=X AND c=Y
What key or combination of keys would give me the fastest result?
Option 1: one key that's (a, b, c).
Option 2: one key that's (b, c) because those are in the where statement.
Option 3: one key that's (b, c, a) because b and c are in the where statement, and a is the value that ultimately needs to be looked up. (Seems logical to me, but I have no idea if this makes any MySQL sense...)
Options 4: two keys, one that's (b, c) and one that is just (a).
Sorry, I'm a really MySQL newbie...

In your case a composite index on (b,c) should do the job. You do not need an index on a since it is not in your WHERE clause. Its presence in the SELECT list doesn't affect how the rest of the query has to be indexed.
You could also use (b,c,a) in that order since MySQL will use column combinations in composite indexes starting from left to right. That isn't necessary for this use case but could future-proof your code if you ever did need to query all three columns Indexing (a,b,c) would not work in this query for that reason.
WHERE b='X' AND c='Y' AND z='Z'
From the MySQL docs on index usage
If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to find rows. For example, if you have a three-column index on (col1, col2, col3), you have indexed search capabilities on (col1), (col1, col2), and (col1, col2, col3).
As always, when in doubt, check the query's execution plan after creating your index to verify that it can be used as intended.
EXPLAIN SELECT a FROM t1 WHERE b='X' AND c='Y'

Related

SQLite & MySQL Compound Index vs Single index

I have a table with two fields: a,b
Both fields are indexed separately -- no compound index.
While trying to run a select query with both fields:
select * from table where a=<sth> and b=<sth>
It took over 400ms. while
select * from table where a=<sth>
took only 30ms;
Do I need set a compound index for (a,b)?
Reasonably, if I have indexes on both a and b, it should be fast for queries of a AND b like above right?
For this query:
select *
from table
where a = <sth> and b = <sth>;
The best index is on table(a, b). This can also be used for your second query as well.
Usually (but not always).
In your case the number of different values in a (and b) and the number of columns you use in your select can change the way db decide to use index / table.
For example,
if in table you have,say, 100.000 records and 80.000 of them have the same value for a, when you query for:
SELECT * FROM table WHERE a=<your value>
db engine could decide to "scan" directly the table without using the index, while if you query
SELECT a, b FROM table WHERE a=<your value>
and in index you added column b too (in index directly or with INCLUDE) it's quite probable that db engine will use the index.
Try to give a look on internet for index tips and give a look too to How can I index these queries?
The SQLite documentation explains how index lookups work.
Once the database has used an index to look up some rows, the other index is no longer efficient to use (there is no easy method to filter the results of the first lookup because the other index refers to rows in the original table, not to entries in the first index). See Multiple AND-Connected WHERE-Clause Terms.
To make index lookups on two columns as fast as possible, you need Multi-Column Indices.

Are the MySQL query performance benefits of indices retained if a subset of the index columns are used in a query?

Do I get to keep the performance and efficiency advantages of having an index setup for multiple columns on a MySQL table if I run a SELECT statement that queries some subset of those columns in the index?
So, if I have an index setup on columns A, B and C but my statement only queries for columns A and B, is that the same as having no index setup at all. Do I need to have another index setup exclusively for A and B to gain any performance benefits with queries?
Short answer to a generic question: It's depends
Long answer:
The DB build the explain plan based on the statistics of the table. basically the DB engine estimates how much it "effort" it takes for every operation the two main factors in this case are the indexed data size and distribution of the indexed data.
Data distribution
If the first two columns data granularity is low (a few possible value for example values column A stands for gender column B stands for age) then there is a good chance that the optimizer will prefer to read the entire table rather then using the index. ** At this case adding an index only on A,B won't be useful either**
** Indexed data size **
Another factor is the size of data in column C. the size of data in column C effects directly on the index size. since reading the index tree also requires IO the bigger the index the so is the cost.
lets assume that the data in column C is comment and the average comment size is 500 chars. the data may have lot's of possible values but the index is going to be a very large index. This may also cause the DB to prefer reading the entire table rather then using the index. ** At this case adding an index on A,B is useful **
See this answer: https://stackoverflow.com/a/20939127/2520738
Basically:
If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to look up rows. For example, if you have a three-column index on (col1, col2, col3), you have indexed search capabilities on (col1), (col1, col2), and (col1, col2, col3).
So basically, yes, if your index reads A, B, C from left to right, you can search on A, A and B, A and B and C. If you don't have single column indexes on B or C then no index will be used when they are searched individually.

Efficiency of multicolumn indexes in MySQL

If I have a MyISAM table with a 3-column index, something like
create table t (
a int,
b int,
c int,
index abc (a, b, c)
) engine=MyISAM;
the question is, can the following query fully utilize the index:
select * from t where a=1 and c=2;
in other words, considering that an index is a b-tree, can MySQL skip the column in the middle and still do a quick search on first and last columns?
EXPLAIN does seem to be showing that the index will be used, however, the Extra says: Using where; Using index and I have no idea what this really means.
The answer is "no".
The MySQL documentation is quite clear on how indexes are used:
If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to find rows. For example, if you have a three-column index on (col1, col2, col3), you have indexed search capabilities on (col1), (col1, col2), and (col1, col2, col3). (http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html.)
What happens is that the index gets used for "a=1". All records that match are loaded, to see if "c=2" is true. The filter ends up using a combination of indexes and explicit record filtering.
By the way, if you want to handle all combinations of two columns, you need several indexes:
(a, b, c)
(b, a, c)
(c, b, a)
Even if you are using an index for all parts of a WHERE clause, you
may see Using where if the column can be NULL.
As per MySQL documentation, the above statement clarifies that the column in your table could be null and hence it says using where as well though it has covering index for fields in where clause.
http://dev.mysql.com/doc/refman/5.1/en/explain-output.html#explain-extra-information

When is one index better than two in MYSQL

I have read about when you are making a multicolumn index that the order matters and that typically you want the columns which will appear in the WHERE clauses first before others that would be in ORDER BY etc. However, won't you also get a speed up if you just index each one separately? (apparently not as my own experiments show me that the combined index behavior can be much faster than simply having each one separately indexed). When should you use a multicolumn index and what types of queries does it give a boost to?
A multi-column index will be the most effective for situations where all criteria are part of the multi-column index - even more so than having multiple single indexes.
Also, keep in mind, if you have a multi-column index, but don't utilize the first column indexed by the multi-column index (or don't otherwise start from the beginning and stay adjacent to previously-used indexes), the multi-column index won't have any benefit. (For example, if I have an index over columns [B, C, D], but have a WHERE that only uses [C, D], this multi-column index will have no benefit.)
Reference: http://dev.mysql.com/doc/refman/5.0/en/multiple-column-indexes.html :
MySQL can use multiple-column indexes for queries that test all the
columns in the index, or queries that test just the first column, the
first two columns, the first three columns, and so on. If you specify
the columns in the right order in the index definition, a single
composite index can speed up several kinds of queries on the same
table.
The short answer is, "it depends."
The long answer is this: if you sometimes do
WHERE COL1 = value1 and COL2 = value2
and sometimes do
WHERE COL1 = value1
but never, or almost never, do
WHERE COL2 = value2
then a compound index on (COL1, COL2) will be your best choice. (True for mySQL as well as other makes and models of DBMS.)
More indexes slow down INSERT and UPDATE operations, so they aren't free.
A compound index (c1,c2) is very useful in the following cases:
The obvious case where c1=v1 AND c2=v2
If you do WHERE c1=v1 AND c2 BETWEEN v2 and v3 - this does a "range scan" on the composite index
SELECT ... FROM t WHERE c1=v1 ORDER BY c2 - in this case it can use the index to sort the results.
In some cases, as a "covering index", e.g. "SELECT c1,c2 FROM t WHERE c1=4" can ignore the table contents, and just fetch a (range) from the index.

Do mysql composite indexes make some other indexes completely redundant?

If I have an a composite index on (a, b) I understand that queries only concerned with 'a' will still use the composite index (but not queries concerned with 'b')
My question is whether there is any valid reason to have a single-column index on 'a' if I have the (a, b) index? What I've read has seemed vague as to whether the (a,b) index was a complete substitute for a, or merely a "better than nothing" index.
This assumes that I do filtering by both a and a,b. I have a table with way too many indexes that is hurting write performance and want to double check before dropping indexes that I'm only fairly sure are not doing any good.
Also, does this answer change depending on whether I am using InnoDb or MyISAM? The table concerned is MyISAM, but most of our tables are InnoDb.
Your (a,b) index will also handle queries involving only 'a' and there is no need for an index on (a) alone.
From the documentation:
If the table has a multiple-column
index, any leftmost prefix of the
index can be used by the optimizer to
find rows.
For example, if you have a
three-column index on (col1, col2,
col3), you have indexed search
capabilities on (col1), (col1, col2),
and (col1, col2, col3).
My question is whether there is any valid reason to have a single-column index on 'a' if I have the (a, b) index?
No, there is no reason to have an index on (a) if you have one on (a,b)