I have read about when you are making a multicolumn index that the order matters and that typically you want the columns which will appear in the WHERE clauses first before others that would be in ORDER BY etc. However, won't you also get a speed up if you just index each one separately? (apparently not as my own experiments show me that the combined index behavior can be much faster than simply having each one separately indexed). When should you use a multicolumn index and what types of queries does it give a boost to?
A multi-column index will be the most effective for situations where all criteria are part of the multi-column index - even more so than having multiple single indexes.
Also, keep in mind, if you have a multi-column index, but don't utilize the first column indexed by the multi-column index (or don't otherwise start from the beginning and stay adjacent to previously-used indexes), the multi-column index won't have any benefit. (For example, if I have an index over columns [B, C, D], but have a WHERE that only uses [C, D], this multi-column index will have no benefit.)
Reference: http://dev.mysql.com/doc/refman/5.0/en/multiple-column-indexes.html :
MySQL can use multiple-column indexes for queries that test all the
columns in the index, or queries that test just the first column, the
first two columns, the first three columns, and so on. If you specify
the columns in the right order in the index definition, a single
composite index can speed up several kinds of queries on the same
table.
The short answer is, "it depends."
The long answer is this: if you sometimes do
WHERE COL1 = value1 and COL2 = value2
and sometimes do
WHERE COL1 = value1
but never, or almost never, do
WHERE COL2 = value2
then a compound index on (COL1, COL2) will be your best choice. (True for mySQL as well as other makes and models of DBMS.)
More indexes slow down INSERT and UPDATE operations, so they aren't free.
A compound index (c1,c2) is very useful in the following cases:
The obvious case where c1=v1 AND c2=v2
If you do WHERE c1=v1 AND c2 BETWEEN v2 and v3 - this does a "range scan" on the composite index
SELECT ... FROM t WHERE c1=v1 ORDER BY c2 - in this case it can use the index to sort the results.
In some cases, as a "covering index", e.g. "SELECT c1,c2 FROM t WHERE c1=4" can ignore the table contents, and just fetch a (range) from the index.
Related
I have a table with two partitions. Partitions are pactive = 1 and pinactive = 0. I understand that two partitions does not make so much of a gain, but I have used it to truncate and load in one partition and plain inserts in another partition.
The problem comes when I create indexes.
Query goes this way
select partitionflag,companyid,activityname
from customformattributes
where companyid=47
and activityname = 'Activity 1'
and partitionflag=0
Created index -
create index idx_try on customformattributes(partitionflag,companyid,activityname,completiondate,attributename,isclosed)
there are around 200000 records that will be retreived from the above query. But the query along with the mentioned index takes 30+ seconds. What is the reason for such a long time? Also, if remove the partitionflag from the mentioned index, the index is not even used.
And is the understanding that,
Even with the partitions available, the optimizer needs to have the required partition mentioned in the index definition, so that it only hits the required partition ---- Correct?
Any ideas on understanding this would be very helpful
You can optimize your index by reordering the columns in it. Usually the columns in the index are ordered by its cardinality (starting from the highest and go down to the lowest). Cardinality is the uniqueness of data in the given column. So in your case I suppose there are many variations of companyid in customformattributes table while partitionflag will have cardinality of 2 (if all the options for this column are 1 and 0).
Your query will first filter all the rows with partitionflag=0, then it will filter by company id and so on.
When you remove partitionflag from the index the query did not used the index because may be the optimizer decides that it will be faster to make full table scan instead of using the index (in most of the cases the optimizer is right)
For the given query:
select partitionflag,companyid,activityname
from customformattributes
where companyid=47
and activityname = 'Activity 1'
and partitionflag=0
the following index may be would be better (but of course :
create index idx_try on customformattributes(companyid,activityname, completiondate,attributename, partitionflag, isclosed)
For the query to use index the following rule must be met - the left most column in the index should be present in the where clause ... and depending on the mysql version you are using additional query requirements may be needed. For example if you are using old version of mysql - you may need to order the columns in the where clause in the same order they are listed in the index. In the last versions of mysql the query optimizer is responsible for ordering the columns in the where clause in the correct order.
Your SELECT query took 30+ seconds because it returns 200k rows and because the index might not be the optimal for the given query.
For the second question about the partitioning: the common rule is that the column you are partitioning by must be part of all the UNIQUE keys in a table (Primary key is also unique key by definition so the column should be added to the PK also). If table structure and logic allows you to add the partitioning column to all the UNIQUE indexes in the table then you add it and partition the table.
When the partitioning is made correctly you can take the advantage of partitioning pruning - this is when SELECT query searches the data only in the partitions where given data is stored (otherwise it looks in all partitions)
You can read more about partitioning here:
https://dev.mysql.com/doc/refman/5.6/en/partitioning-overview.html
The query is slow simply because disks are slow.
Cardinality is not important when designing an index.
The optimal index for that query is
INDEX(companyid, activityname, partitionflag) -- in any order
It is "covering" since it includes all the columns mentioned anywhere in the SELECT. This is indicated by "Using index" in the EXPLAIN.
Leaving off the other 3 columns makes the query faster because it will have to read less off the disk.
If you make any changes to the query (add columns, change from '=' to '>', add ORDER BY, etc), then the index may no longer be optimal.
"Also, if remove the partitionflag from the mentioned index, the index is not even used." -- That is because it was no longer "covering".
Keep in mind that there are two ways an index may be used -- "covering" versus being a way to look up the data. When you don't have a "covering" index, the optimizer chooses between using the index and bouncing between the index and the data versus simply ignoring the index and scanning the table.
Do I get to keep the performance and efficiency advantages of having an index setup for multiple columns on a MySQL table if I run a SELECT statement that queries some subset of those columns in the index?
So, if I have an index setup on columns A, B and C but my statement only queries for columns A and B, is that the same as having no index setup at all. Do I need to have another index setup exclusively for A and B to gain any performance benefits with queries?
Short answer to a generic question: It's depends
Long answer:
The DB build the explain plan based on the statistics of the table. basically the DB engine estimates how much it "effort" it takes for every operation the two main factors in this case are the indexed data size and distribution of the indexed data.
Data distribution
If the first two columns data granularity is low (a few possible value for example values column A stands for gender column B stands for age) then there is a good chance that the optimizer will prefer to read the entire table rather then using the index. ** At this case adding an index only on A,B won't be useful either**
** Indexed data size **
Another factor is the size of data in column C. the size of data in column C effects directly on the index size. since reading the index tree also requires IO the bigger the index the so is the cost.
lets assume that the data in column C is comment and the average comment size is 500 chars. the data may have lot's of possible values but the index is going to be a very large index. This may also cause the DB to prefer reading the entire table rather then using the index. ** At this case adding an index on A,B is useful **
See this answer: https://stackoverflow.com/a/20939127/2520738
Basically:
If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to look up rows. For example, if you have a three-column index on (col1, col2, col3), you have indexed search capabilities on (col1), (col1, col2), and (col1, col2, col3).
So basically, yes, if your index reads A, B, C from left to right, you can search on A, A and B, A and B and C. If you don't have single column indexes on B or C then no index will be used when they are searched individually.
This is probably in the MySQL documentation, but I have not been able to find it. So I know that if I'm selecting a record from a database, the fastest results are when the fields I'm selecting and the fields in the WHERE clause are parts of an index. Say that I have a statement like this:
SELECT a FROM t1 WHERE b=X AND c=Y
What key or combination of keys would give me the fastest result?
Option 1: one key that's (a, b, c).
Option 2: one key that's (b, c) because those are in the where statement.
Option 3: one key that's (b, c, a) because b and c are in the where statement, and a is the value that ultimately needs to be looked up. (Seems logical to me, but I have no idea if this makes any MySQL sense...)
Options 4: two keys, one that's (b, c) and one that is just (a).
Sorry, I'm a really MySQL newbie...
In your case a composite index on (b,c) should do the job. You do not need an index on a since it is not in your WHERE clause. Its presence in the SELECT list doesn't affect how the rest of the query has to be indexed.
You could also use (b,c,a) in that order since MySQL will use column combinations in composite indexes starting from left to right. That isn't necessary for this use case but could future-proof your code if you ever did need to query all three columns Indexing (a,b,c) would not work in this query for that reason.
WHERE b='X' AND c='Y' AND z='Z'
From the MySQL docs on index usage
If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to find rows. For example, if you have a three-column index on (col1, col2, col3), you have indexed search capabilities on (col1), (col1, col2), and (col1, col2, col3).
As always, when in doubt, check the query's execution plan after creating your index to verify that it can be used as intended.
EXPLAIN SELECT a FROM t1 WHERE b='X' AND c='Y'
I have an index structured as so:
BTREE merchant_id
BTREE flag
BTREE test (merchant_id, flag)
I do a SELECT query as such :
SELECT badge_id, merchant_id, badge_title, badge_class
FROM badges WHERE merchant_id = 1 AND flag = 1
what index would be better? Does it matter if they are in a seperate index?
To answer questions such as "Which column would be better to index?", and "Is the query planner using a certain index to execute the query?", you can use the EXPLAIN statement. See the excellent article Analyzing Queries for Speed with EXPLAIN for a comprehensive overview of the use of EXPLAIN in optimizing queries and schema.
In general, where a query can be optimized by indexing one of several columns, a helpful rule of thumb is to index the column that is "most unique" or "most selective" over all records; that is, index the column that has the most number of distinct values over all rows. I am guessing that in your case, the merchant_id column contains the most number of unique values, so it should probably be indexed. You can verify that an index choice is optimal using EXPLAIN on the query for all variations.
Note that the rule of thumb "index the most selective column" does not necessarily apply to the choice of the first column of a composite (also called compound or multi-column) index. It depends on your queries. If, for example, employee_id is the most selective column, but you need to execute queries like SELECT * FROM badges WHERE flag = 17, then having as the only index on table badges the composite index (employee_id, flag) would mean that the query results in a full table scan.
Out of 3 indices you don't really need a separate merchant_id index, since merchant_id look-ups can use your "test" index.
More details:
http://dev.mysql.com/doc/refman/5.0/en/multiple-column-indexes.html
In the following query
SELECT col1,col2
FROM table1
WHERE col3='value1'
AND col4='value2'
If I have 2 separate indexes one on col3 and the other on col4, Which one of them will be used in this query ?
I read somewhere that for each table in the query only one index is used. Does that mean that there is no way for the query to use both indexes ?
Secondly, If I created a composite index using both col3 and col4 together but used only col3 in the WHERE clause will that be worse for the performance?
example:
SELECT col1,col2
FROM table1
WHERE col3='value1'
Lastly, Is it better to just use Covering indexes in all cases ? and does it differ between MYISAM and innodb storage engines ?
A covering index is not the same as a composite index.
If I have 2 separate indexes one on col3 and the other on col4, Which one of them will be used in this query ?
The index with the highest cardinality.
MySQL keeps statistics on which index has what properties.
The index that has the most discriminating power (as evident in MySQL's statistics) will be used.
I read somewhere that for each table in the query only one index is used. Does that mean that there is no way for the query to used both indexes ?
You can use a subselect.
Or even better use a compound index that includes both col3 and col4.
Secondly, If I created a composite index using both col3 and col4 together but used only col3 in the WHERE clause will that be worse for the performance? example:
Compound index
The correct term is compound index, not composite.
Only the left-most part of the compound index will be used.
So if the index is defined as
index myindex (col3, col4) <<-- will work with your example.
index myindex (col4, col3) <<-- will not work.
See: http://dev.mysql.com/doc/refman/5.0/en/multiple-column-indexes.html
Note that if you select a left-most field, you can get away with not using that part of the index in your where clause.
Imagine we have a compound index
Myindex(col1,col2)
SELECT col1 FROM table1 WHERE col2 = 200 <<-- will use index, but not efficiently
SELECT * FROM table1 where col2 = 200 <<-- will NOT use index.
The reason this works is that the first query uses the covering index and does a scan on that.
The second query needs to access the table and for that reason scanning though the index does not make sense.
This only works in InnoDB.
What's a covering index
A covering index refers to the case when all fields selected in a query are covered by an index, in that case InnoDB (not MyISAM) will never read the data in the table, but only use the data in the index, significantly speeding up the select.
Note that in InnoDB the primary key is included in all secondary indexes, so in a way all secondary indexes are compound indexes.
This means that if you run the following query on InnoDB:
SELECT indexed_field FROM table1 WHERE pk = something
MySQL will always use a covering index and will not access the actual table. Although it could use a covering index, it will prefer the PRIMARY KEY because it only needs to hit a single row.
I upvoted Johan's answer for completeness, but I think the following statement he makes regarding secondary indexes is incorrect and/or confusing;
Note that in InnoDB the primary key is included in all secondary indexes,
so in a way all secondary indexes are compound indexes.
This means that if you run the following query on InnoDB:
SELECT indexed_field FROM table1 WHERE pk = something
MySQL will always use a covering index and will not access the actual table.
While I agree the primary key is INCLUDED in the secondary index, I do not agree MySQL "will always use a covering index" in the SELECT query specified here.
To see why, note that a full index "scan" is always required in this case. This is not the same as a "seek" operation, but is instead a 100% scan of the secondary index contents. This is due to the fact the secondary index is not ordered by the primary key; it is ordered by "indexed_field" (otherwise it would not be much use as an index!).
In light of this latter fact, there will be cases where it is more efficient to "seek" the primary key, and then extract indexed_field "from the actual table," not from the secondary index.
This is a question I hear a lot and there is a lot of confusion around the issues due to:
The differences in mySQL over the years.
Indexes and multiple index support changed over the years (towards being supported)
the InnoDB / myISAM differences
There are some key differences (below) but I do not believe multiple indexes are one of them
MyISAM is older but proven. Data in MyISAM tables is split between three different files for:- table format, data, and indexes.
InnoDB is relatively newer than MyISAM and is transaction safe. InnoDB also provides row-locking as opposed to table-locking which increases multi-user concurrency and performance. InnoDB also has foreign-key constraints.
Because of its row-locking feature InnoDB is well suited to high load environments.
To be sure about things, make sure to use explain_plan to analyze the query execution.
Compound index is not the same as a composite index.
Composite index covers all the columns in your filter, join and select criteria. All of these columns are stored on all of the index pages accordingly throughout the index B-tree.
Compound index covers all the filter and join key columns in the B-tree, but keeps the select columns only on the leaf pages as they will not be searched, rather only extracted!
This saves space and consequently creates less index pages, hence faster I/O.