Mysql covering vs composite vs column index - mysql

In the following query
SELECT col1,col2
FROM table1
WHERE col3='value1'
AND col4='value2'
If I have 2 separate indexes one on col3 and the other on col4, Which one of them will be used in this query ?
I read somewhere that for each table in the query only one index is used. Does that mean that there is no way for the query to use both indexes ?
Secondly, If I created a composite index using both col3 and col4 together but used only col3 in the WHERE clause will that be worse for the performance?
example:
SELECT col1,col2
FROM table1
WHERE col3='value1'
Lastly, Is it better to just use Covering indexes in all cases ? and does it differ between MYISAM and innodb storage engines ?

A covering index is not the same as a composite index.
If I have 2 separate indexes one on col3 and the other on col4, Which one of them will be used in this query ?
The index with the highest cardinality.
MySQL keeps statistics on which index has what properties.
The index that has the most discriminating power (as evident in MySQL's statistics) will be used.
I read somewhere that for each table in the query only one index is used. Does that mean that there is no way for the query to used both indexes ?
You can use a subselect.
Or even better use a compound index that includes both col3 and col4.
Secondly, If I created a composite index using both col3 and col4 together but used only col3 in the WHERE clause will that be worse for the performance? example:
Compound index
The correct term is compound index, not composite.
Only the left-most part of the compound index will be used.
So if the index is defined as
index myindex (col3, col4) <<-- will work with your example.
index myindex (col4, col3) <<-- will not work.
See: http://dev.mysql.com/doc/refman/5.0/en/multiple-column-indexes.html
Note that if you select a left-most field, you can get away with not using that part of the index in your where clause.
Imagine we have a compound index
Myindex(col1,col2)
SELECT col1 FROM table1 WHERE col2 = 200 <<-- will use index, but not efficiently
SELECT * FROM table1 where col2 = 200 <<-- will NOT use index.
The reason this works is that the first query uses the covering index and does a scan on that.
The second query needs to access the table and for that reason scanning though the index does not make sense.
This only works in InnoDB.
What's a covering index
A covering index refers to the case when all fields selected in a query are covered by an index, in that case InnoDB (not MyISAM) will never read the data in the table, but only use the data in the index, significantly speeding up the select.
Note that in InnoDB the primary key is included in all secondary indexes, so in a way all secondary indexes are compound indexes.
This means that if you run the following query on InnoDB:
SELECT indexed_field FROM table1 WHERE pk = something
MySQL will always use a covering index and will not access the actual table. Although it could use a covering index, it will prefer the PRIMARY KEY because it only needs to hit a single row.

I upvoted Johan's answer for completeness, but I think the following statement he makes regarding secondary indexes is incorrect and/or confusing;
Note that in InnoDB the primary key is included in all secondary indexes,
so in a way all secondary indexes are compound indexes.
This means that if you run the following query on InnoDB:
SELECT indexed_field FROM table1 WHERE pk = something
MySQL will always use a covering index and will not access the actual table.
While I agree the primary key is INCLUDED in the secondary index, I do not agree MySQL "will always use a covering index" in the SELECT query specified here.
To see why, note that a full index "scan" is always required in this case. This is not the same as a "seek" operation, but is instead a 100% scan of the secondary index contents. This is due to the fact the secondary index is not ordered by the primary key; it is ordered by "indexed_field" (otherwise it would not be much use as an index!).
In light of this latter fact, there will be cases where it is more efficient to "seek" the primary key, and then extract indexed_field "from the actual table," not from the secondary index.

This is a question I hear a lot and there is a lot of confusion around the issues due to:
The differences in mySQL over the years.
Indexes and multiple index support changed over the years (towards being supported)
the InnoDB / myISAM differences
There are some key differences (below) but I do not believe multiple indexes are one of them
MyISAM is older but proven. Data in MyISAM tables is split between three different files for:- table format, data, and indexes.
InnoDB is relatively newer than MyISAM and is transaction safe. InnoDB also provides row-locking as opposed to table-locking which increases multi-user concurrency and performance. InnoDB also has foreign-key constraints.
Because of its row-locking feature InnoDB is well suited to high load environments.
To be sure about things, make sure to use explain_plan to analyze the query execution.

Compound index is not the same as a composite index.
Composite index covers all the columns in your filter, join and select criteria. All of these columns are stored on all of the index pages accordingly throughout the index B-tree.
Compound index covers all the filter and join key columns in the B-tree, but keeps the select columns only on the leaf pages as they will not be searched, rather only extracted!
This saves space and consequently creates less index pages, hence faster I/O.

Related

My SQL indexing on int field doesn't work

I have a table with index on a int column.
Create table sample(
col1 varchar,
col2 int)
Create index idx1 on sample(col2);
When I explain the following query
Select * from sample where col2>2;
It does a full table scan.
Why doesn't the indexing work here?
How can i optimize such queries when table has around 20 million records?
Just because you create an index, does not mean MySQL will always use it. According to the docs, here are several reasons why it may choose to use a full table scan over the index:
The table is so small that it is faster to perform a table scan than to bother with a key lookup. This is common for tables with fewer than 10 rows and a short row length.
There are no usable restrictions in the ON or WHERE clause for indexed columns.
You are comparing indexed columns with constant values and MySQL has calculated (based on the index tree) that the constants cover too large a part of the table and that a table scan would be faster. See Section 8.2.1.1, “WHERE Clause Optimization”.
You are using a key with low cardinality (many rows match the key value) through another column. In this case, MySQL assumes that by using the key it probably will do many key lookups and that a table scan would be faster.
You can use FORCE INDEX to ensure your query uses the index instead of allowing the optimizer to determine the appropriate path, although usually MySQL will take the most efficient approach.
SELECT * FROM t1, t2 FORCE INDEX (index_for_column) WHERE t1.col_name=t2.col_name;
Reference: https://dev.mysql.com/doc/refman/8.0/en/table-scan-avoidance.html

MySQL covering index optimization? [duplicate]

I've just heard the term covered index in some database discussion - what does it mean?
A covering index is an index that contains all of, and possibly more, the columns you need for your query.
For instance, this:
SELECT *
FROM tablename
WHERE criteria
will typically use indexes to speed up the resolution of which rows to retrieve using criteria, but then it will go to the full table to retrieve the rows.
However, if the index contained the columns column1, column2 and column3, then this sql:
SELECT column1, column2
FROM tablename
WHERE criteria
and, provided that particular index could be used to speed up the resolution of which rows to retrieve, the index already contains the values of the columns you're interested in, so it won't have to go to the table to retrieve the rows, but can produce the results directly from the index.
This can also be used if you see that a typical query uses 1-2 columns to resolve which rows, and then typically adds another 1-2 columns, it could be beneficial to append those extra columns (if they're the same all over) to the index, so that the query processor can get everything from the index itself.
Here's an article: Index Covering Boosts SQL Server Query Performance on the subject.
Covering index is just an ordinary index. It's called "covering" if it can satisfy query without necessity to analyze data.
example:
CREATE TABLE MyTable
(
ID INT IDENTITY PRIMARY KEY,
Foo INT
)
CREATE NONCLUSTERED INDEX index1 ON MyTable(ID, Foo)
SELECT ID, Foo FROM MyTable -- All requested data are covered by index
This is one of the fastest methods to retrieve data from SQL server.
Covering indexes are indexes which "cover" all columns needed from a specific table, removing the need to access the physical table at all for a given query/ operation.
Since the index contains the desired columns (or a superset of them), table access can be replaced with an index lookup or scan -- which is generally much faster.
Columns to cover:
parameterized or static conditions; columns restricted by a parameterized or constant condition.
join columns; columns dynamically used for joining
selected columns; to answer selected values.
While covering indexes can often provide good benefit for retrieval, they do add somewhat to insert/ update overhead; due to the need to write extra or larger index rows on every update.
Covering indexes for Joined Queries
Covering indexes are probably most valuable as a performance technique for joined queries. This is because joined queries are more costly & more likely then single-table retrievals to suffer high cost performance problems.
in a joined query, covering indexes should be considered per-table.
each 'covering index' removes a physical table access from the plan & replaces it with index-only access.
investigate the plan costs & experiment with which tables are most worthwhile to replace by a covering index.
by this means, the multiplicative cost of large join plans can be significantly reduced.
For example:
select oi.title, c.name, c.address
from porderitem poi
join porder po on po.id = poi.fk_order
join customer c on c.id = po.fk_customer
where po.orderdate > ? and po.status = 'SHIPPING';
create index porder_custitem on porder (orderdate, id, status, fk_customer);
See:
http://literatejava.com/sql/covering-indexes-query-optimization/
Lets say you have a simple table with the below columns, you have only indexed Id here:
Id (Int), Telephone_Number (Int), Name (VARCHAR), Address (VARCHAR)
Imagine you have to run the below query and check whether its using index, and whether performing efficiently without I/O calls or not. Remember, you have only created an index on Id.
SELECT Id FROM mytable WHERE Telephone_Number = '55442233';
When you check for performance on this query you will be dissappointed, since Telephone_Number is not indexed this needs to fetch rows from table using I/O calls. So, this is not a covering indexed since there is some column in query which is not indexed, which leads to frequent I/O calls.
To make it a covered index you need to create a composite index on (Id, Telephone_Number).
For more details, please refer to this blog:
https://www.percona.com/blog/2006/11/23/covering-index-and-prefix-indexes/

Index on mysql partitioned tables

I have a table with two partitions. Partitions are pactive = 1 and pinactive = 0. I understand that two partitions does not make so much of a gain, but I have used it to truncate and load in one partition and plain inserts in another partition.
The problem comes when I create indexes.
Query goes this way
select partitionflag,companyid,activityname
from customformattributes
where companyid=47
and activityname = 'Activity 1'
and partitionflag=0
Created index -
create index idx_try on customformattributes(partitionflag,companyid,activityname,completiondate,attributename,isclosed)
there are around 200000 records that will be retreived from the above query. But the query along with the mentioned index takes 30+ seconds. What is the reason for such a long time? Also, if remove the partitionflag from the mentioned index, the index is not even used.
And is the understanding that,
Even with the partitions available, the optimizer needs to have the required partition mentioned in the index definition, so that it only hits the required partition ---- Correct?
Any ideas on understanding this would be very helpful
You can optimize your index by reordering the columns in it. Usually the columns in the index are ordered by its cardinality (starting from the highest and go down to the lowest). Cardinality is the uniqueness of data in the given column. So in your case I suppose there are many variations of companyid in customformattributes table while partitionflag will have cardinality of 2 (if all the options for this column are 1 and 0).
Your query will first filter all the rows with partitionflag=0, then it will filter by company id and so on.
When you remove partitionflag from the index the query did not used the index because may be the optimizer decides that it will be faster to make full table scan instead of using the index (in most of the cases the optimizer is right)
For the given query:
select partitionflag,companyid,activityname
from customformattributes
where companyid=47
and activityname = 'Activity 1'
and partitionflag=0
the following index may be would be better (but of course :
create index idx_try on customformattributes(companyid,activityname, completiondate,attributename, partitionflag, isclosed)
For the query to use index the following rule must be met - the left most column in the index should be present in the where clause ... and depending on the mysql version you are using additional query requirements may be needed. For example if you are using old version of mysql - you may need to order the columns in the where clause in the same order they are listed in the index. In the last versions of mysql the query optimizer is responsible for ordering the columns in the where clause in the correct order.
Your SELECT query took 30+ seconds because it returns 200k rows and because the index might not be the optimal for the given query.
For the second question about the partitioning: the common rule is that the column you are partitioning by must be part of all the UNIQUE keys in a table (Primary key is also unique key by definition so the column should be added to the PK also). If table structure and logic allows you to add the partitioning column to all the UNIQUE indexes in the table then you add it and partition the table.
When the partitioning is made correctly you can take the advantage of partitioning pruning - this is when SELECT query searches the data only in the partitions where given data is stored (otherwise it looks in all partitions)
You can read more about partitioning here:
https://dev.mysql.com/doc/refman/5.6/en/partitioning-overview.html
The query is slow simply because disks are slow.
Cardinality is not important when designing an index.
The optimal index for that query is
INDEX(companyid, activityname, partitionflag) -- in any order
It is "covering" since it includes all the columns mentioned anywhere in the SELECT. This is indicated by "Using index" in the EXPLAIN.
Leaving off the other 3 columns makes the query faster because it will have to read less off the disk.
If you make any changes to the query (add columns, change from '=' to '>', add ORDER BY, etc), then the index may no longer be optimal.
"Also, if remove the partitionflag from the mentioned index, the index is not even used." -- That is because it was no longer "covering".
Keep in mind that there are two ways an index may be used -- "covering" versus being a way to look up the data. When you don't have a "covering" index, the optimizer chooses between using the index and bouncing between the index and the data versus simply ignoring the index and scanning the table.

Overhead of Composite Indexes

I have many tables where I have indexes on foreign keys, and clustered indexes which include those foreign keys. For example, I have a table like the following:
TABLE: Item
------------------------
id PRIMARY KEY
owner FOREIGN KEY
status
... many more columns
MySQL generates indexes for primary and foreign keys, but sometimes, I want to improve query performance so I'll create clustered or covering indexes. This leads to have indexes with overlapping columns.
INDEXES ON: Item
------------------------
idx_owner (owner)
idx_owner_status (owner, status)
If I dropped idx_owner, future queries that would normally use idx_owner would just use idx_owner_status since it has owner as the first column in the index.
Is it worth keeping idx_owner around? Is there an additional I/O overhead to use idx_owner_status even though MySQL only uses part of the index?
Edit: I am really only interested in the way InnoDB behaves regarding indexes.
Short Answer
Drop the shorter index.
Long Anwser
Things to consider:
Drop it:
Each INDEX is a separate BTree that resides on disk, so it takes space.
Each INDEX is updated (sooner or later) when you INSERT a new row or an UPDATE modifies an indexed column. This takes some CPU and I/O and buffer_pool space for the 'change buffer'.
Any functional use (as opposed to performance) for the shorter index can be performed by the longer one.
Don't drop it:
The longer index is bulkier than the shorter one. So it is less cacheable. So (in extreme situations) using the bulkier one in place of the shorter one could cause more I/O. A case that aggravates this: INDEX(int, varchar255).
It is very rare that the last item really overrides the other items.
Bonus
A "covering" index is one that contains all the columns mentioned in a SELECT. For example:
SELECT status FROM tbl WHERE owner = 123;
This will touch only the BTree for INDEX(owner, status), thereby being noticeably faster than
SELECT status, foo FROM tbl WHERE owner = 123;
If you really need that query to be faster, then replace both of your indexes with INDEX(owner, status, foo).
PK in Secondary key
One more tidbit... In InnoDB, the columns of the PRIMARY KEY are implicitly appended to every secondary key. So, the three examples are really
INDEX(owner, id)
INDEX(owner, status, id)
INDEX(owner, status, foo, id)
More discussion in my blogs on composite indexes and index cookbook.

When is one index better than two in MYSQL

I have read about when you are making a multicolumn index that the order matters and that typically you want the columns which will appear in the WHERE clauses first before others that would be in ORDER BY etc. However, won't you also get a speed up if you just index each one separately? (apparently not as my own experiments show me that the combined index behavior can be much faster than simply having each one separately indexed). When should you use a multicolumn index and what types of queries does it give a boost to?
A multi-column index will be the most effective for situations where all criteria are part of the multi-column index - even more so than having multiple single indexes.
Also, keep in mind, if you have a multi-column index, but don't utilize the first column indexed by the multi-column index (or don't otherwise start from the beginning and stay adjacent to previously-used indexes), the multi-column index won't have any benefit. (For example, if I have an index over columns [B, C, D], but have a WHERE that only uses [C, D], this multi-column index will have no benefit.)
Reference: http://dev.mysql.com/doc/refman/5.0/en/multiple-column-indexes.html :
MySQL can use multiple-column indexes for queries that test all the
columns in the index, or queries that test just the first column, the
first two columns, the first three columns, and so on. If you specify
the columns in the right order in the index definition, a single
composite index can speed up several kinds of queries on the same
table.
The short answer is, "it depends."
The long answer is this: if you sometimes do
WHERE COL1 = value1 and COL2 = value2
and sometimes do
WHERE COL1 = value1
but never, or almost never, do
WHERE COL2 = value2
then a compound index on (COL1, COL2) will be your best choice. (True for mySQL as well as other makes and models of DBMS.)
More indexes slow down INSERT and UPDATE operations, so they aren't free.
A compound index (c1,c2) is very useful in the following cases:
The obvious case where c1=v1 AND c2=v2
If you do WHERE c1=v1 AND c2 BETWEEN v2 and v3 - this does a "range scan" on the composite index
SELECT ... FROM t WHERE c1=v1 ORDER BY c2 - in this case it can use the index to sort the results.
In some cases, as a "covering index", e.g. "SELECT c1,c2 FROM t WHERE c1=4" can ignore the table contents, and just fetch a (range) from the index.