Overhead of Composite Indexes

Overhead of Composite Indexes - mysql

I have many tables where I have indexes on foreign keys, and clustered indexes which include those foreign keys. For example, I have a table like the following:
TABLE: Item
------------------------
id PRIMARY KEY
owner FOREIGN KEY
status
... many more columns
MySQL generates indexes for primary and foreign keys, but sometimes, I want to improve query performance so I'll create clustered or covering indexes. This leads to have indexes with overlapping columns.
INDEXES ON: Item
------------------------
idx_owner (owner)
idx_owner_status (owner, status)
If I dropped idx_owner, future queries that would normally use idx_owner would just use idx_owner_status since it has owner as the first column in the index.
Is it worth keeping idx_owner around? Is there an additional I/O overhead to use idx_owner_status even though MySQL only uses part of the index?
Edit: I am really only interested in the way InnoDB behaves regarding indexes.

Short Answer
Drop the shorter index.
Long Anwser
Things to consider:
Drop it:
Each INDEX is a separate BTree that resides on disk, so it takes space.
Each INDEX is updated (sooner or later) when you INSERT a new row or an UPDATE modifies an indexed column. This takes some CPU and I/O and buffer_pool space for the 'change buffer'.
Any functional use (as opposed to performance) for the shorter index can be performed by the longer one.
Don't drop it:
The longer index is bulkier than the shorter one. So it is less cacheable. So (in extreme situations) using the bulkier one in place of the shorter one could cause more I/O. A case that aggravates this: INDEX(int, varchar255).
It is very rare that the last item really overrides the other items.
Bonus
A "covering" index is one that contains all the columns mentioned in a SELECT. For example:
SELECT status FROM tbl WHERE owner = 123;
This will touch only the BTree for INDEX(owner, status), thereby being noticeably faster than
SELECT status, foo FROM tbl WHERE owner = 123;
If you really need that query to be faster, then replace both of your indexes with INDEX(owner, status, foo).
PK in Secondary key
One more tidbit... In InnoDB, the columns of the PRIMARY KEY are implicitly appended to every secondary key. So, the three examples are really
INDEX(owner, id)
INDEX(owner, status, id)
INDEX(owner, status, foo, id)
More discussion in my blogs on composite indexes and index cookbook.

Related

Mysql select by auto increment primary key while partitioned by date

I was wondering how would mysql act if i partition a table by date and then have some select or update queries by primary key ?
is it going to search all partitions or query optimizer knows in which partition the row is saved ?
What about other unique and not-unique indexed columns ?

Background
Think of a PARTITIONed table as a collection of virtually independent tables, each with its own data BTree and index BTree(s).
All UNIQUE keys, including the PRIMARY KEY must include the "partition key".
If the partition key is available in the query, the query will first try to do "partition pruning" to limit the number of partitions to actually look at. Without that info, it must look at all partitions.
After the "pruning", the processing goes to each of the possible partitions, and performs the query.
Select, Update
A SELECT logically does a UNION ALL of whatever was found in the non-pruned partitions.
An UPDATE applies its action to each non-pruned partitions. No harm is done (except performance) by the updates that did nothing.
Opinion
In my experience, PARTITIONing often slows thing down due to things such as the above. There are a small number of use cases for partitioning: http://mysql.rjweb.org/doc.php/partitionmaint
Your specific questions
partition a table by date and then have some select or update queries by primary key ?
All partitions will be touched. The SELECT combines the one result with N-1 empty results. The UPDATE will do one update, plus N-1 useless attempts to update.
An AUTO_INCREMENT column must be the first column in some index (not necessarily the PK, not necessarily alone). So, using the id is quite efficient in each partition. But that means that it is N times as much effort as in a non-partitioned table. (This is a performance drag for partitioning.)

How to Create Multi-level Index in MySql and how to find out a column is Multi-level indexed?

I have a table 'activities'. it consists of around 1 million records. its columns consists off,
id(PK), u_id(FK), cl_id(FK), activity_log
By Default id(PK) is indexed and i have created my own index for u_id and cl_id by using,
ALTER TABLE activities ADD INDEX(u_id,cl_id);
Now i would like create an Multi-Level indexing for cl_id(FK) or id(PK) or for both the columns if its possible. How to create an multi-level index using query?
How to find out a column that is multi-level indexed in a table? i have tried this query but it shows only the column that are indexed,
SHOW indexes FROM activities;
Does Multi-level index and non-clustered index are they both same ?

I have no idea what a "multi-level" index is. But there are "composite" indexes such as
INDEX(last_name, first_name)
which is very useful for
WHERE last_name = 'James'
AND first_name = 'Rick'
or even
WHERE last_name = 'James'
AND first_name LIKE 'R%'
In MySQL (InnoDB in particular), the PRIMARY KEY is always a unique index and it is "clustered" with the data. That is, looking up a row by the PK is very efficient. The structure is always BTree.
"Secondary keys" are also BTrees, but the leaf node contains the PK. So, a second lookup is needed to complete the query. (This distinction is rarely worth nothing.)
The PK and/or secondary keys can be "composite".
Declaring a FOREIGN KEY adds a secondary index if there is not already some suitable (PRIMARY or secondary) index that starts with the column in the FK.
The following is redundant, and should be avoided:
PRIMARY KEY(id),
INDEX(id) -- DROP this
Indexed Sequential
Ouch! I have not heard of that antiquated indexing method in nearly two decades.
The Question gives a link to such as a definition of "multi-level" indexing. Run from it! Change schools. Or at least understand that IS is no longer considered viable.
MySQL uses "BTrees" for most indexing. It was invented decades ago, and essentially wiped out Indexed Sequential. And, to a large extent, has wiped out Hashing as an indexing technique on disk.
MySQL's BTrees can handle multiple columns, as in my example, above.

Is the primary key stored implicitly in other keys in mysql myisam engine?

My problem: imagine a table with millions of rows, like
CREATE TABLE a {
id INT PRIMARY KEY,
column2..,
column3..,
many other columns..
..
INDEX (column2);
and a query like this:
SELECT id FROM a WHERE column2 > 10000 LIMIT 1000 OFFSET 5000;
My question: does mysql only use the index "column2" (so the primary key id is implicitly stored as a reference in other indexes), or does it have to fetch all rows to get also the id, which is selected for output? In that case the query should be much faster with a key declared as:
INDEX column2(column2, id)

Short answer: No.
Long answer:
MyISAM, unlike InnoDB, has a "pointer" to the data in the leaf node of each index, including that PRIMARY KEY.
So, INDEX(col2) is essentially INDEX(col2, ptr). Ditto for INDEX(id) being INDEX(id, ptr).
The "pointer" is either a byte offset into the .MYD file (for DYNAMIC) or record number (for FIXED). In either case, the pointer leads to a "seek" into the .MYD file.
The pointer defaults to a 6-byte number, allowing for a huge number of rows. It can be changed by a setting, either for saving space or allowing an even bigger number of rows.
For your particular query, INDEX(col2, id) is optimal and "covering". It is better than INDEX(col2) for MyISAM, but they are equivalent for InnoDB, since InnoDB implicitly has the PK in each secondary index.
The query will have to scan at least 5000+1000 rows, at least in the index's BTree.
Note that InnoDB's PRIMARY KEY is clustered with the data, but MyISAM's PRIMARY KEY is a separate BTree, just like other secondary indexes.
You really should consider moving to InnoDB; there is virtually no reason to use MyISAM today.

An index on column2 is required. Your suggestion with id in the index will prevent table scans and should be very efficient.
Further more it is faster to do this assuming that column2 is a continuous sequence:
SELECT id FROM a WHERE column2 > 15000 LIMIT 1000;
This is because to work with the offset it would just have to scan the next 5000 records (MySQL does not realize that you are actually offsetting column2).

Mysql covering vs composite vs column index

In the following query
SELECT col1,col2
FROM table1
WHERE col3='value1'
AND col4='value2'
If I have 2 separate indexes one on col3 and the other on col4, Which one of them will be used in this query ?
I read somewhere that for each table in the query only one index is used. Does that mean that there is no way for the query to use both indexes ?
Secondly, If I created a composite index using both col3 and col4 together but used only col3 in the WHERE clause will that be worse for the performance?
example:
SELECT col1,col2
FROM table1
WHERE col3='value1'
Lastly, Is it better to just use Covering indexes in all cases ? and does it differ between MYISAM and innodb storage engines ?

A covering index is not the same as a composite index.
If I have 2 separate indexes one on col3 and the other on col4, Which one of them will be used in this query ?
The index with the highest cardinality.
MySQL keeps statistics on which index has what properties.
The index that has the most discriminating power (as evident in MySQL's statistics) will be used.
I read somewhere that for each table in the query only one index is used. Does that mean that there is no way for the query to used both indexes ?
You can use a subselect.
Or even better use a compound index that includes both col3 and col4.
Secondly, If I created a composite index using both col3 and col4 together but used only col3 in the WHERE clause will that be worse for the performance? example:
Compound index
The correct term is compound index, not composite.
Only the left-most part of the compound index will be used.
So if the index is defined as
index myindex (col3, col4) <<-- will work with your example.
index myindex (col4, col3) <<-- will not work.
See: http://dev.mysql.com/doc/refman/5.0/en/multiple-column-indexes.html
Note that if you select a left-most field, you can get away with not using that part of the index in your where clause.
Imagine we have a compound index
Myindex(col1,col2)
SELECT col1 FROM table1 WHERE col2 = 200 <<-- will use index, but not efficiently
SELECT * FROM table1 where col2 = 200 <<-- will NOT use index.
The reason this works is that the first query uses the covering index and does a scan on that.
The second query needs to access the table and for that reason scanning though the index does not make sense.
This only works in InnoDB.
What's a covering index
A covering index refers to the case when all fields selected in a query are covered by an index, in that case InnoDB (not MyISAM) will never read the data in the table, but only use the data in the index, significantly speeding up the select.
Note that in InnoDB the primary key is included in all secondary indexes, so in a way all secondary indexes are compound indexes.
This means that if you run the following query on InnoDB:
SELECT indexed_field FROM table1 WHERE pk = something
MySQL will always use a covering index and will not access the actual table. Although it could use a covering index, it will prefer the PRIMARY KEY because it only needs to hit a single row.

I upvoted Johan's answer for completeness, but I think the following statement he makes regarding secondary indexes is incorrect and/or confusing;
Note that in InnoDB the primary key is included in all secondary indexes,
so in a way all secondary indexes are compound indexes.
This means that if you run the following query on InnoDB:
SELECT indexed_field FROM table1 WHERE pk = something
MySQL will always use a covering index and will not access the actual table.
While I agree the primary key is INCLUDED in the secondary index, I do not agree MySQL "will always use a covering index" in the SELECT query specified here.
To see why, note that a full index "scan" is always required in this case. This is not the same as a "seek" operation, but is instead a 100% scan of the secondary index contents. This is due to the fact the secondary index is not ordered by the primary key; it is ordered by "indexed_field" (otherwise it would not be much use as an index!).
In light of this latter fact, there will be cases where it is more efficient to "seek" the primary key, and then extract indexed_field "from the actual table," not from the secondary index.

This is a question I hear a lot and there is a lot of confusion around the issues due to:
The differences in mySQL over the years.
Indexes and multiple index support changed over the years (towards being supported)
the InnoDB / myISAM differences
There are some key differences (below) but I do not believe multiple indexes are one of them
MyISAM is older but proven. Data in MyISAM tables is split between three different files for:- table format, data, and indexes.
InnoDB is relatively newer than MyISAM and is transaction safe. InnoDB also provides row-locking as opposed to table-locking which increases multi-user concurrency and performance. InnoDB also has foreign-key constraints.
Because of its row-locking feature InnoDB is well suited to high load environments.
To be sure about things, make sure to use explain_plan to analyze the query execution.

Compound index is not the same as a composite index.
Composite index covers all the columns in your filter, join and select criteria. All of these columns are stored on all of the index pages accordingly throughout the index B-tree.
Compound index covers all the filter and join key columns in the B-tree, but keeps the select columns only on the leaf pages as they will not be searched, rather only extracted!
This saves space and consequently creates less index pages, hence faster I/O.

MySql Query very slow

I run the following query on my database :
SELECT e.id_dernier_fichier
FROM Enfants e JOIN FichiersEnfants f
ON e.id_dernier_fichier = f.id_fichier_enfant
And the query runs fine. If I modifiy the query like this :
SELECT e.codega
FROM Enfants e JOIN FichiersEnfants f
ON e.id_dernier_fichier = f.id_fichier_enfant
The query becomes very slow ! The problem is I want to select many columns in table e and f, and the query can take up to 1 minute ! I tried different modifications but nothing works. I have indexes on id_* also on e.codega. Enfants has 9000 lines and FichiersEnfants has 20000 lines. Any suggestions ?
Here are the info asked (sorry not having shown them from the beginning) :

The difference in performance is possibly due to e.id_dernier_fichier being in the index used for the JOIN, but e.codega not being in that index.
Without a full definition of both tables, and all of their indexes, it's not possible to tell for certain. Also, including the two EXPLAIN PLANs for the two queries would help.
For now, however, I can elaborate on a couple of things...
If an INDEX is CLUSTERED (this also applies to PRIMARY KEYs), the data is actually physically stored in the order of the INDEX. This means that knowing you want position x in the INDEX also implicity means you want position x in the TABLE.
If the INDEX is not clustered, however, the INDEX is just providing a lookup for you. Effectively saying position x in the INDEX corresponds to position y in the TABLE.
The importance here is when accessing fields not specified in the INDEX. Doing so means you have to actually go to the TABLE to get the data. In the case of a CLUSTERED INDEX, you're already there, the overhead of finding that field is pretty low. If the INDEX isn't clustered, however, you effectifvely have to JOIN the TABLE to the INDEX, then find the field you're interested in.
Note; Having a composite index on (id_dernier_fichier, codega) is very different from having one index on just (id_dernier_fichier) and a seperate index on just (codega).
In the case of your query, I don't think you need to change the code at all. But you may benefit from changing the indexes.
You mention that you want to access many fields. Putting all those fields in a composite index is porbably not the best solution. Instead you may want to create a CLUSTERED INDEX on (id_dernier_fichier). This will mean that once the *id_dernier_fichier* has been located, you're already in the right place to get all the other fields as well.
EDIT Note About MySQL and CLUSTERED INDEXes
13.2.10.1. Clustered and Secondary Indexes
Every InnoDB table has a special index called the clustered index where the data for the rows is stored:
If you define a PRIMARY KEY on your table, InnoDB uses it as the clustered index.
If you do not define a PRIMARY KEY for your table, MySQL picks the first UNIQUE index that has only NOT NULL columns as the primary key and InnoDB uses it as the clustered index.
If the table has no PRIMARY KEY or suitable UNIQUE index, InnoDB internally generates a hidden clustered index on a synthetic column containing row ID values. The rows are ordered by the ID that InnoDB assigns to the rows in such a table. The row ID is a 6-byte field that increases monotonically as new rows are inserted. Thus, the rows ordered by the row ID are physically in insertion order.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008