Multiple Column Index vs Multiple Indexes/Index Merge - mysql

Let's assume we have a table with 4 columns: A, B, C, and D
Let's assume we have a few queries that will join or perform a clause against these columns:
Q1: Where A = ?
Q2: Where A = ?, B = ?
Q3: Where A = ?, B = ?, C = ?
Since we know we will use these columns in three different contexts, is it best to create three different indexes? Or three different multiple indexes?
Index Merge:
Idx1: Create index A_idx ON table (A)
Idx2: Create index B_idx ON table (B)
Idx3: Create index C_idx ON table (C)
Multiple Index
Idx1: Create index A_idx ON table(A)
Idx2: Create index AB_idx ON table(A,B)
Idx3: Create index ABC_idx ON table(A,B,C)
This is a simplified case. Let's assume we have 10-15 columns, that will be joined or where'd in different ways and combinations. Is it best to create multiple column indexes for these combinations they will receive? Or just find the smallest set of multiple columns that are most frequently used, build a multiple column index on those, and then create individual indexes for the rest?

Composite index on (A,B,C) will cover the 3 queries, so you don't need index on (A) and ON (A,B). It's also faster than index_merge.
The only reason to have more than one index is if some queries won't be covered by the index (they include B and C, but not A for example)
Also keep in mind that one of the most important characteristics of the column, to decide if it should be included in the index, is not if it's used in a query, but it's cardinality. If the query on this column won't exclude a lot of the rows, you should not include it in the index.
Let's say you have A,B,C
For a given value of A you have 20% of the rows. From those rows, for a given value of B you have 1% of the rows. Lets say those conditions (A,B) filter 1000 rows from the table. After applying C, you receive 850 rows. Index on C is not effective and (A,B) is the best index for this query

Related

SQLite & MySQL Compound Index vs Single index

I have a table with two fields: a,b
Both fields are indexed separately -- no compound index.
While trying to run a select query with both fields:
select * from table where a=<sth> and b=<sth>
It took over 400ms. while
select * from table where a=<sth>
took only 30ms;
Do I need set a compound index for (a,b)?
Reasonably, if I have indexes on both a and b, it should be fast for queries of a AND b like above right?
For this query:
select *
from table
where a = <sth> and b = <sth>;
The best index is on table(a, b). This can also be used for your second query as well.
Usually (but not always).
In your case the number of different values in a (and b) and the number of columns you use in your select can change the way db decide to use index / table.
For example,
if in table you have,say, 100.000 records and 80.000 of them have the same value for a, when you query for:
SELECT * FROM table WHERE a=<your value>
db engine could decide to "scan" directly the table without using the index, while if you query
SELECT a, b FROM table WHERE a=<your value>
and in index you added column b too (in index directly or with INCLUDE) it's quite probable that db engine will use the index.
Try to give a look on internet for index tips and give a look too to How can I index these queries?
The SQLite documentation explains how index lookups work.
Once the database has used an index to look up some rows, the other index is no longer efficient to use (there is no easy method to filter the results of the first lookup because the other index refers to rows in the original table, not to entries in the first index). See Multiple AND-Connected WHERE-Clause Terms.
To make index lookups on two columns as fast as possible, you need Multi-Column Indices.

MySQL covering index optimization? [duplicate]

I've just heard the term covered index in some database discussion - what does it mean?
A covering index is an index that contains all of, and possibly more, the columns you need for your query.
For instance, this:
SELECT *
FROM tablename
WHERE criteria
will typically use indexes to speed up the resolution of which rows to retrieve using criteria, but then it will go to the full table to retrieve the rows.
However, if the index contained the columns column1, column2 and column3, then this sql:
SELECT column1, column2
FROM tablename
WHERE criteria
and, provided that particular index could be used to speed up the resolution of which rows to retrieve, the index already contains the values of the columns you're interested in, so it won't have to go to the table to retrieve the rows, but can produce the results directly from the index.
This can also be used if you see that a typical query uses 1-2 columns to resolve which rows, and then typically adds another 1-2 columns, it could be beneficial to append those extra columns (if they're the same all over) to the index, so that the query processor can get everything from the index itself.
Here's an article: Index Covering Boosts SQL Server Query Performance on the subject.
Covering index is just an ordinary index. It's called "covering" if it can satisfy query without necessity to analyze data.
example:
CREATE TABLE MyTable
(
ID INT IDENTITY PRIMARY KEY,
Foo INT
)
CREATE NONCLUSTERED INDEX index1 ON MyTable(ID, Foo)
SELECT ID, Foo FROM MyTable -- All requested data are covered by index
This is one of the fastest methods to retrieve data from SQL server.
Covering indexes are indexes which "cover" all columns needed from a specific table, removing the need to access the physical table at all for a given query/ operation.
Since the index contains the desired columns (or a superset of them), table access can be replaced with an index lookup or scan -- which is generally much faster.
Columns to cover:
parameterized or static conditions; columns restricted by a parameterized or constant condition.
join columns; columns dynamically used for joining
selected columns; to answer selected values.
While covering indexes can often provide good benefit for retrieval, they do add somewhat to insert/ update overhead; due to the need to write extra or larger index rows on every update.
Covering indexes for Joined Queries
Covering indexes are probably most valuable as a performance technique for joined queries. This is because joined queries are more costly & more likely then single-table retrievals to suffer high cost performance problems.
in a joined query, covering indexes should be considered per-table.
each 'covering index' removes a physical table access from the plan & replaces it with index-only access.
investigate the plan costs & experiment with which tables are most worthwhile to replace by a covering index.
by this means, the multiplicative cost of large join plans can be significantly reduced.
For example:
select oi.title, c.name, c.address
from porderitem poi
join porder po on po.id = poi.fk_order
join customer c on c.id = po.fk_customer
where po.orderdate > ? and po.status = 'SHIPPING';
create index porder_custitem on porder (orderdate, id, status, fk_customer);
See:
http://literatejava.com/sql/covering-indexes-query-optimization/
Lets say you have a simple table with the below columns, you have only indexed Id here:
Id (Int), Telephone_Number (Int), Name (VARCHAR), Address (VARCHAR)
Imagine you have to run the below query and check whether its using index, and whether performing efficiently without I/O calls or not. Remember, you have only created an index on Id.
SELECT Id FROM mytable WHERE Telephone_Number = '55442233';
When you check for performance on this query you will be dissappointed, since Telephone_Number is not indexed this needs to fetch rows from table using I/O calls. So, this is not a covering indexed since there is some column in query which is not indexed, which leads to frequent I/O calls.
To make it a covered index you need to create a composite index on (Id, Telephone_Number).
For more details, please refer to this blog:
https://www.percona.com/blog/2006/11/23/covering-index-and-prefix-indexes/

MySQL - Does SELECT * need an index of all table fields?

I would like to know if it is necessary to create an index for all fields within a table if one of your queries will use SELECT *.
To explain, if we had a table that 10M records and we did a SELECT * query on it would the query run faster if we have created an index for all fields within the table or does MySQL handle SELECT * in a different way to SELECT first_field, a_field, last_field.
To my understanding, if I had a query that did SELECT first_field, a_field FROM table then it would bring performance benefits if we created an index on first_field, a_field but if we use SELECT * is there even a benefit from creating an index for all fields?
Performing a SELECT * FROM mytable query would have to read all the data from the table. This could, theoretically, be done from an index if you have an index on all the columns, but it would be just faster for the database to read the table itself.
If you have a where clause, having an index on (some of) the columns you have conditions on may dramatically improve the query's performance. It's a gross simplification, but what basically happens is the following:
The appropriate rows are filtered according to the where clause. It's much faster to search for these rows in an index (which is, essentially, a sorted tree) than a table (which is an unordered set of rows).
For the columns that where in the index used in the previous step the values are returned.
For the columns that aren't, the table is accessed (according to a pointer kept in the index).
indexing a mysql table for a column improves performance when there is a need to search or edit a row/record based on that column of that table.
for example, if there is an 'id' column and if it is a primary key; And in that case if you want to search a record using where clause on that 'id' column then you don't need to create index for the 'id' column because primary key column will act as an indexed column.
In another case, if there is an 'pid' column in the table and if it is not a primary key; Then in order to search based on 'pid' column then to improve performance it is better to create an index for the 'pid' column. That will make query fast to search the expected record.

Multi-column index combined with unique index efficiency

We have a table that has multiple columns, and we have a UNIQUE index on one of our columns (lets call it GBID), and we have another column (lets call it flag) that has no indicies. This table can be quite large and we query WHERE gbid IN () AND flag = 1 a lot, we occasionally query WHERE gbid = "XXX" and rarely query WHERE flag = 1.
Which is more efficient when it comes to indicies:
Have gbid as UNIQUE and flag with no index
Have gbid as UNIQUE and have a multi column index for (gbid, flag)
Have gbid as UNIQUE and have a multi column index for (flag, gbid)
It depends on the % of rows with flag=1, and on how many rows you select (how many gbid's you have in the IN clause).
If it is low (1-2%) and you are selecting a lot of gbid's, options 2 and 3 might be faster (I think option 3 will be better in that case).
If you have a more even distribution of flag values having it in the index won't make a difference.
If you want to be sure you should benchmark it with a sample of real data.

MySQL multiple index types?

I've noticed that in PHPMyAdmin I can individually index columns or I can use checkboxes to select fields and then click index and they're indexed in a different way. Does this mean that if for a given table I have 2 columns of that table that define each row as unique (instead of just a simple single column id`) I should index those together to increase performance?
A multiple-column index can be considered a sorted array containing values that are created by concatenating the values of the indexed columns.
MySQL uses multiple-column indexes in such a way that queries are fast when you specify a known quantity for the first column of the index in a WHERE clause, even if you do not specify values for the other columns.
If you have two columns named last_name and first_name and you create an index INDEX name (last_name,first_name), The index can be used for queries that specify values in a known range for last_name, or for both last_name and first_name.
Source: http://dev.mysql.com/doc/refman/5.0/en/multiple-column-indexes.html
So, it may not be helpful in your particular case. Becuase if you want to query on the later columns (for example: SELECT * FROM test WHERE first_name='Michael' or SELECT * FROM test WHERE last_name='Widenius' OR first_name='Michael), the index will not be used and the queries will be slower.