Alternative way to index a MySQL table for optimal queries? - mysql

I am confused as to how best to index a table in MySQL and need help on the best type of index construction to use. Currently I am using a unique-key index on this table but do not know if this is the best approach to use and in some situations I cannot use this type of indexing due to MySQL limitations.
The table consists of a primary key and n-columns, in this scenario to keep it simple n=4. So the table looks like this: pk, col1, col2, col3, col4
The values in col1-n are VARCHARs typically with a length between 1 to 4 characters. The primary key is a concatenation of the col values. So typical rows could look like the following:
A:B:C:D, A, B, C, D
A:B:C:E, A, B, C, E
A:B:F:F, A, B, F, F
Where the first element is the primary key, and subsequent elements are col1, col2, etc.
The table needs to be optimised for queries, not inserts. The queries that I wish to perform will have a WHERE clause where we know some of the values in columns 1-4. So for example I might want to find all rows where the second column is 'B' or 'C'. Once I have the primary key I use this to JOIN another table.
I was creating a unique key on col1-4 (as they are unique). The problem is, as soon as n becomes large (>16), I can no longer create a unique key index (MySQL is limited to 16 columns for unique key constraints). This is not a problem as the primary key ensures uniqueness. However, I am unsure of two things:
a) Is the unique key a good index to use in order to optimise the speed of the queries?
b) When I can not use a unique key, what index should I use?
I have the following options, and I’m not sure which (if any) is the best:
a) Create a single index on (col1, col2, col3, col4)
b) Create an index per column (col1), (col2)…(col-n)
c) Create an index per col, with the pk included (pk, col1), (pk, col2), (pk, col-n)
Any help you can provide is greatly appreciated.
Thanks
Phil

An index on (col1, col2, col3, col4) can only be used, if the WHERE clause contains a condition on the first columns. That means, if the query does not contain a condition on col1, the index cannot be used at all (see Multiple-Column Indexes). If you have such queries, additional indices should be defined. These might be (col2, col3, col4), (col3, col4) and (col4).
On the other hand, separate indices on (col1), (col2), (col3) and (col4) are also a good choice. Int that case, there is no need to include the primary key in the indices. I'd prefer this solution over the solution mentioned above.
I find your choice of primary key strange. If (col1, col2, col3, col4) is unique, use that as a primary key. If you do not want a primary key on four columns (most people don't), the next choice is often a surrogate key (i.e. an auto_increment column in MySQL). In that case, a unique key on (col1, col2, col3, col4) enforces data integrity.

MySQL is able to merge join several indexes within a single table on PK, as long as you are searching for exact key values (not ranges).
So if you create separate indexes on col1 to colN, you may run this query:
SELECT *
FROM mytable
WHERE col2 = 'B'
OR
col3 = 'C'
which will result in the indexes on col2 and col3 merge joined (you will see it as index_merge using union(col2, col3) in the EXPLAIN output).
To ensure uniqueness, it's enough to declare your first column the PRIMARY KEY, so as long as you maintain your data consistency (PK value is indeed the col* values concatenated and separated), your data uniqueness will be policed by the PK.

Related

Optimize a SQL query with INDEX

I have a very simple query
SELECT col1, col2, col3, col4 FROM table FORCE INDEX (col2)
WHERE col2 IN ('there', 'are, 'around', 'six', 'values', 'here')
with index col2 for col2. My table has around 10 millions row. I used FORCE INDEX here because there are other indices in my table and MySQL uses one of other indices instead of index col2. The other index is very slow for this query.
List of all indices in my table:
INDEX col2 (col2)
UNIQUE INDEX ind1 (col1, col2)
INDEX ind2 (col1, col2)
INDEX ind3 (col2, col1)
This query (with FORCE INDEX) is not slow (takes 6 seconds on AWS RDS free tier) but there is a need to make it as fast as possible. Is there any thing else I could do to speed up this query?
First, you should try not forcing the index on col2, and instead just look at the explain plan. It is likely that a single column index on col2 would be used here. However, you can try adding the following composite covering index on your table:
CREATE INDEX idx ON yourTable (col2, col1, col3, col4);
This index would cover the WHERE clause, and also includes the other columns which appear in the SELECT clause. If it chooses, MySQL could use this index to completely cover the entire query without needing to seek back to the clustered index (i.e. the original table).
INDEX col2 (col2)
UNIQUE INDEX ind1 (col1, col2)
INDEX ind2 (col1, col2)
INDEX ind3 (col2, col1)
Some of these indexes are redundant. MySQL can use (col2, col1) for searches on col2 as well as searches on both col2 and col1. And ind2 is fully redundant with ind1.
The redundancy might be confusing the optimizer.
To cover all combinations of col1 and col2, as well as enforce uniqueness, you only need...
INDEX col2 (col2)
UNIQUE INDEX ind1 (col1, col2)
Removing the redundant indexes will speed up inserts and save space.
See 8.3.6 Multiple-Column Indexes.
The query planner makes its guesses based on table statistics. Sometimes those statistics are out of date. Try running analyze table to update them.

How to create Multiple-Column Indexes efficiently?

Let's say I've coded.
create index use_index on tbl_nm (col2 ,col3 ,col4 ,col5);
would use_index be used in
select * from tbl_nm where col2="something", col5 = "something", col3="something");
Also, we should have created index by ordering the most unique on the left and the most common on the right. right?
And if I would like to order the query result should I add that column into the index too?
The index should be at least refs used in common queries first followed by columns search by range.
So in your example here col2 and col3 will be use this index. but because there is no col4, the searching for col5 won't be as quick. i.e. all col4 items with col2 and col3 matching "something" will be scanned for a matching col5.
If you where searching for col4 rather than col5, it would be a binary search to the required item.
Use EXPLAIN {query} to show what the index usage is.
Consideration of most unique vs least in the order of the index isn't really considered.

Unique first column in multi-column index

I have multi-column index for 2 columns. Can I make first column unique without making separate index for that?
If I understand correctly mysql can use only first column in this index for lookups, so can it use it to detect uniqueness?
The short answer is "No". Because it doesn't make much sense.
Indeed, MySQL is able to use a multiple-column index for operations that use only the leftmost "n" columns from the index definition.
Let's say you have an index on columns (col1, col2). MySQL can use it to find records matching conditions on both col1 and col2, GROUP BY col1, col2 or ORDER BY col1, col2. It is important to notice that col1 and col2 needs to used in this order in the GROUP BY or ORDER BY clause. Their order doesn't matter on WHERE or ON clauses as long as both are used.
MySQL can also use the same index for WHERE or ON conditions and GROUP BY or ORDER BY clauses that contain only col1. It cannot, however, use the index if col2 appears without col1.
What happens when you have an index on columns (col1, col2) and all the rows have distinct values in column col1?
Let's assume we have a table that have distinct values in column col1 and it has an index on columns (col1, col2). When MySQL needs to find the rows that match WHERE col1 = val1 AND col2 = val2, by consulting the index it can find the row that have col1 = val1. It doesn't need to use the index to refine the list of candidate rows because there is no list: there is at most one row having col1 = val1.
Sure, most of the times MySQL will use the index to check if col2 = val2 but having col2 in this index doesn't bring more useful information to the index. The storage space it takes and the processing power it uses on table data updates are too big for the tiny contribution it adds to rows searching.
The whole purpose of having indexes on multiple columns is to help searching by shrinking the list of matching rows for a given set of values when the columns included in a multiple-column index cannot be used individually because they don't contain enough distinct values.
Technically speaking, there is no way to tell MySQL you want to have a multiple-column index on (col1, col2) that must have unique values on col1. Create an UNIQUE INDEX on col1 instead. Then think about the data you have in the table and the queries you run against it and decide if another index on col2 only isn't better than the multiple-column index on (col1, col2).
In order to decide you can create the new indexes (UNIQUE on col1, INDEX on col2), put EXPLAIN in front of the most frequent queries you run on the table and check what index will pick MySQL up for use.
You need to have enough data (thousands of rows, at least, more is better) in the table to get accurate results.
You asked.
I have multi-column index for 2 columns. Can I make first column unique without making separate index for that?
The answer is no. You need a separate unique index on the first column to enforce a uniqueness constraint.

Efficiency of multicolumn indexes in MySQL

If I have a MyISAM table with a 3-column index, something like
create table t (
a int,
b int,
c int,
index abc (a, b, c)
) engine=MyISAM;
the question is, can the following query fully utilize the index:
select * from t where a=1 and c=2;
in other words, considering that an index is a b-tree, can MySQL skip the column in the middle and still do a quick search on first and last columns?
EXPLAIN does seem to be showing that the index will be used, however, the Extra says: Using where; Using index and I have no idea what this really means.
The answer is "no".
The MySQL documentation is quite clear on how indexes are used:
If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to find rows. For example, if you have a three-column index on (col1, col2, col3), you have indexed search capabilities on (col1), (col1, col2), and (col1, col2, col3). (http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html.)
What happens is that the index gets used for "a=1". All records that match are loaded, to see if "c=2" is true. The filter ends up using a combination of indexes and explicit record filtering.
By the way, if you want to handle all combinations of two columns, you need several indexes:
(a, b, c)
(b, a, c)
(c, b, a)
Even if you are using an index for all parts of a WHERE clause, you
may see Using where if the column can be NULL.
As per MySQL documentation, the above statement clarifies that the column in your table could be null and hence it says using where as well though it has covering index for fields in where clause.
http://dev.mysql.com/doc/refman/5.1/en/explain-output.html#explain-extra-information

What happens if I drop a MySQL column without dropping its index first?

With one of my MySQL tables, I dropped column col1 before I drop it from a unique index (col0, col1, col2, col3) that contains it.
Is it automatically taken care of by MySQL? It seems the unique index that was previously (col0, col1, col2, col3) was automatically changed to (col0, col2, col3) after I deleted the column col1.
Is it going to be a problem or do I have to drop the unique index and re-create it as (col0, col2, col3)?
According to the MySQL 5.1 Reference Manual:
If columns are dropped from a table,
the columns are also removed from any
index of which they are a part. If all
columns that make up an index are
dropped, the index is dropped as well.
If you use CHANGE or MODIFY to shorten
a column for which an index exists on
the column, and the resulting column
length is less than the index length,
MySQL shortens the index
automatically.