single-column or composite index - mysql

For a table like this
[ col1 - col2 - col3 - col4 ]
[ 1 - 2 - 3 - 4 ]
I'm going to use two types of queries in two cases
One is SELECT * FROM table WHERE col1 = 1 AND col2 = 2 AND col3 = 3;
Another is SELECT * FROM table WHERE col1 = 1 AND col2 = 2 AND col4 = 4;
In this case, Do I make a
composite index for col1 AND col2 only and a single-column index for col3 AND col4
or do I go
ALL columns single-column index
or put
ALL the columns in composite index
Side question: Do I have to name the Index? And what is the Index size?

Have these two:
INDEX col123 (col1, col2, col3),
INDEX col214 (col2, col1, col4)
Notes:
For the 2 queries given, it does not matter which order the 3 columns are in the composite queries.
I did col1 and col2 in different orders just in case some other query needs col2 without col1.
Having INDEX(col3) (single-column) is less useful.
With INDEX(col1, col2), INDEX(col3) -- The optimizer will pick one index and not use the other. This is less good than having an index with all three columns.
Luke is good; my index cookbook might be better?
The "rules of thumb" are aimed at Postgres. Do not use them; there are too many things that are incorrect for MySQL.
The "query tuning" link is aimed at DB2; it mostly applies to MySQL.
INSERTs do take a little time to update the index(es), but most of that work is delayed (see "Change buffering") for non-UNIQUE indexes. Don't let that stop you from adding an index. The benefit on a SELECT usually far outweighs the cost in INSERT.
Index names are optional in MySQL, but the default for a composite index can be misleading.
Another way to compare queries and/or indexes, even with too few rows to get reliable timings:
FLUSH STATUS;
SELECT ...;
SHOW SESSION STATUS LIKE 'Handler%';
Big numbers = bad; small numbers = smile.

A compound index on col1 and col2, with single-column indexes on col3 and col4 is probably going to work best. But the way to tell for certain is to build a table for testing, and populate it with sample data. If possible, insert roughly the same amount of data you expect in production.
Then build indexes, run the queries, and read the execution plans. Drop the indexes, build them a different way, run the queries, and read the execution plans.
You should also think about what other queries need to use this table, and how indexes affect those queries. Think about INSERT and DELETE queries as well as SELECT statements.
Whether you have to name the index depends on the dbms. Most of them will supply a system-generated name if you leave it out. The size of the index depends on the dbms; there's usually a way to figure it out if the dbms doesn't supply explicit functions or stored procedures to do that.

Related

MySQL queries - defining compound and single indexes across multiple queries. How to prevent the indexes from conflicting and creating slow queries?

I am struggling to work out which columns are best to put my indexes on, when it seems adding additional indexes can have a detrimental effect on the query performance.
For example, I have the following query on a table with around 5m rows;
SELECT col1, col2 FROM table WHERE col1 = 'a' AND col2 = 'b' AND col3 = 'c';
Running this with no indexes takes 12 seconds!
I add a compound index on all 3 columns - table_col1_col2_col3_index;
My query now drops down to 2 seconds - great!
I now have another query on the same table (with no indexes on any column):
SELECT col1, col2 FROM table WHERE col1 = 'a';
Running this on its own and the query takes 4 seconds - still pretty slow!
So now I add a single column index to col1 table_col1_index
My query reduces down to 0.2 seconds. This is great, however I now run the original query again and notice that it is using this index opposed to the one I specified earlier. The original query is now back up at 6 seconds.
I am unsure how to go about ensuring that both queries can be optimised at the same time.
You can create indexes taking care to leave the most used or selective column on the left and then organizing the indexes if you can so you can use the same index in more queries ..
Furthermore you can always print the index you think is best adapted using FORCE (or IGNORE) https://dev.mysql.com/doc/refman/8.0/en/index-hints.html
SELECT * FROM table1 FORCE INDEX (col3_index)
WHERE col1=1 AND col2=2 AND col3=3;
Turn off the query cache, or use SELECT SQL_NO_CACHE ... when doing timing.
Run each timing test twice. The first may spend extra time fetching data and/or index blocks from disk. The second timing is better for comparisons. (And is closer to the way it would be in a "production" server.)
How many rows are being returned? That could have an impact. The 2nd query may be returning many times as many rows.
Please provide SHOW CREATE TABLE -- there could be subtle issues. (datatypes, column sizes, collations, who-knows-what)
Please provide EXPLAIN SELECT ... -- As written, each of your examples should say "Using index", meaning that the index "covers" the query, which means that all the columns in the SELECT exist in the INDEX being used.
Do not use "index hints" -- while it may help a query today, it may hurt it more tomorrow.
All of your examples would ('should') benefit from INDEX(col1, col2, col3), in that order; I would not add any others.

How to create Multiple-Column Indexes efficiently?

Let's say I've coded.
create index use_index on tbl_nm (col2 ,col3 ,col4 ,col5);
would use_index be used in
select * from tbl_nm where col2="something", col5 = "something", col3="something");
Also, we should have created index by ordering the most unique on the left and the most common on the right. right?
And if I would like to order the query result should I add that column into the index too?
The index should be at least refs used in common queries first followed by columns search by range.
So in your example here col2 and col3 will be use this index. but because there is no col4, the searching for col5 won't be as quick. i.e. all col4 items with col2 and col3 matching "something" will be scanned for a matching col5.
If you where searching for col4 rather than col5, it would be a binary search to the required item.
Use EXPLAIN {query} to show what the index usage is.
Consideration of most unique vs least in the order of the index isn't really considered.

Unique first column in multi-column index

I have multi-column index for 2 columns. Can I make first column unique without making separate index for that?
If I understand correctly mysql can use only first column in this index for lookups, so can it use it to detect uniqueness?
The short answer is "No". Because it doesn't make much sense.
Indeed, MySQL is able to use a multiple-column index for operations that use only the leftmost "n" columns from the index definition.
Let's say you have an index on columns (col1, col2). MySQL can use it to find records matching conditions on both col1 and col2, GROUP BY col1, col2 or ORDER BY col1, col2. It is important to notice that col1 and col2 needs to used in this order in the GROUP BY or ORDER BY clause. Their order doesn't matter on WHERE or ON clauses as long as both are used.
MySQL can also use the same index for WHERE or ON conditions and GROUP BY or ORDER BY clauses that contain only col1. It cannot, however, use the index if col2 appears without col1.
What happens when you have an index on columns (col1, col2) and all the rows have distinct values in column col1?
Let's assume we have a table that have distinct values in column col1 and it has an index on columns (col1, col2). When MySQL needs to find the rows that match WHERE col1 = val1 AND col2 = val2, by consulting the index it can find the row that have col1 = val1. It doesn't need to use the index to refine the list of candidate rows because there is no list: there is at most one row having col1 = val1.
Sure, most of the times MySQL will use the index to check if col2 = val2 but having col2 in this index doesn't bring more useful information to the index. The storage space it takes and the processing power it uses on table data updates are too big for the tiny contribution it adds to rows searching.
The whole purpose of having indexes on multiple columns is to help searching by shrinking the list of matching rows for a given set of values when the columns included in a multiple-column index cannot be used individually because they don't contain enough distinct values.
Technically speaking, there is no way to tell MySQL you want to have a multiple-column index on (col1, col2) that must have unique values on col1. Create an UNIQUE INDEX on col1 instead. Then think about the data you have in the table and the queries you run against it and decide if another index on col2 only isn't better than the multiple-column index on (col1, col2).
In order to decide you can create the new indexes (UNIQUE on col1, INDEX on col2), put EXPLAIN in front of the most frequent queries you run on the table and check what index will pick MySQL up for use.
You need to have enough data (thousands of rows, at least, more is better) in the table to get accurate results.
You asked.
I have multi-column index for 2 columns. Can I make first column unique without making separate index for that?
The answer is no. You need a separate unique index on the first column to enforce a uniqueness constraint.

Understanding the order of indexing in MySQL

There are 3 columns in a table with 10 million entries. col1, col2, col3. col1 stores numbers with at most 2 digits, col2 stores numbers with at most 9 digits and col3 stores either 0 or 1.
Now, when I compound index in the order (col1,col2,col3) I get results(of some select operations with all the 3 columns involved in the where condition- exact values of col1 and col3 are specified while a range for col2) in around 0.5 seconds while if I order it as (col3,col1,col2) it takes around 10 secs(for the same query).
From what I understand, indexing in mysql concatenates the values in the 3 columns appropriately in the order I specify them and runs a binary search while querying after an initial sort. According to this understanding, mentioning col3 in the very beginning should be equivalent if not superior to writing it in the order (col1,col2,col3) since if I specify col3=1 or col3=0 it narrows the search by half.
Please explain the anomaly!
Well its tough to make decision like this but personally I would go with indexing
INDEX `compound_index`(col1,col2,col3);
If I wouldn't have range scans to be done I would have created
INDEX `compound_index`(col2,col1,col3);
as col2 most likely have better cardinality
Generally speaking if you don't have range scan to a table columns have better cardinality would become the first column for the index and so on..
In case you have range scan loose index scan works better than covering index
http://www.arubin.org/blog/2010/11/18/loose-index-scan-vs-covered-indexes-in-mysql/
If your WHERE clause gives a range of values for col2, then anything after col2 in the index is not very useful.
If that's not clear, suppose you index on (col1, col2, col3) and your where clause is "where col1=5 and col2 between 2 and 4 and col3=1". So at best, the SQL engine can go to the place in the index beginning at col1=5, col2=2, and col3=1. Theoretically, it could say that when it gets to the end of col2=2, when it sees the first col2=3, col3=0, it could skip ahead to col2=3, col3=1. Similarly when it gets to col2=4, col3=0, it could skip ahead to col2=4, col3=1. But in practice skipping around in the index is relatively slow. The engine reads the index in blocks, so once it gets a block, if it sequentially searches that block, it already has it all in memory. But to skip it may have to read another block, which means additional I/o operations. I think most SQL engines say that once you give a range, everything after that in the index is not used. So most likely the engine would scan all records from 5,2 to 5,4 and pick out the col3=1 as it went, rather than skipping around in the index.
Given that, while you say that col3 is always 0 or 1. I take it that col1 and col2 have a wider range of values? Let's suppose for the sake of discussion that they each have 10 possible values, and that your range on col2 covers 3 values. And let's assume a relatively even distribution across all values -- there are just as many 1's as 2's, etc.
Then if you index on (col1, col2, col3), the engine can use col1 to immediately narrow the search to just 10% of the index, and col2 to narrow to 30% of that or 3% of the total.
If you index on (col3, col2, col1), then the engine can use col3 to narrow the search to 50% of the index, and col2 to 30% of that, or 15%.
Option (b) has the engine searching 5 times as much of the index as option 1. So yes, it would be slower.

When is one index better than two in MYSQL

I have read about when you are making a multicolumn index that the order matters and that typically you want the columns which will appear in the WHERE clauses first before others that would be in ORDER BY etc. However, won't you also get a speed up if you just index each one separately? (apparently not as my own experiments show me that the combined index behavior can be much faster than simply having each one separately indexed). When should you use a multicolumn index and what types of queries does it give a boost to?
A multi-column index will be the most effective for situations where all criteria are part of the multi-column index - even more so than having multiple single indexes.
Also, keep in mind, if you have a multi-column index, but don't utilize the first column indexed by the multi-column index (or don't otherwise start from the beginning and stay adjacent to previously-used indexes), the multi-column index won't have any benefit. (For example, if I have an index over columns [B, C, D], but have a WHERE that only uses [C, D], this multi-column index will have no benefit.)
Reference: http://dev.mysql.com/doc/refman/5.0/en/multiple-column-indexes.html :
MySQL can use multiple-column indexes for queries that test all the
columns in the index, or queries that test just the first column, the
first two columns, the first three columns, and so on. If you specify
the columns in the right order in the index definition, a single
composite index can speed up several kinds of queries on the same
table.
The short answer is, "it depends."
The long answer is this: if you sometimes do
WHERE COL1 = value1 and COL2 = value2
and sometimes do
WHERE COL1 = value1
but never, or almost never, do
WHERE COL2 = value2
then a compound index on (COL1, COL2) will be your best choice. (True for mySQL as well as other makes and models of DBMS.)
More indexes slow down INSERT and UPDATE operations, so they aren't free.
A compound index (c1,c2) is very useful in the following cases:
The obvious case where c1=v1 AND c2=v2
If you do WHERE c1=v1 AND c2 BETWEEN v2 and v3 - this does a "range scan" on the composite index
SELECT ... FROM t WHERE c1=v1 ORDER BY c2 - in this case it can use the index to sort the results.
In some cases, as a "covering index", e.g. "SELECT c1,c2 FROM t WHERE c1=4" can ignore the table contents, and just fetch a (range) from the index.