Unique first column in multi-column index - mysql

I have multi-column index for 2 columns. Can I make first column unique without making separate index for that?
If I understand correctly mysql can use only first column in this index for lookups, so can it use it to detect uniqueness?

The short answer is "No". Because it doesn't make much sense.
Indeed, MySQL is able to use a multiple-column index for operations that use only the leftmost "n" columns from the index definition.
Let's say you have an index on columns (col1, col2). MySQL can use it to find records matching conditions on both col1 and col2, GROUP BY col1, col2 or ORDER BY col1, col2. It is important to notice that col1 and col2 needs to used in this order in the GROUP BY or ORDER BY clause. Their order doesn't matter on WHERE or ON clauses as long as both are used.
MySQL can also use the same index for WHERE or ON conditions and GROUP BY or ORDER BY clauses that contain only col1. It cannot, however, use the index if col2 appears without col1.
What happens when you have an index on columns (col1, col2) and all the rows have distinct values in column col1?
Let's assume we have a table that have distinct values in column col1 and it has an index on columns (col1, col2). When MySQL needs to find the rows that match WHERE col1 = val1 AND col2 = val2, by consulting the index it can find the row that have col1 = val1. It doesn't need to use the index to refine the list of candidate rows because there is no list: there is at most one row having col1 = val1.
Sure, most of the times MySQL will use the index to check if col2 = val2 but having col2 in this index doesn't bring more useful information to the index. The storage space it takes and the processing power it uses on table data updates are too big for the tiny contribution it adds to rows searching.
The whole purpose of having indexes on multiple columns is to help searching by shrinking the list of matching rows for a given set of values when the columns included in a multiple-column index cannot be used individually because they don't contain enough distinct values.
Technically speaking, there is no way to tell MySQL you want to have a multiple-column index on (col1, col2) that must have unique values on col1. Create an UNIQUE INDEX on col1 instead. Then think about the data you have in the table and the queries you run against it and decide if another index on col2 only isn't better than the multiple-column index on (col1, col2).
In order to decide you can create the new indexes (UNIQUE on col1, INDEX on col2), put EXPLAIN in front of the most frequent queries you run on the table and check what index will pick MySQL up for use.
You need to have enough data (thousands of rows, at least, more is better) in the table to get accurate results.

You asked.
I have multi-column index for 2 columns. Can I make first column unique without making separate index for that?
The answer is no. You need a separate unique index on the first column to enforce a uniqueness constraint.

Related

single-column or composite index

For a table like this
[ col1 - col2 - col3 - col4 ]
[ 1 - 2 - 3 - 4 ]
I'm going to use two types of queries in two cases
One is SELECT * FROM table WHERE col1 = 1 AND col2 = 2 AND col3 = 3;
Another is SELECT * FROM table WHERE col1 = 1 AND col2 = 2 AND col4 = 4;
In this case, Do I make a
composite index for col1 AND col2 only and a single-column index for col3 AND col4
or do I go
ALL columns single-column index
or put
ALL the columns in composite index
Side question: Do I have to name the Index? And what is the Index size?
Have these two:
INDEX col123 (col1, col2, col3),
INDEX col214 (col2, col1, col4)
Notes:
For the 2 queries given, it does not matter which order the 3 columns are in the composite queries.
I did col1 and col2 in different orders just in case some other query needs col2 without col1.
Having INDEX(col3) (single-column) is less useful.
With INDEX(col1, col2), INDEX(col3) -- The optimizer will pick one index and not use the other. This is less good than having an index with all three columns.
Luke is good; my index cookbook might be better?
The "rules of thumb" are aimed at Postgres. Do not use them; there are too many things that are incorrect for MySQL.
The "query tuning" link is aimed at DB2; it mostly applies to MySQL.
INSERTs do take a little time to update the index(es), but most of that work is delayed (see "Change buffering") for non-UNIQUE indexes. Don't let that stop you from adding an index. The benefit on a SELECT usually far outweighs the cost in INSERT.
Index names are optional in MySQL, but the default for a composite index can be misleading.
Another way to compare queries and/or indexes, even with too few rows to get reliable timings:
FLUSH STATUS;
SELECT ...;
SHOW SESSION STATUS LIKE 'Handler%';
Big numbers = bad; small numbers = smile.
A compound index on col1 and col2, with single-column indexes on col3 and col4 is probably going to work best. But the way to tell for certain is to build a table for testing, and populate it with sample data. If possible, insert roughly the same amount of data you expect in production.
Then build indexes, run the queries, and read the execution plans. Drop the indexes, build them a different way, run the queries, and read the execution plans.
You should also think about what other queries need to use this table, and how indexes affect those queries. Think about INSERT and DELETE queries as well as SELECT statements.
Whether you have to name the index depends on the dbms. Most of them will supply a system-generated name if you leave it out. The size of the index depends on the dbms; there's usually a way to figure it out if the dbms doesn't supply explicit functions or stored procedures to do that.

Clarification of MySQL index required to be "spanning" of all "and groups"

From the MySQL documentation here
Any index that does not span all AND levels in the WHERE clause is not used to optimize the query. In other words, to be able to use an index, a prefix of the index must be used in every AND group.
What exactly does this mean? Does it mean that for an index to be used, that every component of the AND query must refer to that index?
So lets say we have a Person table with SID (primary), first_name (index), last_name.
Does that mean that for the following query
Select * from Person where first_name='foo' and last_name='bar'
will not use the index on first_name?
An AND group is a set of comparisons that are combined with AND. A WHERE clause has multiple AND groups if it uses OR to combine several of these, e.g.
WHERE (col1 = 1 AND col2 = 2 AND col6 = 10) OR (col1 = '5' AND col4 = 'B' AND col2 = 16)
has two AND groups. There's one group that tests col1, col2, and col6, and another group that tests col1, col4, and col2.
So an index can be used if it has a prefix that's tested in every one of these groups. For instance, an index on (col1, col2, col3) could be used because the prefix (col1, col2) spans both groups.
That statement in the document is rather misleading. It seems to contradict directly with the first example given under that statement.
The following WHERE clauses use indexes:
... WHERE index_part1=1 AND index_part2=2 AND other_column=3
Here it's clearly stated that the index is used even though other_column is not a part of the index. The confusion then, is caused by what exactly is an 'AND Group'. Bamar has explained that really well in his answer so I will not go into that here. But suffice to say
Select * from Person where first_name='foo' and last_name='bar'
Will user an index provided that number of rows with first_name = 'foo' is much smaller than total number of rows in the table.
The statement you quoted here is referring to the multiple column indexing or compound indexes.
It indicates that if you have created an index on multiple columns, they all should be presented in the same order in and groups.
If you have crested an index on col1, col2, col3
And groups can be
col1=1 and col2=2 and col3=3
You can also have
col1=1 and col2=2
But you cannot have
col2=1 and col3=3
Because it is not the prefix of the index

SQL query the last created item performance

I am trying to fetch the last created item in a large table like this:
SELECT `raw_detection`.* FROM `raw_detection`
WHERE `raw_detection`.`duplicated` = 0
AND `raw_detection`.`audio_source_id` = 100
ORDER BY created_at desc LIMIT 1
But this query takes a long time to run (more than 2 seconds).
I have this index:
KEY `index_raw_detections_audio_source`(`audio_source_id`,`duplicated`,`created_at`)
Is there any better way to fetch the last created item for a specific audio source?
Your key references three columns. It cannot be used to speed up queries using only the created_at portion of the key. Try creating an additional key for just created_at.
For reference, from the MySQL doc:
If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to look up rows. For example, if you have a three-column index on (col1, col2, col3), you have indexed search capabilities on
(col1), (col1, col2), and (col1, col2, col3).
MySQL cannot use the index to perform lookups if the columns do not form a leftmost prefix of the index.
https://dev.mysql.com/doc/refman/5.0/en/multiple-column-indexes.html
In your select, specify the columns needed rather than doing select *

Understanding the order of indexing in MySQL

There are 3 columns in a table with 10 million entries. col1, col2, col3. col1 stores numbers with at most 2 digits, col2 stores numbers with at most 9 digits and col3 stores either 0 or 1.
Now, when I compound index in the order (col1,col2,col3) I get results(of some select operations with all the 3 columns involved in the where condition- exact values of col1 and col3 are specified while a range for col2) in around 0.5 seconds while if I order it as (col3,col1,col2) it takes around 10 secs(for the same query).
From what I understand, indexing in mysql concatenates the values in the 3 columns appropriately in the order I specify them and runs a binary search while querying after an initial sort. According to this understanding, mentioning col3 in the very beginning should be equivalent if not superior to writing it in the order (col1,col2,col3) since if I specify col3=1 or col3=0 it narrows the search by half.
Please explain the anomaly!
Well its tough to make decision like this but personally I would go with indexing
INDEX `compound_index`(col1,col2,col3);
If I wouldn't have range scans to be done I would have created
INDEX `compound_index`(col2,col1,col3);
as col2 most likely have better cardinality
Generally speaking if you don't have range scan to a table columns have better cardinality would become the first column for the index and so on..
In case you have range scan loose index scan works better than covering index
http://www.arubin.org/blog/2010/11/18/loose-index-scan-vs-covered-indexes-in-mysql/
If your WHERE clause gives a range of values for col2, then anything after col2 in the index is not very useful.
If that's not clear, suppose you index on (col1, col2, col3) and your where clause is "where col1=5 and col2 between 2 and 4 and col3=1". So at best, the SQL engine can go to the place in the index beginning at col1=5, col2=2, and col3=1. Theoretically, it could say that when it gets to the end of col2=2, when it sees the first col2=3, col3=0, it could skip ahead to col2=3, col3=1. Similarly when it gets to col2=4, col3=0, it could skip ahead to col2=4, col3=1. But in practice skipping around in the index is relatively slow. The engine reads the index in blocks, so once it gets a block, if it sequentially searches that block, it already has it all in memory. But to skip it may have to read another block, which means additional I/o operations. I think most SQL engines say that once you give a range, everything after that in the index is not used. So most likely the engine would scan all records from 5,2 to 5,4 and pick out the col3=1 as it went, rather than skipping around in the index.
Given that, while you say that col3 is always 0 or 1. I take it that col1 and col2 have a wider range of values? Let's suppose for the sake of discussion that they each have 10 possible values, and that your range on col2 covers 3 values. And let's assume a relatively even distribution across all values -- there are just as many 1's as 2's, etc.
Then if you index on (col1, col2, col3), the engine can use col1 to immediately narrow the search to just 10% of the index, and col2 to narrow to 30% of that or 3% of the total.
If you index on (col3, col2, col1), then the engine can use col3 to narrow the search to 50% of the index, and col2 to 30% of that, or 15%.
Option (b) has the engine searching 5 times as much of the index as option 1. So yes, it would be slower.

Would WHERE col1 and ORDER BY col2 use a composite key on (col1,col2)?

I have a database table (potentially huge, with hundreds of millions of records in the future) on which I would execute the following query very often:
select *
from table1
where col1 = [some number]
order by col2
Obviously having an index on "col1" would make it run fast. col1 is not unique, so many rows (2000+ I expect) would be returned.
Does it make sense to create an index on (col1, col2)? Would MySQL use it for this query?
Also, if I just query without "order by" part, would this index be used as well for the "where" part?
Yes, it will help, mysql will use composite index with first part on WHERE and second part on ORDER BY. You can read about ORDER BY optimization here: http://dev.mysql.com/doc/refman/5.5/en/order-by-optimization.html