MySQL index question - mysql

I've been reading about indexes in MySQL recently, and some of the principles are quite straightforward but one concept is still bugging me: basically, if in a hypothetical table with, let's say, 10 columns, we have two single-column indexes (for column01 and column02 respectively), plus a primary key column (some other column), then are they going to be used in a simple SELECT query like this one or not:
SELECT * FROM table WHERE column01 = 'aaa' AND column02 = 'bbb'
Looking at it, my first instinct is telling me that the first index is going to retrieve a set of rows (or primary keys in InnoDB, if I got the idea right) that satisfy the first condition, and the second index will get another set. And the final result set will be just the intersection of these two. In the books that I've been going through I cannot find anything about this particular scenario. Of course, for this particular query one index on both columns seems like the best option, but I am struggling with understanding the real process behind this whole thing if I try to use two indexes that I described above.

Its only going to use a single index. You need to create a composite index of multiple columns if you want it to be able to index off of each column you are testing. You may want to read the manual to find out how MySQL uses each type of index, and how to order your composite indexes correctly to get the best utilization of it.

It's actually the most common question
about indexing at all: is it better to
have one index with all columns or one
individual index for every column?
http://use-the-index-luke.com/sql/where-clause/searching-for-ranges/index-combine-performance

Related

Too many composite(multi column )indexing is ok?

Consider we have A,B,C,D,E,F,G,H columns in my table and if I make composite indexing on column ABCDE because these column are coming in where clause and then I want composite indexing on ABCEF then I create new composite indexing on ABCEF in same table but in a different query, we want indexing on column ABCGH for that i make another composite indexing in same table,
So my question is can we make too many composite indexes as per our requirements because sometimes our column in where clause gets change and we have to increase its performance so tell me is there any other way to optimise the query or we can make multiple composite indexes as per the requirements.
I tried multiple composite indexing and it did not give me any problem till now but still i want to know from all of you that will it give me any problem in future or is it ok to use only composite indexes and not single index.
Your answers will help me alot waiting for your replies.
Thanks in advance.
You can have as many as you want. However, each additional index has a cost when updating, inserting or deleting. The trick is to
find common segments and make indexes for those.
Or create them as required when queries are too slow.
As an example, if you are "needing" indexes for ABCDE, ABDEF, and ABGIH then create an index on just AB
InnoDB supports up to 64 indexes per table (cf. https://dev.mysql.com/doc/refman/8.0/en/innodb-limits.html).
If you try to create a composite index for every permutation of N columns, you would need N-factorial indexes. So for 8 columns, you would need 40,320 indexes. Clearly this is more than InnoDB supports.
You probably don't need that many indexes. In practice, I've rarely seen more than 6 indexes in a given table. All queries that are needed are optimized by one of those.
I understand you said sometimes you change the terms in your query's WHERE clause, so it might need a composite index with different columns.
You can rely on indexes that have a subset of all the columns that would be optimal. That won't be 100% optimized, but it will be better than no index.
You can't predict the optimal set of indexes for a given query until you write that query.
There is a limit of 64 secondary indexes (at least in InnoDB).
Order the indexes so that columns being tested with = come first. (The order of those columns in the INDEX does not matter.)
The leftmost columns in an index are the most important.
There is little or now use in including more than one column that will be searched by a range.
Study your likely queries, and find the most common combinations of 2 or 3 columns; build indexes starting with those.
In your two examples (ABCDEFGH and ABCEF), ABC would work for both (assuming at least A and B are tested with =). If you do throw on more columns, that one INDEX can still be used for both cases.
Maybe you would what to declare both ABCDEFGH and BCEFA; This handles your ABCDEF case, plus cases that have B, but not A. (Remember 'leftmost'.)
Use the SlowLog to find the slowest queries and make better indexes for them.
More on indexing: Index Cookbook
Each index requires space on the disk to be stored, and time to be updated every time you update(/insert/delete) an indexed column value.
So as long as you don't run out of storage or write operations are too slow, because you have to update too many indexes, you are not limited to create as many specific indexes as you want.
This depends on your use case and should be measured with production like data.
A common solution would be to create one index specific for your most important query e.g. in your case ABCDE.
Other queries can still use the as many columns from left to right until there is a first difference. e.g. a query searching for ABCEF could still use ABC on the previous mentioned index.
To also utilise column E you could add a where condition to D to your query in a way you know it matches all values e.g. D < 100 if you know there are only values 1-99.

After defining an index, will MySQL do the rest of the work? [duplicate]

I am really interested in how MySQL indexes work, more specifically, how can they return the data requested without scanning the entire table?
It's off-topic, I know, but if there is someone who could explain this to me in detail, I would be very, very thankful.
Basically an index on a table works like an index in a book (that's where the name came from):
Let's say you have a book about databases and you want to find some information about, say, storage. Without an index (assuming no other aid, such as a table of contents) you'd have to go through the pages one by one, until you found the topic (that's a full table scan).
On the other hand, an index has a list of keywords, so you'd consult the index and see that storage is mentioned on pages 113-120,231 and 354. Then you could flip to those pages directly, without searching (that's a search with an index, somewhat faster).
Of course, how useful the index will be, depends on many things - a few examples, using the simile above:
if you had a book on databases and indexed the word "database", you'd see that it's mentioned on pages 1-59,61-290, and 292 to 400. In such case, the index is not much help and it might be faster to go through the pages one by one (in a database, this is "poor selectivity").
For a 10-page book, it makes no sense to make an index, as you may end up with a 10-page book prefixed by a 5-page index, which is just silly - just scan the 10 pages and be done with it.
The index also needs to be useful - there's generally no point to index e.g. the frequency of the letter "L" per page.
The first thing you must know is that indexes are a way to avoid scanning the full table to obtain the result that you're looking for.
There are different kinds of indexes and they're implemented in the storage layer, so there's no standard between them and they also depend on the storage engine that you're using.
InnoDB and the B+Tree index
For InnoDB, the most common index type is the B+Tree based index, that stores the elements in a sorted order. Also, you don't have to access the real table to get the indexed values, which makes your query return way faster.
The "problem" about this index type is that you have to query for the leftmost value to use the index. So, if your index has two columns, say last_name and first_name, the order that you query these fields matters a lot.
So, given the following table:
CREATE TABLE person (
last_name VARCHAR(50) NOT NULL,
first_name VARCHAR(50) NOT NULL,
INDEX (last_name, first_name)
);
This query would take advantage of the index:
SELECT last_name, first_name FROM person
WHERE last_name = "John" AND first_name LIKE "J%"
But the following one would not
SELECT last_name, first_name FROM person WHERE first_name = "Constantine"
Because you're querying the first_name column first and it's not the leftmost column in the index.
This last example is even worse:
SELECT last_name, first_name FROM person WHERE first_name LIKE "%Constantine"
Because now, you're comparing the rightmost part of the rightmost field in the index.
The hash index
This is a different index type that unfortunately, only the memory backend supports. It's lightning fast but only useful for full lookups, which means that you can't use it for operations like >, < or LIKE.
Since it only works for the memory backend, you probably won't use it very often. The main case I can think of right now is the one that you create a temporary table in the memory with a set of results from another select and perform a lot of other selects in this temporary table using hash indexes.
If you have a big VARCHAR field, you can "emulate" the use of a hash index when using a B-Tree, by creating another column and saving a hash of the big value on it. Let's say you're storing a url in a field and the values are quite big. You could also create an integer field called url_hash and use a hash function like CRC32 or any other hash function to hash the url when inserting it. And then, when you need to query for this value, you can do something like this:
SELECT url FROM url_table WHERE url_hash=CRC32("http://gnu.org");
The problem with the above example is that since the CRC32 function generates a quite small hash, you'll end up with a lot of collisions in the hashed values. If you need exact values, you can fix this problem by doing the following:
SELECT url FROM url_table
WHERE url_hash=CRC32("http://gnu.org") AND url="http://gnu.org";
It's still worth to hash things even if the collision number is high cause you'll only perform the second comparison (the string one) against the repeated hashes.
Unfortunately, using this technique, you still need to hit the table to compare the url field.
Wrap up
Some facts that you may consider every time you want to talk about optimization:
Integer comparison is way faster than string comparison. It can be illustrated with the example about the emulation of the hash index in InnoDB.
Maybe, adding additional steps in a process makes it faster, not slower. It can be illustrated by the fact that you can optimize a SELECT by splitting it into two steps, making the first one store values in a newly created in-memory table, and then execute the heavier queries on this second table.
MySQL has other indexes too, but I think the B+Tree one is the most used ever and the hash one is a good thing to know, but you can find the other ones in the MySQL documentation.
I highly recommend you to read the "High Performance MySQL" book, the answer above was definitely based on its chapter about indexes.
Basically an index is a map of all your keys that is sorted in order. With a list in order, then instead of checking every key, it can do something like this:
1: Go to middle of list - is higher or lower than what I'm looking for?
2: If higher, go to halfway point between middle and bottom, if lower, middle and top
3: Is higher or lower? Jump to middle point again, etc.
Using that logic, you can find an element in a sorted list in about 7 steps, instead of checking every item.
Obviously there are complexities, but that gives you the basic idea.
Take a look at this link: http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html
How they work is too broad of a subject to cover in one SO post.
Here is one of the best explanations of indexes I have seen. Unfortunately it is for SQL Server and not MySQL. I'm not sure how similar the two are...
In MySQL InnoDB, there are two types of index.
Primary key which is called clustered index. Index key words are stored with
real record data in the B+Tree leaf node.
Secondary key which is non clustered index. These index only store primary key's key words along with their own index key words in the B+Tree leaf node. So when searching from secondary index, it will first find its primary key index key words and scan the primary key B+Tree to find the real data records. This will make secondary index slower compared to primary index search. However, if the select columns are all in the secondary index, then no need to look up primary index B+Tree again. This is called covering index.
Take at this videos for more details about Indexing
Simple Indexing
You can create a unique index on a table. A unique index means that two rows cannot have the same index value. Here is the syntax to create an Index on a table
CREATE UNIQUE INDEX index_name
ON table_name ( column1, column2,...);
You can use one or more columns to create an index. For example, we can create an index on tutorials_tbl using tutorial_author.
CREATE UNIQUE INDEX AUTHOR_INDEX
ON tutorials_tbl (tutorial_author)
You can create a simple index on a table. Just omit UNIQUE keyword from the query to create simple index. Simple index allows duplicate values in a table.
If you want to index the values in a column in descending order, you can add the reserved word DESC after the column name.
mysql> CREATE UNIQUE INDEX AUTHOR_INDEX
ON tutorials_tbl (tutorial_author DESC)
Adding some visual representation to the list of answers.
MySQL uses an extra layer of indirection: secondary index records point to primary index records, and the primary index itself holds the on-disk row locations. If a row offset changes, only the primary index needs to be updated.
Caveat: Disk data structure looks flat in the diagram but actually is a
B+ tree.
Source: link
I want to add my 2 cents. I am far from being a database expert, but I've recently read up a bit on this topic; enough for me to try and give an ELI5. So, here's may layman's explanation.
I understand it as such that an index is like a mini-mirror of your table, pretty much like an associative array. If you feed it with a matching key then you can just jump to that row in one "command".
But if you didn't have that index / array, the query interpreter must use a for-loop to go through all rows and check for a match (the full-table scan).
Having an index has the "downside" of extra storage (for that mini-mirror), in exchange for the "upside" of looking up content faster.
Note that (in dependence of your db engine) creating primary, foreign or unique keys automatically sets up a respective index as well. That same principle is basically why and how those keys work.
Let's suppose you have a book, probably a novel, a thick one with lots of things to read, hence lots of words.
Now, hypothetically, you brought two dictionaries, consisting of only words that are only used, at least one time in the novel. All words in that two dictionaries are stored in typical alphabetical order. In hypothetical dictionary A, words are printed only once while in hypothetical dictionary B words are printed as many numbers of times it is printed in the novel. Remember, words are sorted alphabetically in both the dictionaries.
Now you got stuck at some point while reading a novel and need to find the meaning of that word from anyone of those hypothetical dictionaries. What you will do? Surely you will jump to that word in a few steps to find its meaning, rather look for the meaning of each of the words in the novel, from starting, until you reach that bugging word.
This is how the index works in SQL. Consider Dictionary A as PRIMARY INDEX, Dictionary B as KEY/SECONDARY INDEX, and your desire to get for the meaning of the word as a QUERY/SELECT STATEMENT.
The index will help to fetch the data at a very fast rate. Without an index, you will have to look for the data from the starting, unnecessarily time-consuming costly task.
For more about indexes and types, look this.
Indexes are used to find rows with specific column values quickly. Without an index, MySQL must begin with the first row and then read through the entire table to find the relevant rows. The larger the table, the more this costs. If the table has an index for the columns in question, MySQL can quickly determine the position to seek to in the middle of the data file without having to look at all the data. This is much faster than reading every row sequentially.
Indexing adds a data structure with columns for the search conditions and a pointer
The pointer is the address on the memory disk of the row with the
rest of the information
The index data structure is sorted to optimize query efficiency
The query looks for the specific row in the index; the index refers to the pointer which will find the rest of the information.
The index reduces the number of rows the query has to search through from 17 to 4.

Indexing mysql table for selects

I'm looking to add some mysql indexes to a database table. Most of the queries are for selects. Would it be best to create a separate index for each column, or add an index for province, province and number, and name. Does it even make sense to index province since there are only about a dozen options?
select * from employees where province = 'ab' and number = 'v45g';
select * from employees where province = 'ab';
If the usage changed to more inserts should I remove all the indexes except for the number?
An index is a data structure that maps the values of a column into a fast searchable tree. This tree contains the index of rows which the DB can use to find rows fast. One thing to know, some DB engines read plus or minus a bunch of rows to take advantage of disk read ahead. So you may actually read 50 or 100 rows per index read, and not just one. Hence, if you access 30% of a table through an index, you may wind up reading all table data multiple times.
Rule of thumb:
- index the more unique values, a tree with 2 branches and half of your table on either side is not too useful for narrowing down a search
- use as few index as possible
- use real world examples numbers as much as possible. Performance can change dynamically based on data or the whim of the DB engine, so it's very important to try and track how fast your queries are running consistently (ie: log this in case a query ever gets slow). But from this data you can add indexes without being blind
Okay, so there are multiple kinds of index, single and multiple column. You want multiple indexes when it makes sense for indexes to access each other, multiple columns typically when you are refining with a where clause. Think of the first as good when you want joins, or you have "or" conditions. The second is better when you have and conditions and successively filter rows.
In your case name does not make sense since like does not use index. city and number do make sense, probably as a multi-column index. Province could help as well as the last index.
So an index with these columns would likely help:
(number,city,province)
Or try as well just:
(number,city)
You should index fields that are searched upon and have high selectivity / cardinality. Indexes make writes slower.
Other thing is that indexes can be added and dropped at any time so maybe you should let this for a later review of the database and optimization of querys.
That being said one index that you can be sure to add is in the column that holds the name of a person. That's almost always used in searching.
According to MySQL documentation found here:
You can create multiple column indexes and the first column mentioned in the index declaration uses index when searched alone but not the others.
Documentation also says that if you create a hash of the columns and save in another column and index the hashed column the search could be faster then multiple indexes.
SELECT * FROM tbl_name
WHERE hash_col=MD5(CONCAT(val1,val2))
AND col1=val1 AND col2=val2;
You could use an unique index on province.

MySQL: a huge table. can't query, even a simple select!

i have a table with about 200,000 records.
it takes a long time to do a simple select query. i am confiused because i am running under a 4 core cpu and 4GB of ram.
how should i write my query?
or is there anything to do with INDEXING?
important note: my table is static (it's data wont change).
what's your solutions?
PS
1 - my table has a primary key id
2 - my table has a unique key serial
3 - i want to query over the other fields like where param_12 not like '%I.S%'
or where param_13 = '1'
4 - 200,000 is not big and this is exactly why i am surprised.
5 - i even have problem when adding a simple field: my question
6 - can i create an INDEX for BOOL fields? (or is it usefull)
PS and thanks for answers
7 - my select shoudl return the fields that has specified 'I.S' or has not.
select * from `table` where `param_12` like '%I.S%'
this is all i want. it seems no Index helps here. ham?
Indexing will help. Please post table definition and select query.
Add index for all "=" columns in where clause.
Yes, you'll want/need to index this table and partitioning would be helpful as well. Doing this properly is something you will need to provide more information for. You'll want to use EXPLAIN PLAN and look over your queries to determine which columns and how you should index them.
Another aspect to consider is whether or not your table normalized. Normalized tables tend to give better performance due to lowered I/O.
I realize this is vague, but without more information that's about as specific as we can be.
BTW: a table of 200,000 rows is relatively small.
Here is another SO question you may find useful
1 - my table has a primary key id: Not really usefull unless you use some scheme which requires a numeric primary key
2 - my table has a unique key serial: The id is also unique by definition; why not use serial as the primary? This one is automatically indexed because you defined it as unique.
3 - i want to query over the other fields like where param_12 not like '%I.S%' or where param_13 = '1': A like '%something%' query can not really use an index; is there some way you can change param12 to param 12a which is the first %, and param12b which is 'I.S%'? An index can be used on a like statement if the starting string is known.
4 - 200,000 is not big and this is exactly why i am surprised: yep, 200.000 is not that much. But without good indexes, queries and/or cache size MySQL will need to read all data from disk for comparison, which is slow.
5 - i even have problem when adding a simple field: my question
6 - can i create an INDEX for BOOL fields? Yes you can, but an index which matches half of the time is fairly useless, an index is used to limit the amount of records MySQL has to load fully as much as possible; if an index does not dramatically limit that number, as is often the case with boolean (in a 50-50 distribution), using an index only requires more disk IO and can slow searching down. So unless you expect something like an 80-20 distribution or better creating an index will cost time, and not win time.
Index on param_13 might be used, but not the one on param_12 in this example, since the use of LIKE '% negate the index use.
If you're querying data with LIKE '%asdasdasd%' then no index can help you. It will have to do a full scan every time. The problem here is the leading % because that means that the substring you are looking for can be anywhere in the field - so it has to check it all.
Possibly you might look into full-text indexing, but depending on your needs that might not be appropriate.
Firstly, ensure your table have a primary key.
To answer in any more detail than that you'll need to provide more information about the structure of the table and the types of queries you are running.
I don't believe that the keys you have will help. You have to index on the columns used in WHERE clauses.
I'd also wonder if the LIKE requires table scans regardless of indexes. The minute you use a function like that you lose the value of the index, because you have to check each and every row.
You're right: 200K isn't a huge table. EXPLAIN PLAN will help here. If you see TABLE SCAN, redesign.

Optimizing single-row queries from large tables in MySQL

I am dealing with MySQL tables that are essentially results of raytracing simulations on a simulated office room with a single venetian blind. I usually need to retrieve the simulation's result for a unique combination of time and blind's settings. So I end up doing a lot of
SELECT result FROM results WHERE timestamp='2005-05-05 12:30:25' \
AND opening=40 AND slatangle=60
This looks suspiciously optimizable, since this query should never ever return more than one row. Does it make sense to define an index on the three columns that uniquely identify each row? Or are there other techniques I can use?
The answer is most definately a yes. If you define a unique index on timestamp, opening and slatangle MySQL should be able to find your row with very few disc seeks.
You might experiment with creating an index on timestamp, opening, slateangle and result. MySQL may be able to fetch your data from the index without touching the datafile at all.
The MySQL Manual has a section about optimzing queries.
I would suggest adding
LIMIT 1;
to the end of the query.
William
I wouldn't suggest adding 3 indexes. An index using all three columns may be better and even setting the primary key unique on that combination would be best - only if you're sure that it unique.
Yes, create an index of multiple columns helps. Also you should test the performance of different column order, ie O(c1, c2, c3) != O(c2, c1, c3)
Have a look
http://joekuan.wordpress.com/2009/01/23/mysql-optimize-your-query-to-be-more-scalable-part-12/
http://joekuan.wordpress.com/2009/01/23/mysql-optimize-your-query-to-be-more-scalable-part-22/