I have several tables and I'm wondering if my composite index is helpful or not. I am using MySQL 5+ but I guess this would apply to any database (or not?).
Anyway, say I the following table:
username active
-----------------------------------
Moe.Howard 1
Larry.Fine 0
Shemp.Howard 1
So I normally select like:
select * from users where username = 'shemp.howard' and active = 1;
The active=1 is used in many of our tables. Normally, my index would be on the username column but I'm thinking of added the active flag as well (to the same index).
My logic is that as the query engine is scanning through the index, it would be scanning against an index like:
moe.howard,1
shemp.howard,1
larry.fine,0
and find Shemp before it hits the inactive users (Larry).
Now, our active columns are usually TINYINTS and Unsigned. But I'm concerned the index might be backward!
larry.fine,0
moe.howard,1
shemp.howard,1
How should I best handle this and make sure my indexes are correct? Should I not add the active column to the same index as username? Or should I create a separate index for the active and make it descending?
Thanks.
If you combine those two fields in a composite index with the active flag as the second part of the key, then the index order will only depend on that value when (iff) the name field for two or more rows is identical (which seems unlikely in this situation based on the assumption that one would want user names in a system to be unique). The first key in the composite index will define the order of the keys whenever they are different. In other words, if the user name is unique, then adding the active flag as the second segment of a composite index will not change the order of the index.
Also, note that for the example query, the database won't "scan" the index to find the value. Rather it will seek to the first matching entry, which in the example given consists of a single match. The "scan" would happen if multiple entries pass the WHERE clause.
Having said that, unless there are lots of cases where you have duplicate names, my initial reaction would be to not create the composite key. If the names are "generally" unique, then you would not be buying a lot of savings with the composite key. On the other hand, if there are generally quite a few duplicate names with differing active flag values, it could help. At that point, you may need to just test.
Really we can only second guess what the query optimiser will try and do, however it is commonly recommended that if the selectivity of an index over 20% then a full table scan is preferable over an index access. This would mean it is very likely that even if you index active an index won't actually be used asuming you have many more active than non-active users.
MySQL can only use the index in order, so if you create a composite index of username,active that is entirely pointless as you're not going to have multiple users with the same username.
You really need to analyse your query requirements and then you can design an indexing plan to suite them. Profile each query and don't try to over optimize everything as this can have a negative result.
An index should be added only if the values you expect it to help you filter in/out are representative, statistically speaking.
What does that mean?
If say, the filter in your WHERE clause, on the column you're indexing, is helping you out retrieving 20% of the rows, you should add an index in it. This percent number depends on your special case and should be tryed out but that's the idea.
In your case, just by the name, you would have 100% of exclusion. Adding an index on the active column would be then useless because it wouldn't help reducing the final recordset (except if you have possibly n times the same name but only one active?)
The situation would be different if you decided to filter ONLY active users, not caring about the name.
Related
I am really interested in how MySQL indexes work, more specifically, how can they return the data requested without scanning the entire table?
It's off-topic, I know, but if there is someone who could explain this to me in detail, I would be very, very thankful.
Basically an index on a table works like an index in a book (that's where the name came from):
Let's say you have a book about databases and you want to find some information about, say, storage. Without an index (assuming no other aid, such as a table of contents) you'd have to go through the pages one by one, until you found the topic (that's a full table scan).
On the other hand, an index has a list of keywords, so you'd consult the index and see that storage is mentioned on pages 113-120,231 and 354. Then you could flip to those pages directly, without searching (that's a search with an index, somewhat faster).
Of course, how useful the index will be, depends on many things - a few examples, using the simile above:
if you had a book on databases and indexed the word "database", you'd see that it's mentioned on pages 1-59,61-290, and 292 to 400. In such case, the index is not much help and it might be faster to go through the pages one by one (in a database, this is "poor selectivity").
For a 10-page book, it makes no sense to make an index, as you may end up with a 10-page book prefixed by a 5-page index, which is just silly - just scan the 10 pages and be done with it.
The index also needs to be useful - there's generally no point to index e.g. the frequency of the letter "L" per page.
The first thing you must know is that indexes are a way to avoid scanning the full table to obtain the result that you're looking for.
There are different kinds of indexes and they're implemented in the storage layer, so there's no standard between them and they also depend on the storage engine that you're using.
InnoDB and the B+Tree index
For InnoDB, the most common index type is the B+Tree based index, that stores the elements in a sorted order. Also, you don't have to access the real table to get the indexed values, which makes your query return way faster.
The "problem" about this index type is that you have to query for the leftmost value to use the index. So, if your index has two columns, say last_name and first_name, the order that you query these fields matters a lot.
So, given the following table:
CREATE TABLE person (
last_name VARCHAR(50) NOT NULL,
first_name VARCHAR(50) NOT NULL,
INDEX (last_name, first_name)
);
This query would take advantage of the index:
SELECT last_name, first_name FROM person
WHERE last_name = "John" AND first_name LIKE "J%"
But the following one would not
SELECT last_name, first_name FROM person WHERE first_name = "Constantine"
Because you're querying the first_name column first and it's not the leftmost column in the index.
This last example is even worse:
SELECT last_name, first_name FROM person WHERE first_name LIKE "%Constantine"
Because now, you're comparing the rightmost part of the rightmost field in the index.
The hash index
This is a different index type that unfortunately, only the memory backend supports. It's lightning fast but only useful for full lookups, which means that you can't use it for operations like >, < or LIKE.
Since it only works for the memory backend, you probably won't use it very often. The main case I can think of right now is the one that you create a temporary table in the memory with a set of results from another select and perform a lot of other selects in this temporary table using hash indexes.
If you have a big VARCHAR field, you can "emulate" the use of a hash index when using a B-Tree, by creating another column and saving a hash of the big value on it. Let's say you're storing a url in a field and the values are quite big. You could also create an integer field called url_hash and use a hash function like CRC32 or any other hash function to hash the url when inserting it. And then, when you need to query for this value, you can do something like this:
SELECT url FROM url_table WHERE url_hash=CRC32("http://gnu.org");
The problem with the above example is that since the CRC32 function generates a quite small hash, you'll end up with a lot of collisions in the hashed values. If you need exact values, you can fix this problem by doing the following:
SELECT url FROM url_table
WHERE url_hash=CRC32("http://gnu.org") AND url="http://gnu.org";
It's still worth to hash things even if the collision number is high cause you'll only perform the second comparison (the string one) against the repeated hashes.
Unfortunately, using this technique, you still need to hit the table to compare the url field.
Wrap up
Some facts that you may consider every time you want to talk about optimization:
Integer comparison is way faster than string comparison. It can be illustrated with the example about the emulation of the hash index in InnoDB.
Maybe, adding additional steps in a process makes it faster, not slower. It can be illustrated by the fact that you can optimize a SELECT by splitting it into two steps, making the first one store values in a newly created in-memory table, and then execute the heavier queries on this second table.
MySQL has other indexes too, but I think the B+Tree one is the most used ever and the hash one is a good thing to know, but you can find the other ones in the MySQL documentation.
I highly recommend you to read the "High Performance MySQL" book, the answer above was definitely based on its chapter about indexes.
Basically an index is a map of all your keys that is sorted in order. With a list in order, then instead of checking every key, it can do something like this:
1: Go to middle of list - is higher or lower than what I'm looking for?
2: If higher, go to halfway point between middle and bottom, if lower, middle and top
3: Is higher or lower? Jump to middle point again, etc.
Using that logic, you can find an element in a sorted list in about 7 steps, instead of checking every item.
Obviously there are complexities, but that gives you the basic idea.
Take a look at this link: http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html
How they work is too broad of a subject to cover in one SO post.
Here is one of the best explanations of indexes I have seen. Unfortunately it is for SQL Server and not MySQL. I'm not sure how similar the two are...
In MySQL InnoDB, there are two types of index.
Primary key which is called clustered index. Index key words are stored with
real record data in the B+Tree leaf node.
Secondary key which is non clustered index. These index only store primary key's key words along with their own index key words in the B+Tree leaf node. So when searching from secondary index, it will first find its primary key index key words and scan the primary key B+Tree to find the real data records. This will make secondary index slower compared to primary index search. However, if the select columns are all in the secondary index, then no need to look up primary index B+Tree again. This is called covering index.
Take at this videos for more details about Indexing
Simple Indexing
You can create a unique index on a table. A unique index means that two rows cannot have the same index value. Here is the syntax to create an Index on a table
CREATE UNIQUE INDEX index_name
ON table_name ( column1, column2,...);
You can use one or more columns to create an index. For example, we can create an index on tutorials_tbl using tutorial_author.
CREATE UNIQUE INDEX AUTHOR_INDEX
ON tutorials_tbl (tutorial_author)
You can create a simple index on a table. Just omit UNIQUE keyword from the query to create simple index. Simple index allows duplicate values in a table.
If you want to index the values in a column in descending order, you can add the reserved word DESC after the column name.
mysql> CREATE UNIQUE INDEX AUTHOR_INDEX
ON tutorials_tbl (tutorial_author DESC)
Adding some visual representation to the list of answers.
MySQL uses an extra layer of indirection: secondary index records point to primary index records, and the primary index itself holds the on-disk row locations. If a row offset changes, only the primary index needs to be updated.
Caveat: Disk data structure looks flat in the diagram but actually is a
B+ tree.
Source: link
I want to add my 2 cents. I am far from being a database expert, but I've recently read up a bit on this topic; enough for me to try and give an ELI5. So, here's may layman's explanation.
I understand it as such that an index is like a mini-mirror of your table, pretty much like an associative array. If you feed it with a matching key then you can just jump to that row in one "command".
But if you didn't have that index / array, the query interpreter must use a for-loop to go through all rows and check for a match (the full-table scan).
Having an index has the "downside" of extra storage (for that mini-mirror), in exchange for the "upside" of looking up content faster.
Note that (in dependence of your db engine) creating primary, foreign or unique keys automatically sets up a respective index as well. That same principle is basically why and how those keys work.
Let's suppose you have a book, probably a novel, a thick one with lots of things to read, hence lots of words.
Now, hypothetically, you brought two dictionaries, consisting of only words that are only used, at least one time in the novel. All words in that two dictionaries are stored in typical alphabetical order. In hypothetical dictionary A, words are printed only once while in hypothetical dictionary B words are printed as many numbers of times it is printed in the novel. Remember, words are sorted alphabetically in both the dictionaries.
Now you got stuck at some point while reading a novel and need to find the meaning of that word from anyone of those hypothetical dictionaries. What you will do? Surely you will jump to that word in a few steps to find its meaning, rather look for the meaning of each of the words in the novel, from starting, until you reach that bugging word.
This is how the index works in SQL. Consider Dictionary A as PRIMARY INDEX, Dictionary B as KEY/SECONDARY INDEX, and your desire to get for the meaning of the word as a QUERY/SELECT STATEMENT.
The index will help to fetch the data at a very fast rate. Without an index, you will have to look for the data from the starting, unnecessarily time-consuming costly task.
For more about indexes and types, look this.
Indexes are used to find rows with specific column values quickly. Without an index, MySQL must begin with the first row and then read through the entire table to find the relevant rows. The larger the table, the more this costs. If the table has an index for the columns in question, MySQL can quickly determine the position to seek to in the middle of the data file without having to look at all the data. This is much faster than reading every row sequentially.
Indexing adds a data structure with columns for the search conditions and a pointer
The pointer is the address on the memory disk of the row with the
rest of the information
The index data structure is sorted to optimize query efficiency
The query looks for the specific row in the index; the index refers to the pointer which will find the rest of the information.
The index reduces the number of rows the query has to search through from 17 to 4.
I'm looking to add some mysql indexes to a database table. Most of the queries are for selects. Would it be best to create a separate index for each column, or add an index for province, province and number, and name. Does it even make sense to index province since there are only about a dozen options?
select * from employees where province = 'ab' and number = 'v45g';
select * from employees where province = 'ab';
If the usage changed to more inserts should I remove all the indexes except for the number?
An index is a data structure that maps the values of a column into a fast searchable tree. This tree contains the index of rows which the DB can use to find rows fast. One thing to know, some DB engines read plus or minus a bunch of rows to take advantage of disk read ahead. So you may actually read 50 or 100 rows per index read, and not just one. Hence, if you access 30% of a table through an index, you may wind up reading all table data multiple times.
Rule of thumb:
- index the more unique values, a tree with 2 branches and half of your table on either side is not too useful for narrowing down a search
- use as few index as possible
- use real world examples numbers as much as possible. Performance can change dynamically based on data or the whim of the DB engine, so it's very important to try and track how fast your queries are running consistently (ie: log this in case a query ever gets slow). But from this data you can add indexes without being blind
Okay, so there are multiple kinds of index, single and multiple column. You want multiple indexes when it makes sense for indexes to access each other, multiple columns typically when you are refining with a where clause. Think of the first as good when you want joins, or you have "or" conditions. The second is better when you have and conditions and successively filter rows.
In your case name does not make sense since like does not use index. city and number do make sense, probably as a multi-column index. Province could help as well as the last index.
So an index with these columns would likely help:
(number,city,province)
Or try as well just:
(number,city)
You should index fields that are searched upon and have high selectivity / cardinality. Indexes make writes slower.
Other thing is that indexes can be added and dropped at any time so maybe you should let this for a later review of the database and optimization of querys.
That being said one index that you can be sure to add is in the column that holds the name of a person. That's almost always used in searching.
According to MySQL documentation found here:
You can create multiple column indexes and the first column mentioned in the index declaration uses index when searched alone but not the others.
Documentation also says that if you create a hash of the columns and save in another column and index the hashed column the search could be faster then multiple indexes.
SELECT * FROM tbl_name
WHERE hash_col=MD5(CONCAT(val1,val2))
AND col1=val1 AND col2=val2;
You could use an unique index on province.
So let's say for example I have 2 tables: Users > Items
Users can have favorite Items, and a Item can have multiple users that see it as a favorite, so I'll be using a linking table.
Now my linking table would contain something like:
id (int 11 AI)
user_id (int 11)
item_id (int 11)
Now would it be necessary / usefull to put a index on user_id and item_id since this table will contain a lot of records over time.
I'm not a 100% sure when to use indexes. My idea of when to use them(Might be completely incorrect though) is when you have big database and need to search/filter on a column then you index it. If this is incorrect I'm sorry, it's just what I've always been told.
Basically, yes, that's how it goes.
In this case, I'd say that an index on the user_id column would be useful, because you will display to the user a list of their favorites, right?
An index on the item_id might be less useful, because I doubt you're going to display a list of users that have favorited a specific item. Although you might care about the count ("100 users like this item"), so you might add that index after all. Or you might de-normalize and keep the count in the items table. That would give a better performance, although you'll need to write extra code to maintain that number.
Last but not least - in a link table, you can do away with the id column. Just add the primary key index on both columns (user_id and item_id in that order). This will make sure that you cannot enter duplicate rows, and since user_id is the first column in the index, you'll be able to use it in search queries. No need anymore to add a separate index on just the user_id column.
However this also depends on the code you're using. If you're using some kind of framework (ORM?) that REQUIRES an id column for every table, then this trick is useless.
As requested by the author, here's a quick intro on what indexes are.
Suppose you have a DB table which is just a bunch of rows in no particular order. Let's say we have a table people with the columns name, surname, age.
Now, when you want to find the age for John Smith you probably make a query like this:
select age from people where name='John' and surname='Smith'
When you do this, the DB engine can do only one thing - it has to go through ALL the rows and look for the ones that match. If there's 100,000 rows, it will be slow.
Now there's a faster way of doing this. Think about a phonebook (the classical paper edition). On it's thousand yellow pages there are phone numbers for hundreds of people. Yet you can find the number you seek very quickly even if you're a human being. That's because the numbers are sorted alphabetically by name and surname. You open a random page and you can immediately see whether the number you're looking for is before or after the page you opened. Repeat a couple of times and you've found it.
This kind of searching is called a "binary search". Your DB engine could do this too, if the records were sorted by name and surname. So this is what a Primary Key is - it tells the DB to store the records not in some random order, but sorted by some columns. When a new record comes, it can quickly find its rightful place and push it in there, thus keeping the table forever sorted.
There are a few things to note here already.
First, you can make it sort by one or more columns, but, just like in a phonebook, the order is important. If you sort by name first and then by surname, then that's the order the records will be in. So you'll be able to quickly find all the records where name='John' or name='John' and surname='Smith', but it won't help you at all if you need to find just surname='Smith'. Just like in a phonebook.
Second, pushing a record somewhere in the middle is also somewhat slow. Not criminally so, but still. Appending a record at the end is faster. Therefore people tend to use auto_increment columns for their Primary Keys, because then every new row will be placed at the end.
Third, in most DBs Primary Key is not only also used to search quickly, but also uniquely identify the row. Which means that the DB will not be happy if there are two rows that have equal values for the Primary Key columns. In that case, it cannot determine which has to go first, and which last, and it's also not unique. Another reason to use auto_increment. Note that if the PK index has multiple columns in it, then their combination must be unique - every column individually may be non-unique. In our case that means that there can be many Johns and many Smiths, but only one John Smith.
But we still have a problem. What if we want to quickly find rows both by just the name, and just the surname? A PK index can only do one of those things, not both at the same time.
This is where other non-PK indexes come in play. You can add as many of those as you want to the table. In our case, we could create another index to hold just the surname column.
When we do so, the DB creates another hidden table (OK, not true, but you can think of it this way) which is a copy of the original table, but only with the surname column and a special link back to the rows in the original table. This hidden index table is sorted by the surname column. So when you now need to find a row by specifying just the surname, the DB engine can look it up in the hidden index table, and then follow the links back to the original rows and get the data from them. Much faster.
These non-PK indexes also typically come in a few flavors. There's the standard "index" which places no restrictions at all - you can have duplicate values in the columns, nulls, etc. There's a "unique" index, which enforces that all the values in the index need to be unique; and then there are sometimes speciality indexes like FullText, Spatial, etc. Indexes also tend to have some technical options, but you'll have to read the documentation of your DB for those.
One last important thing to note is - indexes make it fast to find things in a table, but they come at a cost. Modifications to the table (insert, update, delete) become slower, because the indexes need to be updated as well. Keep that in mind and only add them where necessary.
Except for Primary Keys. ALWAYS add Primary Keys. That's an order! :)
In short, yes.
Imagine how well joins would work if, each time you needed to match a primary key value to a foreign key in another table, the DBMS had to search the entire table for the matching keys.
I am storing the match odds of sports matches/events in a MYSQl table.
The structure is as follows:
odds_id
match_id
outcome_id
bookmaker_id
odds
date
I can uniquely identify each row by match_id, outcome_id and bookmaker_id. So do I actually need the odds_id. For purposes of order and logic it seems as if I should have one, but the odds_id is meaningless to the user. The table will store a high volume of rows and although the odds_id is set to be auto increment if any odds are deleted the sequence will be thrown.
Will the presence of this index impact upon performance or will it be negligible and therefore ok to leave it in place.
Thank you in advance.
Alan.
If you don't use this table in any joins or whatever, and you usually need the combination between match outcome and bookmaker you can cut the odds index and add an index on the combination (well... not the best word but can't think at any other) of the 3 columns.
If you need to join this table with something else by using the odds_id... you should index it separately.
About the autoincrement... you can set it only on the primary key, witch is already an index. So if you keep the AI you add the index regardless of what you want.
The index impact is almost never negligible so is better to do it right.
I am really interested in how MySQL indexes work, more specifically, how can they return the data requested without scanning the entire table?
It's off-topic, I know, but if there is someone who could explain this to me in detail, I would be very, very thankful.
Basically an index on a table works like an index in a book (that's where the name came from):
Let's say you have a book about databases and you want to find some information about, say, storage. Without an index (assuming no other aid, such as a table of contents) you'd have to go through the pages one by one, until you found the topic (that's a full table scan).
On the other hand, an index has a list of keywords, so you'd consult the index and see that storage is mentioned on pages 113-120,231 and 354. Then you could flip to those pages directly, without searching (that's a search with an index, somewhat faster).
Of course, how useful the index will be, depends on many things - a few examples, using the simile above:
if you had a book on databases and indexed the word "database", you'd see that it's mentioned on pages 1-59,61-290, and 292 to 400. In such case, the index is not much help and it might be faster to go through the pages one by one (in a database, this is "poor selectivity").
For a 10-page book, it makes no sense to make an index, as you may end up with a 10-page book prefixed by a 5-page index, which is just silly - just scan the 10 pages and be done with it.
The index also needs to be useful - there's generally no point to index e.g. the frequency of the letter "L" per page.
The first thing you must know is that indexes are a way to avoid scanning the full table to obtain the result that you're looking for.
There are different kinds of indexes and they're implemented in the storage layer, so there's no standard between them and they also depend on the storage engine that you're using.
InnoDB and the B+Tree index
For InnoDB, the most common index type is the B+Tree based index, that stores the elements in a sorted order. Also, you don't have to access the real table to get the indexed values, which makes your query return way faster.
The "problem" about this index type is that you have to query for the leftmost value to use the index. So, if your index has two columns, say last_name and first_name, the order that you query these fields matters a lot.
So, given the following table:
CREATE TABLE person (
last_name VARCHAR(50) NOT NULL,
first_name VARCHAR(50) NOT NULL,
INDEX (last_name, first_name)
);
This query would take advantage of the index:
SELECT last_name, first_name FROM person
WHERE last_name = "John" AND first_name LIKE "J%"
But the following one would not
SELECT last_name, first_name FROM person WHERE first_name = "Constantine"
Because you're querying the first_name column first and it's not the leftmost column in the index.
This last example is even worse:
SELECT last_name, first_name FROM person WHERE first_name LIKE "%Constantine"
Because now, you're comparing the rightmost part of the rightmost field in the index.
The hash index
This is a different index type that unfortunately, only the memory backend supports. It's lightning fast but only useful for full lookups, which means that you can't use it for operations like >, < or LIKE.
Since it only works for the memory backend, you probably won't use it very often. The main case I can think of right now is the one that you create a temporary table in the memory with a set of results from another select and perform a lot of other selects in this temporary table using hash indexes.
If you have a big VARCHAR field, you can "emulate" the use of a hash index when using a B-Tree, by creating another column and saving a hash of the big value on it. Let's say you're storing a url in a field and the values are quite big. You could also create an integer field called url_hash and use a hash function like CRC32 or any other hash function to hash the url when inserting it. And then, when you need to query for this value, you can do something like this:
SELECT url FROM url_table WHERE url_hash=CRC32("http://gnu.org");
The problem with the above example is that since the CRC32 function generates a quite small hash, you'll end up with a lot of collisions in the hashed values. If you need exact values, you can fix this problem by doing the following:
SELECT url FROM url_table
WHERE url_hash=CRC32("http://gnu.org") AND url="http://gnu.org";
It's still worth to hash things even if the collision number is high cause you'll only perform the second comparison (the string one) against the repeated hashes.
Unfortunately, using this technique, you still need to hit the table to compare the url field.
Wrap up
Some facts that you may consider every time you want to talk about optimization:
Integer comparison is way faster than string comparison. It can be illustrated with the example about the emulation of the hash index in InnoDB.
Maybe, adding additional steps in a process makes it faster, not slower. It can be illustrated by the fact that you can optimize a SELECT by splitting it into two steps, making the first one store values in a newly created in-memory table, and then execute the heavier queries on this second table.
MySQL has other indexes too, but I think the B+Tree one is the most used ever and the hash one is a good thing to know, but you can find the other ones in the MySQL documentation.
I highly recommend you to read the "High Performance MySQL" book, the answer above was definitely based on its chapter about indexes.
Basically an index is a map of all your keys that is sorted in order. With a list in order, then instead of checking every key, it can do something like this:
1: Go to middle of list - is higher or lower than what I'm looking for?
2: If higher, go to halfway point between middle and bottom, if lower, middle and top
3: Is higher or lower? Jump to middle point again, etc.
Using that logic, you can find an element in a sorted list in about 7 steps, instead of checking every item.
Obviously there are complexities, but that gives you the basic idea.
Take a look at this link: http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html
How they work is too broad of a subject to cover in one SO post.
Here is one of the best explanations of indexes I have seen. Unfortunately it is for SQL Server and not MySQL. I'm not sure how similar the two are...
In MySQL InnoDB, there are two types of index.
Primary key which is called clustered index. Index key words are stored with
real record data in the B+Tree leaf node.
Secondary key which is non clustered index. These index only store primary key's key words along with their own index key words in the B+Tree leaf node. So when searching from secondary index, it will first find its primary key index key words and scan the primary key B+Tree to find the real data records. This will make secondary index slower compared to primary index search. However, if the select columns are all in the secondary index, then no need to look up primary index B+Tree again. This is called covering index.
Take at this videos for more details about Indexing
Simple Indexing
You can create a unique index on a table. A unique index means that two rows cannot have the same index value. Here is the syntax to create an Index on a table
CREATE UNIQUE INDEX index_name
ON table_name ( column1, column2,...);
You can use one or more columns to create an index. For example, we can create an index on tutorials_tbl using tutorial_author.
CREATE UNIQUE INDEX AUTHOR_INDEX
ON tutorials_tbl (tutorial_author)
You can create a simple index on a table. Just omit UNIQUE keyword from the query to create simple index. Simple index allows duplicate values in a table.
If you want to index the values in a column in descending order, you can add the reserved word DESC after the column name.
mysql> CREATE UNIQUE INDEX AUTHOR_INDEX
ON tutorials_tbl (tutorial_author DESC)
Adding some visual representation to the list of answers.
MySQL uses an extra layer of indirection: secondary index records point to primary index records, and the primary index itself holds the on-disk row locations. If a row offset changes, only the primary index needs to be updated.
Caveat: Disk data structure looks flat in the diagram but actually is a
B+ tree.
Source: link
I want to add my 2 cents. I am far from being a database expert, but I've recently read up a bit on this topic; enough for me to try and give an ELI5. So, here's may layman's explanation.
I understand it as such that an index is like a mini-mirror of your table, pretty much like an associative array. If you feed it with a matching key then you can just jump to that row in one "command".
But if you didn't have that index / array, the query interpreter must use a for-loop to go through all rows and check for a match (the full-table scan).
Having an index has the "downside" of extra storage (for that mini-mirror), in exchange for the "upside" of looking up content faster.
Note that (in dependence of your db engine) creating primary, foreign or unique keys automatically sets up a respective index as well. That same principle is basically why and how those keys work.
Let's suppose you have a book, probably a novel, a thick one with lots of things to read, hence lots of words.
Now, hypothetically, you brought two dictionaries, consisting of only words that are only used, at least one time in the novel. All words in that two dictionaries are stored in typical alphabetical order. In hypothetical dictionary A, words are printed only once while in hypothetical dictionary B words are printed as many numbers of times it is printed in the novel. Remember, words are sorted alphabetically in both the dictionaries.
Now you got stuck at some point while reading a novel and need to find the meaning of that word from anyone of those hypothetical dictionaries. What you will do? Surely you will jump to that word in a few steps to find its meaning, rather look for the meaning of each of the words in the novel, from starting, until you reach that bugging word.
This is how the index works in SQL. Consider Dictionary A as PRIMARY INDEX, Dictionary B as KEY/SECONDARY INDEX, and your desire to get for the meaning of the word as a QUERY/SELECT STATEMENT.
The index will help to fetch the data at a very fast rate. Without an index, you will have to look for the data from the starting, unnecessarily time-consuming costly task.
For more about indexes and types, look this.
Indexes are used to find rows with specific column values quickly. Without an index, MySQL must begin with the first row and then read through the entire table to find the relevant rows. The larger the table, the more this costs. If the table has an index for the columns in question, MySQL can quickly determine the position to seek to in the middle of the data file without having to look at all the data. This is much faster than reading every row sequentially.
Indexing adds a data structure with columns for the search conditions and a pointer
The pointer is the address on the memory disk of the row with the
rest of the information
The index data structure is sorted to optimize query efficiency
The query looks for the specific row in the index; the index refers to the pointer which will find the rest of the information.
The index reduces the number of rows the query has to search through from 17 to 4.