MySQL indexes and when to group them - mysql

I'm still trying to get my head around the best way to use INDEXES in MySQL. How do you know when to merge them together and when to have them separate?
Below are the indexes from the Wordpress posts table. See how post_name, post_parent and post_author are seperate entries? And then they have type_status_date which is a mixture of 4 fields?
http://img215.imageshack.us/img215/5976/screenshot20120426at431.png
I don't understand the logic behind this? Can anyone enlighten me?

Going to be a bit of a long answer but here we go. Please note I am not going to deal with the differences in database engines here(MyISAM and InnoDB have distinct way of implementing what I am trying to describe)
First thing you have to understand about a index is that it is a separate data structure stored on disk. Normally this is a b-tree data structure containing the column(s) that you have indexed and also contain a pointer to the row in the table(this pointer is normally the primary key).
The only index that is stored with the data is the primary key index. Thus a primary key index IS the table.
Lets assume you have following table definition.
CREATE TABLE `Student` (
`StudentNumber` INT NOT NULL ,
`Name` VARCHAR(32) NULL ,
`Surname` VARCHAR(32) NULL ,
`StudentEmail` VARCHAR(32) NULL ,
PRIMARY KEY (`StudentNumber`) );
Since we have a primary key on StudentID there will be a index containing the primary key and the other columns in the index. If you had to look at the data in the index you would probably see something like this.
1 , John ,Doe ,Jdoe#gmail.com
As you can see this is the table data once again showing you that the primary key index IS the table.
The StudentNumber column is indexed which allows your to effectively search on it the rest of the data is stored with the key. Thus if ran the following query:
SELECT * FROM Student WHERE StudentNumber=1
MySQL would use the primary index to quickly find the row and the read the data stored with the indexed column. Since there is a index MySQL can use the index to do a effective binary seek operation on the b-tree.
Also when it comes to retrieving the data after doing the search MySQL can read the data from the index thus we are using 1 operation in the index to retrieve the data. Now if I ran the following query:
SELECT * FROM Student WHERE Name ='Joe'
MySQL would check if there is a index that it could use to speed the query up. However in my case there is no index on name so MySQL would do a sequential read from the table one row at a time from the first row to the last.
At each row it would evaluate the row against the where clause and return matching row. So basically it reads the primary key index from top to bottom. Remember the primary key index is the table.
If I ran the following statement:
ALTER TABLE `TimLog`.`student`
ADD INDEX `ix_name` (`Name` ASC) ;
ALTER TABLE `TimLog`.`student`
ADD INDEX `ix_surname` (`Surname` ASC) ;
MySQL would create new indexes on the Student table. This will be stored away from the table on disk and the data inside would look something like this:
Data in ix_Name
John, 1 <--PRIMARY KEY VALUE
Data in ix_Surname
Doe, 1 <--PRIMARY KEY VALUE
Notice the data in the ix_Name index is the name and the primary key value. Great so if I ran the previous select statement MySQL would then read the ix_name index and get the primary key value for matching items and then use the primary key index to get the rest of the data.
So the number of operations to get the data from the index is 2. The matching rows are found in the index and then a lookup happens on the primary key to get the row data out.
You now have the following query:
SELECT * FROM Student WHERE Name='John' AND surname ='Doe'
Here MySQL cant use both indexes as it would be a waste of operations. If MySQL had to use both indexes in this query the following would happen(this should not happen).
1 Find in the ix_Name the rows with the value John
2 Read the primary key that matches to get the row data
3 Store the matching results
4 Find in the ix Surname the rows with the value Doe
5 Read the primary key that matches to get row data.
6 Store the matching results
7 Take the Name results and Surname results and merge them
8 Return query results.
This is really a waste of IO as MySQL would then read the table twice. Basically using one index would be better than trying to use two(I will explain in a momnet why). MySQL will choose 1 index to use in a this simple query.
So how does MySQL decide on which index to use?
MySQL keeps statistics around indexes internally. These statistics tell MySQL basically how unique a index is. So for the sake of argument lets say the surname index (ix_surname)was more unique than the name index(ix_name) MySQL would use the surname index (ix_surname).
Thus query retrieval would be like this:
1 Use the ix_surname and find rows that match the value Doe
2 Read the primary key and apply the filter for the value John on the actual column data in the row.
3 Return the matched row.
As you can see the number of operations in this search is much less. I have over simplified a lot of the technical detail. Indexing is a interesting thing to master but you have to look at it from the perspective of how do I get the data with the minimal amount of IO.
Hope it is as clear as mud now!

MySQL cannot normally use more than one index at a time. That means, for instance, that when you have a query that filters or sorts on two fields you put them both into the same index.
WordPress likely has a common query that filters and/or sorts on post_type, post_status and post_date. Making an educated guess as to what they stand for, this would likely be the core query for WordPress's Post listing pages. So the three fields are put into the same index.

Related

use Foreign key indexes in multi index mysql

I have A table with almost 20 fields which several of those are Foreign Key that already has been indexed by Mysql, now I want to create a multi-indexes index that it contains 3 FK field,
First tried was based on Fields
ALTER TABLE `Add`
Add INDEX `IX_Add_ON_IDCat_IDStatus_IDModeration_DateTo_DateAdded`
(`IDCategory`,`IDStatus`,`IDModeration`,`DateTo`,`DateAdded`);
But I think it's better to have an index on indexes instead of fields but my following effort faced with error: Error Code: 1072. Key column 'FK_Add_Category' doesn't exist in table
ALTER TABLE `Add`
Add INDEX `IX_Add_ON_IDCat_IDStatus_IDModeration_DateTo_DateAdded`
(`FK_Add_Category`,`FK_Add_AddStatus`,`FK_Add_AddModeration`,
`IX_Add_DateTo`,`IX_Add_DateAdded`);
My question is is it possible to add an index on exists Indexes ( FK index in my case ) or not and there is the only way to create an index on Columns? if yes How I create that?
An index is an ordered list of values. It is used to make it more efficient to find rows in the table.
Think about the common, real-life, example of INDEX(last_name, first_name). It makes it easy to look up someone if you have their last name and first name. And sort of easy if you have only their last name.
But it is useless if all you have is their first name.
FOREIGN KEYs necessitate a lookup. Apparently you have a FK to AddStatus, since I see FK_Add_AddStatus. That FK generated a lookup for AddStatus. Think of that as being like a separate index on first_name. It is totally separate from the index on last_name & first_name.
5 columns is usually too many to put into a single index.
MySQL uses only one index for a given SELECT.
So, now, I ask, what SELECT might use that 5-column index? Please show us it. We can discuss whether it is useful, and whether the columns are in the optimal order.

Best way to index a table with a unique multi-column?

I am creating a table which will store around 100million rows in MySQL 5.6 using InnoDB storage engine. This table will have a foreign key that will link to another table with around 5 million rows.
Current Table Structure:
`pid`: [Foreign key from another table]
`price`: [decimal(9,2)]
`date`: [date field]
and every pid should have only one record for a date
What is the best way to create indexes on this table?
Option #1: Create Primary index on two fields pid and date
Option #2: Add another column id with AUTO_INCREMENT and primary index and create a unique index on column pid and date
Or any other option?
Only select query i will be using on this table is:
SELECT pid,price,date FROM table WHERE pid = 123
Based on what you said (100M; the only query is...; InnoDB; etc):
PRIMARY KEY(pid, date);
and no other indexes
Some notes:
Since it is InnoDB, all the rest of the fields are "clustered" with the PK, so a lookup by pid is acts as if price were part of the PK. Also WHERE pid=123 ORDER BY date would be very efficient.
No need for INDEX(pid, date, price)
Adding an AUTO_INCREMENT gains nothing (except a hint of ordering). If you needed ordering, then an index starting with date might be best.
Extra indexes slow down inserts. Especially UNIQUE ones.
Either method is fine. I prefer having synthetic primary keys (that is, the auto-incremented version with the additional unique index). I find that this is useful for several reasons:
You can have a foreign key relationship to the table.
You have an indicator of the order of insertion.
You can change requirements, so if some pids allows two values per day or only one per week, then the table can support them.
That said, there is additional overhead for such a column. This overhead adds space and a small amount of time when you are accessing the data. You have a pretty large table, so you might want to avoid this additional effort.
I would try with an index that attempts to cover the query, in the hope that MySQL has to access to the index only in order to get the result set.
ALTER TABLE `table` ADD INDEX `pid_date_price` (`pid` , `date`, `price`);
or
ALTER TABLE `table` ADD INDEX `pid_price_date` (`pid` , `price`, `date`);
Choose the first one if you think you may need to select applying conditions over pid and date in the future, or the second one if you think the conditions will be most probable over pid and price.
This way, the index has all the data the query needs (pid, price and date) and its indexing on the right column (pid)
By the way, always use EXPLAIN to see if the query planner will really use the whole index (take a look at the key and keylen outputs)

Do i need separate index in addition to primary key in a MySQL table?

My Table Schema is
CREATE TABLE ITEMS(Time , Name, Token) PRIMARY_KEY(Time, NAME).
Where Time is the timestamp the item is created. When i do the following query
SELECT Name, Token from ITEMS where name = shoes
it takes a while to load the data as my table has more than million rows.
Should i need to add INDEX for faster retrieval of data? I already have an INDEX for this table as there is a PRIMARY KEY.
You need a separate index for name. The primary key index can handle name, but only in conjunction with time.
If you defined it instead as:
PRIMARY_KEY(Name, Time)
Then your query could take advantage of the index.
MySQL has pretty good documentation on composite indexes here.
When you create index using PRIMARY_KEY(Time, NAME), these values will be concatenated. There is no way for MySQL to use this index to search by NAME.
BTW, you may get lot of useful hints from query optimiser if you use EXPLAN keyword in front of your query like this:
EXPLAIN SELECT Name, Token from ITEMS where name = shoes
Keep your eye on output marked "where". This tells how many records MySQL needs to fetch and examine manually after all indexes are exhausted. No need to wait or test in blind.

Does using "LIMIT 1" speed up a query on a primary key?

If I have a primary key of say id and I do a simple query for the key such as,
SELECT id FROM myTable WHERE id = X
Will it find one row and then stop looking as it is a primary key, or would it be better to tell mysql to limit its select by using LIMIT 1? For instance:
SELECT id FROM myTable WHERE id = X LIMIT 1
Does using “LIMIT 1” speed up a query on a primary key?
No. It's already as fast as can be without LIMIT 1. LIMIT 1 is effectively implied anyway.
Will it find one row and then stop looking as it is a primary key
Yes.
No table scan should be necessary at all here: it's a key-based lookup. The matching row is found and that's the end of the procedure.
There is no need to worry about this sort of "optimization".
Only one row will be fetched -- per unique index contract -- and the database is able to very quickly find all the (1) rows. It is able to do this because of the underlying structures that back the index (or primary key) support fast-seeking by value. There is no table-scan involved. (Generally a variant of a B-tree, but it might be hash-based, etc, is used. I suppose a smart query optimizer might also be able to pass down additional hints based on the unique constraint in effect, but I don't know enough about this.)

MYSQL: Index keyword in create table and when to use it

What does index keyword mean and what function it serves? I understand that it is meant to speed up querying, but I am not very sure how this can be done.
When how to choose the column to be indexed?
A sample of index keyword usage is shown below in create table query:
CREATE TABLE `blog_comment`
(
`id` INTEGER NOT NULL AUTO_INCREMENT,
`blog_post_id` INTEGER,
`author` VARCHAR(255),
`email` VARCHAR(255),
`body` TEXT,
`created_at` DATETIME,
PRIMARY KEY (`id`),
INDEX `blog_comment_FI_1` (`blog_post_id`),
CONSTRAINT `blog_comment_FK_1`
FOREIGN KEY (`blog_post_id`)
REFERENCES `blog_post` (`id`)
)Type=MyISAM
;
I'd recommend reading How MySQL Uses Indexes from the MySQL Reference Manual. It states that indexes are used...
To find the rows matching a WHERE clause quickly.
To eliminate rows from consideration.
To retrieve rows from other tables when performing joins.
To find the MIN() or MAX() value for a specific indexed column.
To sort or group a table (under certain conditions).
To optimize queries using only indexes without consulting the data rows.
Indexes in a database work like an index in a book. You can find what you're looking for in an book quicker, because the index is listed alphabetically. Instead of an alphabetical list, MySQL uses B-trees to organize its indexes, which is quicker for its purposes (but would take a lot longer for a human).
Using more indexes means using up more space (as well as the overhead of maintaining the index), so it's only really worth using indexes on columns that fulfil the above usage criteria.
In your example, the id and blog_post_id columns both uses indexes (PRIMARY KEY is an index too) so that the application can find them quicker. In the case of id, it is likely that this allows users to modify or delete a comment quickly, and in the case of blog_post_id, so the application can quickly find all comments for a given post.
You'll notice that there is no index for the email column. This means that searching for all blog posts by a particular e-mail address would probably take quite a long time. If searching for all comments by a particular e-mail address is something you'd want to add, it might make sense to add an index to that too.
This keyword means that you are creating an index on column blog_post_id along with the table.
Queries like that:
SELECT *
FROM blog_comment
WHERE blog_post_id = #id
will use this index to search on this field and run faster.
Also, there is a foreign key on this column.
When you decide to delete a blog post, the database will need check against this table to see there are no orphan comments. The index will also speed up this check, so queries like
DELETE
FROM blog_post
WHERE ...
will also run faster.