Need a little clarification on MySQL Indexes - mysql

I've been thinking about my database indexes lately, in the past I just kind of non-chalantly threw them in as an afterthought, and never really put much thought into if they are correct or even helping. I've read conflicting information, some say that more indexes are better and others that too many indexes are bad, so I'm hoping to get some clarification and learn a bit here.
Let's say I have this hypothetical table:
CREATE TABLE widgets (
widget_id INT UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT,
widget_name VARCHAR(50) NOT NULL,
widget_part_number VARCHAR(20) NOT NULL,
widget_price FLOAT NOT NULL,
widget_description TEXT NOT NULL
);
I would typically add an index for fields that will be joined and fields that will be sorted on most often:
ALTER TABLE widgets ADD INDEX widget_name_index(widget_name);
So now, in a query such as:
SELECT w.* FROM widgets AS w ORDER BY w.widget_name ASC
The widget_name_index is used to sort the resultset.
Now if I add a search parameter:
SELECT w.* FROM widgets AS w
WHERE w.widget_price > 100.00
ORDER BY w.widget_name ASC
I guess I need a new index.
ALTER TABLE widgets ADD INDEX widget_price_index(widget_price);
But, will it use both indexes? As I understand it it won't...
ALTER TABLE widgets ADD INDEX widget_price_name_index(widget_price, widget_name);
Now widget_price_name_index will be used to both select and order the records. But what if I want to turn it around and do this:
SELECT w.* FROM widgets AS w
WHERE w.widget_name LIKE '%foobar%'
ORDER BY w.widget_price ASC
Will widget_price_name_index be used for this? Or do I need a widget_name_price_index also?
ALTER TABLE widgets ADD INDEX widget_name_price_index(widget_name, widget_price);
Now what if I have a search box that searches widget_name, widget_part_number and widget_description?
ALTER TABLE widgets
ADD INDEX widget_search(widget_name, widget_part_number, widget_description);
And what if end users can sort by any column? It's easy to see how I could end up with more than a dozen indexes for a mere 5 columns.
If we add another table:
CREATE TABLE specials (
special_id INT UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT,
widget_id INT UNSIGNED NOT NULL,
special_title VARCHAR(100) NOT NULL,
special_discount FLOAT NOT NULL,
special_date DATE NOT NULL
);
ALTER TABLE specials ADD INDEX specials_widget_id_index(widget_id);
ALTER TABLE specials ADD INDEX special_title_index(special_title);
SELECT w.widget_name, s.special_title
FROM widgets AS w
INNER JOIN specials AS s ON w.widget_id=s.widget_id
ORDER BY w.widget_name ASC, s.special_title ASC
I am assuming this will use widget_id_index and the widgets.widget_id primary key index for the join, but what about the sorting? Will it use both widget_name_index and special_title_index ?
I don't want to ramble on too long, there are an endless number of scenarios I could conujure up. Obviously this can get much more complex with real world scenarios rather than a couple of simple tables. Any clarification would be appreciated.

By best practices, you do not have to create an index while defining the table schematics. It is always better to create an index as you create the queries in your application. In most cases, you will be starting with a single-column index to satisfy a query. If you want to use many columns in a query, you can create a covering index.
A covering index is an index with two or more columns in it. If the index satisfies all the column requirements of a query, then the storage engine can obtain all the results from the index instead of kicking in a disk I/O operation. So, when creating a query that uses more columns, you can either create a new index covering all the required columns, or, you can extend the existing index to include more columns.
You have to take some considerations while doing any one of the above. MySQL considers an index only when the left-most column of the index can be used in the query. Otherwise, it simply seeks the whole table for fetching results. So if you can extend an existing index without affecting all the queries that use that index, then it would be a wise choice. Otherwise, you can go ahead and create a new index for the new query. Sometimes, the queries can be adjusted to adapt to the index structure.

An index speeds up selects, but slows down inserts and updates. You don't need to create an index for every possible combination of columns you can imagine. I usually just create the obvious indexes that I know I will be using often, and only add more if I can see that they are needed after taking performance measurements. The database can still use an index even if it doesn't cover all the columns in the query.

Only one index is ever used in a query. Fortunately, you can create an index covering multiple columns:
ALTER TABLE widgets ADD INDEX name_and_price_index(widget_name, widget_price);
The above index will be used if you SELECT by widget_name or widget_name + widget_price (but not just widget_price).
As MitMaro points out, use EXPLAIN on a query to see what indexes MySQL has to choose from, as well as what index it ends up using. See here for even more details.

Related

Use an index on a join without "where"

I'm trying to understand if it's possible to use an index on a join if there is no limiting where on the first table.
Note: this is not a line-by-line real-case usage, just a thing I draft together for understanding purposes. Don't point out the obvious "what are your trying to obtain with this schema?", "you should use UNSIGNED" or the likes because that's not the question.
Note2: this MySQL JOINS without where clause is somehow related but not the same
Schema:
CREATE TABLE posts (
id_post INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
text VARCHAR(100)
);
CREATE TABLE related (
id_relation INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
id_post1 INT NOT NULL,
id_post2 INT NOT NULL
);
CREATE INDEX related_join_index ON related(id_post1) using BTREE;
Query:
EXPLAIN SELECT * FROM posts FORCE INDEX FOR JOIN(PRIMARY) INNER JOIN related ON id_post=id_post1 LIMIT 0,10;
SQL Fiddle: http://sqlfiddle.com/#!2/84597/3
As you can see, the index is being used on the second table, but the engine is doing a full table scan on the first one (the FORCE INDEX is there just to highlight the general question).
I'd like to understand if it's possible to get a "ref" on the left side too.
Thanks!
Update: if the first table has significantly more record than the second, the thing swap: the engine uses an index for the first one and a full table scan for the second http://sqlfiddle.com/#!2/3a3bb/1 Still, no way to get indexes used on both.
The DBMS has an optimizer to figure out the best plan to execute a query. It's up to the optimizer to decide whether to use an index or simply read the table directly.
An index makes sense when the DBMS expects only few records to read from a table (say 1% of all rows only). But once it expects to read many records (say 99% of all rows) it will not use the index. The threshold may lie at low as 5% (i.e. <= 5% -> index; > 5% table scan).
There are exceptions. One is when an index holds all columns needed. Then the table itself doesn't have to be read at all. Another may be when the optimizer thinks an index access may result faster in spite of having to read many rows. It's also always possible the optimizer simply guesses wrong.
There is a page on the MySQL documentation about this subject.
Regarding the possibility to get a ref on the first table from the query, the short answer is NO.
The reason is obvious: because there is no WHERE clause ALL the rows from table posts are analyzed because they could be included in the result set. There is no reason to use an index for that, a full table scan is better because it gets all the rows; and because the order doesn't matter, the access is (more or less) sequential. Using an index requires reading more information from the storage (index and data).
MySQL will use the join type index if all the columns that appear in the SELECT clause are present in an index. In this case MySQL will perform a full index scan (join type index) instead of a full table scan (join type ALL) because it requires reading less information from the storage (an index is usually smaller than the entire table data).

MySQL perfomance issues in SELECT

A simple database:
CREATE TABLE data (
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
code VARCHAR(50),
value VARCHAR(10),
);
There are currently ~2 millions rows.
Query like:
SELECT value FROM data WHERE `code`='12345';
executes for 10-12 seconds.
What the best way to increase performance of simple select queries?
Create an index on the code column:
ALTER TABLE data ADD INDEX (code)
Add index to code. Also, is code always numeric (make it type int) and/or is it unique (make it unique or primary key)?
In some cases you may find that adding an index on the code column does not suffice, so if that doesn't work for you you would need to add a (single) index for both the code and value columns.
You can use EXPLAIN SELECT ... to ask MySQL for information about how it's going to execute your query. This would tell you that it needs to check every row. Improving this is a matter of adding an index on the code column which is used in your WHERE clause.
EXPLAIN
http://dev.mysql.com/doc/refman/5.0/en/explain-output.html
CREATE INDEX
http://dev.mysql.com/doc/refman/5.0/en/create-index.html

Fastest way to retrieve records from multiple tables

I need to retrieve columns from two tables and I have used an INNER JOIN. But its consuming lot of time during loading the page. Is there any better and faster way to achieve the same?
Select P.Col1, P.Col2, P.Col3, P.Col4, P.Col5, C.Col1, C.Col2, C.Col3 from Pyalers P inner join Customers C on C.Col1 = P.Col1 where P.Col2 = 5
Thanks in Advance.
Without knowing your DDL, there's no way to say.
But conceptually this is ok, just be sure you have proper indexs sets.
For instance: (is your table name really 'Pyalers'? Assuming 'players')
CREATE INDEX idx_players ON `players` (col1);
CREATE INDEX idx_customers ON `customers` (col1);
use the columns you need for joinning the 2 tables.
http://dev.mysql.com/doc/refman/5.0/en/create-index.html
You're doing it the right way, but if you don't have indexes on your tables on the correct columns, it's not going to be very fast for tables of any size. Do Pyalers.col1 and Customers.col1 both have indexes on them?
Show us how the tables are defined.
Be sure your table has the needed indexes... as a "thumb rule", every field which is used for search (WHERE) or data joins (INNER JOIN, LEFT JOIN, RIGHT JOIN) should be indexed.
Example: If you are creating a table, you can add your indexes at that time (notice that your tables should always have a primary key):
CREATE TABLE myTable (
myId int unsigned not null,
someField varchar(50),
primary key (myId),
index someIdx(someField)
);
If your table already exists, and you want to add indexes, you need to use the ALTER statement:
ALTER TABLE myTable
ADD INDEX someIdx(someField),
ADD PRIMARY KEY (myId);
Rules:
To define an index you most provide a unique name for it, and specify the fields included in the index: INDEX myIndex(field1, field2, ...)
There are different types of indexes: PRIMARY KEY is used for primary keys (that's obvious, huh?); INDEX is an 'ordinary index', just used to speed up search and join operations; UNIQUE INDEX is an index that prevents duplicate values.
Recomendations:
Whenever you can, index all numeric and date fields that are relevant (ids, birth date, etc.). Avoid creating indexes on fields that contain 'double' values.
Don't abuse of indexes, because abuse can create very large index files.
Tips:
If you want to see how your query will be executed, you can use the EXPLAIN statement:
EXPLAIN SELECT a., b. FROM a INNER JOIN b on a.myId = b.otherId
This instruction will show you the execution plan of the query. If in the last column you see 'file sort' or 'using temporary', you may (just may) need aditional indexes (notice that if you use GROUP BY you will almost always get the 'using temporary' message)
Hope this help you

MySQL INDEXES - Adding multuple columns to one index

I am still getting my head around MySQL INDEXES... A quick question...
I have a table that stores a members location. It has a member_id and location_id columns... I do a MySQL query to find all the locations for a specific member...
Would it be better to setup an INDEX like this:
ALTER TABLE `members_locations` ADD INDEX `member_location` ( `member_id` , `location_id` )
Or should I separate them like this>
ALTER TABLE `members_locations` ADD INDEX `member_id` ( `member_id` );
ALTER TABLE `members_locations` ADD INDEX `location_id` ( `location_id` );
Does it make any difference?
This article should be helpful.
Here's an example from it:
ALTER TABLE buyers ADD INDEX idx_name_age(first_name,last_name,age);
Here's another article showing the difference between using a multi-column index and several single-column indexes.
Well,
I guess it would be better to have one index, but it actually depends on how you query it.
If you have both columns (member_id, location_id) in the where clause, they must definitely go into one index.
if you query them independently, e.g. sometimes by member_id, sometimes by location_id only, you might consider two indexes. However, even in that case, one of those index should probably include the second column as well to support queries where both columns are present.
At the end, it all depends what queries you would like to tune.
Although not for MySQL, but for Oracle, my new Web-Book "Use The Index, Luke" describes this in detail. AFAIK all databases are rather similar in that respect.
http://use-the-index-luke.com/where-clause/the-equals-operator/concatenated-keys

MYSQL: Index keyword in create table and when to use it

What does index keyword mean and what function it serves? I understand that it is meant to speed up querying, but I am not very sure how this can be done.
When how to choose the column to be indexed?
A sample of index keyword usage is shown below in create table query:
CREATE TABLE `blog_comment`
(
`id` INTEGER NOT NULL AUTO_INCREMENT,
`blog_post_id` INTEGER,
`author` VARCHAR(255),
`email` VARCHAR(255),
`body` TEXT,
`created_at` DATETIME,
PRIMARY KEY (`id`),
INDEX `blog_comment_FI_1` (`blog_post_id`),
CONSTRAINT `blog_comment_FK_1`
FOREIGN KEY (`blog_post_id`)
REFERENCES `blog_post` (`id`)
)Type=MyISAM
;
I'd recommend reading How MySQL Uses Indexes from the MySQL Reference Manual. It states that indexes are used...
To find the rows matching a WHERE clause quickly.
To eliminate rows from consideration.
To retrieve rows from other tables when performing joins.
To find the MIN() or MAX() value for a specific indexed column.
To sort or group a table (under certain conditions).
To optimize queries using only indexes without consulting the data rows.
Indexes in a database work like an index in a book. You can find what you're looking for in an book quicker, because the index is listed alphabetically. Instead of an alphabetical list, MySQL uses B-trees to organize its indexes, which is quicker for its purposes (but would take a lot longer for a human).
Using more indexes means using up more space (as well as the overhead of maintaining the index), so it's only really worth using indexes on columns that fulfil the above usage criteria.
In your example, the id and blog_post_id columns both uses indexes (PRIMARY KEY is an index too) so that the application can find them quicker. In the case of id, it is likely that this allows users to modify or delete a comment quickly, and in the case of blog_post_id, so the application can quickly find all comments for a given post.
You'll notice that there is no index for the email column. This means that searching for all blog posts by a particular e-mail address would probably take quite a long time. If searching for all comments by a particular e-mail address is something you'd want to add, it might make sense to add an index to that too.
This keyword means that you are creating an index on column blog_post_id along with the table.
Queries like that:
SELECT *
FROM blog_comment
WHERE blog_post_id = #id
will use this index to search on this field and run faster.
Also, there is a foreign key on this column.
When you decide to delete a blog post, the database will need check against this table to see there are no orphan comments. The index will also speed up this check, so queries like
DELETE
FROM blog_post
WHERE ...
will also run faster.