I have the following table:
CREATE TABLE Test (
device varchar(12),
pin varchar(4),
authToken varchar(32),
Primary Key (device)
);
At different points of the application I need to query this table by different single column clause. Meaning I have the following queries:
SELECT * FROM Test WHERE device = ?;
SELECT * FROM Test WHERE authToken = ?;
SELECT * FROM Test WHERE pin = ?;
As I understand it, in this scenario a combined index of (device, authToken, pin) makes no sense, because that would only speed up the first query, not the second or third.
Reading speed is more important than writing for this table, so would simply indexing each column individually be the optimal solution here?
The straightforward answer is to create separate single-column indexes for each query:
create index ix1 (device); -- no need to create it since it's the PK.
create index ix2 (pin);
create index ix3 (authToken);
The first index (from the PK) uses the primary index. The second and third ones could be slower since they suffer from the "secondary index" slowness: they always need to access the secondary index first, then access the primary index; this could becomes slow if you are selecting a high number of rows.
Now, if you want to go overboard in terms of SELECT speed at the expense of slowness on modifications (INSERT, UPDATE, and DELETE), you can use "covering indexes" tailored to each query. These should look like:
create index ix4 (device, pin, authToken); -- [non needed] optimal for WHERE device = ?
create index ix5 (authToken, device, pin); -- optimal for WHERE authToken = ?
create index ix6 (pin, device, authToken); -- optimal for WHERE pin = ?
Note: As indicated by Rick James ix4 is redundant with the primary key index InnoDB tables have. There's no need to create it. It's listed here only for completeness.
These "covering indexes" only use the secondary indexes, resolving the query without the need of accessing the primary index at all. They are much faster for high number of rows retrieved.
You don't need to index pin column as it's already indexed. For other 2 columns (i.e. device and authToken), yes as per your shared queries, it's better to keep them both indexed individually.
Please note that you will see a big performance improvement when you have high number of such queries hitting the server where you also have huge dataset on this table.
To answer:
"How to index table for different single-column clauses?"
CREATE INDEX Test_device_index ON Test(device);
CREATE INDEX Test_authToken_index ON Test(authToken DESC);
CREATE INDEX Test_pin_index ON Test(pin);
Here's the schema I'd suggest:
CREATE TABLE Test (
id SERIAL PRIMARY KEY,
device VARCHAR(255),
pin VARCHAR(255),
authToken VARCHAR(255),
UNIQUE KEY index_authToken (authToken),
UNIQUE KEY index_device (device),
KEY index_pin (pin)
);
Where you have an id type column that's not associated with any particular data, and you have UNIQUE constraints on authToken and device.
Remember to have any column used in a WHERE indexed and test your coverage with things like:
EXPLAIN SELECT ... FROM Test WHERE pin=?
If you see "table scan" in the plan then that's a problem of missing indexes.
It's also a good idea to use VARCHAR(255) as a default unless you have a very compelling reason to restrict it. Enforce length restrictions in your application layer where they can easily be relaxed later. For example, changing to 6-digit PIN vs. 4 is a simple code change and can even be rolled out incrementally, it's not a schema alteration.
Related
I have a table;
Orders
* id INT NN AN PK
* userid INT NN
* is_open TINYINT NN DEFAULT 1
* amount INT NN
* desc VARCHAR(255)
and the query SELECT * FROM orders WHERE userid = ? AND is_open = 1; that I run frequently. I would like to optimize the database for this query and I currently have two options;
Move closed orders (is_open = 0) to a different table since currently open orders will be relatively smaller than closed orders thereby minimizing rows to scan on lookup
Set a unique key constraint: ALTER TABLE orders ADD CONSTRAINT UNIQUE KEY(id, userid);
I don't know how the latter will perform and I know the former will help performance but I don't know if it's a good approach in terms of best practices.
Any other ideas would be appreciated.
The table is of orders; there can be multiple open/closed orders for each userid.
WHERE userid = ? AND is_open = 1 would benefit from either of these 'composite' indexes: INDEX(userid, is_open) or INDEX(is_open, user_id). The choice of which is better depends on what other queries might benefit from one more than the other.
Moving "closed" orders to another table is certainly a valid option. And it will help performance. (I usually don't recommend it, only because of the clumsy code needed to move rows and/or to search both tables in the few cases where that is needed.)
I see no advantage with UNIQUE(id, userid). Presumably id is already "unique" because of being the PRIMARY KEY? Also, in a composite index, the first column will be checked first; that is what the PK is already doing.
Another approach... The AUTO_INCREMENT PK leads to the data BTree being roughly chronological. But you usually reach into the table by userid? To make that more efficient, change PRIMARY KEY(id), INDEX(userid) to PRIMARY KEY(userid, id), INDEX(id). (However... without knowing the other queries touching this table, I can't say whether this will provide much overall improvement.)
This might be even better:
PRIMARY KEY(userid, is_open, id), -- to benefit many queries
INDEX(id) -- to keep AUTO_INCREMENT happy
The cost of an additional index (on the performance of write operations) is usually more than compensated for by the speedup of Selects.
Setting a unique index on id and user_id will gain you nothing since the id is already uniquely indexed as a primary key, and doesn't feature in your query anyway.
Moving closed orders to a different table will give some performance improvement, but since the closed orders are probably distributed throughout the table, that performance improvement won't be as great as you might expect. It also carries an administrative overhead, requiring that orders be moved periodically, and additional complications with reporting.
Your best solution is likely to be to add an index on user_id so that MySQL can go straight to the required User Id and search only those rows. You might get a further boost by indexing on user_id and is_open instead, but the additional benefit is likely to be small.
Bear in mind that each additional index incurs a performance penalty on every table update. This won't be a problem if your table is not busy.
Reading the MySQL docs we see this example table with multiple-column index name:
CREATE TABLE test (
id INT NOT NULL,
last_name CHAR(30) NOT NULL,
first_name CHAR(30) NOT NULL,
PRIMARY KEY (id),
INDEX name (last_name,first_name)
);
It is explained with examples in which cases the index will or will not be utilized. For example, it will be used for such query:
SELECT * FROM test
WHERE last_name='Widenius' AND first_name='Michael';
My question is, would it work for this query (which is effectively the same):
SELECT * FROM test
WHERE first_name='Michael' AND last_name='Widenius';
I couldn't find any word about that in the documentation - does MySQL try to swap columns to find appropriate index or is it all up to the query?
Should be the same because (from mysql doc) the query optiminzer work looking at
Each table index is queried, and the best index is used unless the
optimizer believes that it is more efficient to use a table scan. At
one time, a scan was used based on whether the best index spanned more
than 30% of the table, but a fixed percentage no longer determines the
choice between using an index or a scan. The optimizer now is more
complex and bases its estimate on additional factors such as table
size, number of rows, and I/O block size.
http://dev.mysql.com/doc/refman/5.7/en/where-optimizations.html
In some cases, MySQL can read rows from the index without even
consulting the data file.
and this should be you case
Without ICP, the storage engine traverses the index to locate rows in
the base table and returns them to the MySQL server which evaluates
the WHERE condition for the rows. With ICP enabled, and if parts of
the WHERE condition can be evaluated by using only fields from the
index, the MySQL server pushes this part of the WHERE condition down
to the storage engine. The storage engine then evaluates the pushed
index condition by using the index entry and only if this is satisfied
is the row read from the table. ICP can reduce the number of times the
storage engine must access the base table and the number of times the
MySQL server must access the storage engine.
http://dev.mysql.com/doc/refman/5.7/en/index-condition-pushdown-optimization.html
For the two queries you stated, it will work the same.
However, for queries which have only one of the columns, the order of the index matters.
For example, this will use the index:
SELECT * FROM test WHERE last_name='Widenius';
But this wont:
SELECT * FROM test WHERE first_name='Michael';
I'm trying to understand if it's possible to use an index on a join if there is no limiting where on the first table.
Note: this is not a line-by-line real-case usage, just a thing I draft together for understanding purposes. Don't point out the obvious "what are your trying to obtain with this schema?", "you should use UNSIGNED" or the likes because that's not the question.
Note2: this MySQL JOINS without where clause is somehow related but not the same
Schema:
CREATE TABLE posts (
id_post INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
text VARCHAR(100)
);
CREATE TABLE related (
id_relation INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
id_post1 INT NOT NULL,
id_post2 INT NOT NULL
);
CREATE INDEX related_join_index ON related(id_post1) using BTREE;
Query:
EXPLAIN SELECT * FROM posts FORCE INDEX FOR JOIN(PRIMARY) INNER JOIN related ON id_post=id_post1 LIMIT 0,10;
SQL Fiddle: http://sqlfiddle.com/#!2/84597/3
As you can see, the index is being used on the second table, but the engine is doing a full table scan on the first one (the FORCE INDEX is there just to highlight the general question).
I'd like to understand if it's possible to get a "ref" on the left side too.
Thanks!
Update: if the first table has significantly more record than the second, the thing swap: the engine uses an index for the first one and a full table scan for the second http://sqlfiddle.com/#!2/3a3bb/1 Still, no way to get indexes used on both.
The DBMS has an optimizer to figure out the best plan to execute a query. It's up to the optimizer to decide whether to use an index or simply read the table directly.
An index makes sense when the DBMS expects only few records to read from a table (say 1% of all rows only). But once it expects to read many records (say 99% of all rows) it will not use the index. The threshold may lie at low as 5% (i.e. <= 5% -> index; > 5% table scan).
There are exceptions. One is when an index holds all columns needed. Then the table itself doesn't have to be read at all. Another may be when the optimizer thinks an index access may result faster in spite of having to read many rows. It's also always possible the optimizer simply guesses wrong.
There is a page on the MySQL documentation about this subject.
Regarding the possibility to get a ref on the first table from the query, the short answer is NO.
The reason is obvious: because there is no WHERE clause ALL the rows from table posts are analyzed because they could be included in the result set. There is no reason to use an index for that, a full table scan is better because it gets all the rows; and because the order doesn't matter, the access is (more or less) sequential. Using an index requires reading more information from the storage (index and data).
MySQL will use the join type index if all the columns that appear in the SELECT clause are present in an index. In this case MySQL will perform a full index scan (join type index) instead of a full table scan (join type ALL) because it requires reading less information from the storage (an index is usually smaller than the entire table data).
I've been thinking about my database indexes lately, in the past I just kind of non-chalantly threw them in as an afterthought, and never really put much thought into if they are correct or even helping. I've read conflicting information, some say that more indexes are better and others that too many indexes are bad, so I'm hoping to get some clarification and learn a bit here.
Let's say I have this hypothetical table:
CREATE TABLE widgets (
widget_id INT UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT,
widget_name VARCHAR(50) NOT NULL,
widget_part_number VARCHAR(20) NOT NULL,
widget_price FLOAT NOT NULL,
widget_description TEXT NOT NULL
);
I would typically add an index for fields that will be joined and fields that will be sorted on most often:
ALTER TABLE widgets ADD INDEX widget_name_index(widget_name);
So now, in a query such as:
SELECT w.* FROM widgets AS w ORDER BY w.widget_name ASC
The widget_name_index is used to sort the resultset.
Now if I add a search parameter:
SELECT w.* FROM widgets AS w
WHERE w.widget_price > 100.00
ORDER BY w.widget_name ASC
I guess I need a new index.
ALTER TABLE widgets ADD INDEX widget_price_index(widget_price);
But, will it use both indexes? As I understand it it won't...
ALTER TABLE widgets ADD INDEX widget_price_name_index(widget_price, widget_name);
Now widget_price_name_index will be used to both select and order the records. But what if I want to turn it around and do this:
SELECT w.* FROM widgets AS w
WHERE w.widget_name LIKE '%foobar%'
ORDER BY w.widget_price ASC
Will widget_price_name_index be used for this? Or do I need a widget_name_price_index also?
ALTER TABLE widgets ADD INDEX widget_name_price_index(widget_name, widget_price);
Now what if I have a search box that searches widget_name, widget_part_number and widget_description?
ALTER TABLE widgets
ADD INDEX widget_search(widget_name, widget_part_number, widget_description);
And what if end users can sort by any column? It's easy to see how I could end up with more than a dozen indexes for a mere 5 columns.
If we add another table:
CREATE TABLE specials (
special_id INT UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT,
widget_id INT UNSIGNED NOT NULL,
special_title VARCHAR(100) NOT NULL,
special_discount FLOAT NOT NULL,
special_date DATE NOT NULL
);
ALTER TABLE specials ADD INDEX specials_widget_id_index(widget_id);
ALTER TABLE specials ADD INDEX special_title_index(special_title);
SELECT w.widget_name, s.special_title
FROM widgets AS w
INNER JOIN specials AS s ON w.widget_id=s.widget_id
ORDER BY w.widget_name ASC, s.special_title ASC
I am assuming this will use widget_id_index and the widgets.widget_id primary key index for the join, but what about the sorting? Will it use both widget_name_index and special_title_index ?
I don't want to ramble on too long, there are an endless number of scenarios I could conujure up. Obviously this can get much more complex with real world scenarios rather than a couple of simple tables. Any clarification would be appreciated.
By best practices, you do not have to create an index while defining the table schematics. It is always better to create an index as you create the queries in your application. In most cases, you will be starting with a single-column index to satisfy a query. If you want to use many columns in a query, you can create a covering index.
A covering index is an index with two or more columns in it. If the index satisfies all the column requirements of a query, then the storage engine can obtain all the results from the index instead of kicking in a disk I/O operation. So, when creating a query that uses more columns, you can either create a new index covering all the required columns, or, you can extend the existing index to include more columns.
You have to take some considerations while doing any one of the above. MySQL considers an index only when the left-most column of the index can be used in the query. Otherwise, it simply seeks the whole table for fetching results. So if you can extend an existing index without affecting all the queries that use that index, then it would be a wise choice. Otherwise, you can go ahead and create a new index for the new query. Sometimes, the queries can be adjusted to adapt to the index structure.
An index speeds up selects, but slows down inserts and updates. You don't need to create an index for every possible combination of columns you can imagine. I usually just create the obvious indexes that I know I will be using often, and only add more if I can see that they are needed after taking performance measurements. The database can still use an index even if it doesn't cover all the columns in the query.
Only one index is ever used in a query. Fortunately, you can create an index covering multiple columns:
ALTER TABLE widgets ADD INDEX name_and_price_index(widget_name, widget_price);
The above index will be used if you SELECT by widget_name or widget_name + widget_price (but not just widget_price).
As MitMaro points out, use EXPLAIN on a query to see what indexes MySQL has to choose from, as well as what index it ends up using. See here for even more details.
What does index keyword mean and what function it serves? I understand that it is meant to speed up querying, but I am not very sure how this can be done.
When how to choose the column to be indexed?
A sample of index keyword usage is shown below in create table query:
CREATE TABLE `blog_comment`
(
`id` INTEGER NOT NULL AUTO_INCREMENT,
`blog_post_id` INTEGER,
`author` VARCHAR(255),
`email` VARCHAR(255),
`body` TEXT,
`created_at` DATETIME,
PRIMARY KEY (`id`),
INDEX `blog_comment_FI_1` (`blog_post_id`),
CONSTRAINT `blog_comment_FK_1`
FOREIGN KEY (`blog_post_id`)
REFERENCES `blog_post` (`id`)
)Type=MyISAM
;
I'd recommend reading How MySQL Uses Indexes from the MySQL Reference Manual. It states that indexes are used...
To find the rows matching a WHERE clause quickly.
To eliminate rows from consideration.
To retrieve rows from other tables when performing joins.
To find the MIN() or MAX() value for a specific indexed column.
To sort or group a table (under certain conditions).
To optimize queries using only indexes without consulting the data rows.
Indexes in a database work like an index in a book. You can find what you're looking for in an book quicker, because the index is listed alphabetically. Instead of an alphabetical list, MySQL uses B-trees to organize its indexes, which is quicker for its purposes (but would take a lot longer for a human).
Using more indexes means using up more space (as well as the overhead of maintaining the index), so it's only really worth using indexes on columns that fulfil the above usage criteria.
In your example, the id and blog_post_id columns both uses indexes (PRIMARY KEY is an index too) so that the application can find them quicker. In the case of id, it is likely that this allows users to modify or delete a comment quickly, and in the case of blog_post_id, so the application can quickly find all comments for a given post.
You'll notice that there is no index for the email column. This means that searching for all blog posts by a particular e-mail address would probably take quite a long time. If searching for all comments by a particular e-mail address is something you'd want to add, it might make sense to add an index to that too.
This keyword means that you are creating an index on column blog_post_id along with the table.
Queries like that:
SELECT *
FROM blog_comment
WHERE blog_post_id = #id
will use this index to search on this field and run faster.
Also, there is a foreign key on this column.
When you decide to delete a blog post, the database will need check against this table to see there are no orphan comments. The index will also speed up this check, so queries like
DELETE
FROM blog_post
WHERE ...
will also run faster.