mysql multiple index question - mysql

I have a table(users) with columns as
id INT AUTOINVREMENT PRIMARY
uid INT index
email CHAR(128) UNIQUE
activated TINYINT
And I'll need to query this table like this:
SELECT * FROM users WHERE uid = ? AND activated = 1
My questions is, since there's an index set on the 'uid' column, in order to get the best performance for the above query, do I need to set another index to the 'activated' column too? This table(would be a big one) will be heavily accessed by 'INSERT', 'UPDATE' statements as well as 'SELECT' ones.
As I've learned from other sources that indexes goes opposite to 'INSERT' and 'UPDATE' statements, so if the index on the uid column is enough for the query above I won't have to set another index for activated for 'insert & update's performance sake.

MySQL will only use 1 index per table anyway, so having an additional index will not help.
However, if you want really optimal performance, define your index on both columns in this order: (eg. 1 index across 2 columns)
index_name (uid, activated)
That will allow optimized lookups of just uid, or uid AND activated.

It depends upon your data distribution and the selectivity of uid versus the selectivity of uid and activated. If you have lots of unique values of uid and this would have high selectivity ie searching for uid = x only returns a few rows then including activated in the index would provide little value. Whereas if uid = x returns lots of rows and uid = x and activated = 1 returns few rows then there's value in the index.
It's hard to provide a specific answer without know the data distribution.

Creating the index won't make you selects more slow.
However, it will make them significantly faster only if your search for unlike events.
This index will only be useful if the majority of your accounts are activated and you search for not-activated ones, or the other way round: the majority of your accounts are non-activated and you search for activated ones.
Creating this index will also improve UPDATE and DELETE concurrency: without this index, all accounts (both activated and not-activated) for a given uid will be locked for the duration of UPDATE operation in InnoDB.
However, an additional index will of course hamper the DML performance.

Related

mysql query optimizer uses index with select id, but not with select *

I have the following query:
select * from `tracked_employments`
where `tracked_employments`.`file_id` = 10006000
and `tracked_employments`.`user_id` = 1003230
and `tracked_employments`.`can_be_sent` = 1
and `tracked_employments`.`type` = ‘jobchange’
and `tracked_employments`.`file_type` = ‘file’
order by `tracked_employments`.`id` asc
limit 1000
offset 2000;
and this index
explain tells me that it does not use the index, but when I replace * with id it does use it. Why does it make a difference what columns I select?
Both you and Akina have misconceptions about how InnoDB indexing works.
Let me explain the two ways that that query may be executed.
Case 1. Index is used.
This assumes the datatypes, etc, all match the 5-column composite index that seems to exist on the table. Note: because all the tests are for =, the order of the columns in the WHERE clause and the INDEX does not matter.
In InnoDB, id (or whatever column(s) are in the PRIMARY KEY are implicitly added onto the index.
The lookup will go directly (in the Index's BTree) to the first row that matches all 5 tests. From there, it will scan forward. Each 'row' in the index has the PK, so it can reach over into the data's BTree to find any other columns needed for * (cf SELECT *).
But, it must skip over 2000 rows before delivering the 1000 that are desired. This is done by actually stepping over each one, one at a time. That is, OFFSET is not necessarily fast.
Case 2. Don't bother with the index.
This happens based on some nebulous analysis of the 3000 rows that need to be touched and the size of the table.
The rationale behind possibly scanning the table without using the index is that the bouncing between the index BTree and the data BTree may be more costly than simply scanning the data BTree. Note that the data BTree is already in the desired order -- namely by id. (Assuming that is the PK.) That avoids a sort of up to 1000 rows.
Also, certain datatype issues may prevent the use of the index.
I do need to ask what the client will do with 1000 rows all at once. If it is a web page, that seems awfully big.
Case 3 -- Just SELECT id
In this case, all the info is available in the index, so there is no need to reach into the data's BTree.

MySQL covering index optimization? [duplicate]

I've just heard the term covered index in some database discussion - what does it mean?
A covering index is an index that contains all of, and possibly more, the columns you need for your query.
For instance, this:
SELECT *
FROM tablename
WHERE criteria
will typically use indexes to speed up the resolution of which rows to retrieve using criteria, but then it will go to the full table to retrieve the rows.
However, if the index contained the columns column1, column2 and column3, then this sql:
SELECT column1, column2
FROM tablename
WHERE criteria
and, provided that particular index could be used to speed up the resolution of which rows to retrieve, the index already contains the values of the columns you're interested in, so it won't have to go to the table to retrieve the rows, but can produce the results directly from the index.
This can also be used if you see that a typical query uses 1-2 columns to resolve which rows, and then typically adds another 1-2 columns, it could be beneficial to append those extra columns (if they're the same all over) to the index, so that the query processor can get everything from the index itself.
Here's an article: Index Covering Boosts SQL Server Query Performance on the subject.
Covering index is just an ordinary index. It's called "covering" if it can satisfy query without necessity to analyze data.
example:
CREATE TABLE MyTable
(
ID INT IDENTITY PRIMARY KEY,
Foo INT
)
CREATE NONCLUSTERED INDEX index1 ON MyTable(ID, Foo)
SELECT ID, Foo FROM MyTable -- All requested data are covered by index
This is one of the fastest methods to retrieve data from SQL server.
Covering indexes are indexes which "cover" all columns needed from a specific table, removing the need to access the physical table at all for a given query/ operation.
Since the index contains the desired columns (or a superset of them), table access can be replaced with an index lookup or scan -- which is generally much faster.
Columns to cover:
parameterized or static conditions; columns restricted by a parameterized or constant condition.
join columns; columns dynamically used for joining
selected columns; to answer selected values.
While covering indexes can often provide good benefit for retrieval, they do add somewhat to insert/ update overhead; due to the need to write extra or larger index rows on every update.
Covering indexes for Joined Queries
Covering indexes are probably most valuable as a performance technique for joined queries. This is because joined queries are more costly & more likely then single-table retrievals to suffer high cost performance problems.
in a joined query, covering indexes should be considered per-table.
each 'covering index' removes a physical table access from the plan & replaces it with index-only access.
investigate the plan costs & experiment with which tables are most worthwhile to replace by a covering index.
by this means, the multiplicative cost of large join plans can be significantly reduced.
For example:
select oi.title, c.name, c.address
from porderitem poi
join porder po on po.id = poi.fk_order
join customer c on c.id = po.fk_customer
where po.orderdate > ? and po.status = 'SHIPPING';
create index porder_custitem on porder (orderdate, id, status, fk_customer);
See:
http://literatejava.com/sql/covering-indexes-query-optimization/
Lets say you have a simple table with the below columns, you have only indexed Id here:
Id (Int), Telephone_Number (Int), Name (VARCHAR), Address (VARCHAR)
Imagine you have to run the below query and check whether its using index, and whether performing efficiently without I/O calls or not. Remember, you have only created an index on Id.
SELECT Id FROM mytable WHERE Telephone_Number = '55442233';
When you check for performance on this query you will be dissappointed, since Telephone_Number is not indexed this needs to fetch rows from table using I/O calls. So, this is not a covering indexed since there is some column in query which is not indexed, which leads to frequent I/O calls.
To make it a covered index you need to create a composite index on (Id, Telephone_Number).
For more details, please refer to this blog:
https://www.percona.com/blog/2006/11/23/covering-index-and-prefix-indexes/

MySQL database performance select on large/huge table

I'm using a MySQL database and have to perform some select queries on large/huge tables (e.g. 267,736 rows and 30 columns).
Query details:
Only select queries (the data in the table is fixed, never an update, insert or delete)
Select query on all the columns (business requirement)
Mostly limit the number of rows (LIMIT 10 to all rows -> user can choose)
Could be ordered by one or multiple columns (creation of indexes here will not help since the user can order by any column he likes)
Could be filtered by a value the user chooses (where filter on one or more columns)
Currently the queries take up to 2 seconds, which is to long.
Is there a way to speed them up?
Which storage engine should I use: InnoDB/MyISAM/...
Should I have a primary key, even if I will never use him?
...?
You should (must actually) use indexes.
Create indexes on all columns with which WHERE or ORDER BY is going to be used. Also study and use EXPLAIN to see the impact of the indexes and to optimize your queries.
You don't have to create a primary key if there is no column with unique data in your table, but it is very likely that you do have such a column (id, time...). In this case you should use primary key to filter your queries.
Number of columns in the query has close to no impact on SELECT speed.
As long as you make "Only select queries" storage engine does not matter either. MyISAM might be a bit faster, but InnoDB has many features you will need when you decide that your "Only select queries" rule must be broken.

Ensure certain default sort order in MySql table

I have a large MySql table with over 11 million rows. This is just a huge data set and my task is to be able to analyze the dataset based on certain rules.
Each row belongs to a certain category. There are 2 million different categories. I want to get all rows for a category and perform operations on that.
So currently, I do the following:
Select distinct categories from the table.
for each category : Select fields from table WHERE category=category
Even though my category column is indexed, it takes a really long time to execute Step 2. This is mainly because of the huge data set.
Alternatively, I can use GROUP BY clause, however I am not sure if it will be as fast since GROUP BY on such a huge dataset may be expensive, especially when considering that I will be running my analysis several times on parts of the dataset. A way to permanently ensure a sorted table would be useful.
Therefore as an alternative, I can speed up my queries if only my table is pre-sorted by category. Now I can just read the table row by row and perform the same operations in a much faster time, as all rows of one category will be fetched consecutively.
As the dataset (MySql table) is fixed and no update, delete, insert operations will be performed on it. I want to be able to ensure a way to maintain a default sort order by category. Can anyone suggest a trick to ensure the default sort order of the rows.
Maybe read all rows and rewrite them to a new table or add a new primary key which ensures this order?
Even though my category column is indexed
Indexed by a secondary index? If so, you can encounter the following performance problems:
InnoDB tables are always clustered and the secondary index in clustered table can require a double-lookup (see the "Disadvantages of clustering" in this article).
Indexed rows can be scattered all over the place (index can have bad clustering factor - the link is for Oracle but the principle is the same). If so, an index range scan (such as WHERE category = whatever) can end-up loading many table pages, even though the index is actually used and only a small subset of rows is actually selected. This can destroy the range scan performance.
In alternative to the secondary index, consider using a natural primary key, which in InnoDB tables also acts as a clustering key. The primary/clustering key such as {category, no} will keep the rows of the same category physically close together, making both of your queries (and especially the second one) maximally efficient.
OTOH, if you want to keep the secondary index, consider covering all the fields that you query, so the primary B-Tree doesn't have to be touched at all.
You can do this in one step regardless of indexing by doing something like (pseudo code):
Declare #LastCategory int = Null
Declare #Category int
For Each Row In
Select
#Category = Category,
...
From
Table
Order By
Category
If #LastCategory Is Null Or #LastCategory != #Category
Do any "New Category Steps"
Set #LastCategory = #Category
End
Process Row
End For
With the index on category I'd expect this to perform OK. Your performance issues may be down to what you are doing when processing each row.
Here's an example: http://sqlfiddle.com/#!2/e53c98/1

SELECT vs UPDATE performance with index

If I SELECT IDs then UPDATE using those IDs, then the UPDATE query is faster than if I would UPDATE using the conditions in the SELECT.
To illustrate:
SELECT id FROM table WHERE a IS NULL LIMIT 10; -- 0.00 sec
UPDATE table SET field = value WHERE id IN (...); -- 0.01 sec
The above is about 100 times faster than an UPDATE with the same conditions:
UPDATE table SET field = value WHERE a IS NULL LIMIT 10; -- 0.91 sec
Why?
Note: the a column is indexed.
Most likely the second UPDATE statement locks much more rows, while the first one uses unique key and locks only the rows it's going to update.
The two queries are not identical. You only know that the IDs are unique in the table.
UPDATE ... LIMIT 10 will update at most 10 records.
UPDATE ... WHERE id IN (SELECT ... LIMIT 10) may update more than 10 records if there are duplicate ids.
I don't think there can be a one straight-forward answer to your "why?" without doing some sort of analysis and research.
The SELECT queries are normally cached, which means that if you run the same SELECT query multiple times, the execution time of the first query is normally greater than the following queries. Please note that this behavior can only be experienced where the SELECT is heavy and not in scenarios where even the first SELECT is much faster. So, in your example it might be that the SELECT took 0.00s because of the caching. The UPDATE queries are using different WHERE clauses and hence it is likely that their execution times are different.
Though the column a is indexed, but it is not necessary that MySQL must be using the index when doing the SELECT or the UPDATE. Please study the EXPLAIN outputs. Also, see the output of SHOW INDEX and check if the "Comment" column reads "disabled" for any indexes? You may read more here - http://dev.mysql.com/doc/refman/5.0/en/show-index.html and http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html.
Also, if we ignore the SELECT for a while and focus only on the UPDATE queries, it is obvious that they aren't both using the same WHERE condition - the first one runs on id column and the latter on a. Though both columns are indexed but it does not necessarily mean that all the table indexes perform alike. It is possible that some index is more efficient than the other depending on the size of the index or the datatype of the indexed column or if it is a single- or multiple-column index. There sure might be other reasons but I ain't an expert on it.
Also, I think that the second UPDATE is doing more work in the sense that it might be putting more row-level locks compared to the first UPDATE. It is true that both UPDATES are finally updating the same number of rows. But where in the first update, it is 10 rows that are locked, I think in the second UPDATE, all rows with a as NULL (which is more than 10) are locked before doing the UPDATE. Perhaps MySQL first applies the locking and then runs the LIMIT clause to update only limited records.
Hope the above explanation makes sense!
Do you have a composite index or separate indexes?
If it is a composite index of id and a columns,
In 2nd update statement the a column's index would not be used. The reason is that only the left most prefix indexes are used (unless if a is the PRIMARY KEY)
So if you want the a column's index to be used, you need in include id in your WHERE clause as well, with id first then a.
Also it depends on what storage engine you are using since MySQL does indexes at the engine level, not server.
You can try this:
UPDATE table SET field = value WHERE id IN (...) AND a IS NULL LIMIT 10;
By doing this id is in the left most index followed by a
Also from your comments, the lookups are much faster because if you are using InnoDB, updating columns would mean that the InnoDB storage engine would have to move indexes to a different page node, or have to split a page if the page is already full, since InnoDB stores indexes in sequential order. This process is VERY slow and expensive, and gets even slower if your indexes are fragmented, or if your table is very big
The comment by Michael J.V is the best description. This answer assumes a is a column that is not indexed and 'id' is.
The WHERE clause in the first UPDATE command is working off the primary key of the table, id
The WHERE clause in the second UPDATE command is working off a non-indexed column. This makes the finding of the columns to be updated significantly slower.
Never underestimate the power of indexes. A table will perform better if the indexes are used correctly than a table a tenth the size with no indexing.
Regarding "MySQL doesn't support updating the same table you're selecting from"
UPDATE table SET field = value
WHERE id IN (SELECT id FROM table WHERE a IS NULL LIMIT 10);
Just do this:
UPDATE table SET field = value
WHERE id IN (select id from (SELECT id FROM table WHERE a IS NULL LIMIT 10));
The accepted answer seems right but is incomplete, there are major differences.
As much as I understand, and I'm not a SQL expert:
The first query you SELECT N rows and UPDATE them using the primary key.
That's very fast as you have a direct access to all rows based on the fastest possible index.
The second query you UPDATE N rows using LIMIT
That will lock all rows and release again after the update is finished.
The big difference is that you have a RACE CONDITION in case 1) and an atomic UPDATE in case 2)
If you have two or more simultanous calls of the case 1) query you'll have the situation that you select the SAME id's from the table.
Both calls will update the same IDs simultanously, overwriting each other.
This is called "race condition".
The second case is avoiding that issue, mysql will lock all rows during the update.
If a second session is doing the same command it will have a wait time until the rows are unlocked.
So no race condition is possible at the expense of lost time.