suppose a table has only one index : idx_goods (is_deleted,price)
and is_deleted column is either 0 or 1
my query looks like this : where price < 10, which of the follwing behavior is this case:
1 mysql do a full scan on the secondary index for price match
2 mysql partially scan the secondary index for price match where it starts with is_deleted=0 in the secondary index, reach price=10, then jump to is_deleted=1 in the secondary index and continue there
3 mysql ignore the secondary index and scan the base table for price match, in other words, condition field that's not in the query's index key prefix is not matched against secondary index but against base table, event though the condition field is part of index key
To utilize index based on multiple fields where condition must mostly use fields from the begining of index. So, if index is on fields (field1, field2, field3) then condition must contain field1.
More about MySQL index optimisation
There are some DB systems which can use indexes even if its first part is ommited in condition, but has very limited range of values in it.
Query plan can be checked by using EXPLAIN SELECT ..... documentation
For tables with small number of rows (fewer than 10.. from documentation ) indexes may be not used, so better prepare more data for real tests.
And as seen in this example index is not used for query
explain select * from test where price < 10
but for query below index is used with good results.
explain select * from test where is_deleted in (0,1) and price < 10
If is_deleted can be null then condition should be modified to (is_deleted in (0,1) or is_deleted is null), still it uses index.
And, as #Luuk mentioned, in this case (condition only on price field) index on price will be the best option.
Related
The problem I have is the following:
I have a table that contains about 100000000 rows
it has 22 fields - some numeric, some text
it has a primary key id (auto-incremented integer)
it has a field another_id of type bigint, and a unique key on it
it has a field called state that can take only 4 integer values (0 to 3)
I need that the queries of the following form are executed as fast as possible:
SELECT COUNT(*)
FROM my_table
WHERE another_id IN ( <about 100 values> )
AND state = ...
for different values of state.
How should the index look like? I was thinking about two options:
KEY another_id:state (another_id, state)
KEY state:another_id (state, another_id)
Is there any difference in performance between those two variants? Is there anything else to consider?
Edit: engine is InnoDB
For the query you show, you should create the index with state, another_id in that order.
Define the index with any columns referenced in equality conditions first, after them add one column referenced in a range condition or ORDER BY or GROUP BY.
You may also like my answer to Does Order of Fields of Multi-Column Index in MySQL Matter or my presentation How to Design Indexes, Really, or the video.
I agree with the answer above. One clarification though is that you want to have ita hash index not btree index. It should work faster. The hash index wouldn't work well with any queries that involve inequality such as <=
I have one table: student_homework, and one of its composite index is uk_sid_lsnid_version(student_id, lesson_id, curriculum_version, type):
student_homework 0 uk_sid_lsnid_version 1 student_id A 100 BTREE
student_homework 0 uk_sid_lsnid_version 2 lesson_id A 100 BTREE
student_homework 0 uk_sid_lsnid_version 3 curriculum_version A 100 BTREE
student_homework 0 uk_sid_lsnid_version 4 type A 100 BTREE
Now i have a Sql:
select * from student_homework where student_id=100 and type=1 and explain result is like:
1 SIMPLE student_homework ref uk_sid_lsnid_version,idx_student_id_update_time uk_sid_lsnid_version 4 const 20 10.0 Using index condition
The execution plan is uk_sid_lsnid_version.
The question for me is how the query condition type works here? Does DB engine scans all (narrowed) records for it? In my understanding, the tree hierarchy architecture is:
student_id
/ \
lesson_id lesson_id
/ \
curriculum_version curriculum_version
/ \
type type
For the query condition (student_id, type), student_id matches the root of the tree index. Yet, the type does not match index lesson_id, the DB engine would apply type on all records, which have been filted by student_id.
Is my understanding is correct? if the subset records with a student_id is large, the query cost is still expensive.
There is no difference between query condition student_id = 100 and type =0 and type=0 and student_id = 100
To make full usage of composite index, would it be better if I add a new composite index (student_id, type)?
Yes, your understanding is correct, mysql will use uk_sid_lsnid_version index to match on student_id only, while filtering on type will be done a on the reduced set of rows that match on student_id.
The hint is in the extra column of the explain result: Using index condition
Using index condition (JSON property: using_index_condition)
Tables are read by accessing index tuples and testing them first to determine whether to read full table rows. In this way, index information is used to defer (“push down”) reading full table rows unless it is necessary. See Section 8.2.1.6, “Index Condition Pushdown Optimization”.
Section 8.2.1.6, “Index Condition Pushdown Optimization describes the steps of this technique as:
Get the next row's index tuple (but not the full table row).
Test the part of the WHERE condition that applies to this table and can be checked using only index columns. If the condition is not
satisfied, proceed to the index tuple for the next row.
If the condition is satisfied, use the index tuple to locate and read the full table row.
Test the remaining part of the WHERE condition that applies to this table. Accept or reject the row based on the test result.
Whether it would be better to add another composite index on student_id, type is a question that cannot be objectively answered by us, you need to test it.
If the speed of the query with the current index is fine, then you probably do not need a new index. You also need to weigh in how many other queries would use that index - there is not much point to create an index just for one query. You also need to weigh in how selective the type field is. Type fields with a limited list of values are often not selective enough. Mysql may decide to use index condition pushdown since student_id, type index is a not a covering index and mysql would have to get the full row anyway.
I would like to know if it is necessary to create an index for all fields within a table if one of your queries will use SELECT *.
To explain, if we had a table that 10M records and we did a SELECT * query on it would the query run faster if we have created an index for all fields within the table or does MySQL handle SELECT * in a different way to SELECT first_field, a_field, last_field.
To my understanding, if I had a query that did SELECT first_field, a_field FROM table then it would bring performance benefits if we created an index on first_field, a_field but if we use SELECT * is there even a benefit from creating an index for all fields?
Performing a SELECT * FROM mytable query would have to read all the data from the table. This could, theoretically, be done from an index if you have an index on all the columns, but it would be just faster for the database to read the table itself.
If you have a where clause, having an index on (some of) the columns you have conditions on may dramatically improve the query's performance. It's a gross simplification, but what basically happens is the following:
The appropriate rows are filtered according to the where clause. It's much faster to search for these rows in an index (which is, essentially, a sorted tree) than a table (which is an unordered set of rows).
For the columns that where in the index used in the previous step the values are returned.
For the columns that aren't, the table is accessed (according to a pointer kept in the index).
indexing a mysql table for a column improves performance when there is a need to search or edit a row/record based on that column of that table.
for example, if there is an 'id' column and if it is a primary key; And in that case if you want to search a record using where clause on that 'id' column then you don't need to create index for the 'id' column because primary key column will act as an indexed column.
In another case, if there is an 'pid' column in the table and if it is not a primary key; Then in order to search based on 'pid' column then to improve performance it is better to create an index for the 'pid' column. That will make query fast to search the expected record.
I have an index structured as so:
BTREE merchant_id
BTREE flag
BTREE test (merchant_id, flag)
I do a SELECT query as such :
SELECT badge_id, merchant_id, badge_title, badge_class
FROM badges WHERE merchant_id = 1 AND flag = 1
what index would be better? Does it matter if they are in a seperate index?
To answer questions such as "Which column would be better to index?", and "Is the query planner using a certain index to execute the query?", you can use the EXPLAIN statement. See the excellent article Analyzing Queries for Speed with EXPLAIN for a comprehensive overview of the use of EXPLAIN in optimizing queries and schema.
In general, where a query can be optimized by indexing one of several columns, a helpful rule of thumb is to index the column that is "most unique" or "most selective" over all records; that is, index the column that has the most number of distinct values over all rows. I am guessing that in your case, the merchant_id column contains the most number of unique values, so it should probably be indexed. You can verify that an index choice is optimal using EXPLAIN on the query for all variations.
Note that the rule of thumb "index the most selective column" does not necessarily apply to the choice of the first column of a composite (also called compound or multi-column) index. It depends on your queries. If, for example, employee_id is the most selective column, but you need to execute queries like SELECT * FROM badges WHERE flag = 17, then having as the only index on table badges the composite index (employee_id, flag) would mean that the query results in a full table scan.
Out of 3 indices you don't really need a separate merchant_id index, since merchant_id look-ups can use your "test" index.
More details:
http://dev.mysql.com/doc/refman/5.0/en/multiple-column-indexes.html
I have a query that looks like the following:
select count(*) from `foo` where expires_at < now()”
since expires_at is indexed, the query hits the index no problem. however the following query:
select count(*) from `foo` where expires_at < now() and some_id != 5
the index never gets hit.
both expires_at and some_id are indexed.
is my index not properly created?
This query:
SELECT COUNT(*)
FROM foo
WHERE expires_at < NOW()
can be satisfied by the index only, without referring to the table itself. You may see it from the using index in the plan.
This query:
SELECT COUNT(*)
FROM foo
WHERE expires_at < NOW()
AND some_id <> 5
needs to look into the table to find the value of some_id.
Since the table lookup is quite an expensive thing, it is more efficient to use the table scan and filter the records.
If you had a composite index on expires_at, some_id, the query would probably use the index both for ranging on expires_at and filtering on some_id.
SQL Server even offers a feature known as included fields for this. This command
CREATE INDEX ix_foo_expires__someid ON foo (expires_at) INCLUDE (some_id)
would create an index on expires_at which would additionally store some_id in the leaf entires (without overhead of sorting).
MySQL, unfortunately, does not support it.
Probably what's happening is that for the first query, the index can be used to count the rows satisfying the WHERE clause. In other words, the query would result in a table scan, but happily all the columns involved in the WHERE condition are in an index, so the index is scanned instead.
In the second query though, there's no single index that contains all the columns in the WHERE clause. So MySQL resorts to a full table scan. In the case of the first query, it was using your index, but not to find the rows to check - in the special case of a COUNT() query, it could use the index to count rows. It was doing the equivalent of a table scan, but on the index instead of the table.
1) It seems you have two single-column indices. You can try to create a multi-column index.
For a detailed explanation why this is different than multiple single column indices, see the following:
http://www.mysqlfaqs.net/mysql-faqs/Indexes/When-does-multi-column-index-come-into-use-in-MySQL
2) Do you have a B-tree index on the expires_at column? Since you are doing a range query (<), that might give better performance.
http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html
Best of luck!