I have one table: student_homework, and one of its composite index is uk_sid_lsnid_version(student_id, lesson_id, curriculum_version, type):
student_homework 0 uk_sid_lsnid_version 1 student_id A 100 BTREE
student_homework 0 uk_sid_lsnid_version 2 lesson_id A 100 BTREE
student_homework 0 uk_sid_lsnid_version 3 curriculum_version A 100 BTREE
student_homework 0 uk_sid_lsnid_version 4 type A 100 BTREE
Now i have a Sql:
select * from student_homework where student_id=100 and type=1 and explain result is like:
1 SIMPLE student_homework ref uk_sid_lsnid_version,idx_student_id_update_time uk_sid_lsnid_version 4 const 20 10.0 Using index condition
The execution plan is uk_sid_lsnid_version.
The question for me is how the query condition type works here? Does DB engine scans all (narrowed) records for it? In my understanding, the tree hierarchy architecture is:
student_id
/ \
lesson_id lesson_id
/ \
curriculum_version curriculum_version
/ \
type type
For the query condition (student_id, type), student_id matches the root of the tree index. Yet, the type does not match index lesson_id, the DB engine would apply type on all records, which have been filted by student_id.
Is my understanding is correct? if the subset records with a student_id is large, the query cost is still expensive.
There is no difference between query condition student_id = 100 and type =0 and type=0 and student_id = 100
To make full usage of composite index, would it be better if I add a new composite index (student_id, type)?
Yes, your understanding is correct, mysql will use uk_sid_lsnid_version index to match on student_id only, while filtering on type will be done a on the reduced set of rows that match on student_id.
The hint is in the extra column of the explain result: Using index condition
Using index condition (JSON property: using_index_condition)
Tables are read by accessing index tuples and testing them first to determine whether to read full table rows. In this way, index information is used to defer (“push down”) reading full table rows unless it is necessary. See Section 8.2.1.6, “Index Condition Pushdown Optimization”.
Section 8.2.1.6, “Index Condition Pushdown Optimization describes the steps of this technique as:
Get the next row's index tuple (but not the full table row).
Test the part of the WHERE condition that applies to this table and can be checked using only index columns. If the condition is not
satisfied, proceed to the index tuple for the next row.
If the condition is satisfied, use the index tuple to locate and read the full table row.
Test the remaining part of the WHERE condition that applies to this table. Accept or reject the row based on the test result.
Whether it would be better to add another composite index on student_id, type is a question that cannot be objectively answered by us, you need to test it.
If the speed of the query with the current index is fine, then you probably do not need a new index. You also need to weigh in how many other queries would use that index - there is not much point to create an index just for one query. You also need to weigh in how selective the type field is. Type fields with a limited list of values are often not selective enough. Mysql may decide to use index condition pushdown since student_id, type index is a not a covering index and mysql would have to get the full row anyway.
Related
suppose a table has only one index : idx_goods (is_deleted,price)
and is_deleted column is either 0 or 1
my query looks like this : where price < 10, which of the follwing behavior is this case:
1 mysql do a full scan on the secondary index for price match
2 mysql partially scan the secondary index for price match where it starts with is_deleted=0 in the secondary index, reach price=10, then jump to is_deleted=1 in the secondary index and continue there
3 mysql ignore the secondary index and scan the base table for price match, in other words, condition field that's not in the query's index key prefix is not matched against secondary index but against base table, event though the condition field is part of index key
To utilize index based on multiple fields where condition must mostly use fields from the begining of index. So, if index is on fields (field1, field2, field3) then condition must contain field1.
More about MySQL index optimisation
There are some DB systems which can use indexes even if its first part is ommited in condition, but has very limited range of values in it.
Query plan can be checked by using EXPLAIN SELECT ..... documentation
For tables with small number of rows (fewer than 10.. from documentation ) indexes may be not used, so better prepare more data for real tests.
And as seen in this example index is not used for query
explain select * from test where price < 10
but for query below index is used with good results.
explain select * from test where is_deleted in (0,1) and price < 10
If is_deleted can be null then condition should be modified to (is_deleted in (0,1) or is_deleted is null), still it uses index.
And, as #Luuk mentioned, in this case (condition only on price field) index on price will be the best option.
mysql table struct
it make me confuse, if query range influence use index in mysql
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
That is what happens. And it is actually an optimization.
When using a secondary key (such as INDEX(teacher_id)), the processing goes like this:
Reach into the index, which is a B+Tree. In such a structure, it is quite efficient to find a particular value (such as 1) and then scan forward (until 5000 or 10000).
For each entry, reach over into the data to fetch the row (SELECT *). This uses the PRIMARY KEY, a copy of which is in the secondary key. The PK and the data are clustered together; each lookup by one PK value is efficient (again, a BTree), but you need to do 5000 or 10000 of them. So the cost (time taken) adds up.
A "table scan" (ie, not using any INDEX) goes like this:
Start at the beginning of the table, walk through the B+Tree for the table (in PK order) until the end.
For each row, check the WHERE clause (a range on teacher_id).
If more than something like 20% of the table needs to be looked at, a table scan is actually faster than bouncing back and forth between the secondary index and the data.
So, "large" is somewhere around 20%. The actual value depends on table statistics, etc.
Bottom line: Just let the Optimizer do its thing; most of the time it knows best.
in brief, i use mysql database
execute
EXPLAIN
SELECT * FROM t_teacher_course_info
WHERE teacher_id >1 and teacher_id < 5000
will use index INDEX `idx_teacher_id_last_update_time` (`teacher_id`, `last_update_time`)
but if change range
EXPLAIN
SELECT * FROM t_teacher_course_info
WHERE teacher_id >1 and teacher_id < 10000
id select_type table type possible_keys key key_len ref rows Extra
1 1 SIMPLE t_teacher_course_info ALL idx_teacher_update_time 671082 Using where
scan all table, not use index , any mysql config
maybe scan row count judge if use index. !!!!!!!
I am using mysql5.1, i have table which has about 15 lakh (1.5 million) records.This table has records for different entities i.e child records for all master entities.
There are 8 columns in this table , out of which 6 columns are clubbed to make a primary key.
These columns could be individual foreign keys but due to performance we have made this change.
Even a simple select statement with two conditions is taking 6-8 seconds.Below is the explain plan for the same.
Query
explain extended
select distinct location_code, Max(trial_number) as replication
from status_trait t
where t.status_id='N02'
and t.trial_data='orange'
group by location_code
The results of EXPLAIN EXTENDED
id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE t index FK_HYBRID_EXP_TRAIT_DTL_2 5 1481572 100.00 Using where; Using index
I have these questions:
How to handle tables with large data
Is indexing fine for this table
Two things might help you here.
First, SELECT DISTINCT is pointless in an aggregating query. Just use SELECT.
Second, you didn't disclose the indexes you have created. However, to satisfy this query efficiently, the following compound covering index will probably help a great deal.
(status_id, trial_data, location_code, trial_number)
Why is this the right index? Because MySQL indexes are organized as BTREE. This organization allows the server to random-access the index to find particular values. In your case you want particular values of status_id and trial_data. Once the server has random-accessed the index, it can then scan sequentially. In this case you hope to scan for various values of location_code. The server knows it will find those different values already in order. Finally, the server needs to pluck out values of trial_number to use in your MAX() function. Lo and behold, there they are in the index ready for the plucking.
(If you're doing a lot of aggregation and querying of large tables, it makes sense for you to learn how compound and covering indexes work.)
There's a cost to adding an index: when you INSERT or UPDATE rows, you have to update your index as well. But this kind of index will greatly accelerate your retrieval.
I know there are similar questions on this but I've got a specific query / question around why this query
EXPLAIN SELECT DISTINCT RSubdomain FROM R_Subdomains WHERE EmploymentState IN (0,1) AND RPhone='7853932120'
gives me this output explain
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE RSubdomains index NULL RSubdomain 767 NULL 3278 Using where
with and index on RSubdomains
but if I add in a composite index on EmploymentState/RPhone
I get this output from explain
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE RSubdomains range EmploymentState EmploymentState 67 NULL 2 Using where; Using temporary
if I take away the distinct on RSubdomains it drops the Using temp from the explain output... but what I don't get is why, when I add in the composite key (and keeping the key on RSubdomain) does the distinct end up using a temp table and which index schema is better here? I see that the amount of rows scanned on the combined key is far less, but the query is of type range and it's also slower.
Q: why ... does the distinct end up using a temp table?
MySQL is doing a range scan on the index (i.e. reading index blocks) to locate the rows that satisfy the predicates (WHERE clause). Then MySQL has to lookup the value of the RSubdomain column from the underlying table (it's not available in the index.) To eliminate duplicates, MySQL needs to scan the values of RSubdomain that were retrieved. The "Using temp" indicates the MySQL is materializing a resultset, which is processed in a subsequent step. (Likely, that's the set of RSubdomain values that was retrieved; given the DISTINCT, it's likely that MySQL is actually creating a temporary table with RSubdomain as a primary or unique key, and only inserting non-duplicate values.
In the first case, it looks like the rows are being retreived in order by RSubdomain (likely, that's the first column in the cluster key). That means that MySQL needn't compare the values of all the RSubdomain values; it only needs to check if the last retrieved value matches the currently retrieved value to determine whether the value can be "skipped."
Q: which index schema is better here?
The optimum index for your query is likely a covering index:
... ON R_Subdomains (RPhone, EmploymentState, RSubdomain)
But with only 3278 rows, you aren't likely to see any performance difference.
FOLLOWUP
Unfortunately, MySQL does not provide the type of instrumentation provided in other RDBMS (like the Oracle event 10046 sql trace, which gives actual timings for resources and waits.)
Since MySQL is choosing to use the index when it is available, that is probably the most efficient plan. For the best efficiency, I'd perform an OPTIMIZE TABLE operation (for InnoDB tables and MyISAM tables with dynamic format, if there have been a significant number of DML changes, especially DELETEs and UPDATEs that modify the length of the row...) At the very least, it would ensure that the index statistics are up to date.
You might want to compare the plan of an equivalent statement that does a GROUP BY instead of a DISTINCT, i.e.
SELECT r.RSubdomain
FROM R_Subdomains r
WHERE r.EmploymentState IN (0,1)
AND r.RPhone='7853932120'
GROUP
BY r.Subdomain
For optimum performance, I'd go with a covering index with RPhone as the leading column; that's based on an assumption about the cardinality of the RPhone column (close to unique values), opposed to only a few different values in the EmploymentState column. That covering index will give the best performance... i.e. the quickest elimination of rows that need to be examined.
But again, with only a couple thousand rows, it's going to be hard to see any performance difference. If the query was examining millions of rows, that's when you'd likely see a difference, and the key to good performance will be limiting the number of rows that need to be inspected.
I have an index structured as so:
BTREE merchant_id
BTREE flag
BTREE test (merchant_id, flag)
I do a SELECT query as such :
SELECT badge_id, merchant_id, badge_title, badge_class
FROM badges WHERE merchant_id = 1 AND flag = 1
what index would be better? Does it matter if they are in a seperate index?
To answer questions such as "Which column would be better to index?", and "Is the query planner using a certain index to execute the query?", you can use the EXPLAIN statement. See the excellent article Analyzing Queries for Speed with EXPLAIN for a comprehensive overview of the use of EXPLAIN in optimizing queries and schema.
In general, where a query can be optimized by indexing one of several columns, a helpful rule of thumb is to index the column that is "most unique" or "most selective" over all records; that is, index the column that has the most number of distinct values over all rows. I am guessing that in your case, the merchant_id column contains the most number of unique values, so it should probably be indexed. You can verify that an index choice is optimal using EXPLAIN on the query for all variations.
Note that the rule of thumb "index the most selective column" does not necessarily apply to the choice of the first column of a composite (also called compound or multi-column) index. It depends on your queries. If, for example, employee_id is the most selective column, but you need to execute queries like SELECT * FROM badges WHERE flag = 17, then having as the only index on table badges the composite index (employee_id, flag) would mean that the query results in a full table scan.
Out of 3 indices you don't really need a separate merchant_id index, since merchant_id look-ups can use your "test" index.
More details:
http://dev.mysql.com/doc/refman/5.0/en/multiple-column-indexes.html