Does MySQL use only one index per query/sub-query? - mysql

Please assume the following query:
select dic.*, d.syllable
from dictionary dic
join details d on d.word = dic.word
As you know (and I heard), MySQL uses only one index per query. Now I want to know, in query above, which index would be better?
dictionary(word)
details(word)
In another word, when there is a join (two tables are involved), the index of which one would be affected? Should I create both of them (on the columns on the on clause) and MySQL itself decides using which one is better?

As you know (and I heard), MySQL uses only one index per query.
In general, most databases will only use one index per table, per query. This isn't always the case, but is at least a decent rule of thumb. For your particular example, you can rely on this.
Now I want to know, in query above, which index would be better?
The query you wrote is actually an inner join. This means that either of the two tables could appear on the left side of the join, and the result sets would be logically equivalent. As a result of this, MySQL is therefore free to write the join in any order it chooses. The plan that gets chosen will likely place the larger table on the left hand side, and the smaller table on the right hand side. If you knew the actual execution order of the tables, then you would just index the right table. Given that you may not know this, then both of your suggested indices are reasonable:
CREATE INDEX dict_idx ON dictionary (word);
CREATE INDEX details_idx ON details (word);
We could even try to improve on the above indices by covering the columns which appear in the select clause. For example, the index on details could be expanded to:
CREATE INDEX details_idx ON details (word, syllable);
This would let MySQL use the above index exclusively to satisfy the query plan, without requiring a seek back to the original table. You select dictionary.*, so covering this with a single index might not be possible or desirable, but at least this gets the point across.

MySQL would use the most selective index (the one giving the fewest rows). This means it depends on the data, and also optimizations like this could change between versions of the database.

Related

Indexing mysql table for selects

I'm looking to add some mysql indexes to a database table. Most of the queries are for selects. Would it be best to create a separate index for each column, or add an index for province, province and number, and name. Does it even make sense to index province since there are only about a dozen options?
select * from employees where province = 'ab' and number = 'v45g';
select * from employees where province = 'ab';
If the usage changed to more inserts should I remove all the indexes except for the number?
An index is a data structure that maps the values of a column into a fast searchable tree. This tree contains the index of rows which the DB can use to find rows fast. One thing to know, some DB engines read plus or minus a bunch of rows to take advantage of disk read ahead. So you may actually read 50 or 100 rows per index read, and not just one. Hence, if you access 30% of a table through an index, you may wind up reading all table data multiple times.
Rule of thumb:
- index the more unique values, a tree with 2 branches and half of your table on either side is not too useful for narrowing down a search
- use as few index as possible
- use real world examples numbers as much as possible. Performance can change dynamically based on data or the whim of the DB engine, so it's very important to try and track how fast your queries are running consistently (ie: log this in case a query ever gets slow). But from this data you can add indexes without being blind
Okay, so there are multiple kinds of index, single and multiple column. You want multiple indexes when it makes sense for indexes to access each other, multiple columns typically when you are refining with a where clause. Think of the first as good when you want joins, or you have "or" conditions. The second is better when you have and conditions and successively filter rows.
In your case name does not make sense since like does not use index. city and number do make sense, probably as a multi-column index. Province could help as well as the last index.
So an index with these columns would likely help:
(number,city,province)
Or try as well just:
(number,city)
You should index fields that are searched upon and have high selectivity / cardinality. Indexes make writes slower.
Other thing is that indexes can be added and dropped at any time so maybe you should let this for a later review of the database and optimization of querys.
That being said one index that you can be sure to add is in the column that holds the name of a person. That's almost always used in searching.
According to MySQL documentation found here:
You can create multiple column indexes and the first column mentioned in the index declaration uses index when searched alone but not the others.
Documentation also says that if you create a hash of the columns and save in another column and index the hashed column the search could be faster then multiple indexes.
SELECT * FROM tbl_name
WHERE hash_col=MD5(CONCAT(val1,val2))
AND col1=val1 AND col2=val2;
You could use an unique index on province.

mysql: stored function call + left join = very very slow

I have two tables:
module_339 (id,name,description,etc)
module_339_schedule(id,itemid,datestart,dateend,timestart,timeend,days,recurrent)
module_339_schedule.itemid points to module_339
fist table holds conferences
second one keeps the schedules of the conferences
module_339 has 3 items
module_339_schedule has 4000+ items - almost evenly divided between the 3 conferences
I have a stored function - "getNextDate_module_339" - which will compute the "next date" for a specified conference, in order to be able to display it, and also sort by it - if the user wants to. This stored procedure will just take all the schedule entries of the specified conference and loop through them, comparing dates and times. So it will do one simple read from module_339_schedule, then loop through the items and compare dates and times.
The problem: this query is very slow:
SELECT
distinct(module_339.id)
,min( getNextDate_module_339(module_339.id,1,false)) AS ND
FROM
module_339
LEFT JOIN module_339_schedule on module_339.id=module_339_schedule.itemid /* standard schedule adding */
WHERE 1=1 AND module_339.is_system_preview<=0
group by
module_339.id
order by
module_339.id asc
If I remove either the function call OR the LEFT JOIN, it is fast again.
What am I doing wrong here? Seems to be some kind of "collision" between the function call and the left join.
I think the group by part can be removed from this query, thus enabling you to remove the min function as well. Also, there is not much point of WHERE 1=1 AND..., so I've changed that as well.
Try this:
SELECT DISTINCT module_339.id
,getNextDate_module_339(module_339.id,1,false) AS ND
FROM module_339
LEFT JOIN module_339_schedule ON module_339.id=module_339_schedule.itemid /* standard schedule adding */
WHERE module_339.is_system_preview<=0
ORDER BY module_339.id
Note that this might not have a lot of impact on performance.
I think that the worst part performance-wise is probably the getNextDate_module_339 function.
If you can find a way to get it's functionallity without using a function as a sub query, your sql statement will probably run alot faster then now, with or without the left join.
If you need help doing this, please edit your question to include the function and hopefully I (or someone else) might be able to help you with that.
From the MySQL reference manual:
The best way to improve the performance of SELECT operations is to create indexes on one or more of the columns that are tested in the query. The index entries act like pointers to the table rows, allowing the query to quickly determine which rows match a condition in the WHERE clause, and retrieve the other column values for those rows. All MySQL data types can be indexed.
Although it can be tempting to create an indexes for every possible column used in a query, unnecessary indexes waste space and waste time for MySQL to determine which indexes to use. Indexes also add to the cost of inserts, updates, and deletes because each index must be updated. You must find the right balance to achieve fast queries using the optimal set of indexes.
As a first step I suggest checking that the joined columns are both indexed. Since primary keys are always indexed by default, we can assume that module_339 is already indexed on the id column, so first verify that module_339_schedule is indexed on the itemid column. You can check the indexes on that table in MySQL using:
SHOW INDEX FROM module_339_schedule;
If the table does not have an index on that column, you can add one using:
CREATE INDEX itemid_index ON module_339_schedule (itemid);
That should speed up the join component of the query.
Since your query also references module_339.is_system_preview you might also consider adding an index to that column using:
CREATE INDEX is_system_preview_index ON module_339 (is_system_preview);
You might also be able to optimize the stored procedure, but you haven't included it in your question.

Is it needed to INDEX second column for use in WHERE clause?

Consider fetching data with
SELECT * FROM table WHERE column1='XX' && column2='XX'
Mysql will filter the results matching with the first part of WHERE clause, then with the second part. Am I right?
Imagine the first part match 10 records, and adding the second part filters 5 records. Is it needed to INDEX the second column too?
You are talking about short circuit evaluation. A DBMS has cost-based optimizer. There is no guarantee wich of both conditions will get evaluated first.
To apply that to your question: Yes, it might be benificial to index your second column.
Is it used regulary in searches?
What does the execution plan tell you?
Is the access pattern going to change in the near future?
How many records does the table contain?
Would a Covering Index would be a better choice?
...
Indexes are optional in MySQL, but they can increase performance.
Currently, MySQL can only use one index per table select, so with the given query, if you have an index on both column1 and column2, MySQL will try to determine the index that will be the most beneficial, and use only one.
The general solution, if the speed of the select is of utmost importance, is to create a multi-column index that includes both columns.
This way, even though MySQL could only use one index for the table, it would use the multi-column index that has both columns indexed, allowing MySQL to quickly filter on both criteria in the WHERE clause.
In the multi-column index, you would put the column with the highest cardinality (the highest number of distinct values) first.
For even further optimization, "covering" indexes can be applied in some cases.
Note that indexes can increase performance, but with some cost. Indexes increase memory and storage requirements. Also, when updating or inserting records into a table, the corresponding indexes require maintenance. All of these factors must be considered when implementing indexes.
Update: MySQL 5.0 can now use an index on more than one column by merging the results from each index, with a few caveats.
The following query is a good candidate for Index Merge Optimization:
SELECT * FROM t1 WHERE key1=1 AND key2=1
When processing such query RDBMS will use only one index. Having separate indices on both colums will allow it to choose one that will be faster.
Whether it's needed depends on your specific situation.
Is the query slow as it is now?
Would it be faster with index on another column?
Would it be faster with one index containing both
columns?
You may need to try and measure several approaches.
You don't have to INDEX the second column but it may speed up your SELECT

query optimizer operator choice - nested loops vs hash match (or merge)

One of my stored procedures was taking too long execute. Taking a look at query execution plan I was able to locate the operation taking too long. It was a nested loop physical operator that had outer table (65991 rows) and inner table (19223 rows). On the nested loop it showed estimated rows = 1,268,544,993 (multiplying 65991 by 19223) as below:
I read a few articles on physical operators used for joins and got a bit confused whether nested loop or hash match would have been better for this case. From what i could gather:
Hash Match - is used by optimizer when no useful indexes are available, one table is substantially smaller than the other, tables are not sorted on the join columns. Also hash match might indicate more efficient join method (nested loops or merge join) could be used.
Question: Would hash match be better than nested loops in this scenario?
Thanks
ABSOLUTELY. A hash match would be a huge improvement. Creating the hash on the smaller 19,223 row table then probing into it with the larger 65,991 row table is a much smaller operation than the nested loop requiring 1,268,544,993 row comparisons.
The only reason the server would choose the nested loops is that it badly underestimated the number of rows involved. Do your tables have statistics on them, and if so, are they being updated regularly? Statistics are what enable the server to choose good execution plans.
If you've properly addressed statistics and are still having a problem you could force it to use a HASH join like so:
SELECT *
FROM
TableA A -- The smaller table
LEFT HASH JOIN TableB B -- the larger table
Please note that the moment you do this it will also force the join order. This means you have to arrange all your tables correctly so that their join order makes sense. Generally you would examine the execution plan the server already has and alter the order of your tables in the query to match. If you're not familiar with how to do this, the basics are that each "left" input comes first, and in graphical execution plans, the left input is the lower one. A complex join involving many tables may have to group joins together inside parentheses, or use RIGHT JOIN in order to get the execution plan to be optimal (swap left and right inputs, but introduce the table at the correct point in the join order).
It is generally best to avoid using join hints and forcing join order, so do whatever else you can first! You could look into the indexes on the tables, fragmentation, reducing column sizes (such as using varchar instead of nvarchar where Unicode is not required), or splitting the query into parts (insert to a temp table first, then join to that).
I would not recommend trying to "fix" the plan by forcing the hints in one direction or another. Instead, you need to look to your indexes, statistics and the TSQL code to understand why you have a Table spool loading up 1.2billion rows from 19000.

MySQL index question

I've been reading about indexes in MySQL recently, and some of the principles are quite straightforward but one concept is still bugging me: basically, if in a hypothetical table with, let's say, 10 columns, we have two single-column indexes (for column01 and column02 respectively), plus a primary key column (some other column), then are they going to be used in a simple SELECT query like this one or not:
SELECT * FROM table WHERE column01 = 'aaa' AND column02 = 'bbb'
Looking at it, my first instinct is telling me that the first index is going to retrieve a set of rows (or primary keys in InnoDB, if I got the idea right) that satisfy the first condition, and the second index will get another set. And the final result set will be just the intersection of these two. In the books that I've been going through I cannot find anything about this particular scenario. Of course, for this particular query one index on both columns seems like the best option, but I am struggling with understanding the real process behind this whole thing if I try to use two indexes that I described above.
Its only going to use a single index. You need to create a composite index of multiple columns if you want it to be able to index off of each column you are testing. You may want to read the manual to find out how MySQL uses each type of index, and how to order your composite indexes correctly to get the best utilization of it.
It's actually the most common question
about indexing at all: is it better to
have one index with all columns or one
individual index for every column?
http://use-the-index-luke.com/sql/where-clause/searching-for-ranges/index-combine-performance