mysql IN clause not using possible keys - mysql

I have a fairly simple mysql query which contains a few inner join and then a where clause. I have created indexes for all the columns that are used in the joins as well as the primary keys. I also have a where clause which contains an IN operator. When only 5 or less ids are passed into the IN clause the query optimizer uses one of my indexes to run the query in a reasonable amount of time. When I use explain I see that the type is range and key is PRIMARY. My issue is that if I use more than 5 ids in the IN clause, the optimizer ignores all the available indexes and query runs extremely slow. When I use explain I see that the type is ALL and the key is NULL.
Could someone please shed some light on what is happening here and how I could fix this.
Thanks

Regardless of the "primary key" indexes on the tables to optimize the JOINs, you should also have an index based on common criteria you are applying a WHERE against. More info needed on columns of your query, but you should have an index on your WHERE criteria TOO.

You could also try using Mysql Index Hints. It lets you specify which index should be used during the query execution.
Examples:
SELECT * FROM table1 USE INDEX (col1_index,col2_index)
WHERE col1=1 AND col2=2 AND col3=3;
-
SELECT * FROM table1 IGNORE INDEX (col3_index)
WHERE col1=1 AND col2=2 AND col3=3;
More Information here:
Mysql Index Hints

Found this while checking up on a similar problem I am having. Thought my findings might help anyone else with a similar issue in future.
I have a MyISAM table with about 30 rows (contains common typos of similar words for a search where both the possible original typo and the alternative word may be valid spellings, the table will slowly build up in size). However the cutoff for me is that if there are 4 items in the IN clause the index is used but when 5 are in the IN clause the index is ignored (note I haven't tried alternative words so the actual individual items in the IN clause might be a factor). So similar to the OP, but with a different number of words.
Use index would not work and the index would still be ignored. Force index does work, although I would prefer to avoid specifying indexes (just in case someone deletes the index).
For some testing I padded out the table with an extra 1000 random unique rows the query would use the relevant index even with 80 items in the IN clause.
So seems MySQL decides whether to use the index based on the number of items in the IN clause compared to the number of rows in the table (probably some other factors at play though).

Related

Too many composite(multi column )indexing is ok?

Consider we have A,B,C,D,E,F,G,H columns in my table and if I make composite indexing on column ABCDE because these column are coming in where clause and then I want composite indexing on ABCEF then I create new composite indexing on ABCEF in same table but in a different query, we want indexing on column ABCGH for that i make another composite indexing in same table,
So my question is can we make too many composite indexes as per our requirements because sometimes our column in where clause gets change and we have to increase its performance so tell me is there any other way to optimise the query or we can make multiple composite indexes as per the requirements.
I tried multiple composite indexing and it did not give me any problem till now but still i want to know from all of you that will it give me any problem in future or is it ok to use only composite indexes and not single index.
Your answers will help me alot waiting for your replies.
Thanks in advance.
You can have as many as you want. However, each additional index has a cost when updating, inserting or deleting. The trick is to
find common segments and make indexes for those.
Or create them as required when queries are too slow.
As an example, if you are "needing" indexes for ABCDE, ABDEF, and ABGIH then create an index on just AB
InnoDB supports up to 64 indexes per table (cf. https://dev.mysql.com/doc/refman/8.0/en/innodb-limits.html).
If you try to create a composite index for every permutation of N columns, you would need N-factorial indexes. So for 8 columns, you would need 40,320 indexes. Clearly this is more than InnoDB supports.
You probably don't need that many indexes. In practice, I've rarely seen more than 6 indexes in a given table. All queries that are needed are optimized by one of those.
I understand you said sometimes you change the terms in your query's WHERE clause, so it might need a composite index with different columns.
You can rely on indexes that have a subset of all the columns that would be optimal. That won't be 100% optimized, but it will be better than no index.
You can't predict the optimal set of indexes for a given query until you write that query.
There is a limit of 64 secondary indexes (at least in InnoDB).
Order the indexes so that columns being tested with = come first. (The order of those columns in the INDEX does not matter.)
The leftmost columns in an index are the most important.
There is little or now use in including more than one column that will be searched by a range.
Study your likely queries, and find the most common combinations of 2 or 3 columns; build indexes starting with those.
In your two examples (ABCDEFGH and ABCEF), ABC would work for both (assuming at least A and B are tested with =). If you do throw on more columns, that one INDEX can still be used for both cases.
Maybe you would what to declare both ABCDEFGH and BCEFA; This handles your ABCDEF case, plus cases that have B, but not A. (Remember 'leftmost'.)
Use the SlowLog to find the slowest queries and make better indexes for them.
More on indexing: Index Cookbook
Each index requires space on the disk to be stored, and time to be updated every time you update(/insert/delete) an indexed column value.
So as long as you don't run out of storage or write operations are too slow, because you have to update too many indexes, you are not limited to create as many specific indexes as you want.
This depends on your use case and should be measured with production like data.
A common solution would be to create one index specific for your most important query e.g. in your case ABCDE.
Other queries can still use the as many columns from left to right until there is a first difference. e.g. a query searching for ABCEF could still use ABC on the previous mentioned index.
To also utilise column E you could add a where condition to D to your query in a way you know it matches all values e.g. D < 100 if you know there are only values 1-99.

MySQL Index is NULL but there are available Keys

I have the following problem when running a mysql query:
Query is very slow and when i use explain the query key is null but possible_keys are avaiable and the order is correct, i also tried adding independent indexes per each row but still key was NULL.
You can see table, index and mysql explain here: https://snag.gy/vcChl6.jpg
The optimizer likely has just decided that there is no reason to use the index.
Since you are using SELECT * that means that means that if it used the index, then it would have to use the primary key from the index to then go back and look up all the necessary data from the clustered index. That is referred to as a double lookup, and is generally bad for performance. As there are so few records in this table, the optimizer likely decided that it can easily do a full table scan instead and get your result faster.
In short, this is expected behavior.
If you want to SELECT just some columns, add them to the t1 index and then just SELECT only the columns you need, with that given WHERE clause. It should use the index then. As your table grows in size, it may start using the index as well, once it estimates that the double lookup is cheaper than the full table scan.
A guess: Most rows are of that 'project' and that 'lang'.
The Optimizer does not understand that fact, so it takes the index that is obviously the best:
(id_project, id_lang)
This one would be equally good: (id_lang, id_project).
No fair... The EXPLAIN mentions indexes named id_project and id_lang (not useful), but the list of indexes shows a composite index t1(id_project, id_lang) (useful).
Then, as Willem suggests, it has to bounce between the index and the table. Normally (that is, when it has adequate statistics), the Optimizer will say "Oh, more than ~20% of the table is being referenced; let's ignore any index."
Things you can do:
Get rid of that index.
Change * to a list of just the columns you need. In particular, if you avoid the 3 TEXT columns, two optimizations kick in. Alternatively, any that will never be longer than 255 characters can be changed to VARCHAR(255).
Use some other filtering, ordering, limiting, etc. If this is a web application, do you really want to get ~534 rows?

Range query is not using indexes in mysql

I am trying to optimize the query which we are using to generate reports.
I think I did quite good to optimize to some extends.
Below was the original query:
select trat.asset_name as group_name,trat.name as sub_group_name,
trat.asset_id as group_id,
if(trat.cause_task_type='AccessRequest',true,false) as is_request_task,
'' as grouped_on,
concat(trat.asset_name,' - {0} (',count(*),')') as table_heading
from t_remote_agent_tasks trat
where trat.status in ('completed','failedredundant')
and trat.name not in ('collect-data','update-conn-params')
group by trat.asset_name, trat.name
order by field(trat.name,'create-account','change-attr',
'add-member-to-group',
'grant-access','disable-account','revoke-access',
'remove-member-from-group','update-license')
When I see the execution plain in Extra column it says using where,Using Temporary,filesort.
So I optimize the query like this
select trat.asset_name as group_name,trat.name as sub_group_name,
trat.asset_id as group_id,
if(trat.cause_task_type='AccessRequest',true,false) as is_request_task,
'' as grouped_on,
concat(trat.asset_name,' - {0} (',count(*),')') as table_heading
from t_remote_agent_tasks trat
where trat.status in ('completed','failedredundant')
and trat.name not in ('collect-data','update-conn-params')
group by trat.asset_name,trat.name
order by null
Which gives me the execution plan as using where,using temporary. So filesort is no more use and there is no extra overhead as optimizer doesn't have to sort,which will be taken care during group by.
I again went forward and created indexes on group by columns in same order as they mentioned in group by(this is important or optimization won't happen) i.e create index on (trat.asset_name,trat.name).
Now this optimization gives me Using where only in extra column. Also the query execution time got deduced by almost half(earlier it was 0.568 sec. and now 0.345sec ,not exact though it vary every time but more or less in this range).
Now I want to optimize the range query ,below part of query
trat.status in ('completed','failedredundant')
and trat.name not in ('collect-data','update-conn-params')
I am reading on mysl reference guide to optimize range query,Which says not in is not in range query ,So I did the modification in query like this
trat.status in ('completed','failedredundant')
and trat.name in ('add-member-to-group','change-attr','create-account',
'disable-account','grant-access', 'remove-member-from-group',
'update-license')
But it doesn't show any improvement in Extra(I mean using index should be there,it is still showing using where).
I also tried by splitting both range part into unions(that will change the query result but still no improvement in execution plan)
I want some help on how to optimize this query more,mostly the range part(in part).
Any other optimization if I need to make on this?
I appreciate your time,Thanks in advance
EDIT 1 I forgot to mentioned that I have index on trat.status also,So Below are the indexes
(trat.asset_name,trat.name)
(trat.status)
In virtually all cases, only one index is used in a SELECT. So, one must have available the best.
Both of the first two queries will probably benefit most from the same 'composite' index:
INDEX(asset_name, name)
Normally, one would try to handle the WHERE conditions in the index, but they do not look amenable to an index. (More discussion below.) Second choice is the GROUP BY, which I am recommending. But, since (in the first case) the ORDER BY and the GROUP BY are different, there will necessarily be a tmp table created for the output of the GROUP BY so that it can be sorted according to the ORDER BY. (There may also be a tmp and sort for the GROUP BY; I cannot tell.)
"Using index" means that a "covering" index was used. A "covering" index is a composite index that includes all of the columns used anywhere in the SELECT. That would be about 5 columns, and probably not wise to attempt. (More below.)
Another thing to note that even something this simple:
WHERE x IN (11,22)
GROUP BY y
cannot use any index to handle both the WHERE and GROUP BY. So, there is no way for your query to consume both (except by 'covering').
A covering index, when used, is only partially useful. It says that all the work is done just in the BTree of the index. But that could include a full index scan -- which is not that much faster than a full table scan. This is another argument against recommending 'covering'.
In some situations, IN or OR can be sped up by turning it into UNION:
( SELECT ... WHERE status in ('completed') )
UNION ALL
( SELECT ... WHERE status in ('failedredundant') )
but this will only cause you to stumble into the NOT IN(...) clause, which is worse than an IN.
The goal of finding the best index is to find one that has the rows (in the index and/or in the table) consecutively sitting in the BTree.
To make any further improvements on this query will probably require re-thinking the schema -- it seems to be forcing you to have IN, NOT IN, FIELD and other hard-to-optimize constructs.

Fulltext and composite indexes and how they affect the query

Just say I had a query as below..
SELECT
name,category,address,city,state
FROM
table
WHERE
MATCH(name,subcategory,category,tag1) AGAINST('education')
AND
city='Oakland'
AND
state='CA'
LIMIT
0, 10;
..and I had a fulltext index as name,subcategory,category,tag1 and a composite index as city,state; is this good enough for this query? Just wondering if something extra is needed when mixing additional AND's when making use of the fulltext index with the MATCH/AGAINST.
Edit: What I am trying to understand is, what happens with the additional columns that are within the query but are not indexed in the chosen index (the fulltext index), the above example being city and state. How does MySQL now find the matching rows for these since it can't use two indexes (or can it?) - so, basically, I'm trying to understand how MySQL goes about finding the data optimally for the columns NOT in the chosen fulltext index and if there is anything I can or should do to optimize the query.
If I understand your question, you know that the MATCH AGAINST uses your FULLTEXT index and your wondering how MySQL goes about applying the rest of the WHERE clause (ie. does it do a tablescan or an indexed lookup).
Here's what I'm assuming about your table: it has a PRIMARY KEY on some id column and the FULLTEXT index.
So first off, MySQL will never use the FULLTEXT index for the city/state WHERE clause. Why? Because FULLTEXT indexes only apply with MATCH AGAINST. See here in the paragraph after the first set of bullets (not the Table of Contents bullets).
EDIT: In your case, assuming your table doesn't only have like 10 rows, MySQL will apply the FULLTEXT index for your MATCH AGAINST, then do a tablescan on those results to apply the city/state WHERE.
So what if you add a BTREE index onto city and state?
CREATE INDEX city__state ON table (city(10),state(2)) USING BTREE;
Well MySQL can only use one index for this query since it's a simple select. It will either use the FULLTEXT or the BTREE. Note that when I say one index, I mean one index definition, not one column in a multi-part index. Anwway, this then begs the question which one does it use?
That depends on the table analysis. MySQL will attempt to estimate (based on table stats from the last OPTIMIZE TABLE) which index will prune the most records. If the city/state WHERE gets you down to 10 records while the MATCH AGAINST only gets you down to 100, then MySQL will use the city__state index first for the city/state WHERE and then do a tablescan for the MATCH AGAINST.
On the other hand, if the MATCH_AGAINST gets you down to 10 records while the city/state WHERE gets you down to only a 1000, then MySQL will apply the FULLTEXT index first and tablescan for city and state.
The bottom line is the cardinality of your index. Essentially, how unique are the values that will go into your index? If every record in your table has city set to Oakland, then it's not a very unique key and so having city = 'Oakland' doesn't really reduce the number of records all that much for you. In that case, we say your city__state index has a low cardinality.
Consequently if 90% of the words in your FULLTEXT index are "John", then that doesn't really help you much either for the exact same reasons.
If you can afford the space and the UPDATE/DELETE/INSERT overhead, I would recommend adding the BTREE index and letting MySQL decide which index he wants to use. In my experience, he usually does a very good job of picking the right one.
I hope that answers your question.
EDIT: On a side note, making sure you pick the right size for your BTREE index (in my example I picked the first 10 char in city). This obviously makes a huge impact to cardinality. If you picked city(1), then obviously you'll get a lower cardinality then if you did city(10).
EDIT2: MySQL's query plan (estimation) for which index prunes the most records is what you see in EXPLAIN.
I think you can easily determine which index gets used by using EXPLAIN on your query. Please check the accepted answer for this question, which provides some good resources on how to interpret the output of EXPLAIN.
How does MySQL now find the matching rows for these since it can't use
two indexes
Yes it can: Can MySQL use multiple indexes for a single query? Also, you should read the documentation: How MySQL Uses Indexes
I had similar task some time ago, and I have noticed that MySQL can use either FULLTEXT index or any other index/indexes in one query, but not both; I wasn't able to mix FULLTEXT with any other index. Any selection with fulltext search will work in such way:
select subset using FULLTEXT search
select records matching other criteria from that subset 'Using where'
So you can use either fulltext index or any other index (I wasn't able to use both indexes by FORCE INDEX or anything else).
I suggest trying with both using fulltext and using other index (i.e. on City and State columns) and compare the results - they may vary depending on actual content in your database.
In my case I have discovered that forcing regular (non-fulltext) index in such query produced better performance (since I had very large number of rows, about 300 000, and non-fulltext criteria matched about 1000 of them).
I was using MySQL 5.5.24

MySQL: Optimize query with DISTINCT

In my Java application I have found a small performance issue, which is caused by such simple query:
SELECT DISTINCT a
FROM table
WHERE checked = 0
LIMIT 10000
I have index on the checked column.
In the beginning, the query is very fast (i.e. where almost all rows have checked = 0). But as I mark more and more rows as checked, the query becomes greatly inefficient (up to several minutes).
How can I improve the performance of this query ? Should I add a complex index
a, checked
or rather
checked, a?
My table has a lot of millions of rows, that is why I do not want to test it manually and hope to have lucky guess.
I would add an index on checked, a. This means that the value you're returning has already been found in the index and there's no need to re-access the table to find it. Secondly if you're doing lot's of individual updates of the table there's a good chance both the table and the index have become fragmented on the disc. Rebuilding (compacting) a table and index can significantly increase performance.
You can also use the query rewritten as (just in case the optimizer does not understand that it's equivalent):
SELECT a
FROM table
WHERE checked = 0
GROUP BY a
LIMIT 10000
Add a compound index on the DISTINCT column (a in this case). MySQL is able to use this index for the DISTINCT.
MySQL may also take profit of a compound index on (a, checked) (the order matters, the DISTINCT column has to be at the start of the index). Try both and compare the results with your data and your queries.
(After adding this index you should see Using index for group-by in the EXPLAIN output.)
See GROUP BY optimization on the manual. (A DISTINCT is very similar to a GROUP BY.)
The most efficient way to process GROUP BY is when an index is used to directly retrieve the grouping columns. With this access method, MySQL uses the property of some index types that the keys are ordered (for example, BTREE). This property enables use of lookup groups in an index without having to consider all keys in the index that satisfy all WHERE conditions.>
My table has a lot of millions of rows <...> where almost all rows have
checked=0
In this case it seems that the best index would be a simple (a).
UPDATE:
It was not clear how many rows get checked. From your comment bellow the question:
At the beginning 0 is in 100% rows, but at the end of the day it will
become 0%
This changes everything. So #Ben has the correct answer.
I have found a completely different solution which would do the trick. I will simple create a new table with all possible unique "a" values. This will allow me to avoid DISTINCT
You don't state it, but are you updating the index regularly? As changes occur to the underlying data, the index becomes less and less accurate and processing gets worse and worse. If you have an index on checked, and checked is being updated over time, you need to make sure your index is updated accordingly on a regular basis.