I have one query that is preventing me from going live with this application, because it can take up to 7 seconds to complete when it isn't cached.
SELECT attribute1
FROM `product_applications`
WHERE `product_applications`.`brand_id` IN (.. like 500 ids...)
GROUP BY attribute1
I have the brand_id indexed. I used to have this doing a SELECT DISTINCT, but opted for the GROUP BY and performance has improved slightly.
This table is using InnoDB, and has about 2.3 million rows. I have run an EXPLAIN on it and it uses the index, it just takes forever.
I know there are a lot of variables to getting something like this to perform. The db is on an Amazon EC2 instance.
Is there some sort of table splitting I could do to get the query to perform better? I really appreciate any help anybody can offer.
EDIT:
Here are the results on my explain, from NewRelic:
Id 1
Select Type SIMPLE
Table product_applications
Type range
Possible Keys brand_search_index_1,brand_search_index_2,brand_search_index_3,brand_search_index_4,brand_sarch_index_5
Key brand_search_index_1
Key Length 5
Ref
Rows 843471
Extra Using where; Using index; Using temporary; Using filesort
See, it's using the index. But it's also using a temp table and filesort. How can I overcome that stuff?
EDIT:
Since the time I opened this question, I changed the engine on this table from InnoDB to MyISAM. I also vertically partitioned the table by moving attributes 5 through 60 to another table. But this select statement STILL TAKES BETWEEN 2 AND 3 SECONDS!!!! The poor performance of this query is absolutely maddening.
A different approach if there are very few different values of attribute1 iis to try an index on attribute1 to take advantage of the loose index scan.
please refer to the following answer:
Rewriting mysql select to reduce time and writing tmp to disk
According to this answer IN should be very fast in case of constants otherwise type conversion happens which can slow things.
I would also try a covering index with brand_id as the first column and attribute1 as the second. That should speed up things because your table won't be accessed anymore.
EDIT :
About the temporary/filesort, I suspect they are caused by the your list of +500 ids. Could you try EXPLAIN on the query with only one id in the IN operator ?
If you can reduce the size of your rows that might help. Make as many columns as possible not null. If you can remove all varchar colums that could help as well.
What exactly does the index it is using cover? Possibly try making the index cover less or more columns.
Have you ran analyze table recently? That may cause it to pick another index. Also you could try forcing certain indexes.
Is there a possibility of reducing the number of ids in the IN clause? What about using a range, if they are always sequential ids?
Related
Reading this I now understand when to use indexes and when not to use them. But i have a question; would using an index on a column with a limited number of possible values help speedup queries (SELECT-ing) ? Consider the following:
Table "companies": id, district_id, name
Table "districts": id, name
The number of districts would never pass 5 entries. Should i use an index on companies.district_id then or not? I read somewhere (can't find the link :( ) that it wont help since the values are not that many and it would actually slow down the query in many cases.
PS: both tables are MyISAM
Almost never is an INDEX on a low-cardinality column used by the optimizer.
On the other hand, a "compound index" may be useful. For example, does INDEX(district_id, name) have any use?
Having INDEX(district_id) will slow down INSERTs because the index is added to whenever a row is inserted. It will not slow down SELECTs, other than the minor amount of time for the Optimizer to notice the index and reject it.
(My statements apply to both MyISAM and InnoDB.)
More discussion of this answer:
MySQL: Building the best INDEX for a given SELECT: Flags and Low Cardinality
I have a fairly simple mysql query which contains a few inner join and then a where clause. I have created indexes for all the columns that are used in the joins as well as the primary keys. I also have a where clause which contains an IN operator. When only 5 or less ids are passed into the IN clause the query optimizer uses one of my indexes to run the query in a reasonable amount of time. When I use explain I see that the type is range and key is PRIMARY. My issue is that if I use more than 5 ids in the IN clause, the optimizer ignores all the available indexes and query runs extremely slow. When I use explain I see that the type is ALL and the key is NULL.
Could someone please shed some light on what is happening here and how I could fix this.
Thanks
Regardless of the "primary key" indexes on the tables to optimize the JOINs, you should also have an index based on common criteria you are applying a WHERE against. More info needed on columns of your query, but you should have an index on your WHERE criteria TOO.
You could also try using Mysql Index Hints. It lets you specify which index should be used during the query execution.
Examples:
SELECT * FROM table1 USE INDEX (col1_index,col2_index)
WHERE col1=1 AND col2=2 AND col3=3;
-
SELECT * FROM table1 IGNORE INDEX (col3_index)
WHERE col1=1 AND col2=2 AND col3=3;
More Information here:
Mysql Index Hints
Found this while checking up on a similar problem I am having. Thought my findings might help anyone else with a similar issue in future.
I have a MyISAM table with about 30 rows (contains common typos of similar words for a search where both the possible original typo and the alternative word may be valid spellings, the table will slowly build up in size). However the cutoff for me is that if there are 4 items in the IN clause the index is used but when 5 are in the IN clause the index is ignored (note I haven't tried alternative words so the actual individual items in the IN clause might be a factor). So similar to the OP, but with a different number of words.
Use index would not work and the index would still be ignored. Force index does work, although I would prefer to avoid specifying indexes (just in case someone deletes the index).
For some testing I padded out the table with an extra 1000 random unique rows the query would use the relevant index even with 80 items in the IN clause.
So seems MySQL decides whether to use the index based on the number of items in the IN clause compared to the number of rows in the table (probably some other factors at play though).
Sample table
field 0 : no(PK)
field 1 : title
field 2 : description
field 3 : category1(INDEX)
field 4 : category2(INDEX)
field 5 : category3(INDEX)
field 6 : category4(INDEX)
field 7 : category5(INDEX)
Above is a sample that i will use on my website and category fields have an index each.
If i execute like this command below
select * from table where category1=1 and category2=2 and category3=3 and category4=4 and category5=5
To compare that a table have only one category field to that the table have a lot of category like above table. Which one is better?
I figured out that of course, a table which have only one category field is good choice.
But i really don't know deep information about a calculation process of index.
I have to explain something different between them to my boss!!!!
So i want to get some information with a "sample" with index cost, sample data, calculation process or other will be useful to understand about index calculation process
In general, if you have query with more than one WHERE constraint, the best index to have is compound index which contains all fields that were constrained - in your case it will be index on (category1, category2, category3, category4, category5)
However, in practice it is really wasteful to have so many compound indexes. Also, index is only useful if it has high selectivity. For example, if you have field which may have values 0 or 1 with equal probability (selectivity 1/2), it is almost always NOT worth creating index on such a field or even including this field in compound index.
At any rate, always try running EXPLAIN ANALYZE to get an idea what query planner is thinking and which index it will choose. If you have sequential scan, it may be reason to worry, but not always (for example, using low selectivity index may not be worth it for a planner)
You can analyze what the execution engine will do using EXPLAIN EXTENDED query-phrase. Best case scenario is that MySQL will use an index merge. This means that it will select every option via it's own index, then merge the result sets without any index help. Usually, a composite index is much faster, but that might depend on the number of records and the usage scenario (high or low turnover of records).
As already written by mvp before, use EXPLAIN syntax to see how the query optimizer would handle your query. In general mysql uses one index per table you access to fetch the data you are looking for. The optimizer also tries to find the one with the highest selectivity in case there are several indexes possible.
E.g. you might have a query like yours:
SELECT * FROM table WHERE category1=1 AND category2=2 AND category3=3 AND category4=4 AND category5=5
It would be possible to use a combined index that contains category1, category2, category3, category4 and category5 or also a combined index that contains only category1 and category2. The optimizer would decide at runtime which one it would take.
Another common example would be:
SELECT * FROM table WHERE category1=1 OR category2=2
The query optimizer can only use an index for category1 OR category2 but not both! At least this was what mysql EXPLAIN returned. It might be possible for other databases to run both selections in parallel and simply join the two results and remove duplicates.
Before you start adding lots of indexes remember the overhead they produce. If you have much more read accesses than write accesses it might work out. But if you have also many insert or update operations, the indexes need to be adjusted every time which causes an additional load and increases the query execution time.
For your follow up I recommend this Mysql chapter How MySQL uses indexes
In my Java application I have found a small performance issue, which is caused by such simple query:
SELECT DISTINCT a
FROM table
WHERE checked = 0
LIMIT 10000
I have index on the checked column.
In the beginning, the query is very fast (i.e. where almost all rows have checked = 0). But as I mark more and more rows as checked, the query becomes greatly inefficient (up to several minutes).
How can I improve the performance of this query ? Should I add a complex index
a, checked
or rather
checked, a?
My table has a lot of millions of rows, that is why I do not want to test it manually and hope to have lucky guess.
I would add an index on checked, a. This means that the value you're returning has already been found in the index and there's no need to re-access the table to find it. Secondly if you're doing lot's of individual updates of the table there's a good chance both the table and the index have become fragmented on the disc. Rebuilding (compacting) a table and index can significantly increase performance.
You can also use the query rewritten as (just in case the optimizer does not understand that it's equivalent):
SELECT a
FROM table
WHERE checked = 0
GROUP BY a
LIMIT 10000
Add a compound index on the DISTINCT column (a in this case). MySQL is able to use this index for the DISTINCT.
MySQL may also take profit of a compound index on (a, checked) (the order matters, the DISTINCT column has to be at the start of the index). Try both and compare the results with your data and your queries.
(After adding this index you should see Using index for group-by in the EXPLAIN output.)
See GROUP BY optimization on the manual. (A DISTINCT is very similar to a GROUP BY.)
The most efficient way to process GROUP BY is when an index is used to directly retrieve the grouping columns. With this access method, MySQL uses the property of some index types that the keys are ordered (for example, BTREE). This property enables use of lookup groups in an index without having to consider all keys in the index that satisfy all WHERE conditions.>
My table has a lot of millions of rows <...> where almost all rows have
checked=0
In this case it seems that the best index would be a simple (a).
UPDATE:
It was not clear how many rows get checked. From your comment bellow the question:
At the beginning 0 is in 100% rows, but at the end of the day it will
become 0%
This changes everything. So #Ben has the correct answer.
I have found a completely different solution which would do the trick. I will simple create a new table with all possible unique "a" values. This will allow me to avoid DISTINCT
You don't state it, but are you updating the index regularly? As changes occur to the underlying data, the index becomes less and less accurate and processing gets worse and worse. If you have an index on checked, and checked is being updated over time, you need to make sure your index is updated accordingly on a regular basis.
i have a table with about 200,000 records.
it takes a long time to do a simple select query. i am confiused because i am running under a 4 core cpu and 4GB of ram.
how should i write my query?
or is there anything to do with INDEXING?
important note: my table is static (it's data wont change).
what's your solutions?
PS
1 - my table has a primary key id
2 - my table has a unique key serial
3 - i want to query over the other fields like where param_12 not like '%I.S%'
or where param_13 = '1'
4 - 200,000 is not big and this is exactly why i am surprised.
5 - i even have problem when adding a simple field: my question
6 - can i create an INDEX for BOOL fields? (or is it usefull)
PS and thanks for answers
7 - my select shoudl return the fields that has specified 'I.S' or has not.
select * from `table` where `param_12` like '%I.S%'
this is all i want. it seems no Index helps here. ham?
Indexing will help. Please post table definition and select query.
Add index for all "=" columns in where clause.
Yes, you'll want/need to index this table and partitioning would be helpful as well. Doing this properly is something you will need to provide more information for. You'll want to use EXPLAIN PLAN and look over your queries to determine which columns and how you should index them.
Another aspect to consider is whether or not your table normalized. Normalized tables tend to give better performance due to lowered I/O.
I realize this is vague, but without more information that's about as specific as we can be.
BTW: a table of 200,000 rows is relatively small.
Here is another SO question you may find useful
1 - my table has a primary key id: Not really usefull unless you use some scheme which requires a numeric primary key
2 - my table has a unique key serial: The id is also unique by definition; why not use serial as the primary? This one is automatically indexed because you defined it as unique.
3 - i want to query over the other fields like where param_12 not like '%I.S%' or where param_13 = '1': A like '%something%' query can not really use an index; is there some way you can change param12 to param 12a which is the first %, and param12b which is 'I.S%'? An index can be used on a like statement if the starting string is known.
4 - 200,000 is not big and this is exactly why i am surprised: yep, 200.000 is not that much. But without good indexes, queries and/or cache size MySQL will need to read all data from disk for comparison, which is slow.
5 - i even have problem when adding a simple field: my question
6 - can i create an INDEX for BOOL fields? Yes you can, but an index which matches half of the time is fairly useless, an index is used to limit the amount of records MySQL has to load fully as much as possible; if an index does not dramatically limit that number, as is often the case with boolean (in a 50-50 distribution), using an index only requires more disk IO and can slow searching down. So unless you expect something like an 80-20 distribution or better creating an index will cost time, and not win time.
Index on param_13 might be used, but not the one on param_12 in this example, since the use of LIKE '% negate the index use.
If you're querying data with LIKE '%asdasdasd%' then no index can help you. It will have to do a full scan every time. The problem here is the leading % because that means that the substring you are looking for can be anywhere in the field - so it has to check it all.
Possibly you might look into full-text indexing, but depending on your needs that might not be appropriate.
Firstly, ensure your table have a primary key.
To answer in any more detail than that you'll need to provide more information about the structure of the table and the types of queries you are running.
I don't believe that the keys you have will help. You have to index on the columns used in WHERE clauses.
I'd also wonder if the LIKE requires table scans regardless of indexes. The minute you use a function like that you lose the value of the index, because you have to check each and every row.
You're right: 200K isn't a huge table. EXPLAIN PLAN will help here. If you see TABLE SCAN, redesign.