what index should I use? - mysql

I'm getting pretty slow performance for a pretty simple statement (3-4 seconds):
SELECT col1,col2,col3 FROM table WHERE fname LIKE '%D%' OR fname LIKE '%S%' ORDER BY DATE(bday) DESC LIMIT 0,100
I have index on fname, another on bday and even a joint index on fname & bday. Here's my explain:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE table ALL NULL NULL NULL NULL 95856 Using where; Using filesort

No index will help you here.
When you're using LIKE '%something% it's going to have to look at every row and do the string matching.

As stated no index will help. Personally I'd consider full text search overkill for this and add an other column to the table, fill it in on insert update and index that.
Like is for one offs.

Related

MYSQL Array Variable (No store prodecure, No temporarily table)

As mention in the title, I would like to know any solution for this by not using store prodecure, temporarily table etc.
Compare Query#1 and Query#3, Query#3 get worst performance result. Does it have any workaround to put in variable but without impact the performance result.
Schema (MySQL v5.7)
create table `order`(
id BIGINT(20) not null auto_increment,
transaction_no varchar(20) not null,
primary key (`id`),
unique key uk_transaction_no (`transaction_no`)
);
insert into `order` (`id`,`transaction_no`)
value (1,'10001'),(2,'10002'),(3,'10003'),(4,'10004');
Query #1
explain select * from `order` where transaction_no in ('10001','10004');
id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
1
SIMPLE
order
range
uk_transaction_no
uk_transaction_no
22
2
100
Using where; Using index
Query #2
set #transactionNos = "10001,10004";
There are no results to be displayed.
Query #3
explain select * from `order` where find_in_set(transaction_no, #transactionNos);
id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
1
SIMPLE
order
index
uk_transaction_no
22
4
100
Using where; Using index
Short Answer: See Sargeability
Long Answer:
MySQL makes no attempt to optimize an expression when an indexed column when it is hidden inside a function call such as FIND_IN_SET(), DATE(), etc. Your Query 3 will always be performed as a full table scan or a full index scan.
So the only way to optimize what you are doing is to construct
IN ('10001','10004')
That is often difficult to achieve when "binding" a list of values. IN(?) will not work in most APIs to MySQL.

How to reduce rows lookup when using LIMIT MySQL

I have the following table with Index on id and Foreign Key on activityID:
comment (id, activityID, text)
and the following query:
SELECT <cols> FROM `comment` WHERE `comment`.`activityID` = 1257 ORDER BY `id` DESC LIMIT 20;
I basically want to get only the first 20 comments for this activity that has 1165, however, this is the result of a describe:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE comment ref activityID activityID 4 const 1165 NULL
Essentially, it is looking through all comments for this activity before deciding to limit it.
We tested this query under high load when an activity has 200,000 comments and the query takes 5+ seconds, whereas on the same load, an activity with 30 comments takes a couple of ms.
PS: If I remove the WHERE clause, an EXPLAIN says it will only lookup a single row (don't know if that's the case really):
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE comment index NULL PRIMARY 4 NULL 1 NULL
Is it possible to optimize this kind of query in any way?
Thank you.
The ordering is causing the slowness.
The query uses the activityID index to find all the rows with that ID. Then it has to read all 200,000 comments and sort them by id to find the last 20.
Add a composite index so it can use an index for the ordering:
ALTER TABLE comment ADD INDEX (activityID, id);
Note that you will no longer need the index on activityID by itself, since it's a prefix of this new index.
Use offset
SELECT <cols> FROM `comment` WHERE `comment`.`activityID` = 1257 ORDER BY `id` DESC LIMIT 0,20;
In limit clause add 0 as an offset to get only the first 20 comments
Just add two separate indexes on activityID and id. That should help you in ORDER BY too. There is no hard and fast rules in optimizations, but you need to try various methods.
Do it this way:
ALTER TABLE comment ADD INDEX (id);
ALTER TABLE comment ADD INDEX (activityID);
I think this will help.

Index is not used when replace function is used on indexed column

I have a sql query on table with mobileno column mobileno contains "-" so in where clause i need to use replace(mobileno,"-","") function for comparision.
With explain function i checked that it doesn't use index if function is used on indexed column. So how can i force sql to used index or any other alternative to increase my query performance.
Are you sure the index is not used because of the REPLACE function? Maybe there's not enough rows to make using an index worthwhile. Or some other reason.
I ran a test with 4256 rows and to my surprise, an index was used, even with REPLACE:
EXPLAIN EXTENDED SELECT mobileno FROM test WHERE REPLACE(mobileno, '(', '') = '123) 456-789'
id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE test index NULL mobileno 768 NULL 4256 100 "Using where; Using index"

DISTINCT causing full table scan

I have a table in MySQL (5.5.31) which has about 20M rows. The following query:
SELECT DISTINCT mytable.name name FROM mytable
LEFT JOIN mytable_c ON mytable_c.id_c = mytable.id
WHERE mytable.deleted = 0 ORDER BY mytable.date_modified DESC LIMIT 0,21
is causing full table scan, with explain saying type is ALL and extra info is Using where; Using temporary; Using filesort. Explain results:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE mytable ALL NULL NULL NULL NULL 19001156 Using where; Using temporary; Using filesort
1 SIMPLE mytable_c eq_ref PRIMARY PRIMARY 108 mytable.id 1 Using index
Without the join explain looks like:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE mytable index NULL mytablemod 9 NULL 21 Using where; Using temporary
id_c is the primary key for mytable_c and mytable_c does not have more than one row for every row in mytable. date_modified is indexed. But looks like MySQL does not understand that. If I remove the DISTINCT clause, then explain uses index and touches only 21 rows just as expected. If I remove the join it also does this. Is there any way to make it work without the full table scan with the join? explain shows mysql knows it needs only one row from mytable_c and it is using the primary key, but still does full scan on mytable.
The reason DISTINCT is there that the query is generated by the ORM system in which there might be cases where multiple rows may be produced by JOINs, but the values of SELECT fields will always be unique (i.e. if JOIN is against multi-value link only fields that are the same in every joined row will be in SELECT).
These are just generic comments, not mysql specific.
To find all the possible name values from mytable a full scan of either the table or an index needs to happen. Possible options:
full table scan
full index scan of an index starting with deleted (take advantage of the filter)
full index scan of an index starting with name (only column of concern for output)
If there was an index on deleted, the server could find all the deleted = 0 index entries and then look up the corresponding name value from the table. But if deleted has low cardinality or the statistics aren't there to say differently, it could be more expensive to do the double reads of first the index then the corresponding data item. In that case, just scan the table.
If there was an index on name, an index scan could be sufficient, but then the table needs to be checked for the filter. Again frequent hopping from index to table.
The join columns also need to be considered in a similar manner.
If you forget about the join part and had a multi-part index on columns name, deleted then an index scan would probably happen.
Update
To me the DISTINCT and ORDER BY parts are a bit confusing. Of which name record is the date_modified to be used for sorting? I think something like this would be a bit more clear:
SELECT mytable.name name --, MIN(mytable.date_modified)
FROM mytable
LEFT JOIN mytable_c ON mytable_c.id_c = mytable.id
WHERE mytable.deleted = 0
GROUP BY mytable.name
ORDER BY MIN(mytable.date_modified) DESC LIMIT 0,21
Either way, once the ORDER BY comes into play, a full scan needs to be done to find the order. Without the ORDER BY, the first 21 found could suffice.
Why do not you try to move condition mytable.deleted = 0 from WHERE to the JOIN ON ? You can also try FORCE INDEX (mytablemod)

MySQL Index questions, simple query takes too long for a relatively small amount of indexes rows on a medium-large table

I have a relatively large table (5,208,387 rows, 400mb data/670mb index),
all columns i use to search with are indexes.
name and type are VARCHAR(255) BTREE INDEX
and sdate is an INTEGER column containing timestamps.
I fail to understand some issues,
first this query is very slow (5sec):
SELECT *
FROM `mytable`
WHERE `name` LIKE 'hello%my%big%text%thing%'
AND `type` LIKE '%'
ORDER BY `sdate` DESC LIMIT 3
EXPLAIN for the above:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE mytable range name name 257 NULL 5191 Using where
while this one is very fast (5msec):
SELECT *
FROM `mytable`
WHERE `name` LIKE 'hello.my%big%text%thing%'
AND `type` LIKE '%'
ORDER BY `sdate` DESC LIMIT 3
EXPLAIN for the above:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE mytable range name name 257 NULL 204 Using where
the amount of rows scanned different makes sense because of the indexes,
but having 5k of indexed rows take 5 seconds seems way too much.
also, ordering by name instead of sdate makes the queries very fast, but I need to order by the timestamp.
Second thing I do not understand is that before
adding the last column to the index,
the db had index of 1.4GB,
not after running an OPTIMIZE/REPAIR the size is just 670MB.
The problem is, only the portion before the first % can take advantage of the index, the rest of the like strings needs to process all rows which match hello% or hello.my% without the help of one. Also, ordering by another column then the index used, probably requires a second pass, or at least a scan rather then an already sorted index. Options to better performance (can be implemented independently from each other) are:
Using a full-text index on the name column and using a MATCH() AGAINST() search rather then LIKE with %'s.
Adding the sdate to in index combined (name,sdate) could very well speed up sorting.