How come MySQL LIKE is faster then FULLTEXT? - mysql

I was trying to optimize my MySQL queries, but found out that i'm actually doing this wrong. I've changed my query from using
SELECT * FROM test WHERE tst_title LIKE '1%'
To:
SELECT * FROM `test` WHERE MATCH(tst_title) AGAINST("+1*" IN BOOLEAN MODE)
And the runtime, for the FULLTEXT, was terrible.. See them below:
USING LIKE:
Showing rows 0 - 24 (1960 total, Query took 0.0004 sec)
USING FULLTEXT:
Showing rows 0 - 24 (1960 total, Query took 0.0033 sec)
I've read many tutorials wherein they explained, on why you should use FULLTEXT (since this actually searches by indexes). But how would this be a slower way to retrieve data, then the LIKE statement (since the LIKE statement has to go through every single record in order to return their validity)?
I literally can't figure out on why this is happening.. Help on optimization would be appericiated a lot!

Unless you set the min_word_len to a smaller number than the default, FULLTEXT cannot find all the values starting with 1
If test_title is a numeric value (INT, FLOAT, etc), then both LIKE and FULLTEXT are terrible ways to do the query.
Given INDEX(tst_title) (and it being VARCHAR or TEXT), then LIKE is likely to run faster, since it only has to check all entries starting with 1.
The timings you list smell like the Query Cache took over. For timing purposes, use SELECT SQL_NO_CACHE ... to avoid the QC.
If you use MATCH or LIKE without having FULLTEXT or INDEX, respectively, then the query has no choice but to scan all rows in the table.
Where did 1960 total come from? Does the timing include computing that?
Is the table MyISAM? Or InnoDB? There are differences in FULLTEXT that may factor in this thread.

From what I've read, if you were using ...tst_tile LIKE '%1%' this would be slower as it has to perform a full table scan and has no index. The one you currently have with a wildcard on the right can use an index, and it is probably the reason why it is faster than using FULLTEXT.
Not too sure on it myself, but this is what I read and hope it helps.
EDIT:
May want to read this answer here for a full explanation on FULLTEXT vs LIKE

Related

how mysql `select where in` query work ? why larger amount parameter faster?

I am doing query with SELECT WHERE IN, and found something unpredicted,
SELECT a,b,c FROM tbl1 where a IN (a1,a2,a3...,a5000) --> this query took 1.7ms
SELECT a,b,c FROM tbl1 where a IN (a1,a2,a3...,a20) --> this query took 6.4ms
what is the algorithm of IN ? why larger amount of parameter faster than smaller one ?
The following is a guess...
For every SQL query, the optimizer analyzes the query to choose which index to use.
For multi-valued range queries (like IN(...)), the optimizer performs an "index dive" for each value in the short list, trying to estimate whether it's a good idea to use the index. If you are searching for values that are too common, it's more efficient to just do a table-scan, so there's no need to use the index.
MySQL 5.6 introduced a special optimization, to skip the index dives if you use a long list. Instead, it just guesses that the index on your a column may be worth using, based on stored index statistics.
You can control how long of a list causes the optimizer to skip index dives with the eq_range_index_dive_limit option. The default is 10. Your example shows a list of length 20, so I'm not sure why it's more expensive.
Read the manual about this feature here: https://dev.mysql.com/doc/refman/5.7/en/range-optimization.html#equality-range-optimization

mysql determining which part of a query is the slowest

I've written a select statement in mySQL. The duration is 50 seconds, and the fetch is 206 seconds. This is a long time. I'd like to understand WHICH part of my query is inefficient so I can improve its run time, but I'm not sure how to do that in mySQL.
My table has a little over 1,000,000 records. I have an index built in as well:
KEY `idKey` (`id`,`name`),
Here is my query:
SELECT name, id, alt_id, count(id), min(cost), avg(resale), code from
history where name like "%brian%" group by id;
I've looked at the mySQL Execution Plan, but I can't garner from that what is wrong:
If I highlight over the "Full Index Scan" part of the image, I see this:
Access Type: Index
Full Index Scan
Key/Index:
Used Key Parts: id, name
Possible Keys: idKey, id-Key, nameKey
Attach Condition:
(`allhistory`.`history`.`name` LIKE '%brian%')
Rows Examined Per Scan: 1098181
Rows Produced Per Join: 1098181
Filter: 100%
I know I can just scan a smaller subset of data by adding a LIMIT 100 into the query, and while it makes the time much shorter, (28 second duration, 0.000 sec Fetch,) I also want to see all the records - so I don't really want to put a limit on it.
Can someone more knowledgeable on this topic suggest where my query, my index, or my methodology might be inefficient for what I'm trying to accomplish?
This question has a solution only in mysql full text search functionality.
I don't consider the use of like a workable solution. Table scans are not a solution with millions of rows.
I wrote up an answer in this link, I hope you find a workable solution for yours with that reference and quick walk thru.
Here is one of the Mysql Manual Pages on Full Text Search.
I'm thinking your covered index may be backwards. Try switching the order (name, id). That way the WHERE clause can take advantage of the index.

How do Mysql index process when a table have more index than one?

Sample table
field 0 : no(PK)
field 1 : title
field 2 : description
field 3 : category1(INDEX)
field 4 : category2(INDEX)
field 5 : category3(INDEX)
field 6 : category4(INDEX)
field 7 : category5(INDEX)
Above is a sample that i will use on my website and category fields have an index each.
If i execute like this command below
select * from table where category1=1 and category2=2 and category3=3 and category4=4 and category5=5
To compare that a table have only one category field to that the table have a lot of category like above table. Which one is better?
I figured out that of course, a table which have only one category field is good choice.
But i really don't know deep information about a calculation process of index.
I have to explain something different between them to my boss!!!!
So i want to get some information with a "sample" with index cost, sample data, calculation process or other will be useful to understand about index calculation process
In general, if you have query with more than one WHERE constraint, the best index to have is compound index which contains all fields that were constrained - in your case it will be index on (category1, category2, category3, category4, category5)
However, in practice it is really wasteful to have so many compound indexes. Also, index is only useful if it has high selectivity. For example, if you have field which may have values 0 or 1 with equal probability (selectivity 1/2), it is almost always NOT worth creating index on such a field or even including this field in compound index.
At any rate, always try running EXPLAIN ANALYZE to get an idea what query planner is thinking and which index it will choose. If you have sequential scan, it may be reason to worry, but not always (for example, using low selectivity index may not be worth it for a planner)
You can analyze what the execution engine will do using EXPLAIN EXTENDED query-phrase. Best case scenario is that MySQL will use an index merge. This means that it will select every option via it's own index, then merge the result sets without any index help. Usually, a composite index is much faster, but that might depend on the number of records and the usage scenario (high or low turnover of records).
As already written by mvp before, use EXPLAIN syntax to see how the query optimizer would handle your query. In general mysql uses one index per table you access to fetch the data you are looking for. The optimizer also tries to find the one with the highest selectivity in case there are several indexes possible.
E.g. you might have a query like yours:
SELECT * FROM table WHERE category1=1 AND category2=2 AND category3=3 AND category4=4 AND category5=5
It would be possible to use a combined index that contains category1, category2, category3, category4 and category5 or also a combined index that contains only category1 and category2. The optimizer would decide at runtime which one it would take.
Another common example would be:
SELECT * FROM table WHERE category1=1 OR category2=2
The query optimizer can only use an index for category1 OR category2 but not both! At least this was what mysql EXPLAIN returned. It might be possible for other databases to run both selections in parallel and simply join the two results and remove duplicates.
Before you start adding lots of indexes remember the overhead they produce. If you have much more read accesses than write accesses it might work out. But if you have also many insert or update operations, the indexes need to be adjusted every time which causes an additional load and increases the query execution time.
For your follow up I recommend this Mysql chapter How MySQL uses indexes

MySQL: Optimizing Searches with LIKE or FULLTEXT

I am building a forum and I am looking for the proper way to build a search feature that finds users by their name or by the title of their posts. What I have come up with is this:
SELECT users.id, users.user_name, users.user_picture
FROM users, subject1, subject2
WHERE users.id = subject1.user_id
AND users.id = subject2.user_id
AND (users.user_name LIKE '%{$keywords}%'
OR subject1.title1 LIKE '%{$keywords}%'
OR subject2.title2 LIKE '%{$keywords}%')
ORDER BY users.user_name ASC
LIMIT 10
OFFSET {$offset}
The LIMIT and the OFFSET is for pagination. My question is, would doing a LIKE search through multiple tables greatly slow down performance when the number of rows reach a significant amount?
I have a few alternatives:
One, perhaps I can rewrite that query to have the LIKE searches done inside a subquery that only returns indexed user_ids. Then, I would find the remaining user information based on that. Would that increase performance by much?
Second, I suppose I can have the $keyword string appear before the first wildcard as in LIKE {$keyword}%. This way, I can index the user_name, title1, and title2 columns. However, since I will be trading accuracy for speed here, how much of a difference in performance would this make? Will it be worth sacrificing this much accuracy to index these columns?
Third, perhaps I can give users 3 search fields to choose from, and have each search through only one table. Would this increase performance by much?
Lastly, should I consider using a FULLTEXT search instead of LIKE? What are the performance differences between the two? Also, my tables are using the InnoDB storage engine, and I am not able to use the FULLTEXT index unless I switch to MyISAM. Will there be any major differences in switching to MyISAM?
Pagination is another performance issue I am worried about, because in order to do pagination, I would need to find the total number of results the query returns. At the moment, I am basically doing the query I just mentioned TWICE because the first time it is used only to COUNT the results.
There are two things in your query that will prevent MySql from using indexes firstly your patterns start with a wildcard %, MySql can't use indexes to search for patterns that start with a wildcard, secondly you have OR in your WHERE clause you need to rewrite your query using UNION to avoid using OR which also prevents MySql from using indexes. Without using an index MySql needs to do a full table scan every time and the time needed for that will increase linearly as the number of rows grow in your table and yes as you put it "it would greatly slow down performance when the number of rows reach a significant amount" so I'd say your only real scalable option is to use FULLTEXT search.
Most of your questions are explained here: http://use-the-index-luke.com/sql/where-clause/searching-for-ranges/like-performance-tuning
InnoDB/fulltext indexing is announced for MySQL 5.6, but that will probably not help you right now.
How about starting with EXPLAIN <select-statement>? http://dev.mysql.com/doc/refman/5.6/en/explain.html
Switching to MyISAM should work seemlessly. The only downside is, that MyISAM is locking the whole table upon inserts/updates, which can be slow down tables with many more inserts than selects. Basically a rule of thumb in my opinion is to use MyISAM when you don't need foreign keys and the table has far more selects than inserts and use InnoDB when the table has far more inserts/updates than selects (e.g. for a statistic table).
In your case I guess switching to MyISAM is the better choice as a fulltext index is way more powerful and faster.
It also delivers the possibilty to use certain query modifiers like excluding words ("cat -dog") or similar. But keep in mind that it's not possible to look for words ending with a phrase anymore like with a LIKE-search ("*bar"). "foo*" will work though.

How can I make this MySQL query perform better?

I have one query that is preventing me from going live with this application, because it can take up to 7 seconds to complete when it isn't cached.
SELECT attribute1
FROM `product_applications`
WHERE `product_applications`.`brand_id` IN (.. like 500 ids...)
GROUP BY attribute1
I have the brand_id indexed. I used to have this doing a SELECT DISTINCT, but opted for the GROUP BY and performance has improved slightly.
This table is using InnoDB, and has about 2.3 million rows. I have run an EXPLAIN on it and it uses the index, it just takes forever.
I know there are a lot of variables to getting something like this to perform. The db is on an Amazon EC2 instance.
Is there some sort of table splitting I could do to get the query to perform better? I really appreciate any help anybody can offer.
EDIT:
Here are the results on my explain, from NewRelic:
Id 1
Select Type SIMPLE
Table product_applications
Type range
Possible Keys brand_search_index_1,brand_search_index_2,brand_search_index_3,brand_search_index_4,brand_sarch_index_5
Key brand_search_index_1
Key Length 5
Ref
Rows 843471
Extra Using where; Using index; Using temporary; Using filesort
See, it's using the index. But it's also using a temp table and filesort. How can I overcome that stuff?
EDIT:
Since the time I opened this question, I changed the engine on this table from InnoDB to MyISAM. I also vertically partitioned the table by moving attributes 5 through 60 to another table. But this select statement STILL TAKES BETWEEN 2 AND 3 SECONDS!!!! The poor performance of this query is absolutely maddening.
A different approach if there are very few different values of attribute1 iis to try an index on attribute1 to take advantage of the loose index scan.
please refer to the following answer:
Rewriting mysql select to reduce time and writing tmp to disk
According to this answer IN should be very fast in case of constants otherwise type conversion happens which can slow things.
I would also try a covering index with brand_id as the first column and attribute1 as the second. That should speed up things because your table won't be accessed anymore.
EDIT :
About the temporary/filesort, I suspect they are caused by the your list of +500 ids. Could you try EXPLAIN on the query with only one id in the IN operator ?
If you can reduce the size of your rows that might help. Make as many columns as possible not null. If you can remove all varchar colums that could help as well.
What exactly does the index it is using cover? Possibly try making the index cover less or more columns.
Have you ran analyze table recently? That may cause it to pick another index. Also you could try forcing certain indexes.
Is there a possibility of reducing the number of ids in the IN clause? What about using a range, if they are always sequential ids?