MySQL MATCH() AGAINST() vs. REGEXP for matching whole words - mysql

I am trying to optimise searching in my dictionary (109,000 entries, MyISAM, FULLTEXT), and I am now comparing the performance of MATCH() AGAINST() with that of REGEXP '[[:<:]]keyword1[[:>:]]' AND table.field REGEXP '[[:<:]]keyword2[[:>:]]'.
Using two keywords, I get (inside PhpMyAdmin) 0.0000 seconds or 0.0010 seconds for the MATCH() AGAINST() query vs. 0.1962 seconds or 0.2190 seconds for the regex query. Is speed the only indicator that matters here? Which query should I prefer (both appear to yield the exact same results)? Is it the obvious – the faster one?
Here are the full queries:
SELECT * FROM asphodel_dictionary_unsorted
JOIN asphodel_dictionary_themes ON asphodel_dictionary_unsorted.theme_id = asphodel_dictionary_themes.theme_id
LEFT JOIN asphodel_dictionary_definitions ON asphodel_dictionary_unsorted.term_id = asphodel_dictionary_definitions.term_id
WHERE MATCH (asphodel_dictionary_unsorted.english)
AGAINST ('+boiler +pump' IN BOOLEAN MODE)
and
SELECT * FROM asphodel_dictionary_unsorted
JOIN asphodel_dictionary_themes ON asphodel_dictionary_unsorted.theme_id = asphodel_dictionary_themes.theme_id
LEFT JOIN asphodel_dictionary_definitions ON asphodel_dictionary_unsorted.term_id = asphodel_dictionary_definitions.term_id
WHERE asphodel_dictionary_unsorted.english REGEXP '[[:<:]]boiler[[:>:]]'
AND asphodel_dictionary_unsorted.english REGEXP '[[:<:]]pump[[:>:]]'
ORDER BY asphodel_dictionary_unsorted.theme_id, asphodel_dictionary_unsorted.english

The MATCH/AGAINST solution uses a FULLTEXT index, and it searches the index pretty efficiently.
The REGEXP solution cannot use an index. It always forces a table-scan and tests every row with the regular expression. As your table grows, it will take longer to do REGEXP queries in linear proportion to the number of rows.
I did a presentation Full Text Search Throwdown some years ago, where I compared fulltext-indexed approaches against LIKE and REGEXP. With sample data of 7.4 million rows, the REGEXP took 7 minutes, 57 seconds, whereas searching an InnoDB FULLTEXT index in boolean mode took 350 milliseconds — the MATCH/AGAINST query was 1,363 times faster.
The difference grows even larger the more rows you have.

Related

MySQL SELECT on multiple FULLTEXT indexes. Extremely slow results

10 million rows. MySQL server V. 5.7 Two indexes called "tagline" and "experience".
This statement takes < 1 second:
SELECT count(*) FROM pa
WHERE MATCH(tagline) AGAINST('"developer"' IN BOOLEAN MODE);
This statement also takes < 1 second:
SELECT count(*) FROM pa
WHERE MATCH(experience) AGAINST('"python"' IN BOOLEAN MODE);
This combined statement takes 30 seconds:
SELECT count(*) FROM pa
WHERE MATCH(tagline) AGAINST('"developer"' IN BOOLEAN MODE)
AND MATCH(experience) AGAINST('"python"' IN BOOLEAN MODE);
Similar problem outlined here. Essentially slight alterations to fulltext match make it useless:
https://medium.com/hackernoon/dont-waste-your-time-with-mysql-full-text-search-61f644a54dfa
Change the last one to
SELECT count(*) FROM pa
WHERE MATCH(tagline, experience) AGAINST('+developer +python' IN BOOLEAN MODE)
and add
FULLTEXT(tagline, experience)
(I am assuming you are using Engine=InnoDB.)
Be aware that when using MATCH, it is performed first; anything else. In your case, one MATCH was performed, then it struggled to perform the other, there is way to run a second MATCH efficiently.
Went with Sphinx. https://www.youtube.com/watch?v=OP0c26k_iQc
Fairly easy way of upgrading the capabilities of MySQL without committing to a new stack.

MySQL fulltext or exact value search is very slow

I have this table of 9.5 million rows and I need to perform both fulltext and exact value search over the same column.
Altho there are 2 indexes over this column, one BTREE, one FULLTEXT, database engine doesn't use any and goes thru all 9.5M rows.
select * from mytable
where match(document) against ('+111/05257' in boolean mode)
or document = '111/05257';
-- very slow, takes ~ 9 seconds
-- possible keys: both
-- used key: none :(
If I use only one type of search, queries run fast.
select * from mytable where document = '111/05257';
-- very fast, around 80 ms
-- used key: btree
select * from mytable where match(document) against ('+111/05257' in boolean mode)
-- very fast, around 100 ms
-- used key: fulltext
Given poorly structured data at document column, ranging from '1/XA' thru '5778292019' to 'S:NXA/0001/XA2019/111/05257', I need to use both exact and partial (fulltext) search over this column.
Wildcard searches ('%111/05257%') also perform terribly over btree index.
Any idea how to solve this?
Thank you all
Queries involving OR are notoriously hard to optimize. A common solution is to change them into two queries, and UNION the results:
select * from mytable
where match(document) against ('+111/05257' in boolean mode)
UNION
select * from mytable
where document = '111/05257';
Each of the respective queries should be free to use a different index. The UNION will eliminate any rows in common from the two results.

How to optimize a MySQL/MyISAM full text search with many results

I have a MySQL MyISAM table with a full text index on the keywords column and 20 millions rows. It works well when a search for rare words, for example:
SELECT count(*) FROM books WHERE MATCH(keywords) AGAINST ('+DUCK' IN BOOLEAN MODE)
(0.005s, 2k results)
But when I search for a more common terms it is much slowers:
SELECT count(*) FROM books WHERE MATCH(keywords) AGAINST ('+YES' IN BOOLEAN MODE)
(5s, 2millions results)
It makes sens because the last one returns much more rows, but then how can I pre-filter the rows before the text search? This doesn't work:
SELECT count(*) FROM books WHERE date > "2019-09-23" AND MATCH(keywords) AGAINST ('+YES' IN BOOLEAN MODE)
(5s, 0 result)
MyISAM's (and maybe InnoDB's) FULLTEXT will always do the MATCH first, then any other clauses. Hence, adding that extra filter does not help with speed.
Think of it this way... A FT index is constructed to test the entire table(s) for the MATCH clause. It is not ready to handle any filtering before it goes to work. So, you are stuck with FT first, then filter the results the other way but without benefit of any indexes.

speed up slow count in a 2 millions rows table

i have a Mysql table which has 2 millions rows.
The size is 600Mb.
this query take 2 seconds.
I don't know how to speed it up. The table is already in a Myisam format.
I don't know if i reached the limit of the slowness of a select count.
SELECT COUNT(video) FROM yvideos use index (PRIMARY) WHERE rate>='70' AND tags LIKE '%;car;%'
Thanks all
Yes, it can be optimised.
Firstly, you are doing a full scan with LIKE, because MySQL can not use an index with variable left part (it's possible for ';car;%', but not for '%;car;%').
Secondly, MySQL (in most cases) doesn't use more than one index for a SELECT, so if you have two separate indexes for rate and tags, only one will be used.
So to deal with these things I advice to:
1. use a fulltext index for tags column,
2. replace one query with two separate queries and "glue" result with INNER JOIN (equals to WHERE ... AND ... in this case).
So in the end:
SELECT t1.* FROM
(SELECT * FROM yvideos WHERE rate >= 60) t1
INNER JOIN
(SELECT * FROM yvideos WHERE MATCH (tags) AGAINST ('+car +russia -usa' IN BOOLEAN MODE)) t2
USING (id);
Live example on SQLFiddle.
Execute EXPLAIN for this query and take a look at a plan. Now there is no full scan, all filtering are done using indexes.
For more information about boolean fulltext searches you can read a documetation.
BTW, fulltext indexes are supported in both InnoDB and MyISAM now, so you can decide about an engine.

Best way to use indexes on large mysql like query

This mysql query is runned on a large (about 200 000 records, 41 columns) myisam table :
select t1.* from table t1 where 1 and t1.inactive = '0' and (t1.code like '%searchtext%' or t1.name like '%searchtext%' or t1.ext like '%searchtext%' ) order by t1.id desc LIMIT 0, 15
id is the primary index.
I tried adding a multiple column index on all 3 searched (like) columns. works ok but results are served on a auto filled ajax table on a website and the 2 seond return delay is a bit too slow.
I also tried adding seperate indexes on all 3 columns and a fulltext index on all 3 columns without significant improvement.
What would be the best way to optimize this type of query? I would like to achieve under 1 sec performance, is it doable?
The best thing you can do is implement paging. No matter what you do, that IO cost is going to be huge. If you only return one page of records, 10/25/ or whatever that will help a lot.
As for the index, you need to check the plan to see if your index is actually being used. A full text index might help but that depends on how many rows you return and what you pass in. Using parameters such as % really drain performance. You can still use an index if it ends with % but not starts with %. If you put % on both sides of the text you are searching for, indexes can't help too much.
You can create a full-text index that covers the three columns: code, name, and ext. Then perform a full-text query using the MATCH() AGAINST () function:
select t1.*
from table t1
where match(code, name, ext) against ('searchtext')
order by t1.id desc
limit 0, 15
If you omit the ORDER BY clause the rows are sorted by default using the MATCH function result relevance value. For more information read the Full-Text Search Functions documentation.
As #Vulcronos notes, the query optimizer is not able to use the index when the LIKE operator is used with an expression that starts with a wildcard %.