10 million rows. MySQL server V. 5.7 Two indexes called "tagline" and "experience".
This statement takes < 1 second:
SELECT count(*) FROM pa
WHERE MATCH(tagline) AGAINST('"developer"' IN BOOLEAN MODE);
This statement also takes < 1 second:
SELECT count(*) FROM pa
WHERE MATCH(experience) AGAINST('"python"' IN BOOLEAN MODE);
This combined statement takes 30 seconds:
SELECT count(*) FROM pa
WHERE MATCH(tagline) AGAINST('"developer"' IN BOOLEAN MODE)
AND MATCH(experience) AGAINST('"python"' IN BOOLEAN MODE);
Similar problem outlined here. Essentially slight alterations to fulltext match make it useless:
https://medium.com/hackernoon/dont-waste-your-time-with-mysql-full-text-search-61f644a54dfa
Change the last one to
SELECT count(*) FROM pa
WHERE MATCH(tagline, experience) AGAINST('+developer +python' IN BOOLEAN MODE)
and add
FULLTEXT(tagline, experience)
(I am assuming you are using Engine=InnoDB.)
Be aware that when using MATCH, it is performed first; anything else. In your case, one MATCH was performed, then it struggled to perform the other, there is way to run a second MATCH efficiently.
Went with Sphinx. https://www.youtube.com/watch?v=OP0c26k_iQc
Fairly easy way of upgrading the capabilities of MySQL without committing to a new stack.
Related
I have a MySQL query where I want to use a MATCH AGAINST on the result of a WITH clause, like this:
WITH (select stuff from somewhere)
WHERE MATCH(somefield) AGAINST(' +"term_a" +"term_b"' IN BOOLEAN MODE)
However, that generates an error [HY000][1214] The used table type doesn't support FULLTEXT indexes
Is there a way to use MATCH against the results of a WITH clause? The MATCH/AGAINST does work against the tables themselves if I unroll the WITH, so that's not a problem. It's only a problem when I use the WITH.
I have table with quite many rows (~2M).
When i search it like
SELECT * FROM product WHERE
MATCH(name) AGAINST('+some' '+word' IN BOOLEAN MODE)
it works like charm and finds what i need in less than 0.5s.
But when i search for 2 sets of words like this
SELECT * FROM product WHERE
MATCH(name) AGAINST('+some' '-word' IN BOOLEAN MODE)
OR
MATCH(name) AGAINST('+something' '-other' IN BOOLEAN MODE)
search takes sometimes over minute.
I would expect it to work 2 times slower (it's 2 searches), maybe a bit more (you still have to compare results and remove duplicates, but if there are only few results it should not take long), but not so much longer. After adding OR it works slower, than LIKE "%...%" OR LIKE "%...%"
Anyone can tell me what am i doing wrong?
Unfortunately for you, fulltext indexes have some limitations, and not being able to properly merge the results of two independent fulltext searches is one them:
The Index Merge optimization algorithm has the following known limitations:
[...]
Index Merge is not applicable to full-text indexes.
Fortunately for you, fulltext searches can be more complex, so you can merge your searches yourself. Your second query can be written as a single search using:
SELECT * FROM product WHERE
MATCH(name) AGAINST('(+something -other) (+some -word)' IN BOOLEAN MODE)
This defines two search sets and is ok if either of the two (...) matches - which is an or.
Alternatively, you can use a union instead of an or, which allows MySQL to actually run two independent fulltext searches and then combine the two results, which is basically what you are thinking of:
SELECT * FROM product WHERE
MATCH(name) AGAINST('+some -word' IN BOOLEAN MODE)
UNION
SELECT * FROM product WHERE
MATCH(name) AGAINST('+something -other' IN BOOLEAN MODE)
This also works for more complicated situations, e.g. merging searches on different columns, but will not work that easy if you want to do something else than or.
I have a MySQL MyISAM table with a full text index on the keywords column and 20 millions rows. It works well when a search for rare words, for example:
SELECT count(*) FROM books WHERE MATCH(keywords) AGAINST ('+DUCK' IN BOOLEAN MODE)
(0.005s, 2k results)
But when I search for a more common terms it is much slowers:
SELECT count(*) FROM books WHERE MATCH(keywords) AGAINST ('+YES' IN BOOLEAN MODE)
(5s, 2millions results)
It makes sens because the last one returns much more rows, but then how can I pre-filter the rows before the text search? This doesn't work:
SELECT count(*) FROM books WHERE date > "2019-09-23" AND MATCH(keywords) AGAINST ('+YES' IN BOOLEAN MODE)
(5s, 0 result)
MyISAM's (and maybe InnoDB's) FULLTEXT will always do the MATCH first, then any other clauses. Hence, adding that extra filter does not help with speed.
Think of it this way... A FT index is constructed to test the entire table(s) for the MATCH clause. It is not ready to handle any filtering before it goes to work. So, you are stuck with FT first, then filter the results the other way but without benefit of any indexes.
I am trying to optimise searching in my dictionary (109,000 entries, MyISAM, FULLTEXT), and I am now comparing the performance of MATCH() AGAINST() with that of REGEXP '[[:<:]]keyword1[[:>:]]' AND table.field REGEXP '[[:<:]]keyword2[[:>:]]'.
Using two keywords, I get (inside PhpMyAdmin) 0.0000 seconds or 0.0010 seconds for the MATCH() AGAINST() query vs. 0.1962 seconds or 0.2190 seconds for the regex query. Is speed the only indicator that matters here? Which query should I prefer (both appear to yield the exact same results)? Is it the obvious – the faster one?
Here are the full queries:
SELECT * FROM asphodel_dictionary_unsorted
JOIN asphodel_dictionary_themes ON asphodel_dictionary_unsorted.theme_id = asphodel_dictionary_themes.theme_id
LEFT JOIN asphodel_dictionary_definitions ON asphodel_dictionary_unsorted.term_id = asphodel_dictionary_definitions.term_id
WHERE MATCH (asphodel_dictionary_unsorted.english)
AGAINST ('+boiler +pump' IN BOOLEAN MODE)
and
SELECT * FROM asphodel_dictionary_unsorted
JOIN asphodel_dictionary_themes ON asphodel_dictionary_unsorted.theme_id = asphodel_dictionary_themes.theme_id
LEFT JOIN asphodel_dictionary_definitions ON asphodel_dictionary_unsorted.term_id = asphodel_dictionary_definitions.term_id
WHERE asphodel_dictionary_unsorted.english REGEXP '[[:<:]]boiler[[:>:]]'
AND asphodel_dictionary_unsorted.english REGEXP '[[:<:]]pump[[:>:]]'
ORDER BY asphodel_dictionary_unsorted.theme_id, asphodel_dictionary_unsorted.english
The MATCH/AGAINST solution uses a FULLTEXT index, and it searches the index pretty efficiently.
The REGEXP solution cannot use an index. It always forces a table-scan and tests every row with the regular expression. As your table grows, it will take longer to do REGEXP queries in linear proportion to the number of rows.
I did a presentation Full Text Search Throwdown some years ago, where I compared fulltext-indexed approaches against LIKE and REGEXP. With sample data of 7.4 million rows, the REGEXP took 7 minutes, 57 seconds, whereas searching an InnoDB FULLTEXT index in boolean mode took 350 milliseconds — the MATCH/AGAINST query was 1,363 times faster.
The difference grows even larger the more rows you have.
i have a Mysql table which has 2 millions rows.
The size is 600Mb.
this query take 2 seconds.
I don't know how to speed it up. The table is already in a Myisam format.
I don't know if i reached the limit of the slowness of a select count.
SELECT COUNT(video) FROM yvideos use index (PRIMARY) WHERE rate>='70' AND tags LIKE '%;car;%'
Thanks all
Yes, it can be optimised.
Firstly, you are doing a full scan with LIKE, because MySQL can not use an index with variable left part (it's possible for ';car;%', but not for '%;car;%').
Secondly, MySQL (in most cases) doesn't use more than one index for a SELECT, so if you have two separate indexes for rate and tags, only one will be used.
So to deal with these things I advice to:
1. use a fulltext index for tags column,
2. replace one query with two separate queries and "glue" result with INNER JOIN (equals to WHERE ... AND ... in this case).
So in the end:
SELECT t1.* FROM
(SELECT * FROM yvideos WHERE rate >= 60) t1
INNER JOIN
(SELECT * FROM yvideos WHERE MATCH (tags) AGAINST ('+car +russia -usa' IN BOOLEAN MODE)) t2
USING (id);
Live example on SQLFiddle.
Execute EXPLAIN for this query and take a look at a plan. Now there is no full scan, all filtering are done using indexes.
For more information about boolean fulltext searches you can read a documetation.
BTW, fulltext indexes are supported in both InnoDB and MyISAM now, so you can decide about an engine.