mysql match() agains() OR match() against() SLOW - mysql

I have table with quite many rows (~2M).
When i search it like
SELECT * FROM product WHERE
MATCH(name) AGAINST('+some' '+word' IN BOOLEAN MODE)
it works like charm and finds what i need in less than 0.5s.
But when i search for 2 sets of words like this
SELECT * FROM product WHERE
MATCH(name) AGAINST('+some' '-word' IN BOOLEAN MODE)
OR
MATCH(name) AGAINST('+something' '-other' IN BOOLEAN MODE)
search takes sometimes over minute.
I would expect it to work 2 times slower (it's 2 searches), maybe a bit more (you still have to compare results and remove duplicates, but if there are only few results it should not take long), but not so much longer. After adding OR it works slower, than LIKE "%...%" OR LIKE "%...%"
Anyone can tell me what am i doing wrong?

Unfortunately for you, fulltext indexes have some limitations, and not being able to properly merge the results of two independent fulltext searches is one them:
The Index Merge optimization algorithm has the following known limitations:
[...]
Index Merge is not applicable to full-text indexes.
Fortunately for you, fulltext searches can be more complex, so you can merge your searches yourself. Your second query can be written as a single search using:
SELECT * FROM product WHERE
MATCH(name) AGAINST('(+something -other) (+some -word)' IN BOOLEAN MODE)
This defines two search sets and is ok if either of the two (...) matches - which is an or.
Alternatively, you can use a union instead of an or, which allows MySQL to actually run two independent fulltext searches and then combine the two results, which is basically what you are thinking of:
SELECT * FROM product WHERE
MATCH(name) AGAINST('+some -word' IN BOOLEAN MODE)
UNION
SELECT * FROM product WHERE
MATCH(name) AGAINST('+something -other' IN BOOLEAN MODE)
This also works for more complicated situations, e.g. merging searches on different columns, but will not work that easy if you want to do something else than or.

Related

MySQL Match Against for exact Phrase working partially

I have a table in which I created FullText index in a column called item_desc.
Let's say table contains three records in which column item_desc includes "Sodium Chloride" like following:
Solution Sodium Chloride standard
5425 Sodium Chloride 100u
QtySodium Chloride solution
I have a following (Match, Against) query which supposed to be return rows by exact matching the records but it is returning only first two rows against Sodium Chloride and doesn't consider the phrase if it is concatenated with another word like QtySodium Chloride.
SELECT * FROM tblhugedata WHERE MATCH(Item_desc) AGAINST('"*Sodium Chloride*"' IN BOOLEAN MODE);
Following LIKE query returns expected results but I want to use only FullText index.
SELECT * FROM tblhugedata WHERE Item_desc like '%SODIUM CHLORIDE%';
Is there anyway to extract such results by match, Against way.
Remove the asterisks. FULLTEXT does not allow for leading wildcards. That is, there is no way to get MATCH to match QtySodium against Sodium.
I would consider "QtySodium" to be "garbage in" and complain to the provider of the data.
Here is a kludge that will work in some cases:
WHERE WHERE MATCH(Item_desc) AGAINST('Sodium Chloride' IN BOOLEAN MODE)
AND Item_desc LIKE '%SODIUM CHLORIDE%'
That way, it will efficiently filter down to rows that have either "Sodium" or "Chloride", then check such rows for exactly the substring "Sodium Chloride". That will match your 3 examples, but perhaps not some other examples.
SELECT * FROM tblhugedata WHERE MATCH(Item_desc) AGAINST( 'Sodium Chloride' IN NATURAL LANGUAGE MODE);
InnoDB full-text search does not support the use of multiple operators on a single search word.

How to optimize a MySQL/MyISAM full text search with many results

I have a MySQL MyISAM table with a full text index on the keywords column and 20 millions rows. It works well when a search for rare words, for example:
SELECT count(*) FROM books WHERE MATCH(keywords) AGAINST ('+DUCK' IN BOOLEAN MODE)
(0.005s, 2k results)
But when I search for a more common terms it is much slowers:
SELECT count(*) FROM books WHERE MATCH(keywords) AGAINST ('+YES' IN BOOLEAN MODE)
(5s, 2millions results)
It makes sens because the last one returns much more rows, but then how can I pre-filter the rows before the text search? This doesn't work:
SELECT count(*) FROM books WHERE date > "2019-09-23" AND MATCH(keywords) AGAINST ('+YES' IN BOOLEAN MODE)
(5s, 0 result)
MyISAM's (and maybe InnoDB's) FULLTEXT will always do the MATCH first, then any other clauses. Hence, adding that extra filter does not help with speed.
Think of it this way... A FT index is constructed to test the entire table(s) for the MATCH clause. It is not ready to handle any filtering before it goes to work. So, you are stuck with FT first, then filter the results the other way but without benefit of any indexes.

mysql: NOT MATCH vs NOT IN subquery for fulltext search

I want to find all rows that match a full-text search for one pair of columns but also do not match the same text in another column.
Both of these seem to work
SELECT * FROM docs WHERE MATCH(title, descript) AGAINST ('energy' IN BOOLEAN MODE) AND NOT MATCH(categories) AGAINST ('energy' IN BOOLEAN MODE);
Or using a subquery:
SELECT * FROM docs WHERE MATCH(title, descript) AGAINST ('energy' IN BOOLEAN MODE) AND id NOT IN (SELECT id FROM docs where MATCH(categories) AGAINST ('energy' IN BOOLEAN MODE));
The docs field has the relevant full-text indexes set up.
Any reason to prefer one over the other?
On the (small) database I'm using they are both very fast, too fast to measure reliably.
Thanks for any suggestions.

MySQL - Efficient search with partial word match and relevancy score (FULLTEXT)

How can I do a MySQL search which will match partial words but also provide accurate relevancy sorting?
SELECT name, MATCH(name) AGAINST ('math*' IN BOOLEAN MODE) AS relevance
FROM subjects
WHERE MATCH(name) AGAINST ('math*' IN BOOLEAN MODE)
The problem with boolean mode is the relevancy always returns 1, so the sorting of results isn't very good. For example, if I put a limit of 5 on the search results the ones returned don't seem to be the most relevant sometimes.
If I search in natural language mode, my understanding is that the relevancy score is useful but I can't match partial words.
Is there a way to perform a query which fulfils all of these criteria:
Can match partial words
Results are returned with accurate relevancy
Is efficient
The best I've got so far is:
SELECT name
FROM subjects
WHERE name LIKE 'mat%'
UNION ALL
SELECT name
FROM subjects
WHERE name LIKE '%mat%' AND name NOT LIKE 'mat%'
But I would prefer not to be using LIKE.
The new InnoDB full-text search feature in MySQL 5.6 helps in this case.
I use the following query:
SELECT MATCH(column) AGAINST('(word1* word2*) ("word1 word1")' IN BOOLEAN MODE) score, id, column
FROM table
having score>0
ORDER BY score
DESC limit 10;
where ( ) groups words into a subexpression. The first group has like word% meaning; the second looks for exact phrase. The score is returned as float.
I obtained a good solution in this (somewhat) duplicate question a year later:
MySQL - How to get search results with accurate relevance

Column that matches in a fulltext index spanning multiple columns

I am using the latest version of MySQL 5.5.
I have a fulltext index spanning multiple columns in a table generated specifically for fulltext search (other tables in the database uses innodb):
somedata_search
========
id
name
about
note
dislike
I have a fulltext index on all the columns except for ID. I am able to run fulltext searches using:
SELECT * FROM account_search WHERE MATCH(name, about, note, dislike) AGAINST('mykeyword*' IN BOOLEAN MODE);
This all works fine, but is there a way to deteremine which column the match originates from for each row? If there are matches across columns in a row, I am happy to have just the first column.
I don't think there is any "native" way of getting it but it is possible to do it anyway.
I'm not sure this is fast but it returns the correct data
select text_test.*,
match(name) against ('dude' in boolean mode) as name_match,
match(info) against ('dude' in boolean mode) as info_match
from text_test
where match(name, info) against ('dude' in boolean mode);
http://sqlfiddle.com/#!2/5159c/1