MySQL boolean full text search not working as expected - mysql

I have a table of chemical substances, called substances. I'm trying to impelement a search facility making use of MySQL's full text natural language capabilities.
I have run the following to allow such commands to work on my substances table on the name column:
ALTER TABLE substances ADD FULLTEXT(`name`);
If I run the following command it gives me any results which contain either the word "Chromium" or "Trioxide" as expected:
SELECT * FROM substances WHERE MATCH (`name`) AGAINST ('Chromium Trioxide' IN NATURAL LANGUAGE MODE);
However what I want to do is find only rows that contain "Chromium Trioxide", even if there are characters in between them (e.g. "Chromium (VI) Trioxide"). My understanding is that using a + before each word would do this:
SELECT * FROM substances WHERE MATCH (`name`) AGAINST ('+Chromium +Trioxide' IN NATURAL LANGUAGE MODE);
But it gives me the same results as the original query - i.e. anything that contains either "Chromium" or "Trioxide" but not both.
Where am I going wrong? I've read up on Boolean Full Text searches (https://dev.mysql.com/doc/refman/5.7/en/fulltext-boolean.html) but the only info I found was to include the + before each keyword.
MySQL version is 5.7.9 and the table is MyISAM.

I don't know how solve it using mysql boolean full-text search or natural full-text search,
but i think that can be done using REGEX (see https://dev.mysql.com/doc/refman/5.7/en/regexp.html)
SELECT * FROM substances WHERE `name` REGEXP 'Chromium .* Trioxide'

Related

MyIsam fulltext search against multiple %term%

I have a field called filepath that I'm trying to search. Here is an example path:
/mnt/qfs-X/Asset_Management/XG_Marketing_/Episodic-SG_1001_1233.jpg
I would like to be able to search the following and get a match:
search = "qf episodic sg_1001 JPG"
How would I do this with a fulltext search in mysql/myisam? What I have now is:
SELECT * FROM x_files2 WHERE MATCH(path)
AGAINST('qf episodic sg_1001 JPG' in boolean mode)
But it is returning way too many results (seems like it's returning if any terms are found instead of only those where all are found.
Put + in front of each 'word':
AGAINST('+qf* +episodic +sg_1001* +JPG' in boolean mode)
Do you have the min-word-length set to 2? If not, there could be other troubles.
The + avoids "too many".
Consider switching to InnoDB, now that it has FULLTEXT.
You may have to abandon use of FULLTEXT and switch to LIKE:
WHERE path LIKE '%qf%episodic%sg_1001%JPG%'
If performance is an issue, consider something like
WHERE MATCH(path) AGAINST('...' IN BOOLEAN MODE) -- using some of the words
AND path LIKE '...' -- as above
The MATCH will run first, whittling down the number of possible rows considerably, then the LIKE takes care of details.
Note that middles of words cannot be used in AGAINST. Those could be left out, relying on LIKE to take care of them.

Equivalent searching using "like" with "match()"

I have a query like this:
SELECT * FROM table WHERE col LIKE '%word'
Now what is its the equivalent of above query in fulltext indexed?
It should be noted that this does note work:
SELECT * FROM table WHERE MATCH(col) AGAINST('+word' IN BOOLEAN MODE)
EDIT
I want to select 'test' in this sentence:
it is a ttest.
I don't think MySQL directly supports this. You can speed the query by doing:
SELECT t.*
FROM (SELECT *
FROM table
WHERE MATCH(col) AGAINST('+word' IN BOOLEAN MODE)
) t
WHERE col LIKE '%word';
(I'm not 100% sure that the subquery is needed.)
If you know that this will be a common type of search, you can hack a solution. Modify each of the documents to be something like:
BEGINDOCMARKER blah blah blah ENDDOCMARKER
Then you can search for:
SELECT *
FROM table
WHERE MATCH(col) AGAINST('"word ENDDOCMARKER"' IN BOOLEAN MODE)
If you do take this approach, be careful with the marker. It is tempting to use ^ and $ (ala regular expressions), but these will be unindexed due to the minimum word length and other reasons. Some uncommon jumble of characters is a good bet.
Also note that this makes the text longer and you will probably want to remove these words for presentation purposes. If space is not an issue, duplicate the field, one for searching and one for presenting. This can be useful for another reason: handling synonyms. Or, consider other text search alternatives. Other tools do allow positioning information in the query.

match against not making sense

This is my filter text:
Oliver used book
If I search for 'Oliver' it works, if I search for 'book' it works but if I search for 'used' it does not work.
Heater white fan HEOP1322
Heater -> works : white -> works : fan -> does not work : HEOP -> does not work : HEOP1322 -> works.
My query is like this:
SELECT * FROM table WHERE MATCH(filter) AGAINST ('fan' IN BOOLEAN MODE)
SELECT * FROM table WHERE MATCH(filter) AGAINST ('HEOP' IN BOOLEAN MODE)
SELECT * FROM table WHERE MATCH(filter) AGAINST ('used' IN BOOLEAN MODE)
Why d'hell does the word used not work and the word book works? They have the same length.
I also tried this suggestions Mysql search for string and number using MATCH() AGAINST() without success.
Edit: Solved, follow this instructions.
XAMPP MySQL - Setting ft_min_word_len
"used" is one of the default MySQL full text stopwords: https://dev.mysql.com/doc/refman/5.1/en/fulltext-stopwords.html. Stopwords are words which are ignored because they are too frequent in the (English) language and would not positively contribute to the result of a full text search. If you're only querying for single words, a LIKE %..% query may be more suited than a full-blown full text search.

mysql boolean mode fulltext search with wildcards and literals

I'm pretty new to MySQL full-text searches and I ran into this problem today:
My company table has a record with "e-magazine AG" in the name column. I have a full-text index on the name column.
When I execute this query the record is not found:
SELECT id, name FROM company WHERE MATCH(name) AGAINST('+"e-magazi"*' IN BOOLEAN MODE);
I need to work with quotes because of the dash and to use the wildcard because I implement a "search as you type" functionality.
When I search for the whole term "e-magazine AG", the record is found.
Any ideas what I'm doing wrong here? I read about adding the dash to the list of word characters (config update needed) but I'm searching for a way to do this programmatically.
This clause
MATCH(name) AGAINST('+"e-magazi"*' IN BOOLEAN MODE);
Will search for a AND "e" AND NOT "magazi"; i.e. the - inside "e-magazi" will be interpreted as a not even though it is inside quotation marks.
For this reason it will not work as expected.
A solution is to apply an extra having clause with a LIKE.
I know this having is slow, but it will only be applied to the results of the match, so not too many rows should be involved.
I suggest something like:
SELECT id, name
FROM company
WHERE MATCH(name) AGAINST('magazine' IN BOOLEAN MODE)
HAVING name LIKE '%e-magazi%';
MySQL fulltext treats the word e-magazine in a text as a phrase and not as a word. Because of that it results the two words e and magazine. And while it builds the search index it does not add the e to the index because of the ft_min_word_len (default is 4 chars).
The same length limitation is used for the search query. That is the reason why a search for e-magazine returns exactly the same results as a-magazine because a and - is fully ignored.
But now you want to find the exact phrase e-magazine. By that you use the quotes and that is the complete correct way to find phrases, but MySQL does not support operators for phrases, only for words:
https://dev.mysql.com/doc/refman/5.7/en/fulltext-boolean.html
With this modifier, certain characters have special meaning at the beginning or end of words in the search string
Some people would suggest to use the following query:
SELECT id, name
FROM company
WHERE MATCH(name) AGAINST('e-magazi*' IN BOOLEAN MODE)
HAVING name LIKE 'e-magazi%';
As I said MySQL ignores the e- and searches for the wildcard word magazi*. After those results are optained it uses HAVING to aditionally filter the results for e-magazi* including the e-. By that you will find the phrase e-magazine AG. Of course HAVING is only needed if the search phrase contains the wildcard operator and you should never use quotes. This operator is used by your user and not you!
Note: As long you do not surround the search phrase with % it will find only fields that start with that word. And you do not want to surround it, because it would find bee-magazine as well. So maybe you need an additional OR HAVING name LIKE ' %e-magazi%' OR HAVING NAME LIKE '\\n%e-magazi%' to make it usable inside of texts.
Trick
But finally I prefer a trick so HAVING isn't needed at all:
If you add texts to your database table, add them additionally to a separate fulltext indexed column and replace words like up-to-date with up-to-date uptodate.
If a user searches for up-to-date replace it in the query with uptodate.
By that you can still find specific in user-specific but up-to-date as well (and not only date).
Bonus
If a user searches for -well-known huge ports MySQL treats that as not include *well*, could include *known* and *huge*. Of course you could solve that with an other extra query variant as well, but with the trick above you remove the hyphen so the search query looks simply like that:
SELECT id
FROM texts
WHERE MATCH(text) AGAINST('-wellknown huge ports' IN BOOLEAN MODE)

mysql fulltext MATCH,AGAINST returning 0 results

I am trying to follow: http://dev.mysql.com/doc/refman/4.1/en/fulltext-natural-language.html
in an attempt to improve search queries, both in speed and the ability to order by score.
However when using this SQL ("skitt" is used as a search term just so I can try match Skittles).
SELECT
id,name,description,price,image,
MATCH (name,description)
AGAINST ('skitt')
AS score
FROM
products
WHERE
MATCH (name,description)
AGAINST ('skitt')
it returns 0 results. I am trying to find out why, I think I might have set my index's up wrong I'm not sure, this is the first time I've strayed away from LIKE!
Here is my table structure and data:
Thank you!
By default certain words are excluded from the search. These are called stopwords. "a" is an example of a stopword. You could test your query by using a word that is not a stopword, or you can disable stopwords:
How can I write full search index query which will not consider any stopwords?
If you want to also match prefixes use the truncation operator in boolean mode:
*
The asterisk serves as the truncation (or wildcard) operator. Unlike the other operators, it should be appended to the word to be affected. Words match if they begin with the word preceding the * operator.