mysql boolean mode fulltext search with wildcards and literals - mysql

I'm pretty new to MySQL full-text searches and I ran into this problem today:
My company table has a record with "e-magazine AG" in the name column. I have a full-text index on the name column.
When I execute this query the record is not found:
SELECT id, name FROM company WHERE MATCH(name) AGAINST('+"e-magazi"*' IN BOOLEAN MODE);
I need to work with quotes because of the dash and to use the wildcard because I implement a "search as you type" functionality.
When I search for the whole term "e-magazine AG", the record is found.
Any ideas what I'm doing wrong here? I read about adding the dash to the list of word characters (config update needed) but I'm searching for a way to do this programmatically.

This clause
MATCH(name) AGAINST('+"e-magazi"*' IN BOOLEAN MODE);
Will search for a AND "e" AND NOT "magazi"; i.e. the - inside "e-magazi" will be interpreted as a not even though it is inside quotation marks.
For this reason it will not work as expected.
A solution is to apply an extra having clause with a LIKE.
I know this having is slow, but it will only be applied to the results of the match, so not too many rows should be involved.
I suggest something like:
SELECT id, name
FROM company
WHERE MATCH(name) AGAINST('magazine' IN BOOLEAN MODE)
HAVING name LIKE '%e-magazi%';

MySQL fulltext treats the word e-magazine in a text as a phrase and not as a word. Because of that it results the two words e and magazine. And while it builds the search index it does not add the e to the index because of the ft_min_word_len (default is 4 chars).
The same length limitation is used for the search query. That is the reason why a search for e-magazine returns exactly the same results as a-magazine because a and - is fully ignored.
But now you want to find the exact phrase e-magazine. By that you use the quotes and that is the complete correct way to find phrases, but MySQL does not support operators for phrases, only for words:
https://dev.mysql.com/doc/refman/5.7/en/fulltext-boolean.html
With this modifier, certain characters have special meaning at the beginning or end of words in the search string
Some people would suggest to use the following query:
SELECT id, name
FROM company
WHERE MATCH(name) AGAINST('e-magazi*' IN BOOLEAN MODE)
HAVING name LIKE 'e-magazi%';
As I said MySQL ignores the e- and searches for the wildcard word magazi*. After those results are optained it uses HAVING to aditionally filter the results for e-magazi* including the e-. By that you will find the phrase e-magazine AG. Of course HAVING is only needed if the search phrase contains the wildcard operator and you should never use quotes. This operator is used by your user and not you!
Note: As long you do not surround the search phrase with % it will find only fields that start with that word. And you do not want to surround it, because it would find bee-magazine as well. So maybe you need an additional OR HAVING name LIKE ' %e-magazi%' OR HAVING NAME LIKE '\\n%e-magazi%' to make it usable inside of texts.
Trick
But finally I prefer a trick so HAVING isn't needed at all:
If you add texts to your database table, add them additionally to a separate fulltext indexed column and replace words like up-to-date with up-to-date uptodate.
If a user searches for up-to-date replace it in the query with uptodate.
By that you can still find specific in user-specific but up-to-date as well (and not only date).
Bonus
If a user searches for -well-known huge ports MySQL treats that as not include *well*, could include *known* and *huge*. Of course you could solve that with an other extra query variant as well, but with the trick above you remove the hyphen so the search query looks simply like that:
SELECT id
FROM texts
WHERE MATCH(text) AGAINST('-wellknown huge ports' IN BOOLEAN MODE)

Related

use full text search to search incomplete words in mysql

I am making a library management system.
I have a problem in the search for a book from mysql database.
For searching data in mysql we use full text search .
But it only works if a full word is given. If user enters an incomplete word instead of the actual word , is there any function to search.
ex : if book name is calculus,
if user types calc , then also the books should come
You can try using fulltext search with boolean mode, which allows a few extra operators. You will be interested in the truncation operator (*):
The asterisk serves as the truncation (or wildcard) operator. Unlike
the other operators, it is appended to the word to be affected. Words
match if they begin with the word preceding the * operator.
If a word is specified with the truncation operator, it is not
stripped from a boolean query, even if it is too short or a stopword.
Whether a word is too short is determined from the
innodb_ft_min_token_size setting for InnoDB tables, or ft_min_word_len
for MyISAM tables. These options are not applicable to FULLTEXT
indexes that use the ngram parser.
The wildcarded word is considered as a prefix that must be present at
the start of one or more words. If the minimum word length is 4, a
search for '+word +the*' could return fewer rows than a search for
'+word +the', because the second query ignores the too-short search
term the.
Pls note, that you cannot start an expression with the * operator, so the results cannot include a book, which title contains 'calc', only which title starts with 'calc'.
You can use the LIKE operator with the "%" wildcard
With LIKE you can use the following two wildcard characters in the pattern:
% matches any number of characters, even zero characters.
for example
SELECT * FROM <Table> where book like "%calc%";
http://dev.mysql.com/doc/refman/5.7/en/string-comparison-functions.html

mysql fulltext boolean search with asterix

I have a query like below:
SELECT prd_id FROM products WHERE MATCH (prd_search_field)
AGAINST ('+gul* +yetistiren* +adam*' in boolean mode);
This doesn't return the rows including 'gul'.
http://dev.mysql.com/doc/refman/5.0/en/fulltext-boolean.html
The document says this.
Then a search for '+word +the*' will likely return fewer rows than a
search for '+word +the':
The former query remains as is and requires both word and the* (a word starting with the) to be present in the document.
The latter query is transformed to +word (requiring only word to be present). the is both too short and a stopword, and either condition is enough to cause it to be ignored.
So as I understood the too short word condition must not be applied in my situation since I use * after each word. What's wrong with this?
As a solution I use the below query but since it's slow, I need to find another solution. Any idea would be appreciated? Thanks in advance..
SELECT prd_id FROM products WHERE 1 AND MATCH (prd_search_field)
AGAINST ('+yetistiren* +adam*' in boolean mode) AND prd_search_field
LIKE '%gul%';
As a note ft_min_word_length=4 as default in all shared hosting environments, and I cannot change it.

Improving Mysql Match Against search

I've been loking into Mysql's Match Against search. The results are strange. For example, if I have a table attribute with an entry "education" and do a search (using match against) for "edu" then it finds it. But if i search for "educ" no results are returned. All the way up to "educatio" does not return results. So it only matches whole words, or if 3 letters or less match in a word.
Is there a way to improve it so that results are returned when a search term is a subset of a word in the attribute? E.g. using the example above, searching "educat" would return rows containing "Education"
You can do exactly what you want by matching IN BOOLEAN MODE and using the * operator.
For example:
... MATCH(thing) AGAINST ('+educat*' IN BOOLEAN MODE)...
The + tells the match to include only the values of thing that contain the match term, which in this case is all indexed values beginning with "educat" (see here for how Boolean mode works in detail).
As an aside, Fulltext search in MySQL does not index words of 3 or fewer characters by default, so I suspect your match with "edu" is not working the way you think. Look at the value of your ft_min_word_len variable to see if that's the case.
you can use the mark %a (a=your word or letter)that search any word that start with the same word or letter
you can use %a% that search part of the word that the start and/or in the middle of the word
and the last one you can use a% that ends with the word or letter

How to allow fulltext searching with hyphens in the search query

I have keywords like "some-or-other" where the hyphens matter in the search through my mysql database. I'm currently using the fulltext function.
Is there a way to escape the hyphen character?
I know that one option is to comment out #define HYPHEN_IS_DELIM in the myisam/ftdefs.h file, but unfortunately my host does not allow this. Is there another option out there?
Here's the code I have right now:
$search_input = $_GET['search_input'];
$keyword_safe = mysql_real_escape_string($search_input);
$keyword_safe_fix = "*'\"" . $keyword_safe . "\"'*";
$sql = "
SELECT *,
MATCH(coln1, coln2, coln3) AGAINST('$keyword_safe_fix') AS score
FROM table_name
WHERE MATCH(coln1, coln2, coln3) AGAINST('$keyword_safe_fix')
ORDER BY score DESC
";
From here http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html
One solution to find a word with a dashes or hyphens in is to use FULL TEXT SEARCH IN BOOLEAN MODE, and to enclose the word with the hyphen / dash in double quotes.
Or from here http://bugs.mysql.com/bug.php?id=2095
There is another workaround. It was recently added to the manual:
"
Modify a character set file: This requires no recompilation. The true_word_char() macro
uses a “character type” table to distinguish letters and numbers from other
characters. . You can edit the contents in one of the character set XML
files to specify that '-' is a “letter.” Then use the given character set for your
FULLTEXT indexes.
"
Have not tried it on my own.
Edit: Here is some more additional info from here http://dev.mysql.com/doc/refman/5.0/en/fulltext-boolean.html
A phrase that is enclosed within double quote (“"”) characters matches only rows that contain the phrase literally, as it was typed. The full-text engine splits the phrase into words and performs a search in the FULLTEXT index for the words. Prior to MySQL 5.0.3, the engine then performed a substring search for the phrase in the records that were found, so the match must include nonword characters in the phrase. As of MySQL 5.0.3, nonword characters need not be matched exactly: Phrase searching requires only that matches contain exactly the same words as the phrase and in the same order. For example, "test phrase" matches "test, phrase" in MySQL 5.0.3, but not before.
If the phrase contains no words that are in the index, the result is empty. For example, if all words are either stopwords or shorter than the minimum length of indexed words, the result is empty.
Some people would suggest to use the following query:
SELECT id
FROM texts
WHERE MATCH(text) AGAINST('well-known' IN BOOLEAN MODE)
HAVING text LIKE '%well-known%';
But by that you need many variants depending on the used fulltext operators. Task: Realize a query like +well-known +(>35-hour <39-hour) working week*. Too complex!
And do not forget the default len of ft_min_word_len so a search for up-to-date returns only date in your results.
Trick
Because of that I prefer a trick so constructions with HAVING etc aren't needed at all:
Instead of adding the following text to your database table: "The Up-to-Date Sorcerer" is a well-known science fiction short story. copy the hyphen words without hypens to the end of the text inside a comment: "The Up-to-Date Sorcerer" is a well-known science fiction short story.<!-- UptoDate wellknown -->
If the users searches for up-to-date remove the hyphen in the sql query:
MATCH(text) AGAINST('uptodate ' IN BOOLEAN MODE)
By that you're user can find up-to-date as one word instead of getting all results that contain only date (because ft_min_word_len kills up and to).
Of course before you echo the texts you should remove the <!-- ... --> comments.
Advantages
the query is simpler
the user is able to use all fulltext operators as usual
the query is faster.
If a user searches for -well-known +science MySQL treats that as not include *well*, could include *known* and must include *science*. This isn't what the user expected. The trick solves that, too (as the sql query searches for -wellknown +science)
Maybe simpler to use the Binary operator.
SELECT *
FROM your_table_name
WHERE BINARY your_column = BINARY "Foo-Bar%AFK+LOL"
http://dev.mysql.com/doc/refman/5.0/en/cast-functions.html#operator_binary
The BINARY operator casts the string following it to a binary string. This is an easy way to force a column comparison to be done byte by byte rather than character by character. This causes the comparison to be case sensitive even if the column is not defined as BINARY or BLOB. BINARY also causes trailing spaces to be significant.
My preferred solution to this is to remove the hyphen from the search term and from the data being searched. I keep two columns in my full-text table - search and return. search contains sanitised data with various characters removed, and is what the users' search terms are compared to, after my code has sanitised those as well.
Then I display the return column.
It does mean I have two copies of the data in my database, but for me that trade-off is well worth it. My FT table is only ~500k rows, so it's not a big deal in my use case.

Fulltext search with <> characters

Having problems getting the following query to work. I want to match the actual string " to control word relevance.
SELECT * FROM (table)
WHERE MATCH (field) AGAINST ("+<foo><![CDATA[1850]" IN BOOLEAN MODE)
When I run this it returns almost all records in the database, not just those which match the exact string.
AFAIK you can not use special characters in full text search indexes. It is limited to TEXT. (Words to be exact. For example you can have a list of most common words to be excepted form this index). You have to use LIKE if you are searching for pieces of code with special characters.