Interesting MATCH AGAINST issue: not searching some words - mysql

I have a table like this:
|====brand=====|======title================|
|....Apple.....|...iPhone 5 32 GB..........|
|....Sony......|...Bluetooth Headset.......|
And i am using FULL TEXT SEARCHING basically like this:
SELECT
*,
MATCH(brand,title) AGAINST ('some words') as score
FROM
table
So when i searched headset, mysql giving 1.812121 score truely.
But when i searched iphone; mysql giving 0 score.
P.s: ft_min_word_len = 2

From http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html
A natural language search interprets the search string as a phrase in natural human language (a phrase in free text). There are no special operators. The stopword list applies. In addition, words that are present in 50% or more of the rows are considered common and do not match. Full-text searches are natural language searches if no modifier is given.
So the bit about common words sounds likely.

Related

MATCH AGAINST in MySQL don't work

I have a problem with FULLTEXT search in MySql.
I create query:
SELECT searchTag, MATCH (searchTag) AGAINST ('after party') as score FROM post WHERE MATCH (searchTag) AGAINST ('after party') ORDER BY score DESC
Its result:
1. we,like,to,party 3.6987853050231934
2. f,w,g,party 3.6987853050231934
3. after,party,tooka 3.657205581665039
Why number 3 have lower score if it have two words searching?
after is a stop word. It is ignored by a FULLTEXT MATCH query.
Basically, the word "after" is so common in the English language that including it in a query is semantically meaningless.
Think of it this way: imagine a query against the word "a". There are so many sentences which use the word "a", that a match against them really won't provide you with anything useful.
In this post, all of the sentences reference the word "a".

How to improve mysql NATURAL LANGUAGE MODE search query?

This is my query
SELECT * FROM myTable WHERE MATCH (name) AGAINST ("Apple M1" IN NATURAL LANGUAGE MODE)
if I search Apple M1 as results i get Orange M1 then third or more position i get Apple M-1 – which is the value i stored and I was assuming should be first!
my question is: there is a way to fine tune the mysql search?
They best way to improve MySQL Natural Language Mode search is to use Boolean Full-Text Searches instead. It will do the same as Natural Language Mode search, but you can use additional modifiers to finetune your results, e.g. by
> <
These two operators are used to change a word's contribution to the relevance value that is assigned to a row. The > operator increases the contribution and the < operator decreases it.
There is one minor difference, boolean mode search will not order automatically according to relevance, so you have to order yourself.
SELECT * FROM myTable
WHERE MATCH (name) AGAINST (">Apple M1" IN BOOLEAN MODE)
ORDER BY MATCH (name) AGAINST (">Apple M1" IN BOOLEAN MODE) desc
And a remark: both versions of fulltext search will not find M-1 if you match against M1 (even with a minimum wordlength setting of 2). It will only look for exakt (usually case-insensitive) word matches, it does not look for similar words (unless you use *). It will "just" weigh the combination of (exact) words by some algorithm, and, if you use them, the modifiers.
Update Some additional clarification according to the comments:
If you match against Apple M1, it returns rows that contain (case-insensitive) Apple or M1 in any order, so e.g. M1 apple, Apple M4, Apple M-1 and Orange M1. It will not find Apples M4 or Orange M-1, because they are not exactly that words. E.g. like '%M-1%' wouldn't find Apple M1 either. But if you like, you can match against Apple* to find Apple and Apples, but it's always at the end of the word, *Apple* is not possible, you have to use like '%Apple%' then.
These rows are then ordered by the scoring algorithm, that will basically score words that are less common in your texts higher than very common words. And if you add >Apple, it will give Apple a higher value. It will just be a number, you can add them to your select, e.g. select ..., MATCH (name) AGAINST (">Apple M1" IN BOOLEAN MODE) as score to get a feeling for that.
There are some other things to consider:
only words that have a minimum length are added to the index. That length is given by innodb_ft_min_token_size for innodb or ft_min_word_len for myisam. So you should set it to e.g. 2 to include M1 (otherwise, this word will not have any effect in your search. Since in your example, you found Orange M1, I assume it is set correctly).
- is usually considered a hyphen. So M-1 in your text will be split up into two words M and 1 (that may or may not be included according to your mininum word lenght setting, so maybe set it to 1). You can change that behaviour by adding - to the characterset (see Fine-Tuning MySQL Full-Text Search, the part beginning with Modify a character set file), but this will then not find blue-green anymore if you search for blue and/or green.
the full text search uses stopwords. These words are not included in your index. This list includes a and i, so even with minimum wordlength of 1, you would not find them. You can edit that list.
Some ideas about your potential problem about M1/M-1. To adjust that to your exact requirements, you would have to add more information about your searches and data (and would be maybe another question), but some ideas:
You can replace userinput that contains - by including both versions to your search query: once with -, but enclosed in "", once without. So if the user enters Apple M-1, you would create a search for Apple M1 "M-1" (that would work with or without a modified characterset, but without a new characterset, your min word length has to be 1). If the user enters M1, you should detect that and replace that by M1 "M-1" too.
Another alternative would be to save an additional column with clean, hyphenless words and add that column to the full text index and then match (name, clean_name) against ("M1" ....
And you can of course combine like and match, e.g. if you detect a product number in your input, you can use something like where match(...) against(...) or product_id like 'M%1%', or where match(...) against(...) or product_id = 'M-1' or product_id = 'M1' or even where match(...) against(...) or name like '%M%1%', but the latter would probably be a lot slower and contain a lot of noise. And it might not score correctly, but at least it will be in the resultset.
But as I said, that would depend on your data and your requirements.

use full text search to search incomplete words in mysql

I am making a library management system.
I have a problem in the search for a book from mysql database.
For searching data in mysql we use full text search .
But it only works if a full word is given. If user enters an incomplete word instead of the actual word , is there any function to search.
ex : if book name is calculus,
if user types calc , then also the books should come
You can try using fulltext search with boolean mode, which allows a few extra operators. You will be interested in the truncation operator (*):
The asterisk serves as the truncation (or wildcard) operator. Unlike
the other operators, it is appended to the word to be affected. Words
match if they begin with the word preceding the * operator.
If a word is specified with the truncation operator, it is not
stripped from a boolean query, even if it is too short or a stopword.
Whether a word is too short is determined from the
innodb_ft_min_token_size setting for InnoDB tables, or ft_min_word_len
for MyISAM tables. These options are not applicable to FULLTEXT
indexes that use the ngram parser.
The wildcarded word is considered as a prefix that must be present at
the start of one or more words. If the minimum word length is 4, a
search for '+word +the*' could return fewer rows than a search for
'+word +the', because the second query ignores the too-short search
term the.
Pls note, that you cannot start an expression with the * operator, so the results cannot include a book, which title contains 'calc', only which title starts with 'calc'.
You can use the LIKE operator with the "%" wildcard
With LIKE you can use the following two wildcard characters in the pattern:
% matches any number of characters, even zero characters.
for example
SELECT * FROM <Table> where book like "%calc%";
http://dev.mysql.com/doc/refman/5.7/en/string-comparison-functions.html

Improving Mysql Match Against search

I've been loking into Mysql's Match Against search. The results are strange. For example, if I have a table attribute with an entry "education" and do a search (using match against) for "edu" then it finds it. But if i search for "educ" no results are returned. All the way up to "educatio" does not return results. So it only matches whole words, or if 3 letters or less match in a word.
Is there a way to improve it so that results are returned when a search term is a subset of a word in the attribute? E.g. using the example above, searching "educat" would return rows containing "Education"
You can do exactly what you want by matching IN BOOLEAN MODE and using the * operator.
For example:
... MATCH(thing) AGAINST ('+educat*' IN BOOLEAN MODE)...
The + tells the match to include only the values of thing that contain the match term, which in this case is all indexed values beginning with "educat" (see here for how Boolean mode works in detail).
As an aside, Fulltext search in MySQL does not index words of 3 or fewer characters by default, so I suspect your match with "edu" is not working the way you think. Look at the value of your ft_min_word_len variable to see if that's the case.
you can use the mark %a (a=your word or letter)that search any word that start with the same word or letter
you can use %a% that search part of the word that the start and/or in the middle of the word
and the last one you can use a% that ends with the word or letter

fuzzy matching an address using mysql's match against (if possible using weights for better results ranking)

I have a myISAM table with FULLTEXT index , trying to do
SELECT
lk.id,
lk.address
FROM
lk
WHERE MATCH
lk.address
AGAINST('235 regent street, london w1b 2et');
I get results but only the ones who got the word "london" inside, or ones who got the word "street" inside. I know that 3 ft_min_word_len character words aren't indexed so "235","w1b","2et" are ignored, but what about "regent" ?
What is the STANDARD way of doing this? fuzzy matching an address.
thanks
The answer is to use MATCH AGAINST('...' IN BOOLEAN MODE) , and to add + in front of every word.
Or use other characters like explained in:
http://dev.mysql.com/doc/refman/5.1/en/fulltext-boolean.html
It needs fine tuning depending on your searched text, and how you got it.