I'm trying to use a FULLTEXT index in order to facilitate searching for forum posts. It's not working in the way I expect, and I'm trying to understand why not.
For example, I know there is exactly one post which contains the phrase "haha and i got three", so I perform the query
select * from forum_posts where
match(message) against ('"haha and i got three"' in boolean mode);
and as I expect, I find the single post which includes this phrase. Hooray!
But then I perform the related query:
select * from forum_posts where
match(message) against ('"and i got three"' in boolean mode);
and get no results. In fact, simply searching for the word "three":
select * from forum_posts where
match(message) against ('three' in boolean mode);
yields no results either.
What could be going on?
I think you need to learn about stop words and minimum word length.
My default, MySQL ignores stop words in the full text index. Here is a list of them. "And I got three" is all stop words.
In addition, by default, MySQL ignores words with less than for characters. This is controlled by the parameter. This is explained in more detail here.
It sounds like you will want to change the stop word list and change the minimum word length and rebuild the index.
Related
I am working on an adult site, for this site I have created an internal research.
For search I use this query:
SELECT SQL_CALC_FOUND_ROWS
id_photo, title, description, model, data_ins,
MATCH(title, description, model) AGAINST('".trim(strtolower(addslashes($_GET['q'])))."') as score
FROM ".$prefix."photo
WHERE MATCH(title, description, model) AGAINST('".trim(strtolower(addslashes($_GET['q'])))."')
ORDER BY score DESC LIMIT ".$start.", ".$step."
Everything works smoothly and without php or mysql errors, but the client pointed out a strange thing to me.
eg :
searching for the word starting with "c" and ending with "ck" the
query returns no results.
searching for the word starting with "d"
and ending with "ck" the query returns the correct results.
I use something similar to this to verify if there are results:
$photo_query_id = $db->prepare("my query");
$photo_query_id->execute();
if($photo_query_id->rowCount() < 1){
//...
}
The two words are both used hundreds of times in both titles and descriptions, so why does mysql sometimes prefer not to show results?
Is there a list of bad words in some mysql config file that is blocking queries? And in case where do I find it and how do I modify it?
Use a BOOLEAN MODE search or use the InnoDB database engine for your table. When you do a natural language search against a MyISAM full-text index, words that appear in more than 50% of the rows are treated as stopwords.
From the documentation:
The 50% threshold can surprise you when you first try full-text searching to see how it works, and makes InnoDB tables more suited to experimentation with full-text searches. If you create a MyISAM table and insert only one or two rows of text into it, every word in the text occurs in at least 50% of the rows. As a result, no search returns any results until the table contains more rows. Users who need to bypass the 50% limitation can build search indexes on InnoDB tables, or use the boolean search mode explained in Section 12.10.2, “Boolean Full-Text Searches”]2.
search term=['ISBN number on site']
the variable(column): sentence, in MySQL table. It consist many different sentence.
the sentence I want to look for is
"The AutoLink feature comes with Google's latest toolbar and provides links in a webpage to Amazon.com if it finds a book's ISBN number on the site."
However, When I use the following statement:
SELECT * FROM testtable
where Sentence like "%ISBN number on site%" ;
I am not able to get the result. This is because the search term("ISBN number on site") is lack of one word("the") compare with the sentence.
How to change my statement in order to get the sentence I want? thanks.
Assume that We do not change the search term=['ISBN number on site']
This is not a simple question. Your best bet is to use some type of fulltext search. Fulltext search can be configured to have stopwords (words that are omitted from search - like the word the) and can have a minimum word length limit as well (words with less than certain characters long are also omitted from the search.
However, if you simply use
SELECT * FROM testtable
WHERE MATCH (sentence)
AGAINST ('ISBN number on site');
Then MySQL will return not just the record with the value you were looking for, but the records that have some of the words only, and in different order. The one you showed will probably be one of the highest ranking one, but there is no guarantee that it will be highest ranked one.
You may want to use Boolean fulltext search and prepend + to every search word to force MySQL to return those records only that have all the search words present:
SELECT * FROM testtable
WHERE MATCH (sentence)
AGAINST ('+ISBN +number +on +site' IN BOOLEAN MODE);
But, on should either be a stopword (it is on the default stipword lists) or should be shorter that the minimum word length, so should be omitted from the search expression (you will not get back any results):
SELECT * FROM testtable
WHERE MATCH (sentence)
AGAINST ('+ISBN +number +site' IN BOOLEAN MODE);
I know that this requires alteration of the search expression, however this will get you the best results using MySQL's built-in functionality.
The alternative is to use other fulltext search engines, such as sphinx to perform the search for you.
Try:
SELECT * FROM testtable where Sentence like '%ISBN number on%site%' ;
The wildcard can go in the middle of a string too.
i use this query for a fulltextsearch in my table:
SELECT Titel FROM cmsa WHERE MATCH(Titel) AGAINST ('+"Ort" +"Berlin"' IN BOOLEAN MODE)
but the result is empty.
If i use
SELECT Titel FROM cmsa WHERE Titel LIKE '%Berlin%'
the result would be (without quotes):
"Ort - Berlin"
Why the fulltextsearch didnt find this result. The word "Ort" and the word "Berlin" are both in the field Titel of the entry.
Other fulltext searches works great.
Any Idea?
I guess it is because MySQL has a server parameter - The minimum length of the word to be included in a FULLTEXT index. Default value for this parameter is 4 so your first word Ort is not included in this index. You should change this system parameter, restart server and then rebuild all FULLTEXT indexes.
REPAIR TABLE cmsa QUICK;
Change the full text index minimum word length with MySQL
Try without double quotes and make sure Mysql engine is MYISAM
SELECT
Titel
FROM
cmsa
WHERE
MATCH(Titel) AGAINST ('+Ort +Berlin' IN BOOLEAN MODE)
Some explanation
Boolean Mode Searches
SELECT headline, story FROM news
WHERE MATCH (headline,story)
AGAINST ('+Hurricane -Katrina' IN BOOLEAN MODE);
The above statement would match news stories about hurricanes but not those that mention hurricane katrina.
Query Expansion
The Blind Query Expansion (or automatic relevance feedback) feature can be used to expand the results of the search. This often includes much more noise, and makes for a very fuzzy search.
In most cases you would use this operation if the users query returned just a few results, you try it again WITH QUERY EXPANSION and it will add words that are commonly found with the words in the query.
SELECT headline, story FROM news
WHERE MATCH (headline,story)
AGAINST ('Katrina' WITH QUERY EXPANSION);
The above query might return all news stories about hurricanes, not just ones containing Katrina.
A couple points about Full-Text searching in MySQL:
Searches are not case sensitive
Short words are ignored, the default minimum length is 4 characters. You can change the min and max word length with the variables ft_min_word_len and ft_max_word_len
Words called stopwords are ignored, you can specify your own stopwords, but default words include the, have, some - see default
stopwords list.
You can disable stopwords by setting the variable ft_stopword_file to an empty string.
Full Text searching is only supported by the MyISAM storage engine.
If a word is present in more than 50% of the rows it will have a weight of zero. This has advantages on large datasets, but can make
testing difficult on small ones.
Your query was working perfectly for me. Please check your table is MYISAM or not because full text search working in only myisam engine.
I would like to use the position/index found by the Match...Against fulltext search in mysql to return some text before and after the match in the field. Is this possible? In all the examples I have seen, the Match...Against returns a score in the select instead of a location or position in the text field of which is being searched.
SELECT
random_field,
MATCH ($search_fields)
AGAINST ('".mysql_real_escape_string(trim($keywords))."' IN BOOLEAN MODE)
AS score
FROM indexed_sites
WHERE
MATCH ($search_fields)
AGAINST ('".mysql_real_escape_string($keywords)."' IN BOOLEAN MODE)
ORDER BY score DESC;
This will give me a field and a score...but I would like an index/position instead of (or along side) a score.
Fulltext searching is a scoring function. its not a search for occurrence function. In other words the highest scoring result may not have a starting position for the match. As it may be a combination of weighted results of different matches within the text. if you include query expansion the search for word/s may not even appear in the result!
http://dev.mysql.com/doc/refman/5.0/en/fulltext-query-expansion.html
I hope that makes some sense.
Anyway your best bet is to take the results and then use some text searching function to find the first occurrence of the first matching word. My guess is that would be best suited to a text processing language like perl or a more general language like php or what ever language you are using to run the query.
DC
in the match - against am getting the correct results, there is no problem but the thing i want is the result combination.
Like for "computer graphics" i am getting results for "+computer +graphics" as "computer" alone results and "computer graphics" results and "graphics" results and etc.
Here i want "computer graphics" results first then the other single word match results. How can i bring those first. Help me some one please
you should order by relevance: search for the same query you use in WHERE, call is RELEVANCE and then order by that field.
SELECT MATCH('...') AGAINST ('...') as Relevance
FROM table WHERE MATCH('...') AGAINST('...' IN
BOOLEAN MODE)
ORDER BY Relevance DESC
This can't be done in MySQL fulltext search without a bit of hoop jumping.
You basically need to run the search twice to get your desired results. First, run a boolean fulltext search using double quotes to enclose the exact phrase being searched for. The double quotes in boolean mode will return exact matches only. Once you have those results, then your normal natural-language search. It's the normal, natural language search that is giving you trouble with partial matches. You'll need to manually combine the two search results.
While MySQL fulltext is decent for simple searching needs, it's not a great search solution. Consider something with more power, like Sphinx, Solr / Lucene, or even something like ElasticSearch.
Assuming we're talking about a full-text index:
... ORDER BY MATCH('computer graphics') AGAINST (some,columns) DESC;
Entries in table
what is computer?
what is graphics on computer?
what is computer graphics?
what is graphics?
QUERY : select *,MATCH(field1,field2) AGAINST ("+computer +graphics" IN BOOLEAN MODE) as results from $table where MATCH(field1,field2) AGAINST ("+computer +graphics" IN BOOLEAN MODE) ORDER BY results ASC
IT RETURNS exact results some where in the middle and others are first.
Like
what is computer graphics?
what is graphics on computer?
what is computer?
what is graphics?
How it can be corrected....