MySQL Match Fulltext - mysql

Im' trying to do a fulltext search with mysql, to match a string. The problem is that it's returning odd results in the first place.
For example, the string 'passat 2.0 tdi' :
AND MATCH (
records_veiculos.titulo, records_veiculos.descricao
)
AGAINST (
'passat 2.0 tdi' WITH QUERY EXPANSION
)
is returning this as the first result (the others are fine) :
Volkswagen Passat Variant 1.9 TDI- ANO 2003
wich is incorrect, since there's no "2.0" in this example.
What could it be?
edit: Also, since this will probably be a large database (expecting up to 500.000 records), will this search method be the best for itself, or would it be better to install any other search engine like Sphinx? Or in case it doesn't, how to show relevant results?
edit2: For the record, despite the question being marked as answered, the problem with the MySQL delimiters persists, so if anyone has a suggestion on how to escape delimiters, it would be appreciated and worth the 500 points at stake. The sollution I found to increase the resultset was to replace WITH QUERY EXPANSION with IN BOOLEAN MODE, using operators to force the engine to get the words I needed, like :
AND MATCH (
records_veiculos.titulo, records_veiculos.descricao
)
AGAINST (
'+passat +2.0 +tdi' IN BOOLEAN MODE
)
It didn't solve at all, but at least the relevance of the results as changed significantly.

From the MySQL documentation on Fulltext search:
"The FULLTEXT parser determines where words start and end by looking for certain delimiter characters; for example, “ ” (space), “,” (comma), and “.” (period)."
This means that the period is delimiting the 2 and 0. So it's not looking for '2.0'; it's looking for '2' and '0', and not finding it. WITH QUERY EXPANSION is probably causing relevant related words to show up, thus obviating the need for '2' and '0' to be individual words in the result rankings. A character minimum may also be being enforced.

By default I believe mysql only indexes and matches words with 4 or more characters. You could also try escaping the period? It might be ignored this or otherwise using it as a stop character.

What is the match rank that it returns for that? Does the match have to contain all "words" my understanding was it worked like Google and only needs to match some of the words.
Having said that, have a mind to the effect of adding WITH QUERY EXPANSION, that automatically runs a second search for "related" words, which may not be what you have typed, but which the fulltext engines deems probably related.
Relevant Documentation: http://dev.mysql.com/doc/refman/5.1/en/fulltext-query-expansion.html

The "." is what's matching on 2003 in your query results.
If you're going to do searches on 3 character text strings, you should set ft_min_word_len=3
in your mysql config, restart mysql. Otherwise, a search for "tdi" will return results with "TDI-" but not with just "TDI", because rows with "TDI-" will be indexed but "TDI" alone will not.
After making that config change, you'll have to rebuild your index on that table. (Warning: your index might be significantly larger now.)

Related

MySQL FULLTEXT decimal point treated as word separator

We sell lipo batteries that are 3.7v, 7.4v, 11.1v and the voltage is in a description field. It should be possible to FULLTEXT index that character based field with an FT_MIN_WORD_LEN of 4 and have it contain the tokens "3.7v" etc. and these to be found when searching. All my experiments show that when searching these tokens are missing from the index and I suspect this is because the decimal point is acting as a token separator and no tokens are long enough to meet min length.
What am I doing wrong? Why won't Match Against 3.7v find my entries? Does MySQL FULLTEXT understand the difference between a full stop and a decimal point?
Even if FULLTEXT were smart enough to recognize those two uses of ".", what about the 5 other uses. And what about other punctuation marks? When show "_" be part of a "word" and when not? Etc, etc.
Here's a suggestion for your situation (and many others).
Cleanse the data.
Put it in the table.
Similarly, cleanse the query to be fed into the AGAINST clause.
By "cleanse", I mean do any of several things to modify the data to work adequately with FULLTEXT's limitations.
In your one example, I suggest changing 3.7v or 3.7 v to 3_7v.
You may find that some "words" are shorter than min_word_length; for them, you could pad them or do some other kludge.
I recommend you use InnoDB, not MyISAM for all MySQL work. (And note that the setting there is innodb_ft_min_token_size, and it defaults to "3".)
I found a solution here...
https://dev.mysql.com/doc/refman/8.0/en/full-text-adding-collation.html
MySql documentation 12.9.7
Basically there are xml files that control the behaviour of character sets and I was able to change the behaviour of the "." character from punctuation to regular character. Given that the column contains part numbers I changed most of the characters so they were not punctuation creating a new collation set and used that for my part number column. Now works as required.

Mysql Fulltext search returns empty result while there are 100+ rows

The query I use to get rows:
SELECT * FROM `sentence` WHERE MATCH(text) AGAINST('hello')
Mysql returns empty result when I run this query.
However if I use LIKE keyword to get rows
SELECT * FROM `sentence` WHERE text LIKE '%hello%'
Then, Mysql returns 144 rows.
And lets come to a more weird thing
Mysql fulltext works properly for some words o.O
For example when I search 'killer' word, LIKE returns 44 and Fulltext returns 20 which is okay for me.
This situation doesn't make sense to me. Please explain this and suggest a solution.
To answer your first situation i.e. hello returing nothing, its the fulltext search stopwords which is not doing anything when you search for the word hello
http://dev.mysql.com/doc/refman/5.6/en/fulltext-stopwords.html
You can setup your own file as
In the mysql config file , for debian /etc/mysql/my.cnf add ft_stopword_file='path/to/stopword_file.txt'
We can leave this file blank if needed.
The 2nd case killer not returning the expected number of rows, and this could be of various reasons
whitespace in the word
case sensitivity issue.
and fulltext search will only match a complete word as you are trying it will not do a pattern matching with AGAINST('killer') but using like '%killer%' will match anything has the string killer.
To replicate the issue and to see the cause of the problem if you create a fiddle that would be helpful. But for the first case its the stopword list which is why you are getting 0 rows.
Here is something similar which I faced before
mysql ft_min_word_len change on ubuntu does not work

MySQL fulltextsearch: no results

i use this query for a fulltextsearch in my table:
SELECT Titel FROM cmsa WHERE MATCH(Titel) AGAINST ('+"Ort" +"Berlin"' IN BOOLEAN MODE)
but the result is empty.
If i use
SELECT Titel FROM cmsa WHERE Titel LIKE '%Berlin%'
the result would be (without quotes):
"Ort - Berlin"
Why the fulltextsearch didnt find this result. The word "Ort" and the word "Berlin" are both in the field Titel of the entry.
Other fulltext searches works great.
Any Idea?
I guess it is because MySQL has a server parameter - The minimum length of the word to be included in a FULLTEXT index. Default value for this parameter is 4 so your first word Ort is not included in this index. You should change this system parameter, restart server and then rebuild all FULLTEXT indexes.
REPAIR TABLE cmsa QUICK;
Change the full text index minimum word length with MySQL
Try without double quotes and make sure Mysql engine is MYISAM
SELECT
Titel
FROM
cmsa
WHERE
MATCH(Titel) AGAINST ('+Ort +Berlin' IN BOOLEAN MODE)
Some explanation
Boolean Mode Searches
SELECT headline, story FROM news
WHERE MATCH (headline,story)
AGAINST ('+Hurricane -Katrina' IN BOOLEAN MODE);
The above statement would match news stories about hurricanes but not those that mention hurricane katrina.
Query Expansion
The Blind Query Expansion (or automatic relevance feedback) feature can be used to expand the results of the search. This often includes much more noise, and makes for a very fuzzy search.
In most cases you would use this operation if the users query returned just a few results, you try it again WITH QUERY EXPANSION and it will add words that are commonly found with the words in the query.
SELECT headline, story FROM news
WHERE MATCH (headline,story)
AGAINST ('Katrina' WITH QUERY EXPANSION);
The above query might return all news stories about hurricanes, not just ones containing Katrina.
A couple points about Full-Text searching in MySQL:
Searches are not case sensitive
Short words are ignored, the default minimum length is 4 characters. You can change the min and max word length with the variables ft_min_word_len and ft_max_word_len
Words called stopwords are ignored, you can specify your own stopwords, but default words include the, have, some - see default
stopwords list.
You can disable stopwords by setting the variable ft_stopword_file to an empty string.
Full Text searching is only supported by the MyISAM storage engine.
If a word is present in more than 50% of the rows it will have a weight of zero. This has advantages on large datasets, but can make
testing difficult on small ones.
Your query was working perfectly for me. Please check your table is MYISAM or not because full text search working in only myisam engine.

SQL Server 2008 Containstable generate negative rank with weighted_term

I have a table with full text search enabled on Title column. I try to make a weighted search with a containstable but i get an Arithmetic overflow for the Rank value. The query is as follow
SELECT ID, CAST(Res_Tbl.RANK AS Decimal) AS Relevancy , Title
FROM table1 AS INNER JOIN
CONTAINSTABLE(table1,Title,'ISABOUT("pétoncle" weight (.8), "pétoncle" weight (.8), "PÉTONCLE" weight (.8))',LANGUAGE 1036 ) AS Res_Tbl
ON ID = Res_Tbl.[KEY]
When I execute this query I get : Arithmetic overflow error for type int, value = -83886083125.000076.
If I remove one of the two ';' in the ISABOUT function the query complete successfully.
Note you need to have some results if there is no result the query complete successfully.
Does anybody know how to solve this ?
This question is also on dba.stackexchange.com
Qualifier: Since I can't recreate this, I'm unable to know for sure if this will fix the problem. However, these are some things that I'm seeing.
First off, the ampersand, pound sign, and semicolon are word-break characters. That means, that instead of searching for the string "pétoncle", what you're actually searching for is "p", "233", and "toncle". Clearly, that's not your intent.
I have to presume that you have the text "pétoncle" somewhere in your dataset. That means you need that entire string to be complete.
There are a few things you can do.
1) Turn off Stopwords all together. You can do that by altering the full text index to turn it off.
Note that you have to have your database set to SQL Server 2008 compatability for this to not generate a syntax error:
ALTER FULLTEXT INDEX ON Table1 SET STOPLIST OFF;
2) Create a new stoplist
If you create an empty StopList, you might be able to add the stopwords that you want or copy the system stoplist and remove the stopwords that you don't want. (I would advise the second approach).
Having said that, I wasn't able to find the & or # in the system stoplist, so they may be hard coded. You may have to simply turn the stoplist off.
3) Change your search to ignore the "pétoncle" case.
If you drop the "pétoncle" from the ISABOUT and change them to "p toncle", it might work:
'ISABOUT("pétoncle" weight (.8), "p toncle" weight (.8))'
Those are just some ideas. Like I said, without being able to access the system or recreate the scenario, we won't be able to help much.
Some more information for your researching pleasure:
Stopwords and Stoplists
Alter Fulltext Index syntax
FullText search using Thesaurus file and special characters
For people who got to this page searching for negative rank results returned by SQL Server, as I did, it turns out that can happen if some of your match terms are too long (beyond some character limit). SQL Server will not actually complain or produce an error at query time, instead, the ranking will be mostly garbage, producing negative rank for some choices of weights (in my case, esp. with low weight values on the overlong terms). Limit token/word length and avoid this problem (probably a bug deep inside SQL Server 2008 fulltext search).

MySQL Fulltext search but using LIKE

I'm recently doing some string searches from a table with about 50k strings in it, fairly large I'd say but not that big. I was doing some nested queries for a 'search within results' kinda thing. I was using LIKE statement to get a match of a searched keyword.
I came across MySQL's Full-Text search which I tried so I added a fulltext index to my str column. I'm aware that Full-text searches doesn't work on virtually created tables or even with Views so queries with sub-selects will not fit. I mentioned I was doing a nested queries, example is:
SELECT s2.id, s2.str
FROM
(
SELECT s1.id, s1.str
FROM
(
SELECT id, str
FROM strings
WHERE str LIKE '%term%'
) AS s1
WHERE s1.str LIKE '%another_term%'
) AS s2
WHERE s2.str LIKE '%a_much_deeper_term%';
This is actually not applied to any code yet, I was just doing some tests. Also, searching strings like this can be easily achieved by using Sphinx (performance wise) but let's consider Sphinx not being available and I want to know how this will work well in pure SQL query. Running this query on a table without Full-text added takes about 2.97 secs. (depends on the search term). However, running this query on a table with Full-text added to the str column finished in like 104ms which is fast (i think?).
My question is simple, is it valid to use LIKE or is it a good practice to use it at all in a table with Full-text added when normally we would use MATCH and AGAINST statements?
Thanks!
In this case you not neccessarily need subselects. You can siply use:
SELECT id, str
FROM item_strings
WHERE str LIKE '%term%'
AND str LIKE '%another_term%'
AND str LIKE '%a_much_deeper_term%'
... but also raises a good question: the order in which you are excluding the rows. I guess MySQL is smart enough to assume that the longest term will be the most restrictive, so starting with a_much_deeper_term it will eliminate most of the records then perform addtitional comparsion only on a few rows. - Contrary to this, if you start with term you will probably end up with many possible records then you have to compare them against the st of the terms.
The interesting part is that you can force the order in which the comparsion is made by using your original subselect example. This gives the opportunity to make a decision which term is the most restrictive based upon more han just the length, but for example:
the ratio of consonants a vowels
the longest chain of consonants of the word
the most used vowel in the word
...etc. You can also apply some heuristics based on the type of textual infomation you are handling.
Edit:
This is just a hunch but it could be possible to apply the LIKE to the words in the fulltext indexitself. Then match the rows against the index as if you have serched for full words.
I'm not sure if this is actually done, but it would be a smart thing to pull off by the MySQL people. Also note that this theory can only be used if all possible ocurrences arein fact in the fulltext search. For this you need that:
Your search pattern must be at least the size of the miimal word-length. (If you re searching for example %id% then it can be a part of a 3 letter word too, which is excluded by default form FULLTEXT index).
Your search pattern must not be a substring of any listed excluded word for example: and, of etc.
Your pattern must not contain any special characters.