Using Fulltext in MYSQL, are wildcards expensive? - mysql

I am working on a MySQL search query where I would like incomplete strings to find matches such as:
Search term: "Fi Res", which should find: "Find Result".
I have found a few related questions that tend to answer "use Fulltext". Simply using fulltext doesn't match partial terms, but it seems to work with wildcards, so this (appears to) work:
SELECT * FROM `table` WHERE MATCH (field) AGAINST ('Fi*Res*' IN BOOLEAN MODE)
And a quick solution seems to be just replace the spaces in a query with *.
I'm wondering if there's some reason not to do this, and if so, what (better) implementations should I be looking at?
Also, in this case it's a smaller database table (< 20,000 rows) and the strings to be searched are short (max 255 length).

Wildcards are just substitution of words in 'values' u r searching.. for example google search is kind of wildcard. so u can use '%' wild card for searching..!

Related

mysql full text wildcard search like regex

Upto now I have been using
WHERE col REGEXP 'IN (.*) WE TRUST'
But ever since adding the full index search to this column then this query is very slow.
I'm waning to know know how I could implement a wildcard search using full text index searches.
These are two queries that I have been playing with but still Im getting lots of results that is unexpected and not sure at all why my query is pulling those results.
WHERE MATCH (markIdentification) AGAINST ('IN (.*) WE TRUST')
WHERE MATCH (markIdentification) AGAINST ('+IN (.*) +WE +TRUST')
WHERE MATCH (markIdentification) AGAINST ('+IN * +WE +TRUST')
These are the only ones that seem to get even close.
Any suggestions?
Thank you
Update for question ref:
SELECT * from table
WHERE MATCH (col) AGAINST ('+IN * +WE +TRUST')
AND col LIKE '%IN (.*) WE TRUST%'
Is this correct?
If not then how would you do it?
The FULLTEXT search engine ignores words shorter than three characters. You can set the innodb_ft_min_token_size option to change that, then regenerate your FULLTEXT indexes.
The + (and -) syntax in AGAINST are boolean search mode things. So to use + you need
WHERE MATCH (markIdentification) AGAINST ('+IN +WE +TRUST' IN BOOLEAN MODE )
BOOLEAN mode has lots of special characters to control searches, but * standing alone is not one of them. You can say 'TRUST*' to match trust, trustee, and trusted.
Taking Gordon's suggestion, you might try this:
WHERE MATCH (markIdentification) AGAINST ('+IN +WE +TRUST' IN BOOLEAN MODE )
AND markIdentification REGEXP 'IN (.*) WE TRUST'
This will use your FULLTEXT index to look for possible matches, and REGEXP to get more exact results. The expensive REGEXP operation, then, can run on many fewer rows.
(Beware IN NATURAL LANGUAGE MODE when your tables don't have many rows. It can give strange results. The indexer decides which words are too common to bother with, and if you have a small number of words, that decision gets distorted.)

MyIsam fulltext search against multiple %term%

I have a field called filepath that I'm trying to search. Here is an example path:
/mnt/qfs-X/Asset_Management/XG_Marketing_/Episodic-SG_1001_1233.jpg
I would like to be able to search the following and get a match:
search = "qf episodic sg_1001 JPG"
How would I do this with a fulltext search in mysql/myisam? What I have now is:
SELECT * FROM x_files2 WHERE MATCH(path)
AGAINST('qf episodic sg_1001 JPG' in boolean mode)
But it is returning way too many results (seems like it's returning if any terms are found instead of only those where all are found.
Put + in front of each 'word':
AGAINST('+qf* +episodic +sg_1001* +JPG' in boolean mode)
Do you have the min-word-length set to 2? If not, there could be other troubles.
The + avoids "too many".
Consider switching to InnoDB, now that it has FULLTEXT.
You may have to abandon use of FULLTEXT and switch to LIKE:
WHERE path LIKE '%qf%episodic%sg_1001%JPG%'
If performance is an issue, consider something like
WHERE MATCH(path) AGAINST('...' IN BOOLEAN MODE) -- using some of the words
AND path LIKE '...' -- as above
The MATCH will run first, whittling down the number of possible rows considerably, then the LIKE takes care of details.
Note that middles of words cannot be used in AGAINST. Those could be left out, relying on LIKE to take care of them.

How to use prefix wildcards like '*abc' with match-against

I have the following query :
SELECT * FROM `user`
WHERE MATCH (user_login) AGAINST ('supriya*' IN BOOLEAN MODE)
Which outputs all the records starting with 'supriya'.
Now I want something that will find all the records ending with e.g. 'abc'.
I know that * cannot be preappended and it doesn't work either and I have searched a lot but couldn't find anything regarding this.
If I give query the string priya ..it should return all records ending with priya.
How do I do this?
Match doesn't work with starting wildcards, so matching with *abc* won't work. You will have to use LIKE to achieve this:
SELECT * FROM user WHERE user_login LIKE '%abc';
This will be very slow however.
If you really need to match for the ending of the string, and you have to do this often while the performance is killing you, a solution would be to create a separate column in which you reverse the strings, so you got:
user_login user_login_rev
xyzabc cbazyx
Then, instead of looking for '%abc', you can look for 'cba%' which is much faster if the column is indexed. And you can again use MATCH if you like to search for 'cba*'. You will just have to reverse the search string as well.
I believe the selection of FULL-TEXT Searching isn't relevant here. If you are interested in searching some fields based on wildcards like:
%word% ( word anywhere in the string)
word% ( starting with word)
%word ( ending with word)
best option is to use LIKE clause as GolezTrol has mentioned.
However, if you are interested in advanced/text based searching, FULL-TEXT search is the option.
Limitations with LIKE:
There are some limitations with this clause. Let suppose you use something like '%good' (anything ending with good). It may return irrelevant results like goods, goody.
So make sure you understand what you are doing and what is required.

mysql boolean mode fulltext search with wildcards and literals

I'm pretty new to MySQL full-text searches and I ran into this problem today:
My company table has a record with "e-magazine AG" in the name column. I have a full-text index on the name column.
When I execute this query the record is not found:
SELECT id, name FROM company WHERE MATCH(name) AGAINST('+"e-magazi"*' IN BOOLEAN MODE);
I need to work with quotes because of the dash and to use the wildcard because I implement a "search as you type" functionality.
When I search for the whole term "e-magazine AG", the record is found.
Any ideas what I'm doing wrong here? I read about adding the dash to the list of word characters (config update needed) but I'm searching for a way to do this programmatically.
This clause
MATCH(name) AGAINST('+"e-magazi"*' IN BOOLEAN MODE);
Will search for a AND "e" AND NOT "magazi"; i.e. the - inside "e-magazi" will be interpreted as a not even though it is inside quotation marks.
For this reason it will not work as expected.
A solution is to apply an extra having clause with a LIKE.
I know this having is slow, but it will only be applied to the results of the match, so not too many rows should be involved.
I suggest something like:
SELECT id, name
FROM company
WHERE MATCH(name) AGAINST('magazine' IN BOOLEAN MODE)
HAVING name LIKE '%e-magazi%';
MySQL fulltext treats the word e-magazine in a text as a phrase and not as a word. Because of that it results the two words e and magazine. And while it builds the search index it does not add the e to the index because of the ft_min_word_len (default is 4 chars).
The same length limitation is used for the search query. That is the reason why a search for e-magazine returns exactly the same results as a-magazine because a and - is fully ignored.
But now you want to find the exact phrase e-magazine. By that you use the quotes and that is the complete correct way to find phrases, but MySQL does not support operators for phrases, only for words:
https://dev.mysql.com/doc/refman/5.7/en/fulltext-boolean.html
With this modifier, certain characters have special meaning at the beginning or end of words in the search string
Some people would suggest to use the following query:
SELECT id, name
FROM company
WHERE MATCH(name) AGAINST('e-magazi*' IN BOOLEAN MODE)
HAVING name LIKE 'e-magazi%';
As I said MySQL ignores the e- and searches for the wildcard word magazi*. After those results are optained it uses HAVING to aditionally filter the results for e-magazi* including the e-. By that you will find the phrase e-magazine AG. Of course HAVING is only needed if the search phrase contains the wildcard operator and you should never use quotes. This operator is used by your user and not you!
Note: As long you do not surround the search phrase with % it will find only fields that start with that word. And you do not want to surround it, because it would find bee-magazine as well. So maybe you need an additional OR HAVING name LIKE ' %e-magazi%' OR HAVING NAME LIKE '\\n%e-magazi%' to make it usable inside of texts.
Trick
But finally I prefer a trick so HAVING isn't needed at all:
If you add texts to your database table, add them additionally to a separate fulltext indexed column and replace words like up-to-date with up-to-date uptodate.
If a user searches for up-to-date replace it in the query with uptodate.
By that you can still find specific in user-specific but up-to-date as well (and not only date).
Bonus
If a user searches for -well-known huge ports MySQL treats that as not include *well*, could include *known* and *huge*. Of course you could solve that with an other extra query variant as well, but with the trick above you remove the hyphen so the search query looks simply like that:
SELECT id
FROM texts
WHERE MATCH(text) AGAINST('-wellknown huge ports' IN BOOLEAN MODE)

Mysql Full-Text Search - Searching for part of a keyword

I've read about full-text search functions in mysql. But in these methods you have to search for exactly right spelled complete words.
For example if your text contains 'Bitdefender 2009' and you search for Bit, you get nothing
SELECT * FROM logs WHERE MATCH (log) AGAINST ('Bit 09' IN BOOLEAN MODE);
So are there any solution for this?
(Is there a technique which would let you search for misspelled keywords as well? for example you search for Bitdefedner)
You could also turn to Lucene or other specialized search engine mentioned in https://stackoverflow.com/questions/553055/best-full-text-search-for-mysql
You probably need the help of LIKE, REGEXP, AGAINST or SOUNDEX functions.
Have a look at the following:
http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html
http://dev.mysql.com/doc/refman/5.0/en/string-comparison-functions.html#operator_like
http://dev.mysql.com/doc/refman/5.0/en/regexp.html#operator_regexp
http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_soundex
You could use the * wildcard AGAINST ('Bit* 09' IN BOOLEAN MODE)
For misspelt keywords you need a separate spellcheck phase.