How to use fulltext matching for emails in MySQL? - mysql

I'm adding a "search by email" functionality on my contacts database table (MySQL, InnoDB engine) for our CRM system. I added the FULLTEXT index and partial searches like this
WHERE MATCH(cc.email_c) AGAINST('Joe' IN NATURAL LANGUAGE MODE)
are working fine. However, if I try to match against a full email address like this
WHERE MATCH(cc.email_c) AGAINST('Joe#gmail.com' IN NATURAL LANGUAGE MODE)
all email addresses with Gmail get returned. I've already discarded the possibility of using BOOLEAN MODE, from what I understand it doesn't support the "#" symbol in a query. Is there any other way of using a MATCH AGAINST while still being able to search by the exact full address?
I can always use an SQL with LIKE and some wildcards, but I would still prefer to have a full-text search.

Convert it to AGAINST('+Joe +gmail +com' IN BOOLEAN MODE)
But, beware; short names and "stop words" should not have a '+' on front of them.
Then, because it won't be precise enough, combine with a LIKE:
WHERE MATCH(email) AGAINST('+Joe +gmail +com' IN BOOLEAN MODE)
AND email LIKE '%Joe#gmail.com%'
The first part gives you speed; the second part will test only the ones that pass the first test.

Related

MyIsam fulltext search against multiple %term%

I have a field called filepath that I'm trying to search. Here is an example path:
/mnt/qfs-X/Asset_Management/XG_Marketing_/Episodic-SG_1001_1233.jpg
I would like to be able to search the following and get a match:
search = "qf episodic sg_1001 JPG"
How would I do this with a fulltext search in mysql/myisam? What I have now is:
SELECT * FROM x_files2 WHERE MATCH(path)
AGAINST('qf episodic sg_1001 JPG' in boolean mode)
But it is returning way too many results (seems like it's returning if any terms are found instead of only those where all are found.
Put + in front of each 'word':
AGAINST('+qf* +episodic +sg_1001* +JPG' in boolean mode)
Do you have the min-word-length set to 2? If not, there could be other troubles.
The + avoids "too many".
Consider switching to InnoDB, now that it has FULLTEXT.
You may have to abandon use of FULLTEXT and switch to LIKE:
WHERE path LIKE '%qf%episodic%sg_1001%JPG%'
If performance is an issue, consider something like
WHERE MATCH(path) AGAINST('...' IN BOOLEAN MODE) -- using some of the words
AND path LIKE '...' -- as above
The MATCH will run first, whittling down the number of possible rows considerably, then the LIKE takes care of details.
Note that middles of words cannot be used in AGAINST. Those could be left out, relying on LIKE to take care of them.

use full text search to search incomplete words in mysql

I am making a library management system.
I have a problem in the search for a book from mysql database.
For searching data in mysql we use full text search .
But it only works if a full word is given. If user enters an incomplete word instead of the actual word , is there any function to search.
ex : if book name is calculus,
if user types calc , then also the books should come
You can try using fulltext search with boolean mode, which allows a few extra operators. You will be interested in the truncation operator (*):
The asterisk serves as the truncation (or wildcard) operator. Unlike
the other operators, it is appended to the word to be affected. Words
match if they begin with the word preceding the * operator.
If a word is specified with the truncation operator, it is not
stripped from a boolean query, even if it is too short or a stopword.
Whether a word is too short is determined from the
innodb_ft_min_token_size setting for InnoDB tables, or ft_min_word_len
for MyISAM tables. These options are not applicable to FULLTEXT
indexes that use the ngram parser.
The wildcarded word is considered as a prefix that must be present at
the start of one or more words. If the minimum word length is 4, a
search for '+word +the*' could return fewer rows than a search for
'+word +the', because the second query ignores the too-short search
term the.
Pls note, that you cannot start an expression with the * operator, so the results cannot include a book, which title contains 'calc', only which title starts with 'calc'.
You can use the LIKE operator with the "%" wildcard
With LIKE you can use the following two wildcard characters in the pattern:
% matches any number of characters, even zero characters.
for example
SELECT * FROM <Table> where book like "%calc%";
http://dev.mysql.com/doc/refman/5.7/en/string-comparison-functions.html

mysql fulltext boolean search with asterix

I have a query like below:
SELECT prd_id FROM products WHERE MATCH (prd_search_field)
AGAINST ('+gul* +yetistiren* +adam*' in boolean mode);
This doesn't return the rows including 'gul'.
http://dev.mysql.com/doc/refman/5.0/en/fulltext-boolean.html
The document says this.
Then a search for '+word +the*' will likely return fewer rows than a
search for '+word +the':
The former query remains as is and requires both word and the* (a word starting with the) to be present in the document.
The latter query is transformed to +word (requiring only word to be present). the is both too short and a stopword, and either condition is enough to cause it to be ignored.
So as I understood the too short word condition must not be applied in my situation since I use * after each word. What's wrong with this?
As a solution I use the below query but since it's slow, I need to find another solution. Any idea would be appreciated? Thanks in advance..
SELECT prd_id FROM products WHERE 1 AND MATCH (prd_search_field)
AGAINST ('+yetistiren* +adam*' in boolean mode) AND prd_search_field
LIKE '%gul%';
As a note ft_min_word_length=4 as default in all shared hosting environments, and I cannot change it.

mysql boolean mode fulltext search with wildcards and literals

I'm pretty new to MySQL full-text searches and I ran into this problem today:
My company table has a record with "e-magazine AG" in the name column. I have a full-text index on the name column.
When I execute this query the record is not found:
SELECT id, name FROM company WHERE MATCH(name) AGAINST('+"e-magazi"*' IN BOOLEAN MODE);
I need to work with quotes because of the dash and to use the wildcard because I implement a "search as you type" functionality.
When I search for the whole term "e-magazine AG", the record is found.
Any ideas what I'm doing wrong here? I read about adding the dash to the list of word characters (config update needed) but I'm searching for a way to do this programmatically.
This clause
MATCH(name) AGAINST('+"e-magazi"*' IN BOOLEAN MODE);
Will search for a AND "e" AND NOT "magazi"; i.e. the - inside "e-magazi" will be interpreted as a not even though it is inside quotation marks.
For this reason it will not work as expected.
A solution is to apply an extra having clause with a LIKE.
I know this having is slow, but it will only be applied to the results of the match, so not too many rows should be involved.
I suggest something like:
SELECT id, name
FROM company
WHERE MATCH(name) AGAINST('magazine' IN BOOLEAN MODE)
HAVING name LIKE '%e-magazi%';
MySQL fulltext treats the word e-magazine in a text as a phrase and not as a word. Because of that it results the two words e and magazine. And while it builds the search index it does not add the e to the index because of the ft_min_word_len (default is 4 chars).
The same length limitation is used for the search query. That is the reason why a search for e-magazine returns exactly the same results as a-magazine because a and - is fully ignored.
But now you want to find the exact phrase e-magazine. By that you use the quotes and that is the complete correct way to find phrases, but MySQL does not support operators for phrases, only for words:
https://dev.mysql.com/doc/refman/5.7/en/fulltext-boolean.html
With this modifier, certain characters have special meaning at the beginning or end of words in the search string
Some people would suggest to use the following query:
SELECT id, name
FROM company
WHERE MATCH(name) AGAINST('e-magazi*' IN BOOLEAN MODE)
HAVING name LIKE 'e-magazi%';
As I said MySQL ignores the e- and searches for the wildcard word magazi*. After those results are optained it uses HAVING to aditionally filter the results for e-magazi* including the e-. By that you will find the phrase e-magazine AG. Of course HAVING is only needed if the search phrase contains the wildcard operator and you should never use quotes. This operator is used by your user and not you!
Note: As long you do not surround the search phrase with % it will find only fields that start with that word. And you do not want to surround it, because it would find bee-magazine as well. So maybe you need an additional OR HAVING name LIKE ' %e-magazi%' OR HAVING NAME LIKE '\\n%e-magazi%' to make it usable inside of texts.
Trick
But finally I prefer a trick so HAVING isn't needed at all:
If you add texts to your database table, add them additionally to a separate fulltext indexed column and replace words like up-to-date with up-to-date uptodate.
If a user searches for up-to-date replace it in the query with uptodate.
By that you can still find specific in user-specific but up-to-date as well (and not only date).
Bonus
If a user searches for -well-known huge ports MySQL treats that as not include *well*, could include *known* and *huge*. Of course you could solve that with an other extra query variant as well, but with the trick above you remove the hyphen so the search query looks simply like that:
SELECT id
FROM texts
WHERE MATCH(text) AGAINST('-wellknown huge ports' IN BOOLEAN MODE)

MySQL fulltext search isn't matching as expected

I have a pretty simple query that doesn't seem to be giving me the results I'd like. I'm trying to allow the user to search for a resturant by its name, address, or city using a fulltext search.
here is my query:
SELECT ESTAB_NAME, ESTAB_ADDRESS, ESTAB_CITY
FROM restaurant_restaurants rr
WHERE
MATCH (rr.ESTAB_NAME, rr.ESTAB_ADDRESS, rr.ESTAB_CITY)
AGAINST ('*new* *hua*' IN BOOLEAN MODE)
LIMIT 0, 500
New Hua is the restaurant that exists within the table. However when i do a search for 'ting ho' i get the results I would expect.
Does anyone have any idea what What is going on?
I'm using a MyISAM storage engine on MySQL version 5.0.41
Most likely, the full-text index settings have set a minimum word length of 4 - I believe this is the default. You'll need to change these settings, even for BOOLEAN MODE (as per http://dev.mysql.com/doc/refman/5.1/en/fulltext-boolean.html). Take a look at http://dev.mysql.com/doc/refman/5.1/en/fulltext-fine-tuning.html for the settings to change.
I think Michael is right, but also, you probably want to remove the ***** characters unless that's actually in the title you're searching for. MATCH AGAINST doesn't require a "match all" type of parameter.
My guess: "new" is a Mysql Default Stop Word. See Michael Madsen's second link to see how to change the stop word list and regain the restaurant.