MySQL fulltext with stems - mysql

I am building a little search function for my site. I am taking my user's query, stemming the keywords and then running a fulltext MySQL search against the stemmed keywords.
The problem is that MySQL is treating the stems as literal. Here is the process that is happening:
user searches for a word like "baseballs"
my stemming algorithm (Porter Stemmer) turns "baseballs" into "basebal"
fulltext does not find anything matching "basebal", even though there SHOULD be matches for "baseball" and "baseballs"
How do I do the equivalent of LIKE 'basebal%' with fulltext?
EDIT:
Here is my current query:
SELECT MATCH (`title`,`body`) AGAINST ('basebal') AS `relevance`,`id` FROM `blogs` WHERE MATCH (`title`,`body`) AGAINST ('basebal') ORDER BY `relevance` DESC

I think it will work with an asterisk at the end: basebal*. See the * operator on this page for more info.

See This link.. Stemming is not installed BY default in MySQL but you can install it your self..
http://oksoft.blogspot.com/2009/05/stemming-words-in-mysql.html

IN NATURAL LANGUAGE MODE is the default mode and not compatible with stemming. Try IN BOOLEAN MODE with wildcards...
SELECT MATCH (`title`, `body`) AGAINST ('basebal*' IN BOOLEAN MODE) AS `relevance`, `id` FROM `blogs` WHERE MATCH (`title`, `body`) AGAINST ('basebal*' IN BOOLEAN MODE) ORDER BY `relevance` DESC
Example above provides clarity for people stumbling onto this question 10 years after it was asked. Topic is still relevant and benefits from complete examples 😉

Related

SQL MATCH ... AGAINST limits result to current year

I'm stumped by this:
Two commands:
SELECT Date,Sentence FROM exampletable;
SELECT Date,Sentence FROM exampletable WHERE MATCH (Sentence) AGAINST ("South" IN NATURAL LANGUAGE MODE );
The first gives me results for the entire database, beginning in 2013. I see that there's an entry sometime in 2018 that contains the word "South", so using a match against in the second command I know I should get at least one result pre 2020. However, the first result is from 2020-01-28 onwards.
This happens in all examples I try. Simply adding a match against limits my returned results to > 2020. Is there some database setting that I'm not aware of? Or just something plainly obvious?
Any help would be appreciated! (I'm using MySQL 5.7)
Your query has no order by. Have you tried something like this?
SELECT Date, Sentence
FROM exampletable
WHERE MATCH (Sentence) AGAINST ("South" IN NATURAL LANGUAGE MODE )
ORDER BY date;
Perhaps the earlier dates are just later in the result set.

Yii2 Sphinx search greater than less or Less than less condition OR "BETWEEN"

$totalGeneralDownload = $squery->from('general_materials','general_materials_delta')
->match
(
(new MatchExpression())
->match([$param => $searchTitle])
->andfilterMatch(['and','download_count>=100','download_count<=100'])
)
->showMeta(true)
->search();
This is my sphinx code, can anyone let me know how to use Between in Sphinx
I have R&D on it and find out that sphinx only text search not giving between type clauses so i think you should get id and then you have to filter by between clauses.
Sphinx not provide between clause so we have to manage in our code.
once you get id from sphinx and then you have to filter by your code and then get the result.

CONTAINS(Oracle) vs. MATCH(MySql)

I want to know what I can do to find the same results as in Oracle with MySql DBMS.
For example I use this statement in Oracle:
Select *
FROM PEP INNER JOIN ZUSAMMEN ON PEP.ID = ZUSAMMEN.PEPID
WHERE CONCATINS(ZUSAMMEN.NAMEN, '%Angela% and %Merkel%',0 ) > 0;
So I've set a CONTEXT Index on the 'Namen' Column.
Now, in MySql the syntax looks like this:
SELECT *
FROM INNER JOIN ZUSAMMEN ON PEP.ID = ZUSAMMEN.PEPID
WHERE MATCH(ZUSAMMEN.NAMEN) AGAINST ('Angela Merkel' IN BOOLEAN MODE);
The problem is, that the MySql Statement finds more results than Oracle.
Oracle finds the exact Name (Angela Dorothea Merkel).
MySql Not.
How can I build my Syntax for MySql, that MySql finds the same results as Oracle?
According to mysql documentation on fulltext search in boolean mode:
In implementing this feature, MySQL uses what is sometimes referred to
as implied Boolean logic, in which
+: stands for AND
-: stands for NOT
[no operator]: implies OR
You have not indicated any operators, so mysql is searching for records that have either Angela or Merkel present in the namen field. Modify the search using the + operator to require both words to be present:
MATCH(ZUSAMMEN.NAMEN) AGAINST ('+Angela +Merkel' IN BOOLEAN MODE)
Pls note, that the mysql search will still behave slightly differently. The Oracle search should return a record where the name is Angelas Merkels, while the mysql search will not.

how to speed up mysql regex query

I want to develope a site for announcing jobs, but because I have a lot of conditions (title,category,tags,city..) I use a MySQL regex statement. However, it's very slow and sometimes results in a 500 internal Server Error
Here is one example :
select * from job
where
( LOWER(title) REGEXP 'dév|freelance|free lance| 3eme grade|inform|design|site|java|vb.net|poo '
or
LOWER(description) REGEXP 'dév|freelance|free lance| 3eme grade|inform|design|site|java|vb.net|poo '
or
LOWER(tags) REGEXP 'dév|freelance|free lance| 3eme grade|inform|design|site|java|vb.net|poo')
and
LOWER(ville) REGEXP LOWER('Agadir')
and
`date`<'2016-01-11'
order by `date` desc
Any advice?
You can't optimize a query based exclusively on regexes. Use full text indexing (or a dedicated search engine such as Mnogo) for text search and geospatial indexing for locations.
The big part of the WHERE, namely the OR of 3 REGEXPs cannot be optimized.
LOWER(ville) REGEXP LOWER('Agadir') can be turned into simply ville REGEXP 'Agadir' if your collation is ..._ci. Please provide SHOW CREATE TABLE job.
Then that can be optimized to ville = 'Agadir'.
But maybe this query is "generated" by your UI? And the users are allowed to use regexp thingies? (SECURITY WARNING: SQL injection is possible here!)
If it is "generated", the generate the "=" version if there are no regexp codes.
Provide these:
INDEX(ville, date) -- for cases when you can do `ville = '...'`
INDEX(date) -- for cases when you must have `ville REGEXP '...'`
The first will be used (and reasonably optimal) when appropriate. The second is better than nothing. (It depends on how many rows have that date range.)
It smells like there may be other SELECTs. Let's see some other variants. What I have provided here may or may not help with them.
See my indexing cookbook: http://mysql.rjweb.org/doc.php/index_cookbook_mysql

mysql regexp for search using alias

I am not very good with regexp so I really would like some help to achieve my goal.
When searching in my db I use an alias for specific keywords.
Here is an example
keyword tets alias test (someone have spell wrong then word test)
keyword b.m.w alias bmw (if someone write b.m.w instead of bmw)
etc.
So far if a user searches for "bmw 316" I use LIKE "%bmw%316%" to get the results.
Now if the user searches for "b.m.w 316" I must use
"%b.m.w%316%" OR
"%bmw%316%"
because b.m.w has alias bmw.
In the case of 6 words with 2-3 aliases there are too many combinations.
I am trying to achieve it with regexp.
In the scenario above it would be something like (bmw|b.m.w) 316.
How do I solve this problem?
You are not looking for REGEXP you are looking for a thing called levenshtein distance
MySQL does not (yet) have native support for this (wonderful) concept, but you can download a UDF here:
http://joshdrew.com/
And here's a list so you've got something to choose from:
http://blog.lolyco.com/sean/2008/08/27/damerau-levenshtein-algorithm-levenshtein-with-transpositions/
You can also write your own function in MySQL, so you don't have to install a UDF.
http://www.supermind.org/blog/927/working-mysql-5-1-levenshtein-stored-procedure
Finally this question might help you out as well:
Implementation of Levenshtein distance for mysql/fuzzy search?
A query for the closest match would look something like:
SELECT * FROM atable a ORDER BY levenshtein(a.field, '$search') ASC LIMIT 10