Getting a minimum number of results from fulltext search - mysql

I have a text field that I'm searching against using an array of keywords and right now, I'm either searching for all of the keywords or any of the keywords.
My question is: is there a way to pull results with a minimum number of keywords?
For example, I'm searching for 6 keywords, but I only need 50% of them to match, so I want the fulltext search to only return results that have matched at least 3 of the keywords.
Is this even possible?
Maybe by using a FullText Modifier?

When you do a fulltext search using the IN BOOLEAN MODE modifier in your select statement, it can display the number of matches in the search.
Example:
SELECT id, MATCH (text) AGAINST ('MySQL Fulltext' IN BOOLEAN MODE) AS matches
FROM table_name
HAVING matches > 2;

Related

MySQL Match Against for exact Phrase working partially

I have a table in which I created FullText index in a column called item_desc.
Let's say table contains three records in which column item_desc includes "Sodium Chloride" like following:
Solution Sodium Chloride standard
5425 Sodium Chloride 100u
QtySodium Chloride solution
I have a following (Match, Against) query which supposed to be return rows by exact matching the records but it is returning only first two rows against Sodium Chloride and doesn't consider the phrase if it is concatenated with another word like QtySodium Chloride.
SELECT * FROM tblhugedata WHERE MATCH(Item_desc) AGAINST('"*Sodium Chloride*"' IN BOOLEAN MODE);
Following LIKE query returns expected results but I want to use only FullText index.
SELECT * FROM tblhugedata WHERE Item_desc like '%SODIUM CHLORIDE%';
Is there anyway to extract such results by match, Against way.
Remove the asterisks. FULLTEXT does not allow for leading wildcards. That is, there is no way to get MATCH to match QtySodium against Sodium.
I would consider "QtySodium" to be "garbage in" and complain to the provider of the data.
Here is a kludge that will work in some cases:
WHERE WHERE MATCH(Item_desc) AGAINST('Sodium Chloride' IN BOOLEAN MODE)
AND Item_desc LIKE '%SODIUM CHLORIDE%'
That way, it will efficiently filter down to rows that have either "Sodium" or "Chloride", then check such rows for exactly the substring "Sodium Chloride". That will match your 3 examples, but perhaps not some other examples.
SELECT * FROM tblhugedata WHERE MATCH(Item_desc) AGAINST( 'Sodium Chloride' IN NATURAL LANGUAGE MODE);
InnoDB full-text search does not support the use of multiple operators on a single search word.

Match against query not working - incorrect parameters

I have a table with articles and a table with categories. Each category has a number of keywords and I want to use those keywords to determine if an article belongs to a certain category.
I'm using the query below:
SELECT
path,
title,
description,
keywords
FROM
(
SELECT
path,
keywords,
(select title from article where id = 164016) as title,
(select description from article where id = 164016) as description
FROM
categories c
) as x
WHERE
MATCH (title, description) AGAINST ('my keywords' IN BOOLEAN MODE)
For some reason this query is not working because of incorrect paramaters with MATCH but I can't figure out what it is.
Use boolean fulltext search to enable exact matching on an expression. However, there are some limitations to such searches, as described by the documentation linked above:
A phrase that is enclosed within double quote (") characters matches only rows that contain the phrase literally, as it was typed. The full-text engine splits the phrase into words and performs a search in the FULLTEXT index for the words. Nonword characters need not be matched exactly: Phrase searching requires only that matches contain exactly the same words as the phrase and in the same order. For example, "test phrase" matches "test, phrase".
If the phrase contains no words that are in the index, the result is
empty. The words might not be in the index because of a combination of
factors: if they do not exist in the text, are stopwords, or are
shorter than the minimum length of indexed words.
The above also means that if you have a stop word or a word shorter than the minimum length in the search expression, then MySQL will not return any matches.
SELECT path, keywords, MATCH (c.keywords) AGAINST ('"Here is some text"' IN BOOLEAN MODE) as relevance
FROM categories c
WHERE MATCH (c.keywords) AGAINST ('"Here is some text"' IN BOOLEAN MODE)
If you want completely exact matches, then you cannot use fulltext search. You either need to use the like operator or the = operator.
UPDATE
title in this case is a calculated field which does not have a fulltext index.
against()
takes a string to search for, and an optional modifier that indicates what type of search to perform. The search string must be a string value that is constant during query evaluation. This rules out, for example, a table column because that can differ for each row.
(source)
You have a field name in the against() function, which is not allowed.

MySQL - Efficient search with partial word match and relevancy score (FULLTEXT)

How can I do a MySQL search which will match partial words but also provide accurate relevancy sorting?
SELECT name, MATCH(name) AGAINST ('math*' IN BOOLEAN MODE) AS relevance
FROM subjects
WHERE MATCH(name) AGAINST ('math*' IN BOOLEAN MODE)
The problem with boolean mode is the relevancy always returns 1, so the sorting of results isn't very good. For example, if I put a limit of 5 on the search results the ones returned don't seem to be the most relevant sometimes.
If I search in natural language mode, my understanding is that the relevancy score is useful but I can't match partial words.
Is there a way to perform a query which fulfils all of these criteria:
Can match partial words
Results are returned with accurate relevancy
Is efficient
The best I've got so far is:
SELECT name
FROM subjects
WHERE name LIKE 'mat%'
UNION ALL
SELECT name
FROM subjects
WHERE name LIKE '%mat%' AND name NOT LIKE 'mat%'
But I would prefer not to be using LIKE.
The new InnoDB full-text search feature in MySQL 5.6 helps in this case.
I use the following query:
SELECT MATCH(column) AGAINST('(word1* word2*) ("word1 word1")' IN BOOLEAN MODE) score, id, column
FROM table
having score>0
ORDER BY score
DESC limit 10;
where ( ) groups words into a subexpression. The first group has like word% meaning; the second looks for exact phrase. The score is returned as float.
I obtained a good solution in this (somewhat) duplicate question a year later:
MySQL - How to get search results with accurate relevance

difference between these 2 queries

Well i'm running 2 queries that should show me the same result,
First query:
SELECT count( id ) AS cv FROM table_name WHERE field_name LIKE '%êêê01, word02, word03%'
Second query:
SELECT count( id ) AS cv FROM table_name WHERE match(field_name) against('êêê01, word02, word03')
but the first show more rows than the second, someone could tell me why?
I'm using fulltext index on this field,
Thanks.
I did a quick research and the following quote should answer your question:
One problem with MATCH on MySQL is that it seems to only match against whole words so a search for 'bla' won't match a column with a value of 'blah'.
It's also described in the documentation for match
By default, the MATCH() function performs a natural language search for a string against a text collection. A collection is a set of one or more columns included in a FULLTEXT index. The search string is given as the argument to AGAINST(). For each row in the table, MATCH() returns a relevance value; that is, a similarity measure between the search string and the text in that row in the columns named in the MATCH() list.
Meanwhile like is more "powerful" as it can look upon individuals characters:
Per the SQL standard, LIKE performs matching on a per-character basis, thus it can produce results different from the = comparison operator:
Which explains why like returns more results than match.

Fulltext search with relevance - why the need for a multiple columns index?

I have to implement fulltext search in multiple columns with result weighting based on relevance of certain columns / fields.
All the solutions I've come across seem to use single-column indexes for calculating relevance and one multiple-column index for the WHERE clause. See: https://stackoverflow.com/a/600915/168719 or https://stackoverflow.com/a/6305108/168719
Here's my query then:
SELECT MATCH(name) AGAINST (text) as relevance_name,
MATCH(description) AGAINST(text) as relevance_description,
MATCH(description_long) AGAINST (text) as relevance_description_long
FROM products WHERE
And I'm facing the choice between:
a)
MATCH(name, description, description_long) AGAINST (text) > 0
b)
MATCH(name) AGAINST (text) > 0
OR MATCH(description) AGAINST (text) > 0
OR MATCH(description_long) AGAINST (text) > 0
After which the sorting clause comes.
ORDER BY (relevance_name * 2 +
relevance_description * 3 +
relevance_description_long * 4) / 9
The question is - what is the superiority of a (apparently the preferred method) over b?
a requires creating another fulltext index (across all searchable columns), which obviously takes more disk space.
What are the advantages? Is it a matter of performance? Or search quality?
Manual on page 12.9.1. Natural Language Full-Text Searches tells us:
For each row in the table, MATCH() returns a relevance value; that is, a similarity measure between the search string and the text in that row in the columns named in the MATCH() list.
Therefore, MATCH () will return different values for MATCH (c1,c2,c3) and MATCH(c1) + MATCH(c2) + MATCH(c3). Similar difference will be when using match with OR operator.
Relevance is computed based on the number of words in the row, the number of unique words in that row, the total number of words in the collection, and the number of documents (rows) that contain a particular word.
You should use approach B, because it is in the same form as your query.