MySQL Match Against for exact Phrase working partially - mysql

I have a table in which I created FullText index in a column called item_desc.
Let's say table contains three records in which column item_desc includes "Sodium Chloride" like following:
Solution Sodium Chloride standard
5425 Sodium Chloride 100u
QtySodium Chloride solution
I have a following (Match, Against) query which supposed to be return rows by exact matching the records but it is returning only first two rows against Sodium Chloride and doesn't consider the phrase if it is concatenated with another word like QtySodium Chloride.
SELECT * FROM tblhugedata WHERE MATCH(Item_desc) AGAINST('"*Sodium Chloride*"' IN BOOLEAN MODE);
Following LIKE query returns expected results but I want to use only FullText index.
SELECT * FROM tblhugedata WHERE Item_desc like '%SODIUM CHLORIDE%';
Is there anyway to extract such results by match, Against way.

Remove the asterisks. FULLTEXT does not allow for leading wildcards. That is, there is no way to get MATCH to match QtySodium against Sodium.
I would consider "QtySodium" to be "garbage in" and complain to the provider of the data.
Here is a kludge that will work in some cases:
WHERE WHERE MATCH(Item_desc) AGAINST('Sodium Chloride' IN BOOLEAN MODE)
AND Item_desc LIKE '%SODIUM CHLORIDE%'
That way, it will efficiently filter down to rows that have either "Sodium" or "Chloride", then check such rows for exactly the substring "Sodium Chloride". That will match your 3 examples, but perhaps not some other examples.

SELECT * FROM tblhugedata WHERE MATCH(Item_desc) AGAINST( 'Sodium Chloride' IN NATURAL LANGUAGE MODE);
InnoDB full-text search does not support the use of multiple operators on a single search word.

Related

mysql match() agains() OR match() against() SLOW

I have table with quite many rows (~2M).
When i search it like
SELECT * FROM product WHERE
MATCH(name) AGAINST('+some' '+word' IN BOOLEAN MODE)
it works like charm and finds what i need in less than 0.5s.
But when i search for 2 sets of words like this
SELECT * FROM product WHERE
MATCH(name) AGAINST('+some' '-word' IN BOOLEAN MODE)
OR
MATCH(name) AGAINST('+something' '-other' IN BOOLEAN MODE)
search takes sometimes over minute.
I would expect it to work 2 times slower (it's 2 searches), maybe a bit more (you still have to compare results and remove duplicates, but if there are only few results it should not take long), but not so much longer. After adding OR it works slower, than LIKE "%...%" OR LIKE "%...%"
Anyone can tell me what am i doing wrong?
Unfortunately for you, fulltext indexes have some limitations, and not being able to properly merge the results of two independent fulltext searches is one them:
The Index Merge optimization algorithm has the following known limitations:
[...]
Index Merge is not applicable to full-text indexes.
Fortunately for you, fulltext searches can be more complex, so you can merge your searches yourself. Your second query can be written as a single search using:
SELECT * FROM product WHERE
MATCH(name) AGAINST('(+something -other) (+some -word)' IN BOOLEAN MODE)
This defines two search sets and is ok if either of the two (...) matches - which is an or.
Alternatively, you can use a union instead of an or, which allows MySQL to actually run two independent fulltext searches and then combine the two results, which is basically what you are thinking of:
SELECT * FROM product WHERE
MATCH(name) AGAINST('+some -word' IN BOOLEAN MODE)
UNION
SELECT * FROM product WHERE
MATCH(name) AGAINST('+something -other' IN BOOLEAN MODE)
This also works for more complicated situations, e.g. merging searches on different columns, but will not work that easy if you want to do something else than or.

Optimizing search query

This might seem to be a redundant question but i can't find the right answer to this issue.
I have a TableA with more than 50 columns.I am implementing a search functionality for searching a query in about 10 columns of this table. TableA contains more than a million rows
For this I have created a composite index on these 10 columns.
index (col1,col_2,col_3,col_4,col_5,col_6,col_7,col_8,col_9,col_10)
Now i am splitting user's query using space as regex. i.e. $search_words = $search_query.split(' '); and using individual words to match in my search query. Example :
SELECT something FROM tableA
WHERE ( MATCH ( col_1, col_2,col_3,col_4,col_5,col_6,col_7,col_8,col_9,col_10 )
AGAINST ( ' +word1* +word2* +word3* +word4* ' IN BOOLEAN MODE ) )
This query works fine for general searches but if users searches for individual alphabets in query like A E I O Co. it takes too much time. What is the best way to optimise the query or another way to perform search in this situation?
If you feed a too-short string to InnoDB's FULLTEXT, it returns zero results. So... Filter out any strings that are shorter than innodb_ft_min_token_size.
If necessary, test for them separately using REGEXP '[[:<:]]A[[:>:]] to look for a 1-letter word A.
Or throw them together. This would check for the only 1-letter English words: REGEXP '[[:<:]][AI][[:>:]]

Mysql Match Against Ranking

Im currently using a query for an autocomplete box with like. However I want to use the match, against which should be faster but I'm running against some issues with the sorting.
I want to rank a query like this:
[query] %
[query]%
% [query]%
%[query]%
For now I use
SELECT * FROM table
WHERE name LIKE '%query%'
ORDER BY (case
WHEN name LIKE 'query %' THEN 1
WHEN name LIKE 'query%' THEN 2
WHEN name LIKE '% query%' THEN 3
ELSE 4 END) ASC
When I use...
SELECT * FROM table
WHERE MATCH(name) AGAINST('query*' IN BOOLEAN MODE)
...all results get the same 'ranking score'.
For example searching for Natio
returns Pilanesberg National Park and National Park Kruger with the same score while I want the second result as first becouse it starts with the query.
How can I achieve this?
I had your same problem and I had to approach it in a different way.
The documentation of MySQL says:
The term frequency (TF) value is the number of times that a word appears in a document. The inverse document frequency (IDF) value of a word is calculated using the following formula, where total_records is the number of records in the collection, and matching_records is the number of records that the search term appears in.
${IDF} = log10( ${total_records} / ${matching_records} )
When a document contains a word multiple times, the IDF value is multiplied by the TF value:
${TF} * ${IDF}
Using the TF and IDF values, the relevancy ranking for a document is calculated using this formula:
${rank} = ${TF} * ${IDF} * ${IDF}
And this is followed by an example where it explains the above declaration: it search for the word 'database' in different fields and returns a rank based upon the results.
In your example the words "Pilanesberg National Park", "National Park Kruger" will return the same rank against ('Natio' IN BOOLEAN MODE)* because the rank is based not on the common sense similarity of the word (or in this case you'd expected to tell the database what's meaning -for you- "similar to"), but is based on the above formula, related to the frequency.
And note also that the value of the freqency is affected by the type of index (InnoDB or MyISAM) and by the version of MySQL (in older version you cannot use Full-text indexes with InnoDB tables).
Regarding your problem, you can use MySQL user defined variables or functions or procedures in order to evaluate the rank basing upon your idea of rank. Examples here, here or here. And also here.
See also:
MySQL match() against() - order by relevance and column?
MYsql FULLTEXT query yields unexpected ranking; why?

difference between these 2 queries

Well i'm running 2 queries that should show me the same result,
First query:
SELECT count( id ) AS cv FROM table_name WHERE field_name LIKE '%êêê01, word02, word03%'
Second query:
SELECT count( id ) AS cv FROM table_name WHERE match(field_name) against('êêê01, word02, word03')
but the first show more rows than the second, someone could tell me why?
I'm using fulltext index on this field,
Thanks.
I did a quick research and the following quote should answer your question:
One problem with MATCH on MySQL is that it seems to only match against whole words so a search for 'bla' won't match a column with a value of 'blah'.
It's also described in the documentation for match
By default, the MATCH() function performs a natural language search for a string against a text collection. A collection is a set of one or more columns included in a FULLTEXT index. The search string is given as the argument to AGAINST(). For each row in the table, MATCH() returns a relevance value; that is, a similarity measure between the search string and the text in that row in the columns named in the MATCH() list.
Meanwhile like is more "powerful" as it can look upon individuals characters:
Per the SQL standard, LIKE performs matching on a per-character basis, thus it can produce results different from the = comparison operator:
Which explains why like returns more results than match.

Getting a minimum number of results from fulltext search

I have a text field that I'm searching against using an array of keywords and right now, I'm either searching for all of the keywords or any of the keywords.
My question is: is there a way to pull results with a minimum number of keywords?
For example, I'm searching for 6 keywords, but I only need 50% of them to match, so I want the fulltext search to only return results that have matched at least 3 of the keywords.
Is this even possible?
Maybe by using a FullText Modifier?
When you do a fulltext search using the IN BOOLEAN MODE modifier in your select statement, it can display the number of matches in the search.
Example:
SELECT id, MATCH (text) AGAINST ('MySQL Fulltext' IN BOOLEAN MODE) AS matches
FROM table_name
HAVING matches > 2;