difference between these 2 queries - mysql

Well i'm running 2 queries that should show me the same result,
First query:
SELECT count( id ) AS cv FROM table_name WHERE field_name LIKE '%êêê01, word02, word03%'
Second query:
SELECT count( id ) AS cv FROM table_name WHERE match(field_name) against('êêê01, word02, word03')
but the first show more rows than the second, someone could tell me why?
I'm using fulltext index on this field,
Thanks.

I did a quick research and the following quote should answer your question:
One problem with MATCH on MySQL is that it seems to only match against whole words so a search for 'bla' won't match a column with a value of 'blah'.
It's also described in the documentation for match
By default, the MATCH() function performs a natural language search for a string against a text collection. A collection is a set of one or more columns included in a FULLTEXT index. The search string is given as the argument to AGAINST(). For each row in the table, MATCH() returns a relevance value; that is, a similarity measure between the search string and the text in that row in the columns named in the MATCH() list.
Meanwhile like is more "powerful" as it can look upon individuals characters:
Per the SQL standard, LIKE performs matching on a per-character basis, thus it can produce results different from the = comparison operator:
Which explains why like returns more results than match.

Related

MySQL Match Against for exact Phrase working partially

I have a table in which I created FullText index in a column called item_desc.
Let's say table contains three records in which column item_desc includes "Sodium Chloride" like following:
Solution Sodium Chloride standard
5425 Sodium Chloride 100u
QtySodium Chloride solution
I have a following (Match, Against) query which supposed to be return rows by exact matching the records but it is returning only first two rows against Sodium Chloride and doesn't consider the phrase if it is concatenated with another word like QtySodium Chloride.
SELECT * FROM tblhugedata WHERE MATCH(Item_desc) AGAINST('"*Sodium Chloride*"' IN BOOLEAN MODE);
Following LIKE query returns expected results but I want to use only FullText index.
SELECT * FROM tblhugedata WHERE Item_desc like '%SODIUM CHLORIDE%';
Is there anyway to extract such results by match, Against way.
Remove the asterisks. FULLTEXT does not allow for leading wildcards. That is, there is no way to get MATCH to match QtySodium against Sodium.
I would consider "QtySodium" to be "garbage in" and complain to the provider of the data.
Here is a kludge that will work in some cases:
WHERE WHERE MATCH(Item_desc) AGAINST('Sodium Chloride' IN BOOLEAN MODE)
AND Item_desc LIKE '%SODIUM CHLORIDE%'
That way, it will efficiently filter down to rows that have either "Sodium" or "Chloride", then check such rows for exactly the substring "Sodium Chloride". That will match your 3 examples, but perhaps not some other examples.
SELECT * FROM tblhugedata WHERE MATCH(Item_desc) AGAINST( 'Sodium Chloride' IN NATURAL LANGUAGE MODE);
InnoDB full-text search does not support the use of multiple operators on a single search word.

MySQL Match Against w/ additional condition: Use subquery?

Suppose we have a table such as
CREATE TABLE test{
title VARCHAR(32),
city VARCHAR(32),
description TEXT
...
And in a query say we have
SELECT * FROM test WHERE MATCH(title, description) AGAINST('xyz' IN NATURAL LANGUAGE MODE) AND city = 'ABC';
Will MySQL know to use the "city" condition first, or should we be more explicit and use a subquery?
From looking at the code, the MATCH expression will be evaluated already in the query optimization phase. This means that all the rows containing 'xyz' will be identified before the condition on city is considered. (At least, this is how I understand it to for work for InnoDB. I do not know the details of how this is implemented in MyISAM.) During query execution, when the WHERE clause is evaluated, expressions are evaluated from left to right. (This is the current implementation, and may change in future versions.) Since MATCH scores have already been computed, at this point one just evaluates whether they are non-zero.
If your city column is indexed, the query optimizer may choose to use this index to scan only rows from the given city, and compare the MATCH scores for just these rows. However, all rows containing 'xyz' are still first identified. The EXPLAIN output for the query will show if the index is used.
I doubt that using a subquery will help anything. If the subquery is correlated, you may even risk that the full-text search is performed multiple times.

Optimizing search query

This might seem to be a redundant question but i can't find the right answer to this issue.
I have a TableA with more than 50 columns.I am implementing a search functionality for searching a query in about 10 columns of this table. TableA contains more than a million rows
For this I have created a composite index on these 10 columns.
index (col1,col_2,col_3,col_4,col_5,col_6,col_7,col_8,col_9,col_10)
Now i am splitting user's query using space as regex. i.e. $search_words = $search_query.split(' '); and using individual words to match in my search query. Example :
SELECT something FROM tableA
WHERE ( MATCH ( col_1, col_2,col_3,col_4,col_5,col_6,col_7,col_8,col_9,col_10 )
AGAINST ( ' +word1* +word2* +word3* +word4* ' IN BOOLEAN MODE ) )
This query works fine for general searches but if users searches for individual alphabets in query like A E I O Co. it takes too much time. What is the best way to optimise the query or another way to perform search in this situation?
If you feed a too-short string to InnoDB's FULLTEXT, it returns zero results. So... Filter out any strings that are shorter than innodb_ft_min_token_size.
If necessary, test for them separately using REGEXP '[[:<:]]A[[:>:]] to look for a 1-letter word A.
Or throw them together. This would check for the only 1-letter English words: REGEXP '[[:<:]][AI][[:>:]]

Mysql Match Against Ranking

Im currently using a query for an autocomplete box with like. However I want to use the match, against which should be faster but I'm running against some issues with the sorting.
I want to rank a query like this:
[query] %
[query]%
% [query]%
%[query]%
For now I use
SELECT * FROM table
WHERE name LIKE '%query%'
ORDER BY (case
WHEN name LIKE 'query %' THEN 1
WHEN name LIKE 'query%' THEN 2
WHEN name LIKE '% query%' THEN 3
ELSE 4 END) ASC
When I use...
SELECT * FROM table
WHERE MATCH(name) AGAINST('query*' IN BOOLEAN MODE)
...all results get the same 'ranking score'.
For example searching for Natio
returns Pilanesberg National Park and National Park Kruger with the same score while I want the second result as first becouse it starts with the query.
How can I achieve this?
I had your same problem and I had to approach it in a different way.
The documentation of MySQL says:
The term frequency (TF) value is the number of times that a word appears in a document. The inverse document frequency (IDF) value of a word is calculated using the following formula, where total_records is the number of records in the collection, and matching_records is the number of records that the search term appears in.
${IDF} = log10( ${total_records} / ${matching_records} )
When a document contains a word multiple times, the IDF value is multiplied by the TF value:
${TF} * ${IDF}
Using the TF and IDF values, the relevancy ranking for a document is calculated using this formula:
${rank} = ${TF} * ${IDF} * ${IDF}
And this is followed by an example where it explains the above declaration: it search for the word 'database' in different fields and returns a rank based upon the results.
In your example the words "Pilanesberg National Park", "National Park Kruger" will return the same rank against ('Natio' IN BOOLEAN MODE)* because the rank is based not on the common sense similarity of the word (or in this case you'd expected to tell the database what's meaning -for you- "similar to"), but is based on the above formula, related to the frequency.
And note also that the value of the freqency is affected by the type of index (InnoDB or MyISAM) and by the version of MySQL (in older version you cannot use Full-text indexes with InnoDB tables).
Regarding your problem, you can use MySQL user defined variables or functions or procedures in order to evaluate the rank basing upon your idea of rank. Examples here, here or here. And also here.
See also:
MySQL match() against() - order by relevance and column?
MYsql FULLTEXT query yields unexpected ranking; why?

Getting a minimum number of results from fulltext search

I have a text field that I'm searching against using an array of keywords and right now, I'm either searching for all of the keywords or any of the keywords.
My question is: is there a way to pull results with a minimum number of keywords?
For example, I'm searching for 6 keywords, but I only need 50% of them to match, so I want the fulltext search to only return results that have matched at least 3 of the keywords.
Is this even possible?
Maybe by using a FullText Modifier?
When you do a fulltext search using the IN BOOLEAN MODE modifier in your select statement, it can display the number of matches in the search.
Example:
SELECT id, MATCH (text) AGAINST ('MySQL Fulltext' IN BOOLEAN MODE) AS matches
FROM table_name
HAVING matches > 2;