Mysql fulltext index search returning weird result - mysql

I have ingredient table. I want all those recipes which have certain ingredients. Below is my table structure.
Table(ingredient) - Applied fulltext index on ingredient column.
------------------------------------------------------
ingredientID rcteID ingredient
310 1 Mint Leaves
311 1 Corriender Leaves
312 1 GreenChili
I am trying to fetch above record below fulltext search query but not getting that record.
SELECT `Ingredient`.`ingredientID` , `Ingredient`.`rcteID`
FROM `ingredient` AS `Ingredient`
WHERE MATCH (`Ingredient`.`ingredient`)
AGAINST ('+Mint Leaves +Corriender Leaves +Greenchili' IN BOOLEAN MODE)
AND `Ingredient`.`rcteID`
IN ( 1 )
GROUP BY `Ingredient`.`rcteID`
Why above query is not working for above record?
When I tried below query it worked. just changed searching text.
SELECT `Ingredient`.`ingredientID` , `Ingredient`.`rcteID`
FROM `ingredient` AS `Ingredient`
WHERE MATCH (`Ingredient`.`ingredient`)
AGAINST ('+Greenchili +Mint Leaves +Corriender Leaves' IN BOOLEAN MODE)
AND `Ingredient`.`rcteID`
IN ( 1 )
GROUP BY `Ingredient`.`rcteID`
OUTPUT
--------------------
ingredientID rcteID
311 1
Don't understand what's going on. Why first query not returning any result and below query returning result?

This is not an real explanation, but you can run this query to see the score.
SELECT MATCH (`Ingredient`.`ingredient`)
AGAINST ('+Mint Leaves +Corriender Leaves +Greenchili' IN BOOLEAN MODE)
FROM `ingredient` AS `Ingredient`
WHERE MATCH (`Ingredient`.`ingredient`)
AGAINST ('+Mint Leaves +Corriender Leaves +Greenchili' IN BOOLEAN MODE)
I believe that your query mean: find ingredients that each of them contains ALL of these Mint Leaves, Corriender Leaves, Greenchili, and which is not found in your data set. MySQL cannot find any row that contains all of these keywords above.
However if you but your query into brackets, it is a different story:
SELECT `Ingredient`.`ingredientID` , `Ingredient`.`rcteID`
FROM `ingredient` AS `Ingredient`
WHERE MATCH (`Ingredient`.`ingredient`)
AGAINST ('(+Greenchili) (+Mint Leaves) (+Corriender Leaves)' IN BOOLEAN MODE)
AND `Ingredient`.`rcteID`
IN ( 1 )
GROUP BY `Ingredient`.`rcteID`
This query can be translated into: Fetch me ingredients which contains AT LEAST one of these:Mint Leaves, Corriender Leaves, Greenchili and group them by rcteID.
UPDATED:
SELECT t1.rcteID FROM `ingredient` t1
JOIN `ingredient` t2 ON t2.rcteID = t1.rcteID
JOIN `ingredient` t3 ON t3.rcteID = t2.rcteID
WHERE
MATCH (t1.`ingredient`) AGAINST ('+Greenchili' IN BOOLEAN MODE)
AND
MATCH (t2.`ingredient`) AGAINST ('+Mint Leaves' IN BOOLEAN MODE)
AND
MATCH (t3.`ingredient`) AGAINST ('+Corriender Leaves' IN BOOLEAN MODE)
AND t1.`rcteID` IN ( 1 )
GROUP BY t1.`rcteID`
I think this query will work for you. Basically, it share the same idea with you but it looks for 3 keywords separately and only get the rcteID which contains 3 ingredients.

Related

MySQL query with a subquery takes significantly longer when using a full text in a where, rather than an order by

I have a query which sometimes runs really fast and sometimes incredibly slowly depending on the number of results that match a full text boolean search within the query.
The query also contains a subquery.
Without the subquery the main query is always fast.
The subquery by itself is also always fast.
But together they are very slow.
Removing the full text search from a where clause and instead ordering by the full text search is really fast.
So it's only slow then when using a full text search search within a where.
That's the simple readable overview, exact queries are below.
I've included the schema at the bottom although it will be difficult to replicate without my dataset which unfortunately I can't share.
I've included the counts and increments in the example queries to give some indication of the data size involved.
I actually have a solution by simply accepting a result which includes irrelevant data and then filtering out that data in PHP. But i'd like to understand why my queries are performing poorly and how I might be able to resolve the issue in MySQL.
In particular i'm confused why it's fast with the full text search in an order by but not with it in the where.
The query I want (slow)
I've got a query that looks like this:
select
*,
MATCH (name) AGAINST ('Old Tra*' IN BOOLEAN MODE) AS relevance_score
from
`app_records`
where
`id` in (
select
distinct(app_record_parents.record_id)
from
`app_group_records`
inner join `app_record_parents`
on `app_record_parents`.`parent_id` = `app_group_records`.`record_id`
where
`group_id` = 3
)
and
MATCH (name) AGAINST ('Old Tra*' IN BOOLEAN MODE)
order by
`relevance_score` desc
limit
10;
This query takes 10 seconds.
This is too long for this sort of query, I need to be looking at milliseconds.
But the two queries run really fast when run by themselves.
The sub select by itself
select distinct(app_record_parents.record_id)
from
`app_group_records`
inner join
`app_record_parents`
on `app_record_parents`.`parent_id` = `app_group_records`.`record_id`
where
`group_id` = 3
The sub select by itself takes 7ms with 2600 results.
The main query without the sub select
select
*,
MATCH (name) AGAINST ('Old Tra*' IN BOOLEAN MODE) AS relevance_score
from
`app_records`
where
MATCH (name) AGAINST ('Old Tra*' IN BOOLEAN MODE)
order by
`relevance_score` desc
limit
10;
The main query without the sub select takes 6ms with 2971 possible results (obviously there's a limit 10 there).
It's faster with less results
The same query but matching against "Old Traf" rather than "Old Tra" takes 300ms.
The number of results are obviously different when using "Old Traf" vs "Old Tra".
Results of full query
"Old Tra": 9
"Old Traf": 2
Records matching the full text search
"Old Tra": 2971
"Old Traf": 120
Removing the where solves the issue
Removing the where and returning all records sorted by the relevance score is really fast and still gives me the experience i'd like:
select
*,
MATCH (name) AGAINST ('Old Tra*' IN BOOLEAN MODE) AS relevance_score
from
`app_records`
where
`id` in (
select
distinct(app_record_parents.record_id)
from
`app_group_records`
inner join `app_record_parents`
on `app_record_parents`.`parent_id` = `app_group_records`.`record_id`
where
`group_id` = 3
)
order by
`relevance_score` desc
limit
10;
But then I need to filter out irrelevant results in code
I'm using this in php so I can now filter my results to remove any that have a 0 relevance score (if there are only 2 matches for instance, 8 random results with a relevance score of 0 will still be included, since i'm not using a where).
array_filter($results, function($result) {
return $result->relevance_score > 0;
});
Obviously this is really quick so it's not really a problem.
But I still don't understand what's wrong with my queries.
So I do have a fix as outlined above. But I still don't understand why my queries are slow.
It's clear that the number of possible results from the full text search is causing an issue, but exactly why and how to get around this issue is beyond me.
Table Schema
Here are my tables
CREATE TABLE `app_records` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`type` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`name` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
PRIMARY KEY (`id`),
FULLTEXT KEY `app_models_name_IDX` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=960004 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
CREATE TABLE `app_record_parents` (
`record_id` int(10) unsigned NOT NULL,
`parent_id` int(10) unsigned DEFAULT NULL,
KEY `app_record_parents_record_id_IDX` (`record_id`) USING BTREE,
KEY `app_record_parents_parent_id_IDX` (`parent_id`) USING BTREE,
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
CREATE TABLE `app_group_records` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`group_id` int(10) unsigned NOT NULL,
`record_id` int(10) unsigned NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=31 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
A note on what the queries are doing
The subquery is getting a list of record_id's that belong to group_id 3.
So while there are 960004 records in app_records there are only 2600 which belong to group 3 and it is against these 2600 that i'm trying to query for name's that match "Old Tra",
So the subquery is getting a list of these 2600 record_id's and then i'm doing a WHERE id IN <subquery> to get the relevant results from app_records.
EDIT: Using joins is equally slow
Just to add using joins has the same issue. Taking 10 seconds for "Old Tra" and 400ms for "Old Traf" and being very fast when not using a full text search in a where.
SELECT
app_records.*,
MATCH (NAME) AGAINST ('Old Tra*' IN BOOLEAN MODE) AS relevance_score
FROM
`app_records`
INNER JOIN app_record_parents ON app_records.id = app_record_parents.record_id
INNER JOIN app_group_records ON app_group_records.record_id = app_record_parents.parent_id
WHERE
`group_id` = 3
AND MATCH (NAME) AGAINST ('Old Tra*' IN BOOLEAN MODE)
GROUP BY
app_records.id
LIMIT
10;
app_record_parents
Has no PRIMARY KEY; hence may have unnecessary duplicate pairs.
Does not have optimal indexes.
See this for several tips.
Perhaps app_group_records is also many-many?
Are you are searching for Old Tra* anywhere in name? If not, then why not use WHERE name LIKE 'Old Tra%. In this case, add INDEX(name).
Note: When FULLTEXT is involved, it is picked first. Please provide EXPLAIN SELECT to confirm this.
This formulation may be faster:
select *,
MATCH (r.name) AGAINST ('Old Tra*' IN BOOLEAN MODE) AS relevance_score
from `app_records` AS r
WHERE MATCH (r.name) AGAINST ('Old Tra*' IN BOOLEAN MODE)
AND EXISTS ( SELECT 1
FROM app_group_records AS gr
JOIN app_record_parents AS rp ON rp.parent_id = gr.record_id
WHERE gr.group_id = 3
AND r.id = rp.record_id )
ORDER BY relevance_score DESC
LIMIT 10
Indexes:
gr: (group_id, record_id) -- in this order
r: nothing but the FULLTEXT will be used
rp: (record_id, parent_id) -- in this order

FULLTEX Search on two column of Different Table in Mysql

Can anyone help me to find the query for fulltext search?
I have two columns Product and Generic.
Table-Product:
1. ProductID (Integer)
2. GenericID (Integer)-FK
3. Product_Name (Varchar)
And in Table-Generic:
1. GenericID (Integer)
2. Generic_Name (Varchar)
What I need is to search the input string with the combined name of both Product_Name and Generic_Name.
my sample query is given below
SELECT
prod.ProductID AS ID,
generic.Generic_Name AS genericName,
prod.Product_Name AS packageName
FROM
Product prod
INNER JOIN
Generic generic ON prod.GenericID = generic.GenericID
WHERE
MATCH (prod.Product_Name ,generic.Generic_Name) AGAINST('+acb* +ace* +serr* +para*' IN BOOLEAN MODE)
ORDER BY prod.Product_Name ASC
It doesn't work because the columns are in different tables.
FULLTEXT search operations each use a FULLTEXT index. That index can only be on one table.
So, you could try using two fulltext search operations...
WHERE (
match(prod.Product_Name) against('+acb* +ace* +serr* +para*' in boolean mode)
OR
match(generic.Generic_Name) against('+acb* +ace* +serr* +para*' in boolean mode)
)
Or, for best performance and result-set ranking, you could build a new name table like this
GenericId NOT a primary key
IsGeneric 1 or 0
Name either Product_Name or Generic_Name
You would construct this table from the union of the names in your other two tables. For example, it might contain
4321 0 Advil
4321 0 Motrin
4321 1 Ibuprofen
4322 0 Coumadin
4322 1 Warfarin
Then, a query like this would do the trick
select prod.ProductID AS ID,
generic.Generic_Name AS genericName,
prod.Product_Name AS packageName
FROM Product prod
INNER JOIN Generic generic ON prod.GenericID = generic.GenericID
INNER JOIN Name ON Name.GenericID = prod.GenericID
WHERE MATCH(Name.Name) AGAINST('+acb* +ace* +serr* +para*' in boolean mode)
ORDER BY prod.Product_Name ASC
The second alternative is more work to program. But, because it puts both tradenames and generic names into a single fulltext index, it will be faster and it is likely to give better results.

mysql full text search with union performing slowly

I have a database that I had to move from MS SQL over to MySQL. I was able to migrate it over and get things (mostly) up and running, but the performance has taken a big hit on the following query.
In MS SQL, the query would run in less than 2 seconds, but it's now taking 1-2 minutes in MySQL. In MS SQL it was using the CONTAINS operator rather than MATCH, but when I moved to MySQL I made the subject and note columns full text indexes.
There are two separate tables - both containing a "subject" and "full note" field. The database is quite large and I'm trying to do a free text search, looking for a match in either field in either table.
If someone can help me optimize the query, I'd really appreciate it.
SELECT cs.SWCaseID, cs.SWSubject, cs.SWNote
FROM tblSCasesSearch cs
WHERE cs.SWCaseID in
(SELECT cs.SWCaseID
FROM tblSCasesSearch cs
WHERE MATCH (cs.SWSubject, cs.SWNote) AGAINST ('SEARCH VALUE' IN BOOLEAN MODE)
union
SELECT csn.SWCaseID
FROM tblSCaseNotesSearch csn
WHERE MATCH (csn.SWSubject, csn.SWNote) AGAINST ('SEARCH VALUE' IN BOOLEAN MODE))
LIMIT 50
Try rewriting the query as explicit joins:
SELECT cs.SWCaseID, cs.SWSubject, cs.SWNote
FROM tblSCasesSearch cs left outer join
(SELECT distinct cs.SWCaseID
FROM tblSCasesSearch cs
WHERE MATCH (cs.SWSubject, cs.SWNote) AGAINST ('SEARCH VALUE' IN BOOLEAN MODE)
) scs
on cs.SWCaseId = scs.SWCaseId left outer join
(SELECT distinct cs.SWCaseID
FROM tblSCaseNotesSearch cs
WHERE MATCH (cs.SWSubject, cs.SWNote) AGAINST ('SEARCH VALUE' IN BOOLEAN MODE)
) scns
on cs.SWCaseId = scns.SWCaseId
WHERE scs.SWCaseId is not null or scns.SWCaseId is not null
limit 50;

MYSQL Search for empty fields in table

I'm search through multiple tables.
SELECT DISTINCT cv.id, cv.tJobTitle, cv.tJobTitleAlt, cv.rEmployer, employee.firstName, employee.surname, cv.recentJobTitle, match ( cv.title, cv.recentJobTitle, cv.targetJobTitle, cv.targetJobTitleAlt ) AGAINST ('Desktop' IN BOOLEAN MODE) AS relevance
FROM cv AS cv, employee AS employee, country AS country
WHERE cv.showTo=1 AND cv.status=1 AND cv.employeeIDFK = employee.id AND cv.countryISO2FK='GB'
AND cv.countryISO2FK=country.iso2
AND match ( cv.title, cv.recentJobTitle, cv.targetJobTitle, cv.targetJobTitleAlt ) AGAINST ('Desktop' IN BOOLEAN MODE )
AND cv.salaryType='1' AND cv.salaryMax <=23088 OR cv.salaryMin is NUll
ORDER BY relevance DESC
I have a price values which I am search in my database but I also have a tick box to say if the price has not be set show that record.
So If the price field is empty then still show in result.
I have try the above but its giving me more the 100 records where my table only has 2 records.
Assuming country.iso2 is a unique field, I'm guessing that you multiple cv's per employee or vice-versa.
NOTE: It's good advice to avoid using the comman notation for INNER JOINs. Also, this will only work where your field3 is really empty and not NULL.

Search with relevance ranking using containstable and freetext

I've read that you can rank the result from a search using containstable along with contains and freetext under SQL 2008 server. I've just recently used freetext for the first time. Free text loops through the words separately and compares to the indexed column. I want to be able to search for phrases first and then single words.
Let's say the description column is indexed. I'm using a stored procedure query like this:
SELECT id, description, item from table where (FREETEXT(description,#strsearch))
Example if 3 rowsets contains words with apples in them and I search for 'apple cake', the row-set with id2 should be first, then the other two should follow:
id1 apple pie 4/01/2012
id2 apple cake 2/29/2011
id3 candy apple 5/9/2011
Example if 4 rowsets contains words with food in them and I search for 'fast food restaurant', the row-set with id3 should be first, followed by id1 (not an exact match but because it has 'fast food' in the column), then the other two should follow:
id1 McDonalds fast food
id2 healthy food
id3 fast food restaurant
id4 Italian restaurant
Does this article help?
MSDN : Limiting Ranked Result Sets (Full-Text Search)
It implies, in part, that using an additional parameter will allow you to limit the result to the ones with the greatest relevance (which you can influence using WEIGHT) and also order by that relevance (RANK).
top_n_by_rank is an integer value, n, that specifies that only the n
highest ranked matches are to be returned, in descending order.
The doc doesn't have an example for FREETEXT; it only references CONTAINSTABLE. But it definitely implies that CONTAINSTABLE outputs a RANK column that you could use to ORDER BY.
I don't know if there is any way to enforce your own definition of relevance. It may make sense to pull out the top 10 relevant matches according to FTS, then apply your own ranking on the output, e.g. you can split up the search terms using a function, and order by how many of the words matched. For simplicity and easy repro in the following example I am not using Full-Text in the subquery but you can replace it with whatever you're actually doing. First create the function:
IF OBJECT_ID('dbo.SplitStrings') IS NOT NULL
DROP FUNCTION dbo.SplitStrings;
GO
CREATE FUNCTION dbo.SplitStrings(#List NVARCHAR(MAX))
RETURNS TABLE
AS
RETURN ( SELECT Item FROM
( SELECT Item = x.i.value('(./text())[1]', 'nvarchar(max)')
FROM ( SELECT [XML] = CONVERT(XML, '<i>'
+ REPLACE(#List, ' ', '</i><i>') + '</i>').query('.')
) AS a CROSS APPLY [XML].nodes('i') AS x(i) ) AS y
WHERE Item IS NOT NULL
);
GO
Then a simple script that shows how to perform the matching:
DECLARE #foo TABLE
(
id INT,
[description] NVARCHAR(450)
);
INSERT #foo VALUES
(1,N'McDonalds fast food'),
(2,N'healthy food'),
(3,N'fast food restaurant'),
(4,N'Italian restaurant'),
(5,N'Spike''s Junkyard Dogs');
DECLARE #searchstring NVARCHAR(255) = N'fast food restaurant';
SELECT x.id, x.[description]--, MatchCount = COUNT(s.Item)
FROM
(
SELECT f.id, f.[description]
FROM #foo AS f
-- pretend this actually does full-text search:
--where (FREETEXT(description,#strsearch))
-- and ignore how I actually matched:
INNER JOIN dbo.SplitStrings(#searchstring) AS s
ON CHARINDEX(s.Item, f.[description]) > 0
GROUP BY f.id, f.[description]
) AS x
INNER JOIN dbo.SplitStrings(#searchstring) AS s
ON CHARINDEX(s.Item, x.[description]) > 0
GROUP BY x.id, x.[description]
ORDER BY COUNT(s.Item) DESC, [description];
Results:
id description
-- -----------
3 fast food restaurant
1 McDonalds fast food
2 healthy food
4 Italian restaurant