SQL Server Full-Text Search for exact match with fallback - sql-server-2008

First off there seems to be no way to get an exact match using a full-text search. This seems to be a highly discussed issue when using the full-text search method and there are lots of different solutions to achieve the desired result, however most seem very inefficient. Being I'm forced to use full-text search due to the volume of my database I recently had to implement one of these solutions to get more accurate results.
I could not use the ranking results from the full-text search because of how it works. For instance if you searched for a movie called Toy Story and there was also a movie called The Story Behind Toy Story that would come up instead of the exact match because it found the word Story twice and Toy.
I do track my own rankings which I call "Popularity" each time a user access a record the number goes up. I use this datapoint to weight my results to help determine what the user might be looking for.
I also have the issue where sometimes need to fall back to a LIKE search and not return an exact match. I.e. searching Goonies should return The Goonies (most popular result)
So here is an example of my current stored procedure for achieving this:
DECLARE #Title varchar(255)
SET #Title = '"Toy Story"'
--need to remove quotes from parameter for LIKE search
DECLARE #Title2 varchar(255)
SET #Title2 = REPLACE(#title, '"', '')
--get top 100 results using full-text search and sort them by popularity
SELECT TOP(100) id, title, popularity As Weight into #TempTable FROM movies WHERE CONTAINS(title, #Title) ORDER BY [Weight] DESC
--check if exact match can be found
IF EXISTS(select * from #TempTable where Title = #title2)
--return exact match
SELECT TOP(1) * from #TempTable where Title = #title2
ELSE
--no exact match found, try using like with wildcards
SELECT TOP(1) * from #TempTable where Title like '%' + #title2 + '%'
DROP TABLE #TEMPTABLE
This stored procedure is executed about 5,000 times a minute, and crazy enough it's not bringing my server to it's knees. But I really want to know if there was a more efficient approach to this? Thanks.

You should use full text search CONTAINSTABLE to find the top 100 (possibly 200) candidate results and then order the results you found using your own criteria.
It sounds like you'd like to ORDER BY
exact match of the phrase (=)
the fully matched phrase (LIKE)
higher value for the Popularity column
the Rank from the CONTAINSTABLE
But you can toy around with the exact order you prefer.
In SQL that looks something like:
DECLARE #title varchar(255)
SET #title = '"Toy Story"'
--need to remove quotes from parameter for LIKE search
DECLARE #title2 varchar(255)
SET #title2 = REPLACE(#title, '"', '')
SELECT
m.ID,
m.title,
m.Popularity,
k.Rank
FROM Movies m
INNER JOIN CONTAINSTABLE(Movies, title, #title, 100) as [k]
ON m.ID = k.[Key]
ORDER BY
CASE WHEN m.title = #title2 THEN 0 ELSE 1 END,
CASE WHEN m.title LIKE #title2 THEN 0 ELSE 1 END,
m.popularity desc,
k.rank
See SQLFiddle

This will give you the movies that contain the exact phrase "Toy Story", ordered by their popularity.
SELECT
m.[ID],
m.[Popularity],
k.[Rank]
FROM [dbo].[Movies] m
INNER JOIN CONTAINSTABLE([dbo].[Movies], [Title], N'"Toy Story"') as [k]
ON m.[ID] = k.[Key]
ORDER BY m.[Popularity]
Note the above would also give you "The Goonies Return" if you searched "The Goonies".

If got the feeling you don't really like the fuzzy part of the full text search but you do like the performance part.
Maybe is this a path: if you insist on getting the EXACT match before a weighted match you could try to hash the value. For example 'Toy Story' -> bring to lowercase -> toy story -> Hash into 4de2gs5sa (with whatever hash you like) and perform a search on the hash.

In Oracle I've used UTL_MATCH for similar purposes. (http://docs.oracle.com/cd/E11882_01/appdev.112/e25788/u_match.htm)
Even though using the Jaro Winkler algorithm, for instance, might take awhile if you compare the title column from table 1 and table 2, you can improve performance if you partially join the 2 tables. I have in some cases compared person names on table 1 with table 2 using Jaro Winkler, but limited results not just above a certain Jaro Winkler threshold, but also to names between the 2 tables where the first letter is the same. For instance I would compare Albert with Aden, Alfonzo, and Alberto, using Jaro Winkler, but not Albert and Frank (limiting the number of situations where the algorithm needs to be used).
Jaro Winkler may actually be suitable for movie titles as well. Although you are using SQL server (can't use the utl_match package) it looks like there is a free library called "SimMetrics" which has the Jaro Winkler algorithm among other string comparison metrics. You can find detail on that and instructions here: http://anastasiosyal.com/POST/2009/01/11/18.ASPX?#simmetrics

Related

MYSQL REGEX search many words with no order condition

I try to use a regex with mysql that search boundary words in a json array string but I don't want the regex match words order because I don't know them.
So I started firstly to write my regex on regex101 (https://regex101.com/r/wNVyaZ/1) and then try to convert this one for mysql.
WHERE `Wish`.`services` REGEXP '^([^>].*[[:<:]]Hygiène[[:>:]])([^>].*[[:<:]]Radiothérapie[[:>:]]).+';
WHERE `Wish`.`services` REGEXP '^([^>].*[[:<:]]Hygiène[[:>:]])([^>].*[[:<:]]Andrologie[[:>:]]).+';
In the first query I get result, cause "Hygiène" is before "Radiothérapie" but in the second query "Andrologie" is before "Hygiène" and not after like it written in the query. The problem is that the query is generated automatically with a list of services that are choosen with no order importance and I want to match only boundary words if they exists no matter the order they have.
You can search for words in JSON like the following (I tested on MySQL 5.7):
select * from wish
where json_search(services, 'one', 'Hygiène') is not null
and json_search(services, 'one', 'Andrologie') is not null;
+------------------------------------------------------------+
| services |
+------------------------------------------------------------+
| ["Andrologie", "Angiologie", "Hygiène", "Radiothérapie"] |
+------------------------------------------------------------+
See https://dev.mysql.com/doc/refman/5.7/en/json-search-functions.html#function_json-search
If you can, use the JSON search queries (you need a MySQL with JSON support).
If it's advisable, consider changing the database structure and enter the various "words" as a related table. This would allow you much more powerful (and faster) queries.
JOIN has_service AS hh ON (hh.row_id = id)
JOIN services AS ss ON (hh.service_id = ss.id
AND ss.name IN ('Hygiène', 'Angiologie', ...)
Otherwise, in this context, consider that you're not really doing a regexp search, and you're doing a full table scan anyway (unless MySQL 8.0+ or PerconaDB 5.7+ (not sure) and an index on the full extent of the 'services' column), and several LIKE queries will actually cost you less:
WHERE (services LIKE '%"Hygiène"%'
OR services LIKE '%"Angiologie"%'
...)
or
IF(services LIKE '%"Hygiène"%', 1, 0)
+IF(services LIKE '%"Angiologie"%', 1, 0)
+ ... AS score
HAVING score > 0 -- or score=5 if you want only matches on all full five
ORDER BY score DESC;

Searching Database with partial keyword

I'm trying to do a search of all the columns of a specific table and I want to return the result that contains certain characters. For example
Entered Search Value: "Josh"
Output Values: Josh, Joshua, Joshie, Rich Joshua
I want to return all values containing the characters Josh. I'm trying to use FreeTextTable however it only returns exact words like this
Entered Search Value: "Josh"
Output Values: Josh
I'm using the following code.
DECLARE #nameSearch NVARCHAR(100) = 'Josh';
SELECT MAX(KEY_TBL.RANK) as RANK, FT_TBL.ID
FROM Property FT_TBL
INNER JOIN (SELECT Rank, [KEY]
from FREETEXTTABLE(Property, *, #nameSearch)) AS KEY_TBL
ON FT_TBL.ID = KEY_TBL.[KEY]
GROUP BY FT_TBL.ID
I know this will be possible by using LIKE or CONTAINS but I have a lot of rows in that table and it would take time before it returns the result. So I need to use FreeTextTable to get the Rank and Key. However I can't achieve my goal here. I need help. Thanks!

MySQL - How can I perform this string search?

I have a table which features 37 columns, of various types, INTs and VARCHARS and 2 LONGTEXT columns. The client wants a search to search the table and find the rows that match.
However, I'm having trouble with this. Here is what I've done so far:
1) My initial view was to do a massive set of OR queries - however I was put off this by the fact I would need to supply the search data ~30 times, which is massive repetition and I'm sure there;s a better way than this.
Code:
SELECT
MemberId,
MemNameTitle,
MemSName,
MemFName,
MemPostcode,
MemEmail
FROM
MemberData
WHERE
MemFName LIKE CONCAT('%',?,'%') OR
MemSName LIKE CONCAT('%',?,'%') OR
MemAddr LIKE CONCAT('%',?,'%') OR
MemPostcode LIKE CONCAT('%',?,'%') OR
MemEmail LIKE CONCAT('%',?,'%')
...etc...
Etc. Etc. That's a massive set of OR's and really unwieldy.
2) I thought I'd try and rework it to place all the columns in brackets and then only ask the query once, I saw a similar piece of code on SO but not sure that was correctly working, but it was an insprition, at least:
SELECT
MemberId,
MemNameTitle,
MemSName,
MemFName,
MemPostcode,
MemEmail
FROM
MemberData
WHERE
(MemNameTitle OR
MemFName OR
MemSName OR
MemAddr OR
MemPostcode OR
MemEmail OR
MemSkype OR
MemLinkedIn OR
MemFacebook OR
MemEmailTwo ...etc...) LIKE CONCAT('%',?,'%')
GROUP BY
MemberId
This code executes without apparent error but fails as it always returns no result, as in 0 fields returned. I can't see why, from an initial view,
3) So, with some research on OS I found a rearrangement using the IN keyword, but from previous questions on here Is it possible to use LIKE and IN for a WHERE statment? it appeared not to work.
What I wanted to get was something like:
SELECT
MemberId,
MemNameTitle,
MemSName,
MemFName,
MemPostcode,
MemEmail
FROM
MemberData
WHERE
MemNameTitle,
MemFName,
MemSName,
MemAddr,
MemPostcode,
MemEmail,
MemSkype,
MemLinkedIn,
...etc ...
MemFax,
MemberStatus,
CommitteeNotes,
SecondAddr,
SecondAddrPostcode IN (LIKE CONCAT('%',?,'%') )
This is crudy syntax but I hope you get the idea I want to get, I want to search many fields for the same value using a LIKE % % clause. Fields are variously TEXT/VARCHAR types.
4) I then looked into MySQL full text searches but this quickly became useless as this is only applied to TEXT type rather than VARCHAR type searching. I considered before each search changing each VARCHAR column to a TEXT column but figured that was also be relatively processor intensive and seemed illogical for a search that many people must want to do?
So, I'm out of ideas..... Can you help me search this way? Or suggest why my code in attempt 2 always returns Zero rows?
Cheers
Additional Work:
5) I have been looking at rearranging the IN clause statement and came up with this:
SELECT *(lazy typing!) WHERE
CONCAT('%',?,'%') IN
(MemNameTitle,
MemFName,
MemSName,
MemAddr,
MemPostcode,
MemEmail,
MemSkype,
...etc...
CommitteeNotes,
SecondAddr,
SecondAddrPostcode)
GROUP BY MemberId
However this returns a result, but the result is always the last row of the table. This doesn't work.
Solution 1:
From Ravinder, using CONCAT_WS for all the fields - this works in my case, although something in my mind does find CONCATs somewhat ugly, but oh well.
SELECT * FROM MemberData WHERE
CONCAT_WS('<*!*>',
MemNameTitle, MemFName, MemSName,
MemAddr, MemPostcode, MemEmail,
MemSkype, MemLinkedIn,
...etc...
MemberStatus, CommitteeNotes, SecondAddr,
SecondAddrPostcode)
LIKE CONCAT('%',?,'%')
GROUP BY MemberId ";
The table will eventually have a few thousand rows, and I am a little worried that as this query will concat 24 columns for each row on the table for each search, that this could easily become quite expensive and inefficient (ie slow), so if anyone has any ways of either
i) searching without CONCAT columns or
ii) making this solution faster/ more efficient
please share!!
There is a workaround solution. But I feel this is too crude and performance may not be that good.
where
concat_ws( "<*!*>", col1, col2, col3, ... ) like concat( '%', ?, '%' )
Here, I used '<*!*>' just as an example separator.
You have to use a pattern string as separator which, you are sure that,
is not part of the place holder value or
is not part of the generated string when 2 or more columns are
concatenated
Refer to Documentation:
MySQL: CONCAT_WS(separator,str1,str2,...)
It won't skip empty column values but NULLs.
One rather ugly way to do it would be
SELECT
MemberId,
MemNameTitle,
MemSName,
MemFName,
MemPostcode,
MemEmail
FROM
MemberData
WHERE
CONCAT(
MemNameTitle,
MemFName,
MemSName,
MemAddr,
MemPostcode,
MemEmail,
MemSkype,
MemLinkedIn,
...etc ...
MemFax,
MemberStatus,
CommitteeNotes,
SecondAddr,
SecondAddrPostcode) LIKE CONCAT('%',?,'%')
so you first concatenate all the columns you want to search and then look in the resulting big string for your text.
But i guess you can see that this is far from performant and optimal. But since you are using the % sign in the beginning and end of your searches, you couldn't use any indexes anyway.
Warning:
Be aware that this CONCAT may fail in case one of your columns contains a null value, because then the whole CONCAT will return null!

Searching for multiple keywords using SQL Server stored procedure

I'm going to search my database (SQL Server 2008) using a stored procedure. My users can enter keyword(s) in a textbox (keywords can be separated using , for instance).
Currently I'm using something like this:
keyword like N"%'+#SearchQuery%'%"
(keyword is a nvarchar column in my table, and #SearchQuery is the input to my stored procedure)
It works fine but what if user types several keywords: apple,orange, banana
Should I limit number of my keywords? How should I write my stored procedure if I have more than one keyword? How should I pass my user input to the stored procedure? I should pass apple, orange, banana as a whole phrase and then I should parse them in my stored procedure, or I should separate my keywords and send 3 keywords? How can I query these 3 keywords? A for loop?
What are best practices for performing such queries?
thanks
Do the parsing of the keywords in your application. SQL is not the best place for string manipulation.
Send the keywords as a table valued parameter (ie : http://www.mssqltips.com/sqlservertip/2112/table-value-parameters-in-sql-server-2008-and-net-c/ ) then you aren't limited to a fixed number of keywords.
Add the wildcards to the parameter in the stored procedure
update #keywords set keyword = '%'+keyword+'%'
filter your results by joining your source data to this table
eg:
SELECT result
FROM source
INNER JOIN #keywords keywords
ON source.keyword LIKE keywords.keyword
It depends on:
* How big it's your database.
* How often users will search for something.
* How precise results users except.
LIKE is not performance daemon, especially starting with %.
Maybe you should try full search text?
If you would like stay with LIKE (it will works only for small tables) I would try something like:
Split intput by , character (insert them into table as podiluska suggested is a good idea).
Build query for each token and UNION all results. Or run it in loop for each token and insert results to temporary table.
If you need some precise results (i.e. only records matches all 3 words) you can select most matching results from temporary results built above.
You could use CTE to split the string of keywords in a temporary table and then use it as you like. The keyword list can even have numbers or any characters, like %$<> or what you want, just remember comma is the string separator
DECLARE #CommaSeparatorString VARCHAR(MAX),
#CommaSeparatorXML XML
DECLARE #handle INT
SELECT #CommaSeparatorString = 'apple,orange,banana'
SELECT #CommaSeparatorString = REPLACE(REPLACE(#CommaSeparatorString,'<','$^%'),'>','%^$')
SELECT #CommaSeparatorXML = CAST('<ROOT><i>' + REPLACE(#CommaSeparatorString, ',', '</i><i>') + '</i></ROOT>' AS XML)
SELECT REPLACE(REPLACE(c.value('.', 'VARCHAR(100)'),'$^%','<'),'%^$','>') AS ID
FROM (SELECT #CommaSeparatorXML AS CommaXML) a
CROSS APPLY CommaXML.nodes('//i') x(c)
Result:
ID
------
apple
orange
banana

SQL search result ranking

I have a table call objects which there are the columns:
object_id,
name_english(vchar),
name_japanese(vchar),
name_french(vchar),
object_description
for each object.
When a user perform a search, they may enter either english, japanese or french... and my sql statement is:
SELECT
o.object_id,
o.name_english,
o.name_japanese,
o.name_french,
o.object_description
FROM
objects AS o
WHERE
o.name_english LIKE CONCAT('%',:search,'%') OR
o.name_japanese LIKE CONCAT('%',:search,'%') OR
o.name_french LIKE CONCAT('%',:search,'%')
ORDER BY
o.name_english, o.name_japanese, o.name_french ASC
And some of the entries are like:
Tin spoon,
Tin Foil,
Doctor Martin Shoes,
Martini glass,
Cutting board,
Ting Soda.
So, when the user search the word "Tin" it will return all results of these, but instead I just want to return the results which specific include the term "Tin" or displaying the result and rank them by relevance order. How can I achieve that?
Thanks.
You can use MySQL FULLTEXT indices to do that. This requires the MyISAM table type, an index on (name_english, name_japanese, name_french, object_description) or whatever fields you want to search on, and the appropriate use of the MATCH ... AGAINST operator on exactly that set of columns.
See the manual at http://dev.mysql.com/doc/refman/5.5/en/fulltext-search.html, and the examples on the following page http://dev.mysql.com/doc/refman/5.5/en/fulltext-natural-language.html
After running the query above , you will get all sort of results including ones that you are not interested, but you can then use regular expressions on the above results(returned by mysql server) set to filter out what u need.
This should do the trick - you may have to filter out duplicates, but the basic idea is obvious.
SELECT
`object`.`object_id`,
`object`.`name_english`,
`object`.`name_japanese`,
`object`.`name_french`,
`object`.`object_info`, 1 as ranking
FROM `objects` AS `object`
WHERE `object`.`name_english` LIKE CONCAT(:search,'%') OR `object`.`name_japanese` LIKE CONCAT(:search,'%') OR `object`.`name_french` LIKE CONCAT(:search,'%')
union
SELECT
`object`.`object_id`,
`object`.`name_english`,
`object`.`name_japanese`,
`object`.`name_french`,
`object`.`object_info`, 10 as ranking
FROM `objects` AS `object`
WHERE `object`.`name_english` LIKE CONCAT('%',:search,'%') OR `object`.`name_japanese` LIKE CONCAT('%',:search,'%') OR `object`.`name_french` LIKE CONCAT('%',:search,'%')
ORDER BY ranking, `object`.`name_english`, `object`.`name_japanese`, `object`.`name_french` ASC