I have a table which features 37 columns, of various types, INTs and VARCHARS and 2 LONGTEXT columns. The client wants a search to search the table and find the rows that match.
However, I'm having trouble with this. Here is what I've done so far:
1) My initial view was to do a massive set of OR queries - however I was put off this by the fact I would need to supply the search data ~30 times, which is massive repetition and I'm sure there;s a better way than this.
Code:
SELECT
MemberId,
MemNameTitle,
MemSName,
MemFName,
MemPostcode,
MemEmail
FROM
MemberData
WHERE
MemFName LIKE CONCAT('%',?,'%') OR
MemSName LIKE CONCAT('%',?,'%') OR
MemAddr LIKE CONCAT('%',?,'%') OR
MemPostcode LIKE CONCAT('%',?,'%') OR
MemEmail LIKE CONCAT('%',?,'%')
...etc...
Etc. Etc. That's a massive set of OR's and really unwieldy.
2) I thought I'd try and rework it to place all the columns in brackets and then only ask the query once, I saw a similar piece of code on SO but not sure that was correctly working, but it was an insprition, at least:
SELECT
MemberId,
MemNameTitle,
MemSName,
MemFName,
MemPostcode,
MemEmail
FROM
MemberData
WHERE
(MemNameTitle OR
MemFName OR
MemSName OR
MemAddr OR
MemPostcode OR
MemEmail OR
MemSkype OR
MemLinkedIn OR
MemFacebook OR
MemEmailTwo ...etc...) LIKE CONCAT('%',?,'%')
GROUP BY
MemberId
This code executes without apparent error but fails as it always returns no result, as in 0 fields returned. I can't see why, from an initial view,
3) So, with some research on OS I found a rearrangement using the IN keyword, but from previous questions on here Is it possible to use LIKE and IN for a WHERE statment? it appeared not to work.
What I wanted to get was something like:
SELECT
MemberId,
MemNameTitle,
MemSName,
MemFName,
MemPostcode,
MemEmail
FROM
MemberData
WHERE
MemNameTitle,
MemFName,
MemSName,
MemAddr,
MemPostcode,
MemEmail,
MemSkype,
MemLinkedIn,
...etc ...
MemFax,
MemberStatus,
CommitteeNotes,
SecondAddr,
SecondAddrPostcode IN (LIKE CONCAT('%',?,'%') )
This is crudy syntax but I hope you get the idea I want to get, I want to search many fields for the same value using a LIKE % % clause. Fields are variously TEXT/VARCHAR types.
4) I then looked into MySQL full text searches but this quickly became useless as this is only applied to TEXT type rather than VARCHAR type searching. I considered before each search changing each VARCHAR column to a TEXT column but figured that was also be relatively processor intensive and seemed illogical for a search that many people must want to do?
So, I'm out of ideas..... Can you help me search this way? Or suggest why my code in attempt 2 always returns Zero rows?
Cheers
Additional Work:
5) I have been looking at rearranging the IN clause statement and came up with this:
SELECT *(lazy typing!) WHERE
CONCAT('%',?,'%') IN
(MemNameTitle,
MemFName,
MemSName,
MemAddr,
MemPostcode,
MemEmail,
MemSkype,
...etc...
CommitteeNotes,
SecondAddr,
SecondAddrPostcode)
GROUP BY MemberId
However this returns a result, but the result is always the last row of the table. This doesn't work.
Solution 1:
From Ravinder, using CONCAT_WS for all the fields - this works in my case, although something in my mind does find CONCATs somewhat ugly, but oh well.
SELECT * FROM MemberData WHERE
CONCAT_WS('<*!*>',
MemNameTitle, MemFName, MemSName,
MemAddr, MemPostcode, MemEmail,
MemSkype, MemLinkedIn,
...etc...
MemberStatus, CommitteeNotes, SecondAddr,
SecondAddrPostcode)
LIKE CONCAT('%',?,'%')
GROUP BY MemberId ";
The table will eventually have a few thousand rows, and I am a little worried that as this query will concat 24 columns for each row on the table for each search, that this could easily become quite expensive and inefficient (ie slow), so if anyone has any ways of either
i) searching without CONCAT columns or
ii) making this solution faster/ more efficient
please share!!
There is a workaround solution. But I feel this is too crude and performance may not be that good.
where
concat_ws( "<*!*>", col1, col2, col3, ... ) like concat( '%', ?, '%' )
Here, I used '<*!*>' just as an example separator.
You have to use a pattern string as separator which, you are sure that,
is not part of the place holder value or
is not part of the generated string when 2 or more columns are
concatenated
Refer to Documentation:
MySQL: CONCAT_WS(separator,str1,str2,...)
It won't skip empty column values but NULLs.
One rather ugly way to do it would be
SELECT
MemberId,
MemNameTitle,
MemSName,
MemFName,
MemPostcode,
MemEmail
FROM
MemberData
WHERE
CONCAT(
MemNameTitle,
MemFName,
MemSName,
MemAddr,
MemPostcode,
MemEmail,
MemSkype,
MemLinkedIn,
...etc ...
MemFax,
MemberStatus,
CommitteeNotes,
SecondAddr,
SecondAddrPostcode) LIKE CONCAT('%',?,'%')
so you first concatenate all the columns you want to search and then look in the resulting big string for your text.
But i guess you can see that this is far from performant and optimal. But since you are using the % sign in the beginning and end of your searches, you couldn't use any indexes anyway.
Warning:
Be aware that this CONCAT may fail in case one of your columns contains a null value, because then the whole CONCAT will return null!
Related
I tried something out. Here is a simple example in SQL Fiddle: Example
There is a column someNumbers (comma-seperated numbers) and I tried to get all the rows where this column contains a specific number. Problem is, the result only contains rows where someNumbers starts with the specific number.
The query SELECT * FROM myTable where 2 in ( someNumbers ) only returns the row with id 2 and not the row with id 1.
Any suggestions? Thank you all.
You are storing data in the wrong format! You should not be storing multiple values in a single string column. You should not be storing numbers as strings. Instead, you should have a junction table with one row per id and per number.
Sometimes, you just have no choice, because someone else created a really poorly designed database. For these situations, MySQL has the function find_in_set():
SELECT *
FROM myTable
WHERE find_in_set(2, someNumbers ) > 0;
The right solution, however, is to fix the data model.
While Gordon's answer is a good one, here is a way to do this with like
SELECT * FROM myTable where someNumbers like '2,%' or someNumbers like '%,2,%' or someNumbers like '%,2'
The first like checks if your array starts with the number you are looking for (2). The second one checks if 2 is within the array and the last like tests for appearance at the end.
Note that the commas are essential here, because something like '%2%' would also match ...,123,...
EDIT: As suggested by the OP it may happen that only a single value is present in the row. Consequently, the query must check this case by doing ... someNumbers = '2'
I would suggest this query :
SELECT * FROM myTable where someNumbers like '%2%'
It will select every entry where someNumbers contains '2'
Select * from table_name where coloumn_name IN(value,value,value)
you can use it
I have a field called 'areasCovered' in a MySQL database, which contains a string list of postcodes.
There are 2 rows that have similar data e.g:
Row 1: 'B*,PO*,WA*'
Row 2: 'BB*, SO*, DE*'
Note - The strings are not in any particular order and could change depending on the user
Now, if I was to use a query like:
SELECT * FROM technicians WHERE areasCovered LIKE '%B*%'
I'd like it to return JUST Row 1. However, it's returning Row 2 aswell, because of the BB* in the string.
How could I prevent it from doing this?
The key to using like in this case is to include delimiters, so you can look for delimited values:
SELECT *
FROM technicians
WHERE concat(', ', areasCovered, ', ') LIKE '%, B*, %'
In MySQL, you can also use find_in_set(), but the space can cause you problems so you need to get rid of it:
SELECT *
FROM technicians
WHERE find_in_set('B', replace(areasCovered, ', ', ',') > 0
Finally, though, you should not be storing these types of lists as strings. You should be storing them in a separate table, a junction table, with one row per technician and per area covered. That makes these types of queries easier to express and they have better performance.
You are searching wild cards at the start as well as end.
You need only at end.
SELECT * FROM technicians WHERE areasCovered LIKE 'B*%'
Reference:
Normally I hate REGEXP. But ho hum:
SELECT * FROM technicians
WHERE concat(",",replace(areasCovered,", ",",")) regexp ',B{1}\\*';
To explain a bit:
Get rid of the pesky space:
select replace("B*,PO*,WA*",", ",",");
Bolt a comma on the front
select concat(",",replace("B*,PO*,WA*",", ",","));
Use a REGEX to match "comma B once followed by an asterix":
select concat(",",replace("B*,PO*,WA*",", ",",")) regexp ',B{1}\\*';
I could not check it on my machine, but it's should work:
SELECT * FROM technicians WHERE areasCovered <> replace(areaCovered,',B*','whatever')
In case the 'B*' does not exist, the areasCovered will be equal to replace(areaCovered,',B*','whatever'), and it will reject that row.
In case the 'B*' exists, the areCovered will NOT be eqaul to replace(areaCovered,',B*','whatever'), and it will accept that row.
You can Do it the way Programming Student suggested
SELECT * FROM technicians WHERE areasCovered LIKE 'B*%'
Or you can also use limit on query
SELECT * FROM technicians WHERE areasCovered LIKE '%B*%' LIMIT 1
%B*% contains % on each side which makes it to return all the rows where value contains B* at any position of the text however your requirement is to find all the rows which contains values starting with B* so following query should do the work.
SELECT * FROM technicians WHERE areasCovered LIKE 'B*%'
First off there seems to be no way to get an exact match using a full-text search. This seems to be a highly discussed issue when using the full-text search method and there are lots of different solutions to achieve the desired result, however most seem very inefficient. Being I'm forced to use full-text search due to the volume of my database I recently had to implement one of these solutions to get more accurate results.
I could not use the ranking results from the full-text search because of how it works. For instance if you searched for a movie called Toy Story and there was also a movie called The Story Behind Toy Story that would come up instead of the exact match because it found the word Story twice and Toy.
I do track my own rankings which I call "Popularity" each time a user access a record the number goes up. I use this datapoint to weight my results to help determine what the user might be looking for.
I also have the issue where sometimes need to fall back to a LIKE search and not return an exact match. I.e. searching Goonies should return The Goonies (most popular result)
So here is an example of my current stored procedure for achieving this:
DECLARE #Title varchar(255)
SET #Title = '"Toy Story"'
--need to remove quotes from parameter for LIKE search
DECLARE #Title2 varchar(255)
SET #Title2 = REPLACE(#title, '"', '')
--get top 100 results using full-text search and sort them by popularity
SELECT TOP(100) id, title, popularity As Weight into #TempTable FROM movies WHERE CONTAINS(title, #Title) ORDER BY [Weight] DESC
--check if exact match can be found
IF EXISTS(select * from #TempTable where Title = #title2)
--return exact match
SELECT TOP(1) * from #TempTable where Title = #title2
ELSE
--no exact match found, try using like with wildcards
SELECT TOP(1) * from #TempTable where Title like '%' + #title2 + '%'
DROP TABLE #TEMPTABLE
This stored procedure is executed about 5,000 times a minute, and crazy enough it's not bringing my server to it's knees. But I really want to know if there was a more efficient approach to this? Thanks.
You should use full text search CONTAINSTABLE to find the top 100 (possibly 200) candidate results and then order the results you found using your own criteria.
It sounds like you'd like to ORDER BY
exact match of the phrase (=)
the fully matched phrase (LIKE)
higher value for the Popularity column
the Rank from the CONTAINSTABLE
But you can toy around with the exact order you prefer.
In SQL that looks something like:
DECLARE #title varchar(255)
SET #title = '"Toy Story"'
--need to remove quotes from parameter for LIKE search
DECLARE #title2 varchar(255)
SET #title2 = REPLACE(#title, '"', '')
SELECT
m.ID,
m.title,
m.Popularity,
k.Rank
FROM Movies m
INNER JOIN CONTAINSTABLE(Movies, title, #title, 100) as [k]
ON m.ID = k.[Key]
ORDER BY
CASE WHEN m.title = #title2 THEN 0 ELSE 1 END,
CASE WHEN m.title LIKE #title2 THEN 0 ELSE 1 END,
m.popularity desc,
k.rank
See SQLFiddle
This will give you the movies that contain the exact phrase "Toy Story", ordered by their popularity.
SELECT
m.[ID],
m.[Popularity],
k.[Rank]
FROM [dbo].[Movies] m
INNER JOIN CONTAINSTABLE([dbo].[Movies], [Title], N'"Toy Story"') as [k]
ON m.[ID] = k.[Key]
ORDER BY m.[Popularity]
Note the above would also give you "The Goonies Return" if you searched "The Goonies".
If got the feeling you don't really like the fuzzy part of the full text search but you do like the performance part.
Maybe is this a path: if you insist on getting the EXACT match before a weighted match you could try to hash the value. For example 'Toy Story' -> bring to lowercase -> toy story -> Hash into 4de2gs5sa (with whatever hash you like) and perform a search on the hash.
In Oracle I've used UTL_MATCH for similar purposes. (http://docs.oracle.com/cd/E11882_01/appdev.112/e25788/u_match.htm)
Even though using the Jaro Winkler algorithm, for instance, might take awhile if you compare the title column from table 1 and table 2, you can improve performance if you partially join the 2 tables. I have in some cases compared person names on table 1 with table 2 using Jaro Winkler, but limited results not just above a certain Jaro Winkler threshold, but also to names between the 2 tables where the first letter is the same. For instance I would compare Albert with Aden, Alfonzo, and Alberto, using Jaro Winkler, but not Albert and Frank (limiting the number of situations where the algorithm needs to be used).
Jaro Winkler may actually be suitable for movie titles as well. Although you are using SQL server (can't use the utl_match package) it looks like there is a free library called "SimMetrics" which has the Jaro Winkler algorithm among other string comparison metrics. You can find detail on that and instructions here: http://anastasiosyal.com/POST/2009/01/11/18.ASPX?#simmetrics
I have a table call objects which there are the columns:
object_id,
name_english(vchar),
name_japanese(vchar),
name_french(vchar),
object_description
for each object.
When a user perform a search, they may enter either english, japanese or french... and my sql statement is:
SELECT
o.object_id,
o.name_english,
o.name_japanese,
o.name_french,
o.object_description
FROM
objects AS o
WHERE
o.name_english LIKE CONCAT('%',:search,'%') OR
o.name_japanese LIKE CONCAT('%',:search,'%') OR
o.name_french LIKE CONCAT('%',:search,'%')
ORDER BY
o.name_english, o.name_japanese, o.name_french ASC
And some of the entries are like:
Tin spoon,
Tin Foil,
Doctor Martin Shoes,
Martini glass,
Cutting board,
Ting Soda.
So, when the user search the word "Tin" it will return all results of these, but instead I just want to return the results which specific include the term "Tin" or displaying the result and rank them by relevance order. How can I achieve that?
Thanks.
You can use MySQL FULLTEXT indices to do that. This requires the MyISAM table type, an index on (name_english, name_japanese, name_french, object_description) or whatever fields you want to search on, and the appropriate use of the MATCH ... AGAINST operator on exactly that set of columns.
See the manual at http://dev.mysql.com/doc/refman/5.5/en/fulltext-search.html, and the examples on the following page http://dev.mysql.com/doc/refman/5.5/en/fulltext-natural-language.html
After running the query above , you will get all sort of results including ones that you are not interested, but you can then use regular expressions on the above results(returned by mysql server) set to filter out what u need.
This should do the trick - you may have to filter out duplicates, but the basic idea is obvious.
SELECT
`object`.`object_id`,
`object`.`name_english`,
`object`.`name_japanese`,
`object`.`name_french`,
`object`.`object_info`, 1 as ranking
FROM `objects` AS `object`
WHERE `object`.`name_english` LIKE CONCAT(:search,'%') OR `object`.`name_japanese` LIKE CONCAT(:search,'%') OR `object`.`name_french` LIKE CONCAT(:search,'%')
union
SELECT
`object`.`object_id`,
`object`.`name_english`,
`object`.`name_japanese`,
`object`.`name_french`,
`object`.`object_info`, 10 as ranking
FROM `objects` AS `object`
WHERE `object`.`name_english` LIKE CONCAT('%',:search,'%') OR `object`.`name_japanese` LIKE CONCAT('%',:search,'%') OR `object`.`name_french` LIKE CONCAT('%',:search,'%')
ORDER BY ranking, `object`.`name_english`, `object`.`name_japanese`, `object`.`name_french` ASC
I am trying to write a query to pull all the rows that contain a username from a large list of usernames in a field.
For example, the table contains a column called 'Worklog' which contains comments made by users and their username. I need to search that field for all user names that are contained in a list I have.
I have tried a few different things but can't get anything to work. So far, this is kind of what I have tried:
SELECT *
FROM `JULY2010`
WHERE `WorkLog`
IN (
SELECT CONCAT( '%', `UserName` , '%' )
FROM `OpsAnalyst`
)
The problem is I need to use LIKE because it is searching a large amount of text, but I also have a large list that it is pulling from, and that list needs to be dynamic because the people that work here are changing frequently. Any ideas?
SELECT *
FROM `JULY2010`
WHERE `WorkLog` REGEXP
(SELECT CONCAT( `UserName`, '|')
FROM `OpsAnalyst`)
I slightly modified this and used GROUP_CONCAT() and now my query looks like this:
SELECT *
FROM JULY2010
WHERE `WorkLog`
REGEXP (
SELECT GROUP_CONCAT(`UserName` SEPARATOR '|') FROM `OpsAnalyst`
)
I am now getting a result set, but it seems like it isn't as many results as I should be getting. I'm going to have to look into it a little more to figure out what the problem is