I'm trying to return all rows where some column contains all keywords. Keywords can be in any order and surrounded by anything. I've looked into FULL TEXT searches but none seem to give the kind of control I want. I can do something like this:
SELECT *
FROM articles
WHERE body LIKE '%term1%' AND body LIKE '%term2%' AND body LIKE '%term3'...
But this gets messy with an arbitrary number of search terms. Is there a better way of doing this?
Have a look at Fulltext binary search (on that page search for A Basic Boolean Searching Application).
Try this
SELECT * FROM `articles` WHERE MATCH (body) AGAINST
('term1 +term2 +term3 etc...' IN BOOLEAN MODE)
here '+[keyword]' is the equivalent of AND. if you search a keyword in different columns, so you can; where match (col1,col2,col3 etc...) against ('keyword'). Remember, you need to define a FULLTEXT index that contains which columns will be searched (used in "match (col1,col2...)" part)
Related
I'm building a word unscrambler using MySQL, Think about it like the SCRABBLE game, there is a string which is the letter tiles and the query should return all words that can be constructed from these letters, I was able to achieve that using this query:
SELECT * FROM words
WHERE word REGEXP '^[hello]{2,}$'
AND NOT word REGEXP 'h(.*?h){1}|e(.*?e){1}|l(.*?l){2}|l(.*?l){2}|o(.*?o){1}'
The first part of the query makes sure that the output words are constructed from the letter tiles, the second part takes care of the words occurrences, so the above query will return words like: hello, hell, hole, etc..
My issue is when there is a blank tile (a wildcard), so for example if the string was: "he?lo", the "?" Can be replaced with any letter, so for example it will output: helio, helot.
Can someone suggest any modification on the query that will make it support the wildcards and also takes care of the occurrence. (The blank tiles could be up to 2)
I've got something that comes close. With a single blank tile, use:
SELECT * FROM words
WHERE word REGEXP '^[acre]*.[acre]*$'
AND word not REGEXP 'a(.*?a){1}|r(.*?r){1}|c(.*?c){1}|e(.*?e){1}'
with 2 blank tiles use:
SELECT * FROM words
WHERE word REGEXP '^[acre]*.[acre]*.[acre]*$'
AND word NOT REGEXP 'a(.*?a){1}|r(.*?r){1}|c(.*?c){1}|e(.*?e){1}'
The . in the first regexp allows a character that isn't one of the tiles with a letter on it.
The only problem with this is that the second regexp prevents duplicates of the lettered tiles, but a blank should be allowed to duplicate one of the letters. I'm not sure how to fix this. You could add 1 to the counts in {}, but then it would allow you to duplicate multiple letters even though you only have one blank tile.
A possible starting point:
Sort the letters in the words; sort the letters in the tiles (eg, "ehllo", "acer", "aerr").
That will avoid some of the ORing, but still has other complexities.
If this is really Scrabble, what about the need to attach to an existing letter or letters? And do you primarily want to find a way to use all 7 letters?
I store lyrics of songs and also allow chords to be added by putting them between square brackets (e.g: [Dm7]). Here's an example of lyrics stored in my database:
Left my fear [Dm7]by the side of the [B]road
Hear You[C] speak won't let[E] go
Fall to my knees
...
What I want to do is search for lyrics in songs. For example I might want to search for the lyrics fear by the side . The problem is the [Dm7] in my example above does not allow a simple LIKE search.
Is it possible to do a search (REGEX?) that excludes text such as [Dm7] from a query? If so how? Please note that the chords between the square brackets can vary.
You might like to consider a fulltext index, and then use match() against() in your where clause. Example:
create fulltext index ftx on songs(lyrics);
select *
from songs
where match(lyrics) against('fear by the side');
demo here
The matching is a little fuzzy, and you can't use the boolean mode matching because the chords don't have whitespace on both sides, but the normal mode should be sufficient.
The 'fuzziness' of the match can be used to provide a match ranking - works best on english language, which this seems to be. For example:
select match(lyrics) against('fear by the side') rank,
lyrics from songs
where match(lyrics) against('fear by the side')
order by match(lyrics) against('fear by the side') desc;
Would sort the results by best match, and also return the matching rank.
updated demo
The fulltext index also has a boolean mode, which as the same suggests, can be used to force the results to include or exclude certain words like so:
match(column) against('+word -otherword' in boolean mode) would return all rows for which column contains word but does not have otherword.
your fulltext index can also be multi column, if you desire.
Thanks to #SvenB and his suggestion of this post, this was my answer.
REPLACE(col, SUBSTRING(col, (LOCATE('[', col)), LOCATE(']', col) - (LOCATE('[', col)) + 1), '') LIKE '%fear by the side%'
It's a bit messy but works! I think in the long term FULL TEXT search is the way to go based on others comments.
I have created a full text search index on ClientReference column but it doesn't work if i want to search for characters appearing any where in the string.
String = ' abcdef '
This won't work;
SELECT * FROM Proposals
WHERE CONTAINS([ClientReference], '"*bc*"')
But it works if i use prefix.
SELECT * FROM Proposals
WHERE CONTAINS([ClientReference], '"a*"')
ADDED
Someone has just mentioned that "it is not possible, You can only search based on word but not search based on alphabets within a word."
So why the following works and looks for '223' any where in the string?
select ClientReference1 from ClientReferences
where CONTAINS([ClientReference1], '"*223*"')
If you don't have lots of text and/or lots of rows (millions+), you may be better served just using LIKE instead of CONTAINS.
SELECT * FROM Proposals WHERE ClientReference LIKE '%re%'
I am working on a project.I have a paragraph and I have some tags like C#,mysql,.net,ajax etc.I want to check whether my paragraph contains these tags or not and if it contains which one it contains and how many tags matches.Depending on the number of tags matched I have to give a score.I am not getting how to do this i can't use in clause here neither i can use find_in_set().Please help me how should I achieve this.
You might want to look at the REGEXP feature of mysql:
CREATE TABLE texts (
paragraph text,
FULLTEXT INDEX( paragraph ))
engine = myisam;
INSERT INTO texts ( paragraph ) values
( "this is a very uninteresting paragraph "),
( "MySQL can be fun and useful, with or without PHP!"),
(" I have misspelled phpone, but I didn't mean the programming language!");
select paragraph FROM texts
where paragraph REGEXP( "[[:<:]]php[[:>:]]");
It's not as efficient as a FULLTEXT search but may better fit your needs. Depends.
MySQL's Full text search seems to be exactly what you're looking for, although by default it might have some troubles searching for C#, because of both the short length and the special character.
Alternatively, you can just use LIKE, to search for paragraph LIKE '%PHP%'.
I've got a field in my table for tags on some content, separated by spaces, and at the moment I'm using:
SELECT * FROM content WHERE tags LIKE '%$selectedtag%'"
But if the selected tag is elephant, it will select content tagged with bigelephant and elephantblah etc...
How do I get it to just select what I want precisely?
SELECT * FROM content WHERE tags RLIKE '[[:<:]]elephant[[:>:]]'
If your table is MyISAM, you can use this:
SELECT * FROM content WHERE MATCH(tags) AGAINST ('+elephant' IN BOOLEAN MODE)
, which can be drastically improved by creating a FULLTEXT index on it:
CREATE FULLTEXT INDEX ix_content_tags ON content (tags)
, but will work even without the index.
For this to work, you should adjust ##ft_min_wold_len to index tags less than 4 characters long if you have any.
You could use MySQL's regular expressions.
SELECT * FROM content WHERE tags REGEXP '(^| )selectedtag($| )'
Be aware, though, that the use of regular expressions adds an overhead and might perform poorly in some circumstances.
Another simple way, if you can alter your database data, is to ensure that there is an empty space before the first tag and after the last one; A little like: " elephant animal ". That way you can use wildcards.
SELECT * FROM content WHERE tags LIKE '% selectedtag %'
I would consider a different design here. This constitutes a many-to-many relationship, so you could have a tags table and a join table. In general, atomicity of data saves you from a lot of headaches.
An added bonus of this approach is that you can rename a tag without having to edit every entry containing that tag.
A: You have to create a separate tags table which points to the content with contentid and contains a keyword, then:
select a.*
from content a,tags b
where a.id=b.contentid
group by a.id;
B: Put a comma between the tags and befor and afther them, like ",bigelephant,elephant,", then use like "%,elephant,%"
WHERE tags LIKE '% '{$selectedtag}' %'
OR tags LIKE ''{$selectedtag}' %'
OR tags LIKE '% '{$selectedtag}''
The '%' is a wildcard in SQl. Remove the wildcards, and you will get precisely what you ask for.
Redo the table design but for now if you use spaces to delimit between tags you COULD do this:
SELECT * FROM content WHERE tags LIKE '% elephant %';
Just make sure that you lead and end with a space as well (or replace the spaces with commas if you're doing it that way)
Again though, the best option is to set up a many-to-many relationship in your database but I suspect you're looking for a quick and dirty one-off fix.