How to find words by partially strings - mysql

I have been trying to solve this problem for hours, but I dont know how to approach it, so I would need a push to a right direction.
I want to create a page where users can find the appropriate word, by providing word length and characters.
For example, user wants to find all the 5 letter words, where the second letter is R and fourth V, like this:
_R_V_
I have a table with column WORDS with words "letter", "moon", "drive", "mrive" and the query should return: "drive" and "mrive".
Is it possible to do it in MySQL?
While I was looking for the direction I found that I should create a trie structure. I dont know how to do that, but I will learn it if there is no easier way.

Yes, you can use LIKE :
SELECT * FROM YourTable t
WHERE t.word_col LIKE '_R_V_'
_ Wildcard stands for any single character. This will also force the string to be 5 characters in length, since % wildcard is not used.
You can find a great explanation about LIKE wildcards in the link above.

Related

MySql Specific Search - Replace String

I need to search words that contain multiple number prefixes.
Example:
0119
0129
0139
0149
But there is other prefixes, 0155859, 0128889
Etc.
If i search 0%9 it'll come up with all the results i don't want, it'll include the 0155859, 0128889 ones
I need to search and list ONLY the ones that have 0119, etc
How do i do it ?
0XX9 ( Where XX is any strings that matches, so 0119, 0129, etc. % Lists all other characters till a 9 appears, i don't want that. )
I'm trying on my english, correct me if i did'nt expressed myself right !
In a LIKE pattern, the _ character matches any single character. So you can do:
WHERE word LIKE '0__9%'
This matches a word that begins with 0, then any two characters, then 9, then anything after that.
My gut feeling at seeing your question was to consider using REGEXP, which is MySQL's regex matching operator. Try the following query:
SELECT *
FROM yourTable
WHERE word REGEXP '0[0-9][0-9]9'
The pattern used would match any word containing a zero, followed by any two numbers, followed by a 9.

Performance of LIKE 'xyz%' v/s LIKE '%xyz'

I was wondering how the LIKE operator actually work.
Does it simply start from first character of the string and try matching pattern, one character moving to the right? Or does it look at the placement of the %, i.e. if it finds the % to be the first character of the pattern, does it start from the right most character and starts matching, moving one character to the left on each successful match?
Not that I have any use case in my mind right now, just curious.
edit: made question narrow
If there is an index on the column, putting constant characters in the front will lead your dbms to use a more efficient searching/seeking algorithm. But even at the simplest form, the dbms has to test characters. If it is able to find it doesn't match early on, it can discard it and move onto the next test.
The LIKE search condition uses wildcards to search for patterns within a string. For example:
WHERE name LIKE 'Mickey%'
will locate all values that begin with 'Mickey' optionally followed by any number of characters. The % is not case sensitive and not accent sensitive and you can use multiple %, for example
WHERE name LIKE '%mouse%'
will return all values with 'mouse' (or 'Mouse' or 'mousé') in it.
The % is inclusive, meaning that
WHERE name like '%A%'
will return all that starts with an 'A', contain 'A' or end with 'A'.
You can use _ (underscore) for any character on a single position:
WHERE name LIKE '_at%'
will give you all values with 'a' as the second letter and 't' as the third. The first letter can be anything. For example: 'Batman'
In T-SQL, if you use [] you can find values in a range.
WHERE name LIKE '[c-f]%'
it will find any value beginning with letter between c and f, inclusive. Meaning it will return any value that start with c, d, e or f. This [] is T-SQL only. Use [^ ] to find values not in a range.
Finding all values that contain a number:
WHERE name LIKE '%[0-9]%'
returns everything that has a number in it. Example: 'Godfather2'
If you are looking for all values with the 3rd position to be a '-' (dash) use two underscores:
WHERE NAME '__-%'
It will return for example: 'Lo-Res'
Finding the values with names ends in 'xyz' use:
WHERE name LIKE '%xyz'
returns anything that ends with 'xyz'
Finding a % sign in a name use brackets:
WHERE name LIKE '%[%]%'
will return for example: 'Top%Movies'
Searching for [ use brackets around it:
WHERE name LIKE '%[[]%'
gives results as: 'New York [NY]'
The database collation's sort order determines both case sensitivety and the sort order for the range of characters. You can optionally use COLLATE to specify collation sort order used by the LIKE operator.
Usually the main performance bottleneck is IO. The efficiency of the LIKE operator can be only important if your whole table fits in the memory otherwise IO will take most of the time.
AFAIK oracle can use indexes for prefix matching. (like 'abc%'), but these index cannot be used for more complex expressions.
Anyway if you have only this kind of queries you should consider using a simple index on the related column. (Probably this is true for other RDBMS's as well.)
Otherwise LIKE operator is generally slow, but most of the RDBMS have some kind of full text searching solution. I think the main reason of the slowness is that LIKE is too general. Usually full text indexes has lots of different options which can tell the database what you really want to search for, and with these additional information the DB can do its task in a more efficient way.
As a rule of thumb I think if you want to search in a text field and you think performance can be an issue, you should consider your RDBMS's full text searching solution, or the real goal is not text searching, but this is some kind of "design side effect", for example xml/json/statuses stored in a field as text, then probably you should consider choosing a more efficient data storing option. (if there is any...)

How do I assign a variable to each letter of a string in MySQL?

I am trying to figure out a way of doing an "anagram" function as a stored procedure on MySQL. Lets say I have a database containing all the words in the dictionary - I want to enter a parameter of some letters as a VARCHAR and get back a list of words which make up an anagram of those letters.
I guess what I'm sort of saying is, how do I run an SQL command to say "Select all words which are the same length as the parameter AND contain each of the letters in the parameter".
I have explored the string functions available (http://www.hscripts.com/tutorials/mysql/string-function.php). I'm sure these can be used in conjunction in some way but can't quite get the syntax right when it gets complicated.
I am new to SQL, and it just seems like the String functions available are very limited. Any help would be greatly appreciated :)
You don't; it's not a sensible thing to ask a relational database to do.
However, if someone was forcing me at gunpoint to implement anagram finding using a relational database, I would denormalize it like this:
word | sorted
-----|-------
bar | abr
bra | abr
keel | eekl
leek | eekl
Where "sorted" consists of all of the letters in "word", sorted using any rule you like as long as it's a total order. You would use something other than SQL to compute that part.
Then you could find anagrams with something like this:
SELECT w2.word AS anagram
FROM words w1
JOIN words w2 ON w1.sorted=w2.sorted
WHERE w1.word = 'leek'
AND w2.word <> w1.word
SQL is probably not the right place to do this, you should do it on the front end.
First of all consider the properties of an anagram, it will be the same length as the words in your dictionary. You can start by retrieving those words.
Instead of creating a variable per letter consider using an array
Each letter maps to an index (a=0, b=3, etc...). Each time you run into that letter increase the value for that bucket so for the word "dad" you'll end up with a structure that looks like this:
arr[0]=1, arr[1]=0, arr[2]=0, arr[3]=2, arr[4]=0 and so on...
Now you can just see if your words match each item in the array.
While not impossible in SQL, you can represent that kind of logic in the database, for example another table that will have a reference to the dictionary word and each tuple would be the array, then you can just retrieve all the items with the same values.

Punctuation insensitive search in mySQL

I have a database of phrases that users will search for from their own input. I want them to find the phrase regardless of what punctuation they use. For example if the phrase, "Hey, how are you?" is in the row, I want all of the following searches to return it:
"Hey! How are you?!"
"Hey how are you?"
"Hey :) How are you?"
Right now, I have the columns 'phrase' and 'phrase_search'. Phrase search is a stripped down version of phrase so our example would be 'hey-how-are-you'.
Is there anyway to achieve this without storing the phrase twice?
Thank you!
-Nicky
What you've done is probably the most time-efficient way of doing it. Yes, it requires double the space, but is that an issue?
If it is an issue, a possible solution would be to convert your search string to use wildcards (eg. %Hey%how%are%you%) and then filter the SQL results in your code by applying the same stripping function to the database input and the search string and comparing them. The rationale behind this is that there should be relatively few matches with non-punctuation characters in-between the words, so you're still getting MySQL to do the "heavy lifting" while your PHP/Perl/Python/whatever code can do a more fine-grained check on a relatively small number of rows.
(This assumes that you have some code calling this, rather than a user typing the SQL query from the command line, of course.)

Regexp MySql- Only strings containing two words

I have table with rows of strings.
I'd like to search for those strings that consists of only
two words.
I tried few ways with [[:space:]] etc but mysql was returning
three, four word strings also
try this:
select * from yourTable WHERE field REGEXP('^[[:alnum:]]+[[:blank:]]+[[:alnum:]]+$');
more details in link :
http://dev.mysql.com/doc/refman/5.1/en/regexp.html
^\w+\s\w+$ should do well.
Note; what I experience more often in the last days is that close to nobody uses the ^$-operators.
They are absolutely needed if you want to tell if a string starts or ends with something or want to match the string exactly, word for word, as you. "Normal" strings, like you used (I assume you used something like \w[:space]\w match in the string, what means that they also match if the condition is true anywhere within the string!
Keep that in mind and Regex will serve you well :)
REGEXP ('^[a-z0-9]*[[:space:]][a-z0-9]*$')