REGEXP in mysql - mysql

I have a database of all english language words. I want to find regular expression for particular alphabets. like for example if i ask for the regular expression for "eta", it should return all the words who have 'e','a' and 't' in any combination. but when I run the following code
SELECT word FROM wordlist
WHERE
word REGEXP 'eta';
it return words like 'beta,caretake,detach etc....'
how to use regexp in its true spirit in mysql.
Thanks in Advance

SELECT word
FROM wordlist
WHERE word REGEXP 'e'
AND word REGEXP 'a'
AND word REFEXP 't'

Try:
SELECT word FROM wordlist WHERE word REGEXP '(e|t|a)'

Related

MySQL 8 + Regex Word Boundaries

I want to searching for the term 'ed' at the start or end of a word, the following SQL statement only matches a exact word match.
SELECT * FROM ul_product
where productname REGEXP '\\bed\\b'
If I do the following it gets results where ed is at the start or end of a word
SELECT * FROM ul_product
where productname REGEXP '(\\bed)|(ed\\b)'
Is this how it's supposed to work?
The description of word boundary and examples online led me to believe statement 1 would produce the results of statement 2.
I can use the statements I've created as is for my 'exact' and 'partial' matching, but is this right?
Regex '\\bed\\b' searches for 'ed' surrounded by word boundaries - in other words it searches for word 'ed'.
On the other end, regex: '(\\bed)|(ed\\b)' searches for either '\\bed' or 'ed\\b' (the pipe character stands for "or" in regexes). So it matches on 'ed' at the beginning of a word or at the end of a word - which seems to be what you want.
Note that the parentheses are not necessary here. You could just write this as:
where productname REGEXP '\\bed|ed\\b'

SQL Query to include/exclude characters

I have a dictionary table 'dictionary', where column 'word' contains the list of all English words.
I want to find all words that contain specific alphabet characters only and exclude the rest of the alphabet. I am able to do so (see the example below), but as you can see, it is downright ugly.
EXAMPLE
Currently to find all words that contain letters 'a', 'b', 'c', 'x', 'y', 'z', but exlude rest of the alphabet letters I do this:
SELECT word
FROM dictionary
WHERE (
word LIKE '%a%'
OR word LIKE '%b%'
OR word LIKE '%c%'
OR word LIKE '%x%'
OR word LIKE '%y%'
OR word LIKE '%z%'
) AND (
word NOT LIKE '%d%'
AND word NOT LIKE '%e%'
AND word NOT LIKE '%f%'
AND word NOT LIKE '%g%'
AND word NOT LIKE '%h%'
AND word NOT LIKE '%i%'
AND word NOT LIKE '%j%'
AND word NOT LIKE '%k%'
AND word NOT LIKE '%l%'
AND word NOT LIKE '%m%'
AND word NOT LIKE '%n%'
AND word NOT LIKE '%o%'
AND word NOT LIKE '%p%'
AND word NOT LIKE '%q%'
AND word NOT LIKE '%r%'
AND word NOT LIKE '%s%'
AND word NOT LIKE '%t%'
AND word NOT LIKE '%u%'
AND word NOT LIKE '%v%'
AND word NOT LIKE '%w%' )
Any way to accomplish this task using some form of regex or other optimization?
Any tricks or hints would be much appreciated.
You can achieve it using REGEXP
SELECT `word `
FROM `dictionary`
WHERE `word` REGEXP '[abcxyzABCXYZ]'
AND `word` NOT REGEXP '[defghijklmnopqrstuvwDEFGHIJKLMNOPQRSTUVW]'
This is a better approach.
SELECT word
FROM dictionary
WHERE word REGEXP '.*[abcxyzABCXYZ].*$'
AND word NOT REGEXP '.*[defghijklmnopqrstuvwDEFGHIJKLMNOPQRSTUVW].*$';
Check This.
set #1='a'\\
set #2 ='b'\\
set #3 ='c'\\
set #4 ='x'\\
set #5 ='y'\\
set #6 ='z'\\
set #find=CONCAT(#1,#2,#3,#4,#5,#6)\\
set #ALl='abcxyzdefghijklmnopqrstuvw'\\ all alphabets here
set #ALl=replace(#ALl,#1,'')\\ replace alphabet that you want
set #ALl=replace(#ALl,#2,'')\\
set #ALl=replace(#ALl,#3,'')\\
set #ALl=replace(#ALl,#4,'')\\
set #ALl=replace(#ALl,#5,'')\\
set #ALl=replace(#ALl,#6,'')\\
SELECT word
FROM dictionary
WHERE word RLIKE CONCAT('.*[',#find,'].*$')
AND word NOT RLIKE CONCAT('.*[',#ALl,'].*$');\\
Check Demo Here
If you can't use REGEXP for some reason, then this should work too:
SELECT word
FROM dictionary
WHERE Replace(Replace(Replace(Replace(Replace(Replace(
word, 'a', ''),
'b', ''),
'c', ''),
'x', ''),
'y', ''),
'z', '') = ''
I doubt it will be very fast though. Then again, I'm not sure anything will be fast for this requirement =)
The test is simply:
word REGEXP '^[abcxyz]*$'
That says that everything from start (^) to end ($) must be a string of zero or more (*) of the characters ([]) abcxyz.
I did not include both lower and upper case on the presumption that you are using a case-insensitive collation.
There is no need for the AND .. NOT.
There is a possible problem: If you want to allow notw, do you want to match won't? That is, what should be done about punctuation?
(There may be other edge cases that are not adequately specified in the Question.)
WHERE Field not LIKE '%[^a-z0-9 .]%'

MySQL regex for word boundary containing '#'

I'm trying to search for an example phrase: '#test123' using regex like:
SELECT (...) WHERE x RLIKE '[[:<:]]#test123[[:>:]]'
With no luck. Probably the word boundary selector '[[:<:]]' does not count '#' as a word.
How to achieve it? How to set in MySQL regex word boundary selector but with exceptions?
MySQL 5.7 Reference Manual / ... / Regular Expressions:
[[:<:]], [[:>:]]
These markers stand for word boundaries. They match the beginning and
end of words, respectively. A word is a sequence of word characters
that is not preceded by or followed by word characters. A word
character is an alphanumeric character in the alnum class or an
underscore (_).
So, # is a word boundary, not a word character. We need to expand "word characters" class to include # too. The simplest way is to enumerate custom word characters directly a-z0-9_#:
SELECT * FROM
(
SELECT '#test123' AS x UNION ALL
SELECT 'and #test123 too' UNION ALL
SELECT 'not#test123not' UNION ALL
SELECT 'not#test123' UNION ALL
SELECT '#test123not' UNION ALL
SELECT 'not # test123' UNION ALL
SELECT 'test123' UNION ALL
SELECT '#west123'
) t
WHERE x RLIKE '([^a-z0-9_#]|^)#test123([^a-z0-9_#]|$)';
Result:
x
----------------
#test123
and #test123 too
I think you can use below expression instead:
'[.#.][[:<:]]test123[[:>:]]'
Note: don't use non-word literals inside [[:<:]][[:>:]] and use [..] for characters.
Or (with thanks to #Y.B.)
'(^|.*[^a-zA-Z0-9_])[.#.][[:<:]]test123[[:>:]]'

MySQL REGEXP - Select certain pattern of numbers and characters

Anyone have a clue how I could go about trying to select a certain pattern of numbers with a 1 at the end?
Ex.
SELECT pattern FROM table WHERE pattern REGEXP '1_2+2_2+3_2+4_2&2016-06-09&1';
or
SELECT pattern FROM table WHERE pattern REGEXP '2_1&2016-06-09&1';
using the same number-underscore-number, ampersand, date, ampersand, number; just as long as that number 1 is at the end?
EDIT:
Actually, let me phrase it better. How do I use REGEXP to select an ampersand and the number 1 at the end of a string?
You don't need regex. Just use LIKE:
LIKE '%&1'
The % makes it not be anchored to the start of the string. LIKE is not regex, but closer to a glob syntax. It may be faster than regex, too.
The LIKE operator is used in a WHERE clause to search for a specified pattern in a column.
SELECT column_name
FROM table
WHERE column_name LIKE '%&1';
Note:
You can also use LIKE operator for searching from start not only from end.
Here is an Example.
SELECT column_name
FROM table
WHERE column_name LIKE '&1%';

More efficient word boundary query in mySQL

I have a table with 1/2 million phrases and I am doing word matching using this query:
SELECT * FROM `searchIndex` WHERE `indexData` RLIKE '[[:<:]]Hirt'
The indexData field has a FULLTEXT index and is datatype longtext.
I want to match on items like
"Alois M. Hirt"
"Show Biz - Al Hirt, in a new role, ..."
"Al Hirt's Sinatraville open 9 p..."
"Hirt will be playing..."
and not on "shirt" or "thirteen" or "thirty" etc.
The query is succeeding but it frequently takes 3 seconds to return and I wondered if there was a better, more efficient way of doing this word boundary match?
If I were to add another index to indexData what would be the correct keylength to use?
TIA
No need to have a FULLTEXT index. MySQL has special markers for word boundaries. From the MySQL doc:
[[:<:]], [[:>:]]
These markers stand for word boundaries. They match the beginning and end of words, respectively. A word is a sequence of word characters that is not preceded by or followed by word characters. A word character is an alphanumeric character in the alnum class or an underscore (_).
mysql> SELECT 'a word a' REGEXP '[[:<:]]word[[:>:]]'; -> 1
mysql> SELECT 'a xword a' REGEXP '[[:<:]]word[[:>:]]'; -> 0
setsuna's answer worked very well:
SELECT * FROM searchIndex WHERE MATCH (indexData) AGAINST ('Hirt*' IN BOOLEAN MODE);