MySQL substring match using regular expression; substring contain 'man' not 'woman' - mysql

I have an issue while I fetch data from database using regular expression. While I search for 'man' in tags it returns tags contains 'woman' too; because its substring.
SELECT '#hellowomanclothing' REGEXP '^(.)*[^wo]man(.)*$'; # returns 0 correct, it contains 'woman'
SELECT '#helloowmanclothing' REGEXP '^(.)*[^wo]man(.)*$'; # returns 0 incorrect, it can contain anything other than 'woman'
SELECT '#stylemanclothing' REGEXP '^(.)*[^wo]man(.)*$'; # returns 1 correct
How can I update the regular expression, when I search for 'man' it should return only the tag contains 'man' not 'woman'?

You can use two expressions. I think like is sufficient:
SELECT ('#stylemanclothing' like '%man%' and '#stylemanclothing' not like '%woman%')
Although you can express this in a regular expression, this is probably the easier solution.

Use this:
SELECT '#helloowmanclothing' REGEXP '^(.)*([^o]|[^w]o)man(.)*$'
In your pattern [^wo] stands for "one character except for w and o", while you need to exclude two consecutive characters - w and then o.
Therefore above pattern allows for o before man only if o is preceeded by character other than w.

A variant of n-dru pattern since you don't need to describe all the string:
SELECT '#hellowomanclothing' REGEXP '(^#.|[^o]|[^w]o)man';
Note: if a tag contains 'man' and 'woman' this pattern will return 1. If you don't want that Gordon Linoff solution is what you are looking for.

Related

MySQL regex matching rows containing two or more spaces in a row

I am trying to write a MySQL statement which finds and returns the book registrations that contain 2 or more spaces in a row.
The statement below is wrong.
SELECT * FROM book WHERE titles REGEXP '[:space]{2,}';
Since the 2 spaces already meet your condition, you really do not need to check if there are more than 2. Moreover, if you need to match a regular ASCII space (decimal code 32), you do not need a REGEXP operator, you can safely use
SELECT * FROM book WHERE titles LIKE '% %';
LIKE is preferred in all cases where you can use it instead of REGEXP (see MySQL | REGEXP VS Like)
When you need to match numerous whitespace symbols, you can use WHERE titles REGEXP '[[:space:]]{2}' (it will match [ \t\r\n\v\f]), and if you only plan to match tabs and spaces, use WHERE titles REGEXP '[[:blank:]]{2}'. For more details, see POSIX Bracket Expressions.
Note that [:class_name:] should only be used inside a character class (i.e. inside another pair of [...], otherwise, they are not recognized.
Your POSIX class must be,
SELECT * FROM book WHERE titles REGEXP '[[:space:]]{2,}';
No need for ,
SELECT * FROM book WHERE titles REGEXP '[[:space:]]{2}';
You may also use [[:blank:]]
SELECT * FROM book WHERE titles REGEXP '[[:blank:]]{2}';
If you mean just the space character: REGEXP ' '. Or you could use LIKE "% %", which would be faster. (Note: there are 2 blanks in those.)
Otherwise, see http://dev.mysql.com/doc/refman/5.6/en/regexp.html for blank and space.

Using REGEXP within MySQL to find a certain number within a comma separated list

I have a list of numbers in some fields in a table, for example something like this:
2033,1869,1914,1913,19120,1911,1910,1909,1908,1907,1866,1921,1922,1923
Now, I'm trying to do a query to check if a number is found in the row, however, I can't use LIKE as then it may return false positives as if I did a search for 1912 in the above field I would get a result returned because of the number 19120, obviously we don't want that - we can't append or prepend a comma as the start/end numbers don't have them.
So, onto using REGEXP I go... I tried this, but it doesn't work (it returns a result):
SELECT * FROM cat_listing WHERE cats REGEXP '[^0-9]*1912[^0-9]*';
I imagine why it still finds something is because of the * quantifier; it found [^0-9] 0 times AFTER 1912 so it considers it a match.
I'm not sure how to modify it to do what I want.
In your case, it seems word boundaries are necessary:
SELECT * FROM cat_listing WHERE cats REGEXP '[[:<:]]1912[[:>:]]';
[[:<:]] is the beginning of a word and [[:>:]] is the end. See reference:
[[:<:]], [[:>:]]
These markers stand for word boundaries. They match the beginning and end of >words, respectively. A word is a sequence of word characters that is not >preceded by or followed by word characters. A word character is an alphanumeric >character in the alnum class or an underscore (_).
You have another option called find_in_set()
SELECT * FROM cat_listing WHERE find_in_set('1912', cats) <> 0;
Returns 0 if str is not in strlist or if strlist is the empty string. Returns NULL if either argument is NULL. This function does not work properly if the first argument contains a comma (“,”) character.
No need to use a regex just because the column value has no comma at either end:
SELECT
cats
FROM cat_listing
WHERE INSTR(CONCAT(',', cats, ','), ',1912,')
;
See it in action: SQL Fiddle.
Please comment if adjustment / further detail is required.

How do you find words with hyphens in a MYSQL REGEXP query using word boundries?

I have a MYSQL query to try to find words with hyphens. I am using the MYSQL word boundary.
SELECT COUNT(id)
AS count
FROM table
WHERE (name REGEXP '^[[<:]]some-words-with-hyphens[[:>:]]/')
This seems to work, although the following does not (see the - after the word "hyphens"):
SELECT COUNT(id)
AS count
FROM table
WHERE (words REGEXP '^[[<:]]some-words-with-hyphens-[[:>:]]/')
I tried to escape the -'s with \- but that did not seem to change the result. I also tried to put the - in brackets like [-], but that did not seem to change the result.
What would be the proper way to write this query with the understanding that hyphens will be within and possibly at the end of the "word"?
As documented under Regular Expressions:
A regular expression for the REGEXP operator may use any of the following special characters and constructs:
[ deletia ]
[[:<:]], [[:>:]]
These markers stand for word boundaries. They match the beginning and end of words, respectively. A word is a sequence of word characters that is not preceded by or followed by word characters. A word character is an alphanumeric character in the alnum class or an underscore (_).
mysql> SELECT 'a word a' REGEXP '[[:<:]]word[[:>:]]'; -> 1
mysql> SELECT 'a xword a' REGEXP '[[:<:]]word[[:>:]]'; -> 0
Since - and / are both non-word characters, the [[:>:]] construct does not match the point between them.
It's not clear why you're using these constructs at all, as the following ought to do the trick:
words REGEXP '^some-words-with-hyphens-/'
Indeed, it's not clear why you're even using regular expressions in this case, as simple pattern matching can achieve the same:
words LIKE 'some-words-with-hyphens-/%'
Assuming that some-words-with-hyphens is actually a regex and not some verbatim text, you could simply add an optional - at the end of the regex in order to match a trailing dash if it's present:
WHERE (name REGEXP '^[[<:]]some-words-with-hyphens[[:>:]]-?/')
#eggyal has already explained why the word boundary matches before that hyphen.

Regex : string must contain a-z and A-Z and 0-9

I'm using a stored procedure to validate the input parameter. The input parameter must contain a-z and A-Z and 0-9.
for Example:
aS78fhE0 -> Correct
76AfbRZt -> Correct
76afbrzt -> Incorrect(doesn't contain Upper Case A-Z)
asAfbRZt -> Incorrect(doesn't contain Numeric 0-9)
4QA53RZJ -> Incorrect(doesn't contain Lower Case a-z)
what Regular Expression that can validate the input parameter like above example,.?
Many Thanks,Praditha
UPDATEOthers character except Alphanumeric are not allowedI'm Using MySQL version 5
Further from Johns Post and subsequent comments:
The MySql you require would be
SELECT * FROM mytable WHERE mycolumn REGEXP BINARY '[a-z]'
AND mycolumn REGEXP BINARY '[A-Z]'
AND mycolumn REGEXP BINARY '[0-9]'
Add additional
AND mycolum REGEXP BINARY '^[a-zA-Z0-9]+$'
If you only want Alphanumerics in the string
With look-ahead assertion you could do like this:
/^(?=.*[a-z])(?=.*[A-Z])(?=.*[0-9]).*$/
update: It seems mysql doesn't support look around assertions.
You could split it up into 3 separate regex to test for each case.
[a-z], [A-Z], and [0-9]
and the results of those matches together, and you can achieve the result you're looking for.
EDIT:
if you're only looking to match alphanumerics, you should do ^[a-zA-Z0-9]+$ as suggested by Ed Head in the comments
My solution is leads to a long expression becuase i will permutate over all 6 possibilities the found capital letter, small letter and the needed number can be arranged in the string:
^(.*[a-z].*[A-Z].*[0-9].*|
.*[a-z].*[0-9].*[A-Z].*|
.*[A-Z].*[a-z].*[0-9].*|
.*[A-Z].*[0-9].*[a-z].*|
.*[0-9].*[a-z].*[A-Z].*|
.*[0-9].*[A-Z].*[a-z].*)$
Edit: Forgot the .* at the end and at the beginning.
Unfortunately, MySQL does not support lookaround assertions, therefore you'll have to spell it out for the regex engine (assuming that only those characters are legal):
^(
[A-Za-z0-9]*[a-z][A-Za-z0-9]*[A-Z][A-Za-z0-9]*[0-9][A-Za-z0-9]*|
[A-Za-z0-9]*[a-z][A-Za-z0-9]*[0-9][A-Za-z0-9]*[A-Z][A-Za-z0-9]*|
[A-Za-z0-9]*[A-Z][A-Za-z0-9]*[a-z][A-Za-z0-9]*[0-9][A-Za-z0-9]*|
[A-Za-z0-9]*[A-Z][A-Za-z0-9]*[0-9][A-Za-z0-9]*[a-z][A-Za-z0-9]*|
[A-Za-z0-9]*[0-9][A-Za-z0-9]*[a-z][A-Za-z0-9]*[A-Z][A-Za-z0-9]*|
[A-Za-z0-9]*[0-9][A-Za-z0-9]*[A-Z][A-Za-z0-9]*[a-z][A-Za-z0-9]*
)$
or, in MySQL:
SELECT * FROM mytable WHERE mycolumn REGEXP BINARY "^([A-Za-z0-9]*[a-z][A-Za-z0-9]*[A-Z][A-Za-z0-9]*[0-9][A-Za-z0-9]*|[A-Za-z0-9]*[a-z][A-Za-z0-9]*[0-9][A-Za-z0-9]*[A-Z][A-Za-z0-9]*|[A-Za-z0-9]*[A-Z][A-Za-z0-9]*[a-z][A-Za-z0-9]*[0-9][A-Za-z0-9]*|[A-Za-z0-9]*[A-Z][A-Za-z0-9]*[0-9][A-Za-z0-9]*[a-z][A-Za-z0-9]*|[A-Za-z0-9]*[0-9][A-Za-z0-9]*[a-z][A-Za-z0-9]*[A-Z][A-Za-z0-9]*|[A-Za-z0-9]*[0-9][A-Za-z0-9]*[A-Z][A-Za-z0-9]*[a-z][A-Za-z0-9]*)$";
[a-zA-Z0-9]*[a-z]+[a-zA-Z0-9]*[A-Z]+[a-zA-Z0-9]*[0-9]+[a-zA-Z0-9]*|[a-zA-Z0-9]*[a-z]+[a-zA-Z0-9]*[0-9]+[a-zA-Z0-9]*[A-Z]+[a-zA-Z0-9]*|[a-zA-Z0-9]*[A-Z]+[a-zA-Z0-9]*[a-z]+[a-zA-Z0-9]*[0-9]+[a-zA-Z0-9]*|[a-zA-Z0-9]*[A-Z]+[a-zA-Z0-9]*[0-9]+[a-zA-Z0-9]*[a-z]+[a-zA-Z0-9]*|[a-zA-Z0-9]*[0-9]+[a-zA-Z0-9]*[A-Z]+[a-zA-Z0-9]*[a-z]+[a-zA-Z0-9]*|[a-zA-Z0-9]*[0-9]+[a-zA-Z0-9]*[a-z]+[a-zA-Z0-9]*[A-Z]+[a-zA-Z0-9]*

Querying a mysql database fetching words with a regexp

I'm using a regexp for fetching a set of words that accomplish the next syntax:
SELECT * FROM words WHERE word REGEXP '^[dcqaahii]{5}$'
My first impression gave me the sensation that it was good till I realized that some letters were used more than contained in the regexp.
The question is that I want to get all words (i.e. of 5 letters) that can be formed with the letters within the brackets, so if I have two 'a' resulting words can have no 'a', one 'a' or even two 'a', but no more.
What should i add to my regexp for avoiding this?
Thanks in advance.
It would probably be better to retrieve all candidates first and post-process, as others have suggested:
SELECT * FROM words WHERE word REGEXP '^[dcqahi]{5}$'
However, nothing is stopping you from doing multiple REGEXPs. You can select 0, 1, or 2 incidences of the letter 'a' with this grungy expression:
'^[^a]*a?[^a]*a?[^a]*$'
So do the pre-filter first and then combine additional REGEXP requirements with AND:
SELECT * FROM words
WHERE word REGEXP '^[dcqahi]{5}$'
AND word REGEXP '^[^a]*a?[^a]*a?[^a]*$'
AND word REGEXP '^[^i]*i?[^i]*i?[^i]*$'
[edit] As an afterthought, I have inferred that for the non-vowels you also want to restrict to 0 or 1 occurrance. So if that's the case, you'd keep going...
AND word REGEXP '^[^d]*d?[^d]*$'
AND word REGEXP '^[^c]*c?[^c]*$'
AND word REGEXP '^[^q]*q?[^q]*$'
AND word REGEXP '^[^h]*h?[^h]*$'
Yuck.
Only solution I can think of would be to use the above SQL you have to get an initial filtered set of data but then loop through it and further filter with some server side code (PHP etc.) which is better suited to doing that kind of logic.
In regular expressions, square brackets [] are merely a character class, like a list of allowed characters. Specifying the same letter twice within the brackets is therefore redundant.
For example the pattern [sed] will match sed, and seed because e is part of the allowed characters. Specifying a character count afterward in braces {} is merely a total count of characters previously allowed by the character class.
The pattern [sed]{3} therefore will match sed but not seed.
I would recommend moving the logic for testing the validity of words from SQL into your program.