MySQL: REGEX Replace StackOverflow-style Comments (BOLD text) [duplicate] - mysql

I tried matching keywords with REGEXP in MySQL as following:
-- Match "fitt*", the asterisk "*" is expected to be matched as-is
> select 'aaaa fitt* bbb' regexp '[[:<:]]fitt\*[[:>:]]'; -- return 1, ok
> select 'aaaa fitttttt* bbb' regexp '[[:<:]]fitt\*[[:>:]]'; -- return 1 as well, but should return 0
> select 'aaaa fitt* bbb' regexp '[[:<:]]fitt\\*[[:>:]]'; -- return 0, failed
How to escape the asterisk (*) in order to exactly match the character *?

\\* is the correct way to match the asterisk. But [[:>:]] won't match after it, because that only matches between a word character and a non-word character, and * is not a word character. Instead, you need to match a non-word character there explicitly. You also need an alternative for the end of line, since that's the other type of word boundary.
> select 'aaaa fitt* bbb' regexp '[[:<:]]fitt\\*([^[:alnum:]]|$)'; -- returns 1
> select 'aaaa fitttttt* bbb' regexp '[[:<:]]fitt\\*([^[:alnum:]]|$)'; -- returns 0
Another way to match the asterisk explicitly is by putting it in a character class.
> select 'aaaa fitt* bbb' regexp '[[:<:]]fitt[*]([^[:alnum:]]|$)'; -- returns 1
> select 'aaaa fitttttt* bbb' regexp '[[:<:]]fitt[*]([^[:alnum:]]|$)'; -- returns 0

There could be 3 problems:
Item 1: The answer to the title question is either of these:
\\* (in the regexp)
[*]
Item 2: \\\\* may be needed if you are coming from some client that first unescapes the backslash before feeding it to MySQL, which still needs a backslash. However, as written (without any client code), \\\\* is treated as zero or more backslashes.
Item 3: #Barmar's answer focused on why [[:>:]] is incorrect.

Related

MySQL RegularExpression (Limiter not working)

NOTICE TO THE MODS: DON'T DELETE/ DON'T CLOSE
I asked this question earlier and the mods closed it because they thought it was similar to a question by another user. I have looked at the thread that they referred me to and it doesn't contain the kind of numeric problems that I have. That thread is How do I match an entire string with a regex?
My Question/Issue:
REGEXP is returning a false positive.
SELECT '123456' REGEXP '[0-9]{1,4}' AS Test;
Based on what I've read, the part with the curly brace {1,4} means minimum of 1 occurrence and no more than 4. But from the above , the occurrence of the range [0-9] is more than 4 and yet the query returns a 1 instead of a 0. I've attached a screenshot. What am I missing? Thanks.
Screenshot of the example in Workbench
SELECT '123456' REGEXP '^[0-9]{1,4}$' AS Test;
By "anchoring" you are asking to match the entire string. The above will fail because of the limit of 4.
SELECT '123456' REGEXP '^[0-9]{1,}$' AS Test;
Passes because it allows at least number of digits.
SELECT 'zzz123456' REGEXP '^[0-9]{1,}$' AS Test; -- Fail
SELECT '123456' REGEXP '^[0-9]*$' AS Test; -- pass
SELECT '' REGEXP '^[0-9]{1,}$' AS Test; -- fail (too short)
SELECT '' REGEXP '^[0-9]+$' AS Test; -- same as {1,}
SELECT 'abc123456def' REGEXP '[0-9]{1,4}' AS Test; -- pass (no anchor)
SELECT 'abc123456def' REGEXP '^[^0-9]+[0-9]{1,4}[^0-9]+$' AS Test; -- fail
SELECT 'abc123456def' REGEXP '[^0-9]*[0-9]+[^0-9]*' AS Test; -- pass
Those last two include [^0-9], which means "anything except 0-9.
Elaboration on ^
At the beginning of the regexp, ^ "anchors" the processing at the beginning: REGEXP "^x" means "starts with x"; REGEXP "x" succeeds if "x" is anywhere in the string.
At the beginning of a "character set", ^ means "not": REGEXP "x[0-9]" looks for x followed immediately by a digit' REGEXP "x[^0-9]" looks for x not immediately followed by a digit.

Regex Error in MYSQL

I want to select cities starting with a,e, i,o,u and ending with a,e, i,o,u in MySQL.(Case not matters)
Query1
SELECT CITY FROM STATION WHERE CITY REGEXP '^[AEIOU]' and CITY REGEXP '[AEIOU]$';
Query2
SELECT CITY FROM STATION WHERE CITY REGEXP '^[AEIOU]*[AEIOU]$';
Why Query2 is giving me an error although Query1 is correct.
With your first query, you only fetch entries that start or end with vowels. The second one only matches entries that start with 0 or more vowels and end with a vowel (so, you will get results like a or Aou only).
You might try using
SELECT CITY FROM STATION WHERE CITY REGEXP '^[AEIOU].*[AEIOU]$'
^^
The .* pattern matches any 0+ chars, as many as possible, so it will matching any string that starts AND ends with a vowel.
However, WHERE CITY REGEXP '^[AEIOU]' and CITY REGEXP '[AEIOU]$' fetches entries only consisting of 1 vowel, and the above will not match a record like A (one-vowel string). To match those use an optional group:
SELECT CITY FROM STATION WHERE CITY REGEXP '^[AEIOU](.*[AEIOU])?$'
^ ^^
Here, (...)? is a capturing group (MySQL regex does not support non-capturing ones) that matches a sequence of patterns 1 or 0 times (due to the ? quantifier).
A couple of notes on the regex:
^[AEIOU].*[AEIOU]$ - matches a whole string that starts and ends with a vowel in a case insensitive way (REGEXP is not case sensitive, except when used with binary strings)
^ - matches the start of input
[AEIOU] - a single vowel from the set
.* - any 0+ chars as many as possible (POSIX regex used in MySQL does not support lazy quantifiers, and . matches any chars, even line break chars, too)
[AEIOU] - a vowel
$ - end of input.
^ : Match the beginning of a string.and $ : Match the end of a string.
so you can try with above both regex and also use % , may be helpful.

How to escape asterisk (*) in MySQL in a REGEXP

I tried matching keywords with REGEXP in MySQL as following:
-- Match "fitt*", the asterisk "*" is expected to be matched as-is
> select 'aaaa fitt* bbb' regexp '[[:<:]]fitt\*[[:>:]]'; -- return 1, ok
> select 'aaaa fitttttt* bbb' regexp '[[:<:]]fitt\*[[:>:]]'; -- return 1 as well, but should return 0
> select 'aaaa fitt* bbb' regexp '[[:<:]]fitt\\*[[:>:]]'; -- return 0, failed
How to escape the asterisk (*) in order to exactly match the character *?
\\* is the correct way to match the asterisk. But [[:>:]] won't match after it, because that only matches between a word character and a non-word character, and * is not a word character. Instead, you need to match a non-word character there explicitly. You also need an alternative for the end of line, since that's the other type of word boundary.
> select 'aaaa fitt* bbb' regexp '[[:<:]]fitt\\*([^[:alnum:]]|$)'; -- returns 1
> select 'aaaa fitttttt* bbb' regexp '[[:<:]]fitt\\*([^[:alnum:]]|$)'; -- returns 0
Another way to match the asterisk explicitly is by putting it in a character class.
> select 'aaaa fitt* bbb' regexp '[[:<:]]fitt[*]([^[:alnum:]]|$)'; -- returns 1
> select 'aaaa fitttttt* bbb' regexp '[[:<:]]fitt[*]([^[:alnum:]]|$)'; -- returns 0
There could be 3 problems:
Item 1: The answer to the title question is either of these:
\\* (in the regexp)
[*]
Item 2: \\\\* may be needed if you are coming from some client that first unescapes the backslash before feeding it to MySQL, which still needs a backslash. However, as written (without any client code), \\\\* is treated as zero or more backslashes.
Item 3: #Barmar's answer focused on why [[:>:]] is incorrect.

REGEXP not working in mysql

I am trying to find record with names which have non-alpha numeric characters.
I thought that I could do it with REGEXP
http://dev.mysql.com/doc/refman/5.1/en/regexp.html
Then I referred another SO question
How can I find non-ASCII characters in MySQL?
I found I could use this query :
SELECT * FROM tableName WHERE NOT columnToCheck REGEXP '[A-Za-z0-9]';
But it returns me zero rows . If I replaced the command to :
SELECT * FROM tableName WHERE columnToCheck REGEXP '[A-Za-z0-9]';
It returns me all the rows!!.
I tried some basic commands :
SELECT 'justffalnums' REGEXP '[[:alnum:]]'; returns 1
which is correct but
SELECT 'justff?alnums ' REGEXP '[[:alnum:]]'; also returns 1
I don't understand why it returs one. It should return 0 as it has space and also a '?' .
Is there anything to be enable in mysql for the regexp to work ?
I am using mysql 5.0 and tried with 5.1 too .
You need to add ^ (string begins) and $ (string ends) as well as an operator saying a certain number of alphanum's to use. Below I used + which means one or more.
SELECT 'justff?alnums ' REGEXP '^[[:alnum:]]+$';
-- only contains alphanumns => 0
SELECT 'justff?alnums ' REGEXP '^[[:alnum:]]+';
-- just begins with alphanum => 1
SELECT 'justff?alnums ' REGEXP '[[:alnum:]]+$';
-- just ends with alphanum => 0
The regex that you've given does not say that the entire field has to contain the characters in question. You can use the negation character ^ at the beginning of a character set.
SELECT 'justff?alnums' REGEXP '[^A-Za-z0-9]'; returns 1
SELECT 'justffalnums' REGEXP '[^A-Za-z0-9]'; returns 0

MySQL REGEXP: matching blank entries

I have this SQL condition that is supposed to retrieve all rows that satisfy the given regexp condition:
country REGEXP ('^(USA|Italy|France)$')
However, I need to add a pattern for retrieving all blank country values. Currently I am using this condition
country REGEXP ('^(USA|Italy|France)$') OR country = ""
How can achieve the same effect without having to include the OR clause?
Thanks,
Erwin
This should work:
country REGEXP ('^(USA|Italy|France|)$')
However from a performance point of view, you may want to use the IN syntax
country IN ('USA','Italy','France', '')
The later should be faster as REGEXP can be quite slow.
There's no reason you can't use the $ (match end of string) to fill in your "empty subexpression" issue...
It looks a little weird but country REGEXP ('^(USA|Italy|France|$)$') will actually work
You could try:
country REGEXP ('^(USA|Italy|France|)$')
I just added another | after France, which should would basically tell it to also match ^$ which is the same as country = ''.
Update: since this method doesn't work, I would recommend you use this regex:
country REGEXP ('^(USA|Italy|France)$|^$')
Note that you can't use the regex: ^(USA|Italy|France|.{0})$ because it will complain that there is an empty sub expression. Although ^(USA|Italy|France)$|^.{0}$ would work.
Here are some examples of the return value of this regex:
select '' regexp '^(USA|Italy|France)$|^$'
> 1
select 'abc' regexp '^(USA|Italy|France)$|^$'
> 0
select 'France' regexp '^(USA|Italy|France)$|^$'
> 1
select ' ' regexp '^(USA|Italy|France)$|^$'
> 0
As you can see, it returns exactly what you want.
If you want to treat blank values the same (e.g. 0 spaces and 5 spaces both count as blank), you should use the regex:
country REGEXP ('^(USA|Italy|France|\s*)$')
This will cause the last row in the previous example to behave differently, i.e.:
select ' ' regexp '^(USA|Italy|France|\s*)$'
> 1