NOTICE TO THE MODS: DON'T DELETE/ DON'T CLOSE
I asked this question earlier and the mods closed it because they thought it was similar to a question by another user. I have looked at the thread that they referred me to and it doesn't contain the kind of numeric problems that I have. That thread is How do I match an entire string with a regex?
My Question/Issue:
REGEXP is returning a false positive.
SELECT '123456' REGEXP '[0-9]{1,4}' AS Test;
Based on what I've read, the part with the curly brace {1,4} means minimum of 1 occurrence and no more than 4. But from the above , the occurrence of the range [0-9] is more than 4 and yet the query returns a 1 instead of a 0. I've attached a screenshot. What am I missing? Thanks.
Screenshot of the example in Workbench
SELECT '123456' REGEXP '^[0-9]{1,4}$' AS Test;
By "anchoring" you are asking to match the entire string. The above will fail because of the limit of 4.
SELECT '123456' REGEXP '^[0-9]{1,}$' AS Test;
Passes because it allows at least number of digits.
SELECT 'zzz123456' REGEXP '^[0-9]{1,}$' AS Test; -- Fail
SELECT '123456' REGEXP '^[0-9]*$' AS Test; -- pass
SELECT '' REGEXP '^[0-9]{1,}$' AS Test; -- fail (too short)
SELECT '' REGEXP '^[0-9]+$' AS Test; -- same as {1,}
SELECT 'abc123456def' REGEXP '[0-9]{1,4}' AS Test; -- pass (no anchor)
SELECT 'abc123456def' REGEXP '^[^0-9]+[0-9]{1,4}[^0-9]+$' AS Test; -- fail
SELECT 'abc123456def' REGEXP '[^0-9]*[0-9]+[^0-9]*' AS Test; -- pass
Those last two include [^0-9], which means "anything except 0-9.
Elaboration on ^
At the beginning of the regexp, ^ "anchors" the processing at the beginning: REGEXP "^x" means "starts with x"; REGEXP "x" succeeds if "x" is anywhere in the string.
At the beginning of a "character set", ^ means "not": REGEXP "x[0-9]" looks for x followed immediately by a digit' REGEXP "x[^0-9]" looks for x not immediately followed by a digit.
Related
I tried matching keywords with REGEXP in MySQL as following:
-- Match "fitt*", the asterisk "*" is expected to be matched as-is
> select 'aaaa fitt* bbb' regexp '[[:<:]]fitt\*[[:>:]]'; -- return 1, ok
> select 'aaaa fitttttt* bbb' regexp '[[:<:]]fitt\*[[:>:]]'; -- return 1 as well, but should return 0
> select 'aaaa fitt* bbb' regexp '[[:<:]]fitt\\*[[:>:]]'; -- return 0, failed
How to escape the asterisk (*) in order to exactly match the character *?
\\* is the correct way to match the asterisk. But [[:>:]] won't match after it, because that only matches between a word character and a non-word character, and * is not a word character. Instead, you need to match a non-word character there explicitly. You also need an alternative for the end of line, since that's the other type of word boundary.
> select 'aaaa fitt* bbb' regexp '[[:<:]]fitt\\*([^[:alnum:]]|$)'; -- returns 1
> select 'aaaa fitttttt* bbb' regexp '[[:<:]]fitt\\*([^[:alnum:]]|$)'; -- returns 0
Another way to match the asterisk explicitly is by putting it in a character class.
> select 'aaaa fitt* bbb' regexp '[[:<:]]fitt[*]([^[:alnum:]]|$)'; -- returns 1
> select 'aaaa fitttttt* bbb' regexp '[[:<:]]fitt[*]([^[:alnum:]]|$)'; -- returns 0
There could be 3 problems:
Item 1: The answer to the title question is either of these:
\\* (in the regexp)
[*]
Item 2: \\\\* may be needed if you are coming from some client that first unescapes the backslash before feeding it to MySQL, which still needs a backslash. However, as written (without any client code), \\\\* is treated as zero or more backslashes.
Item 3: #Barmar's answer focused on why [[:>:]] is incorrect.
How would I do the following in mysql?
SELECT * FROM table WHERE search REGEXP '.+season\d+\s?.+' limit 10;
I want to match something like:
"hello this is season1 how are you?"
But not:
"hello this is season1episode1 how are you?
You can use the following regular expression since \d and \s are not available on MySQL. You can use character classes instead.
You can replace \d with [[:digit:]] or [0-9] and \s with [[= =]] or [ ].
SELECT * FROM table WHERE search REGEXP '.+season[[:digit:]]+[[= =]].+' LIMIT 10
-- or...
SELECT * FROM table WHERE search REGEXP '.+season[0-9]+[ ].+' LIMIT 10
demo on dbfiddle.uk
Before MySQL 8.0,
REGEXP "season[0-9]+[[:>:]]"
meaning "season", at least one digit, then a word boundary. Note that it will stop with punctuation.
REGEXP "season[0-9]+[^a-zA-Z]"
Might work for you -- it says that it should be followed by a letter.
8.0 changes the word boundary to:
REGEXP "season[0-9]+\b"
(Caveat: the backslash may need to be doubled up.)
It is necessary to select rows from the table that contain one substring (or substrings) and do not contain others. It is important to make one expression.
Google says that regular expression like ^(?=.*subs1)(?!.*subs2)$ can work but it doesn't work for me (also tested on https://regexr.com/)
For example
SELECT * FROM TABLE WHERE target_string REGEXP "^(?=.*subs1)(?!.*subs2)$"
bla/subs1/bla/bla -> true (return as query result)
bla/subs1/bla/subs2 -> false
bla/bla/subs2/bla -> false
2 conditions in one expression do not work (separately, work)
Thanks for help!
WHERE foo LIKE '%123%'
AND foo NOT LIKE '%234%'
WHERE foo REGEXP '123[^4]' -- rejects '1234', but accepts '123x234' -- OK?
(This should work in all versions of MySQL or MariaDB.)
I tried matching keywords with REGEXP in MySQL as following:
-- Match "fitt*", the asterisk "*" is expected to be matched as-is
> select 'aaaa fitt* bbb' regexp '[[:<:]]fitt\*[[:>:]]'; -- return 1, ok
> select 'aaaa fitttttt* bbb' regexp '[[:<:]]fitt\*[[:>:]]'; -- return 1 as well, but should return 0
> select 'aaaa fitt* bbb' regexp '[[:<:]]fitt\\*[[:>:]]'; -- return 0, failed
How to escape the asterisk (*) in order to exactly match the character *?
\\* is the correct way to match the asterisk. But [[:>:]] won't match after it, because that only matches between a word character and a non-word character, and * is not a word character. Instead, you need to match a non-word character there explicitly. You also need an alternative for the end of line, since that's the other type of word boundary.
> select 'aaaa fitt* bbb' regexp '[[:<:]]fitt\\*([^[:alnum:]]|$)'; -- returns 1
> select 'aaaa fitttttt* bbb' regexp '[[:<:]]fitt\\*([^[:alnum:]]|$)'; -- returns 0
Another way to match the asterisk explicitly is by putting it in a character class.
> select 'aaaa fitt* bbb' regexp '[[:<:]]fitt[*]([^[:alnum:]]|$)'; -- returns 1
> select 'aaaa fitttttt* bbb' regexp '[[:<:]]fitt[*]([^[:alnum:]]|$)'; -- returns 0
There could be 3 problems:
Item 1: The answer to the title question is either of these:
\\* (in the regexp)
[*]
Item 2: \\\\* may be needed if you are coming from some client that first unescapes the backslash before feeding it to MySQL, which still needs a backslash. However, as written (without any client code), \\\\* is treated as zero or more backslashes.
Item 3: #Barmar's answer focused on why [[:>:]] is incorrect.
I am trying to find record with names which have non-alpha numeric characters.
I thought that I could do it with REGEXP
http://dev.mysql.com/doc/refman/5.1/en/regexp.html
Then I referred another SO question
How can I find non-ASCII characters in MySQL?
I found I could use this query :
SELECT * FROM tableName WHERE NOT columnToCheck REGEXP '[A-Za-z0-9]';
But it returns me zero rows . If I replaced the command to :
SELECT * FROM tableName WHERE columnToCheck REGEXP '[A-Za-z0-9]';
It returns me all the rows!!.
I tried some basic commands :
SELECT 'justffalnums' REGEXP '[[:alnum:]]'; returns 1
which is correct but
SELECT 'justff?alnums ' REGEXP '[[:alnum:]]'; also returns 1
I don't understand why it returs one. It should return 0 as it has space and also a '?' .
Is there anything to be enable in mysql for the regexp to work ?
I am using mysql 5.0 and tried with 5.1 too .
You need to add ^ (string begins) and $ (string ends) as well as an operator saying a certain number of alphanum's to use. Below I used + which means one or more.
SELECT 'justff?alnums ' REGEXP '^[[:alnum:]]+$';
-- only contains alphanumns => 0
SELECT 'justff?alnums ' REGEXP '^[[:alnum:]]+';
-- just begins with alphanum => 1
SELECT 'justff?alnums ' REGEXP '[[:alnum:]]+$';
-- just ends with alphanum => 0
The regex that you've given does not say that the entire field has to contain the characters in question. You can use the negation character ^ at the beginning of a character set.
SELECT 'justff?alnums' REGEXP '[^A-Za-z0-9]'; returns 1
SELECT 'justffalnums' REGEXP '[^A-Za-z0-9]'; returns 0