Dot character negation in mysql regex search - mysql

I want to use a regualr expression in a mysql search that macth a string and NOT the same string followed by a dot . character.
As an example i want to match the product code 3.4.5.1 but not 3.4.5.1.4 I have a workaround using a double like :
SELECT * from mytable where myclolumn like '%3.4.5.1%' AND mycolumn NOT like '%3.4.5.1.%'....
Not a very elegant solution.
So I was trying to use REGEXP_LIKE(mycolumn,'(3.4.5.1)[^\.]')
In fact I have tested that pattern and works nicely in php and even online : https://regexr.com/66olc
but seems that regex works in their own way in MYSQL because I obtain an empty search result.
What I'm doing wrong?

You can use
([^0-9]|^)3\.4\.5\.1([^.0-9]|$)
See the regex demo. Details:
([^0-9]|^) - a non-digit char ([^0-9]) or (|) start of string (^)
3\.4\.5\.1 - 3.4.5.1 literal string (dots in regex must be escaped with a literal backslash to match a . char, else, . matches any single char)
([^.0-9]|$) - any char other than a . and digit ([^.0-9]), or (|) end of string ($).

"Groups of digits separated by dots":
([0-9]+[.])+[0-9]+
Explanations:
[0-9]+ -- 1 or more digit; you could change "+" to "{1,2}" to say "1 or 2"
[.] -- a single dot; this would work, too: \.
(...)+ -- 1 or more of what's between the parens

Related

Regexp not working with single/double quotes

I have a field called myfield that contains this string:
{'Content-Language': 'en', 'X-Frame-Options': 'SAMEORIGIN', 'X-Generator': 'Drupal 7 (http://drupal.org)', 'Link': '<https://01.org/node>; rel="shortlink"', 'Some-Header-Key': 'max-age=31; addSomething', 'Content-Encoding': 'gzip'}
I want to capture the 'Some-Header-Key': 'max-age=31; addSomething' where:
1) 'Some-Header-Key', max-age are fixed values that should always be present.
2) The addSomething is a optional.
3) There may be one or more spaces between the double colon and the equal sign
4) The general formal is 'key': 'value', with either single or double quotes.
5) The ([^""|'])* to say: zero or more characters that are not single or double quotes. This to capture addSomething.
I wrote this query:
select myfield
from mytable
where mycol regexp "('|"")Some-Header-Key('|"")\s*:\s*('|"")([^""|'])*max-age\s*=\s*[0-9]+([^""|'])*('|"")";
But it does not return anything!! although myfield contains the above example string.
When I copied the field value into an external text file and run the regexp in grep, the regexp captured the string correctly.
What is wrong in MySQL? I use MySQL workbench 8.0 in Ubuntu 18.04.
Your problem is with the \s in your regex expression. Versions of MySQL prior to 8 do not support this notation, you need to use the character class [:blank:] instead i.e.
where mycol regexp "('|"")Some-Header-Key('|"")[[:blank:]]*:[[:blank:]]*('|"")([^""|'])*max-age[[:blank:]]*=[[:blank:]]*[0-9]+([^""|'])*('|"")"
In MySQL 8, you can use \s but you need to escape the backslash as MySQL uses C-style escape syntax in strings, thus \s just translates to s. So change the \s to \\s and it should work:
where mycol regexp "('|"")Some-Header-Key('|"")\\s*:\\s*('|"")([^""|'])*max-age\\s*=\\s*[0-9]+([^""|'])*('|"")"
Demo on dbfiddle
Not single or double quotes: [^'"]
Zero or more of such: [^'"]
Either a single quote or two double quotes: ('|"")
Either a double quote or two single quotes: ("|'')
One of either type of quote: ['"] or ('|")
A single-quoted string: '[^']*'
A double-quoted string: "[^"]*"
Either of the above: ('[^']*'|"[^"]*")
Next problem: How to quote a regexp string: If it contains ' or ", escape that with a backslash:
my_json REGEXP "('[^']*'|\"[^\"]*\")"
If you use something that does "binding" for you, you don't need to do the escaping. PHP has mysqli_real_escape_string and add_slashes.
But... I you are going to use JSON, you should upgrade to MySQL 5.7 or MariaDB 10.2 so you can use JSON functions instead of REGEXP.

MySQL regex matching at least 2 dots

Consider the following regex
#(.*\..*){2,}
Expected behaviour:
a#b doesnt match
a#b.c doesnt match
a#b.c.d matches
a#b.c.d.e matches
and so on
Testing in regexpal it works as expected.
Using it it in a mysql select doesn't work as expected. Query:
SELECT * FROM `users` where mail regexp '#(.*\..*){2,}'
is returning lines like
foo#example.com
that should not match the given regex. Why?
I think the answer to your question is here.
Because MySQL uses the C escape syntax in strings (for example, ā€œ\nā€
to represent the newline character), you must double any ā€œ\ā€ that you
use in your REGEXP strings.
MYSQL Reference
Because your middle dot wasn't properly escaped it was treated as just another wildcard and in the end your expression was effectively collapsed to #.{2,} or #..+
#anubhava's answer is probably a better substitute for what you tried to do though I would note #dasblinkenlight's comment about using the character class [.] which will make it easy to drop in a regex you've already tested in at RegexPal.
You can use:
SELECT * FROM `users` where mail REGEXP '([^.]*\\.){2}'
to enforce at least 2 dots in mail column.
I would match two dots in MySQL using like:
where col like '%#.%.%'
The problem with your code is that .* (match-everything dot) matches dot '.' character. Replacing it with [^.]* fixes the problem:
SELECT *
FROM `users`
where mail regexp '#([^.]*[.]){2,}'
Note the use of [.] in place of the equivalent \.. This syntax makes it easier to embed the regex into programming languages that use backslash as escape character in their string literals.
Demo.

Mysql regex error #1139 using literal -

I tried running this query:
SELECT column FROM table WHERE column REGEXP '[^A-Za-z\-\']'
but this returns
#1139 - Got error 'invalid character range' from regexp
which seems to me like the - in the character class is not being escaped, and instead read as an invalid range. Is there some other way that it's suppose to be escaped for mysql to be the literal -?
This regex works as expected outside of mysql, https://regex101.com/r/wE8vY5/1.
I came up with an alternative to that regex which is
SELECT column FROM table WHERE column NOT REGEXP '([:alpha:]|-|\')'
so the question isn't how do I get this to work. The question is why doesn't the first regex work?
Here's a SQL fiddle of the issue, http://sqlfiddle.com/#!9/f8a006/1.
Also, there is no language being used here, query is being run at DB level.
Regex in PHP: http://sandbox.onlinephpfunctions.com/code/10f5fe2939bdbbbebcc986c171a97c0d63d06e55
Regex in JS: https://jsfiddle.net/6ay4zmrb/
Just change the order.
SELECT column FROM table WHERE column REGEXP '[^-A-Za-z\']'
#Avinash Raj is correct the - must be first (or last). The \ is not an escape character in POSIX, which is what mysql uses, https://dev.mysql.com/doc/refman/5.1/en/regexp.html.
One key syntactic difference is that the backslash is NOT a metacharacter in a POSIX bracket expression.
-http://www.regular-expressions.info/posixbrackets.html
What special characters must be escaped in regular expressions?
Inside character classes, the backslash is a literal character in POSIX regular expressions. You cannot use it to escape anything. You have to use "clever placement" if you want to include character class metacharacters as literals. Put the ^ anywhere except at the start, the ] at the start, and the - at the start or the end of the character class to match these literally

MySQL REGEXP word boundaries [[:<:]] [[:>:]] and double quotes

I'm trying to match some whole-word-expressions with the MySQL REGEXP function. There is a problem, when there are double quotes involved.
The MySQL documentation says: "To use a literal instance of a special character in a regular expression, precede it by two backslash () characters."
But these queries all return 0:
SELECT '"word"' REGEXP '[[:<:]]"word"[[:>:]]'; -> 0
SELECT '"word"' REGEXP '[[:<:]]\"word\"[[:>:]]'; -> 0
SELECT '"word"' REGEXP '[[:<:]]\\"word\\"[[:>:]]'; -> 0
SELECT '"word"' REGEXP '[[:<:]] word [[:>:]]'; -> 0
SELECT '"word"' REGEXP '[[:<:]][[.".]]word[[.".]][[:>:]]'; -> 0
What else can I try to get a 1? Or is this impossible?
Let me quote the documentation first:
[[:<:]], [[:>:]]
These markers stand for word boundaries. They match the beginning and
end of words, respectively. A word is a sequence of word characters
that is not preceded by or followed by word characters. A word
character is an alphanumeric character in the alnum class or an
underscore (_).
From the documentation we can see the reason behind your problem and it is not caused by escaping whatsoever. The problem is that you are trying to match the word boundary [[:<:]] right at the beginning of the string which won't work because a word boundary as you can see from the documentation separates a word character from a non-word character, but in your case the first character is a " which isn't a word character so there is no word boundary, the same goes for the last " and [[:>:]].
In order for this to work, you need to change your expression a bit to this one:
"[[:<:]]word[[:>:]]"
^^^^^^^ ^^^^^^^
Notice how the word boundary separates a non-word character " from a word character w in the beginning and a " from d at the end of the string.
EDIT: If you always want to use a word boundary at the start and end of the string without knowing if there will be an actual boundary then you might use the following expression:
([[:<:]]|^)"word"([[:>:]]|$)
This will either match a word boundary at the beginning or the start-of-string ^ and the same for the end of the word boundary or end-of-string. I really advise you to study the data you are trying to match and look for common patterns and don't use regular expressions if they are not the right tool for the job.
SQL Fiddle Demo
In MySQL up from 8.0.4 use: \\bword\\b
ref. https://dev.mysql.com/doc/refman/8.0/en/regexp.html#regexp-compatibility
In MySQL 8 and above
Adding to Oleksiy Muzalyev's answer
https://dev.mysql.com/doc/refman/8.0/en/regexp.html#regexp-compatibility
In MySQL 8.04 and above, you have to use:
\bword\b
Where \b represents the ICU variant for word boundary. The previous Spencer library uses [[:<:]] to represent a word boundary.
When actually using this as part of a query, I've had to escape the escape character \ so my query actually looked like
SELECT * FROM table WHERE field RLIKE '\\bterm\\b'
When querying from PHP, use SINGLE quotes to do the same thing
$sql = 'SELECT * FROM table WHERE field RLIKE ?';
$args = ['\\bterm\\b'];
...
You need to be a little more sophisticated:
SELECT '"word"' REGEXP '"word"'; --> 1
SELECT '"This is" what I need' REGEXP '"This is" what I need[[:>:]]'; --> 1
That is,
If the test string begins/ends with a 'letter', the precede/follow the string with [[:<:]]/[[:>:]].
This is as opposed to blindly tacking those onto the string. After all, you are already inspecting the search string for special regexp characters to escape them. This is just another task in that vein. The definition of 'letter' should match whatever the word-boundary tokens look for.

Regex : string must contain a-z and A-Z and 0-9

I'm using a stored procedure to validate the input parameter. The input parameter must contain a-z and A-Z and 0-9.
for Example:
aS78fhE0 -> Correct
76AfbRZt -> Correct
76afbrzt -> Incorrect(doesn't contain Upper Case A-Z)
asAfbRZt -> Incorrect(doesn't contain Numeric 0-9)
4QA53RZJ -> Incorrect(doesn't contain Lower Case a-z)
what Regular Expression that can validate the input parameter like above example,.?
Many Thanks,Praditha
UPDATEOthers character except Alphanumeric are not allowedI'm Using MySQL version 5
Further from Johns Post and subsequent comments:
The MySql you require would be
SELECT * FROM mytable WHERE mycolumn REGEXP BINARY '[a-z]'
AND mycolumn REGEXP BINARY '[A-Z]'
AND mycolumn REGEXP BINARY '[0-9]'
Add additional
AND mycolum REGEXP BINARY '^[a-zA-Z0-9]+$'
If you only want Alphanumerics in the string
With look-ahead assertion you could do like this:
/^(?=.*[a-z])(?=.*[A-Z])(?=.*[0-9]).*$/
update: It seems mysql doesn't support look around assertions.
You could split it up into 3 separate regex to test for each case.
[a-z], [A-Z], and [0-9]
and the results of those matches together, and you can achieve the result you're looking for.
EDIT:
if you're only looking to match alphanumerics, you should do ^[a-zA-Z0-9]+$ as suggested by Ed Head in the comments
My solution is leads to a long expression becuase i will permutate over all 6 possibilities the found capital letter, small letter and the needed number can be arranged in the string:
^(.*[a-z].*[A-Z].*[0-9].*|
.*[a-z].*[0-9].*[A-Z].*|
.*[A-Z].*[a-z].*[0-9].*|
.*[A-Z].*[0-9].*[a-z].*|
.*[0-9].*[a-z].*[A-Z].*|
.*[0-9].*[A-Z].*[a-z].*)$
Edit: Forgot the .* at the end and at the beginning.
Unfortunately, MySQL does not support lookaround assertions, therefore you'll have to spell it out for the regex engine (assuming that only those characters are legal):
^(
[A-Za-z0-9]*[a-z][A-Za-z0-9]*[A-Z][A-Za-z0-9]*[0-9][A-Za-z0-9]*|
[A-Za-z0-9]*[a-z][A-Za-z0-9]*[0-9][A-Za-z0-9]*[A-Z][A-Za-z0-9]*|
[A-Za-z0-9]*[A-Z][A-Za-z0-9]*[a-z][A-Za-z0-9]*[0-9][A-Za-z0-9]*|
[A-Za-z0-9]*[A-Z][A-Za-z0-9]*[0-9][A-Za-z0-9]*[a-z][A-Za-z0-9]*|
[A-Za-z0-9]*[0-9][A-Za-z0-9]*[a-z][A-Za-z0-9]*[A-Z][A-Za-z0-9]*|
[A-Za-z0-9]*[0-9][A-Za-z0-9]*[A-Z][A-Za-z0-9]*[a-z][A-Za-z0-9]*
)$
or, in MySQL:
SELECT * FROM mytable WHERE mycolumn REGEXP BINARY "^([A-Za-z0-9]*[a-z][A-Za-z0-9]*[A-Z][A-Za-z0-9]*[0-9][A-Za-z0-9]*|[A-Za-z0-9]*[a-z][A-Za-z0-9]*[0-9][A-Za-z0-9]*[A-Z][A-Za-z0-9]*|[A-Za-z0-9]*[A-Z][A-Za-z0-9]*[a-z][A-Za-z0-9]*[0-9][A-Za-z0-9]*|[A-Za-z0-9]*[A-Z][A-Za-z0-9]*[0-9][A-Za-z0-9]*[a-z][A-Za-z0-9]*|[A-Za-z0-9]*[0-9][A-Za-z0-9]*[a-z][A-Za-z0-9]*[A-Z][A-Za-z0-9]*|[A-Za-z0-9]*[0-9][A-Za-z0-9]*[A-Z][A-Za-z0-9]*[a-z][A-Za-z0-9]*)$";
[a-zA-Z0-9]*[a-z]+[a-zA-Z0-9]*[A-Z]+[a-zA-Z0-9]*[0-9]+[a-zA-Z0-9]*|[a-zA-Z0-9]*[a-z]+[a-zA-Z0-9]*[0-9]+[a-zA-Z0-9]*[A-Z]+[a-zA-Z0-9]*|[a-zA-Z0-9]*[A-Z]+[a-zA-Z0-9]*[a-z]+[a-zA-Z0-9]*[0-9]+[a-zA-Z0-9]*|[a-zA-Z0-9]*[A-Z]+[a-zA-Z0-9]*[0-9]+[a-zA-Z0-9]*[a-z]+[a-zA-Z0-9]*|[a-zA-Z0-9]*[0-9]+[a-zA-Z0-9]*[A-Z]+[a-zA-Z0-9]*[a-z]+[a-zA-Z0-9]*|[a-zA-Z0-9]*[0-9]+[a-zA-Z0-9]*[a-z]+[a-zA-Z0-9]*[A-Z]+[a-zA-Z0-9]*