REGEX alternative - mysql

Any guy can find the alternative way to rewrite the 2 REGEXs below without the question mark (?).
^(?:2131|1800|35\d{3})\d{11}$
^4[0-9]{12}(?:[0-9]{3})?$
Or, can you suggest how to make a query for search VISA and JCB card pattern with SQL language.
I just want to make a query to search card pattern inside my database. I try to use the regular expression to done this. Unfortunately, POSIX regexes don't support using the question mark ? as a non-greedy (lazy) modifier to the star and plus quantifiers like PCRE (Perl Compatible Regular Expressions). This means you can't use +? and *?.

In MySQL versions before v.8, you need to use POSIX ERE like regex syntax, that is:
You can't use non-capturing groups
You can't use \d shorthand character class for digits, you need to use [[:digit:]] or [0-9]
You won't be able to use lazy quantifiers, but your patterns do not contain them. In some cases, they can be replaced with negated bracket expressions (e.g. a.*?b is better written as a[^ab]*b).
In your case, you need to replace (?: with ( and replace \d with [0-9]
^(2131|1800|35[0-9]{3})[0-9]{11}$
^4[0-9]{12}([0-9]{3})?$

You drop the question mark in (?: that makes it a normal group.
instead of the )? use ){0,1}

Related

MySQL 8.0.30 Regular Expression Word Matching with Special Characters

While there's a told of "old" examples on the internet using the now unsupported '[[:<:]]word[[:>:]]' technique, I'm trying to find out how, in MySQL 8.0.30, to do exact word matching from our table with words that have special characters in them.
For example, we have a paragraph of text like:
"Senior software engineer and C++ developer with Unit Test and JavaScript experience. I also have .NET experience!"
We have a table of keywords to match against this and have been using the basic system of:
SELECT
sk.ID
FROM
sit_keyword sk
WHERE
var_text REGEXP CONCAT('\\b',sk.keyword,'\\b')
It works fine 90% of the time, but it completely fails on:
C#, C++, .NET, A+ or "A +" etc. So it's failing to match keywords with special characters in them.
I can't seem to find any recent documentation on how to address this since, as mentioned, nearly all of the examples I can find use the old unsupported techniques. Note I need to match these words (with special characters) anywhere in the source text, so it can be the first or last word, or somewhere in the middle.
Any advice on the best way to do this using REGEXP would be appreciated.
You need to escape special chars in the search phrase and use the construct that I call "adaptive dynamic word boundaries" instead of word boundaries:
var_text REGEXP CONCAT('(?!\\B\\w)',REGEXP_REPLACE(sk.keyword, '([-.^$*+?()\\[\\]{}\\\\|])', '\\$1'),'(?<!\\w\\B)')
The REGEXP_REPLACE(sk.keyword, '([-.^$*+?()\\[\\]{}\\\\|])', '\\$1') matches . ^ $ * + - ? ( ) [ ] { } \ | chars (adds a \ before them) and (?!\\B\\w) / (?<!\\w\\B) require word boundaries only when the search phrase start/ends with a word char.
More details on adaptive dynamic word boundaries and demo in my YT video.
Regular expressions treat several characters as metacharacters. These are documented in the manual on regular expression syntax: https://dev.mysql.com/doc/refman/8.0/en/regexp.html#regexp-syntax
If you need a metacharacter to be treated as the literal character, you need to escape it with a backslash.
This gets very complex. If you just want to search for substrings, perhaps you should just use LOCATE():
WHERE LOCATE(sk.keyword, var_text) > 0
This avoids all the trickery with metacharacters. It treats the string of sk.keyword as containing only literal characters.

Regex to SQL: repetition-operator operand invalid

I'm trying to use a regex to detect URLs in all the rows of my table, here's the regex
\b(([\w-]+:\/\/?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|\/)))
However, I invariably get the "repetition-operator operand invalid" error, which, after hours of search on the internet, still remains obscure.
Where have I gone wrong? What can I do to fix this? And alternaltively, is there a better way to detect URLs in messages in SQL other than a Regex?
Thank you.
You cannot use ? quantifier in MySQL regex as the syntax is POSIX-based. Still, you can use * to match 0 or more characters. Also, \b in MySQL regex should be replaced with [[:<:]] (since this matches at the beginning of a word).
Thus, I suggest using
[[:<:]](([a-zA-Z0-9-]+:\/\/*|www[.])[^ ()<>]+(\([a-zA-Z0-9_]+\)|([^ [:punct:]]|\/)))
I am expanding \w to [a-zA-Z0-9_] as it is exactly what \w is. Instead of \s, I am using a literal space. Instead of \d, I am using [0-9]. This is done for readability and better compatibility. If \w, \d and \s work for you, you can use them, but I do not see them among the supported entities in POSIX specs.
Also, instead of literal space, you could use [:space:], it matches space, tab, newline, and carriage return. Instead of [a-zA-Z] you can use [:alpha:], and instead of [0-9], you can use [:digit:]. Please also check this:
[[:<:]](([[:alpha:][:digit:]-]+:\/\/*|www[.])[^[:space:]()<>]+(\([[:alpha:][:digit:]_]+\)|([^[:space:][:punct:]]|\/)))

Converting PCRE to POSIX regular expression

I am working on a MySQL database and noticed that it doesn't natively support PCRE (requires a plugin).
I wish to use these three for some data validation (these are actually the values given to the pattern attribute):
^[A-z\. ]{3,36}
^[a-z\d\.]{3,24}$
^(?=^.{4,}$)(?=.*\d)(?=.*[a-z])(?=.*[A-Z])(?!.*\s).*$
How do I do this?
I looked on the web but couldn't find any concrete examples or answers. Also there seem to exist no utilities that could do this automatically.
I am aware that some times, such conversions are not exact and can produce differences but I am willing to try.
The MySQL docs state that:
MySQL uses Henry Spencer's implementation of regular expressions, which is aimed at conformance with POSIX 1003.2. MySQL uses the extended version to support pattern-matching operations performed with the REGEXP operator in SQL statements.
Ok, so we're talking about POSIX ERE.
This page lists the details between various regex flavors, so I'll use it as a cheatsheet.
^[A-z\. ]{3,36}
You're using:
Anchors: ^
Character classes: [...]
The range quantifier: {n,m}
All of these are supported out of the box in POSIX ERE, so you can use this expression as-is. But escaping the . in the character class is redundant, and A-z is most probably wrong in a character class (it includes [\]^_\`), so just write:
^[A-Za-z. ]{3,36}
^[a-z\d\.]{3,24}$
This one uses \d as well, which is unsupported in POSIX ERE. So you have to write:
^[a-z0-9.]{3,24}$
^(?=^.{4,}$)(?=.*\d)(?=.*[a-z])(?=.*[A-Z])(?!.*\s).*$
Meh. You're using lookaheads. These are totally out of the scope for POSIX ERE, but you can work around this limitation by combining several SQL clauses for an equivalent logic:
WHERE LENGTH(foo) >= 4
AND foo REGEXP '[0-9]'
AND foo REGEXP '[a-z]'
AND foo REGEXP '[A-Z]'
AND NOT foo REGEXP '[ \t\r\n]'

Regex for start with three alpha and four digits

I have writen an sql statement to retrieve data from Mysql db and I wanted to select data where myId start with three alpha and 4 digits example : ABC1234K1D2
myId REGEXP '^[A-Z]{3}/d{4}'
but it gives me empty result(data is available in DB). Could someone point me to correct way.
In most regex variants the answer would be: /d matches a / followed by a d; I think you want \d which matches a digit.
However MySQL has a somewhat limited regex implementation (see documentation).
There is no shortcut to character sets like \d for any digit.
You need to either use a named character set ([[:digit:]]), or just use [0-9].
Try this out :
[A-Z]{3}[0-9]{4}
If you want characters to be case insensitive. Try this :
[a-zA-Z]{3}[0-9]{4}
First, in regular regular expressions, to match a digit, you have to use \d instead of /d (which makes you match / followed by d).
Then, I had never noticed, but I think \d (and the others like \w, etc.) don't seem to be available in MySQL. The doc lists the accepted spacial chars, and those generic classes don't appear. You could use [:digit:] instead, even if [0-9] is quite shorter ;)
You are doing fine, just replace /d with \d.Final regex: ^[A-Z]{3}\d{4}
You could use the following pattern :
^[a-zA-Z]{3}\d{4}

Regular expression for search terms not preceded or followed by [a-z] and [A-Z]

Can someone supply me with a regex to match a search term that is not preceded or followed by [a-z] and [A-Z]? (Other characters are OK.) I.e., when searching for key, I don't want keyboard in my search results, but key. is okay.
Since you don't specify what regex engine you're using, I'll assume a baseline, in which case "[^A-Za-z]key[^A-Za-z]" would be sufficient.
If you also want to catch the string at the start and end of the line, you'll also have to use "^key[^A-Aa-z]" and "[^A-Aa-z]key$".
\bkey\b should do what you want.
\b is a word boundary
As this question is tagged with mysql I assume you are using MySQL regexps. Then [[:<:]]key[[:>:]] is what you want. See the documentation at dev.mysql.com for details.
Or the more concise [^\w]key[^\w]
If you're using Perl, what you need is \b, aka "word boundary":
m/\bkey\b/
No need for the ORs if you do it like this:
(^|[^A-Za-z])key([^A-Za-z]|$)