I have writen an sql statement to retrieve data from Mysql db and I wanted to select data where myId start with three alpha and 4 digits example : ABC1234K1D2
myId REGEXP '^[A-Z]{3}/d{4}'
but it gives me empty result(data is available in DB). Could someone point me to correct way.
In most regex variants the answer would be: /d matches a / followed by a d; I think you want \d which matches a digit.
However MySQL has a somewhat limited regex implementation (see documentation).
There is no shortcut to character sets like \d for any digit.
You need to either use a named character set ([[:digit:]]), or just use [0-9].
Try this out :
[A-Z]{3}[0-9]{4}
If you want characters to be case insensitive. Try this :
[a-zA-Z]{3}[0-9]{4}
First, in regular regular expressions, to match a digit, you have to use \d instead of /d (which makes you match / followed by d).
Then, I had never noticed, but I think \d (and the others like \w, etc.) don't seem to be available in MySQL. The doc lists the accepted spacial chars, and those generic classes don't appear. You could use [:digit:] instead, even if [0-9] is quite shorter ;)
You are doing fine, just replace /d with \d.Final regex: ^[A-Z]{3}\d{4}
You could use the following pattern :
^[a-zA-Z]{3}\d{4}
Related
While there's a told of "old" examples on the internet using the now unsupported '[[:<:]]word[[:>:]]' technique, I'm trying to find out how, in MySQL 8.0.30, to do exact word matching from our table with words that have special characters in them.
For example, we have a paragraph of text like:
"Senior software engineer and C++ developer with Unit Test and JavaScript experience. I also have .NET experience!"
We have a table of keywords to match against this and have been using the basic system of:
SELECT
sk.ID
FROM
sit_keyword sk
WHERE
var_text REGEXP CONCAT('\\b',sk.keyword,'\\b')
It works fine 90% of the time, but it completely fails on:
C#, C++, .NET, A+ or "A +" etc. So it's failing to match keywords with special characters in them.
I can't seem to find any recent documentation on how to address this since, as mentioned, nearly all of the examples I can find use the old unsupported techniques. Note I need to match these words (with special characters) anywhere in the source text, so it can be the first or last word, or somewhere in the middle.
Any advice on the best way to do this using REGEXP would be appreciated.
You need to escape special chars in the search phrase and use the construct that I call "adaptive dynamic word boundaries" instead of word boundaries:
var_text REGEXP CONCAT('(?!\\B\\w)',REGEXP_REPLACE(sk.keyword, '([-.^$*+?()\\[\\]{}\\\\|])', '\\$1'),'(?<!\\w\\B)')
The REGEXP_REPLACE(sk.keyword, '([-.^$*+?()\\[\\]{}\\\\|])', '\\$1') matches . ^ $ * + - ? ( ) [ ] { } \ | chars (adds a \ before them) and (?!\\B\\w) / (?<!\\w\\B) require word boundaries only when the search phrase start/ends with a word char.
More details on adaptive dynamic word boundaries and demo in my YT video.
Regular expressions treat several characters as metacharacters. These are documented in the manual on regular expression syntax: https://dev.mysql.com/doc/refman/8.0/en/regexp.html#regexp-syntax
If you need a metacharacter to be treated as the literal character, you need to escape it with a backslash.
This gets very complex. If you just want to search for substrings, perhaps you should just use LOCATE():
WHERE LOCATE(sk.keyword, var_text) > 0
This avoids all the trickery with metacharacters. It treats the string of sk.keyword as containing only literal characters.
python pattern => ^(?=.\bABDUL\b)(?=.\bHAI\b.)(?=.\bMANSOOR\b).*$
need equalent mysql pattern
can you please help me out ?
The regex in question is a quite strange way how to match simple words. It is not clear what is the expected input. Maybe, the input justifies this approach.
^(?=.\bABDUL\b)(?=.\bHAI\b.)(?=.\bMANSOOR\b).*$
Which means: At the beginning there must be any character which is not a part of a word, then ABDUL, a non word character, HAI, a non word character, MANSOOR, a non word character or the end of the string.
^[^[:alnum:]]ABDUL[^[:alnum:]]HAI[^[:alnum:]]MANSOOR([^[:alnum:]]?.*)?$
Which is: At the beginning, not a number or alphabet character (alphanumerical), ABDUL, one non-alphanumerical, HAI, one non-alphanumerical, MANSOOR one non-alphanumerical or the end of the string.
I did not test it and did not intended to make it 100% the same as the first one, but it should be close enough.
For anyone who would like to copy it to their code:
Matching the first character is not very common and can be a bug in the original regexp.
(?=...) is an "lookahead assertion" which does not consume any characters, the POSIX version does not have it, but for a simple string searching it may not be important.
Both versions should match strings like !ABDUL$HAI)MANSOOR - make sure that this is what you want.
For someone who would like to understand the regular expressions I used
https://dev.mysql.com/doc/refman/8.0/en/regexp.html for mysql (POSIX syntax) and https://docs.python.org/3/library/re.html for python (PCRE = Perl compatible syntax)
Any guy can find the alternative way to rewrite the 2 REGEXs below without the question mark (?).
^(?:2131|1800|35\d{3})\d{11}$
^4[0-9]{12}(?:[0-9]{3})?$
Or, can you suggest how to make a query for search VISA and JCB card pattern with SQL language.
I just want to make a query to search card pattern inside my database. I try to use the regular expression to done this. Unfortunately, POSIX regexes don't support using the question mark ? as a non-greedy (lazy) modifier to the star and plus quantifiers like PCRE (Perl Compatible Regular Expressions). This means you can't use +? and *?.
In MySQL versions before v.8, you need to use POSIX ERE like regex syntax, that is:
You can't use non-capturing groups
You can't use \d shorthand character class for digits, you need to use [[:digit:]] or [0-9]
You won't be able to use lazy quantifiers, but your patterns do not contain them. In some cases, they can be replaced with negated bracket expressions (e.g. a.*?b is better written as a[^ab]*b).
In your case, you need to replace (?: with ( and replace \d with [0-9]
^(2131|1800|35[0-9]{3})[0-9]{11}$
^4[0-9]{12}([0-9]{3})?$
You drop the question mark in (?: that makes it a normal group.
instead of the )? use ){0,1}
I am using the following RegEx in MySQL to match two consecutive digits that are the same anywhere in a string:
^.*([[:digit:]])\1+.*$
It matches correctly the following strings:
8831
5011
9931
but it also matches
9318
and it doesn't match
3449
Is the problem around .* or is it something else?
There's no way to check to the same thing twice directly, instead you would need to check for all possibilities. Luckily since you are only looking at 10 digits, it's relatively easy:
(11|22|33|44|55|66|77|88|99|00)
I don't think MySQL regular expressions have back references. You can do the more verbose:
where col regexp '00|11|22|33|44|55|66|77|88|99'
I am refering one open source code. There I can found an sql with this kind of a filter.
select sometext from table1,table2 where table1.sometext LIKE
CONCAT('% ',table2.test_keyword,' %') AND table2.test_keyword NOT
REGEXP '__*';
What is that __* in this sql?
__* matches one _ followed by zero or more _s.
__*
^^^
||\__ (zero or more) ^
|\___ underscore |
\____ underscore, then |
_+ would have done the same job.
_+
^^
|\__ (one or more) ^
\___ underscore |
It's simply one or more underscore characters.
The pattern is best read as:
'_', exactly one underscore,
'_*', followed by zero or more underscores.
Keep in mind that, without a start marker, that will match the pattern at any location in the string, so it basically means any string with an underscore in it (or, more accurately, since you're using NOT, a string without an underscore).
It's also needlessly complex, since you could achieve the same effect with AND table2.test_keyword NOT REGEXP '_'.
See here for the latest MySQL documentation on regexes (5.6 at the time of this answer).