Regex : string must contain a-z and A-Z and 0-9 - mysql

I'm using a stored procedure to validate the input parameter. The input parameter must contain a-z and A-Z and 0-9.
for Example:
aS78fhE0 -> Correct
76AfbRZt -> Correct
76afbrzt -> Incorrect(doesn't contain Upper Case A-Z)
asAfbRZt -> Incorrect(doesn't contain Numeric 0-9)
4QA53RZJ -> Incorrect(doesn't contain Lower Case a-z)
what Regular Expression that can validate the input parameter like above example,.?
Many Thanks,Praditha
UPDATEOthers character except Alphanumeric are not allowedI'm Using MySQL version 5

Further from Johns Post and subsequent comments:
The MySql you require would be
SELECT * FROM mytable WHERE mycolumn REGEXP BINARY '[a-z]'
AND mycolumn REGEXP BINARY '[A-Z]'
AND mycolumn REGEXP BINARY '[0-9]'
Add additional
AND mycolum REGEXP BINARY '^[a-zA-Z0-9]+$'
If you only want Alphanumerics in the string

With look-ahead assertion you could do like this:
/^(?=.*[a-z])(?=.*[A-Z])(?=.*[0-9]).*$/
update: It seems mysql doesn't support look around assertions.

You could split it up into 3 separate regex to test for each case.
[a-z], [A-Z], and [0-9]
and the results of those matches together, and you can achieve the result you're looking for.
EDIT:
if you're only looking to match alphanumerics, you should do ^[a-zA-Z0-9]+$ as suggested by Ed Head in the comments

My solution is leads to a long expression becuase i will permutate over all 6 possibilities the found capital letter, small letter and the needed number can be arranged in the string:
^(.*[a-z].*[A-Z].*[0-9].*|
.*[a-z].*[0-9].*[A-Z].*|
.*[A-Z].*[a-z].*[0-9].*|
.*[A-Z].*[0-9].*[a-z].*|
.*[0-9].*[a-z].*[A-Z].*|
.*[0-9].*[A-Z].*[a-z].*)$
Edit: Forgot the .* at the end and at the beginning.

Unfortunately, MySQL does not support lookaround assertions, therefore you'll have to spell it out for the regex engine (assuming that only those characters are legal):
^(
[A-Za-z0-9]*[a-z][A-Za-z0-9]*[A-Z][A-Za-z0-9]*[0-9][A-Za-z0-9]*|
[A-Za-z0-9]*[a-z][A-Za-z0-9]*[0-9][A-Za-z0-9]*[A-Z][A-Za-z0-9]*|
[A-Za-z0-9]*[A-Z][A-Za-z0-9]*[a-z][A-Za-z0-9]*[0-9][A-Za-z0-9]*|
[A-Za-z0-9]*[A-Z][A-Za-z0-9]*[0-9][A-Za-z0-9]*[a-z][A-Za-z0-9]*|
[A-Za-z0-9]*[0-9][A-Za-z0-9]*[a-z][A-Za-z0-9]*[A-Z][A-Za-z0-9]*|
[A-Za-z0-9]*[0-9][A-Za-z0-9]*[A-Z][A-Za-z0-9]*[a-z][A-Za-z0-9]*
)$
or, in MySQL:
SELECT * FROM mytable WHERE mycolumn REGEXP BINARY "^([A-Za-z0-9]*[a-z][A-Za-z0-9]*[A-Z][A-Za-z0-9]*[0-9][A-Za-z0-9]*|[A-Za-z0-9]*[a-z][A-Za-z0-9]*[0-9][A-Za-z0-9]*[A-Z][A-Za-z0-9]*|[A-Za-z0-9]*[A-Z][A-Za-z0-9]*[a-z][A-Za-z0-9]*[0-9][A-Za-z0-9]*|[A-Za-z0-9]*[A-Z][A-Za-z0-9]*[0-9][A-Za-z0-9]*[a-z][A-Za-z0-9]*|[A-Za-z0-9]*[0-9][A-Za-z0-9]*[a-z][A-Za-z0-9]*[A-Z][A-Za-z0-9]*|[A-Za-z0-9]*[0-9][A-Za-z0-9]*[A-Z][A-Za-z0-9]*[a-z][A-Za-z0-9]*)$";

[a-zA-Z0-9]*[a-z]+[a-zA-Z0-9]*[A-Z]+[a-zA-Z0-9]*[0-9]+[a-zA-Z0-9]*|[a-zA-Z0-9]*[a-z]+[a-zA-Z0-9]*[0-9]+[a-zA-Z0-9]*[A-Z]+[a-zA-Z0-9]*|[a-zA-Z0-9]*[A-Z]+[a-zA-Z0-9]*[a-z]+[a-zA-Z0-9]*[0-9]+[a-zA-Z0-9]*|[a-zA-Z0-9]*[A-Z]+[a-zA-Z0-9]*[0-9]+[a-zA-Z0-9]*[a-z]+[a-zA-Z0-9]*|[a-zA-Z0-9]*[0-9]+[a-zA-Z0-9]*[A-Z]+[a-zA-Z0-9]*[a-z]+[a-zA-Z0-9]*|[a-zA-Z0-9]*[0-9]+[a-zA-Z0-9]*[a-z]+[a-zA-Z0-9]*[A-Z]+[a-zA-Z0-9]*

Related

Regex pattern equivalent of %word% in mysql

I need 2 regex case insensitive patterns. One of them are equivalent of SQL's %. So %word%. My attempt at this was '^[a-zA-Z]*word[a-zA-Z]*$'.
Question 1: This seems to work, but I am wondering if this is the equivalent of %word%.
Finally the last pattern being similar to %, but requires 3 or more characters either before and after the word. So for example if the target word was word:
words = matched because it doesn't have 3 or more characters either before or after it.
swordfish = not matched because it has 3 or more characters after word
theword = not matched because it has 3 or more characters before it
mywordly = matched because it doesn't contain 3 or more characters before or after word.
miswordeds = not matched because it has 3 characters before it. (it also has 3 words after it too, but it met the criteria already).
Question 2: For the second regex, I am not very sure how to start this. I will be using the regex in a MySQL query using the REGEXP function for example:
SELECT 1
WHERE 'SWORDFISH' REGEXP '^[a-zA-Z]*word[a-zA-Z]*$'
First Question:
According to https://dev.mysql.com/doc/refman/8.0/en/string-comparison-functions.html#operator_like
With LIKE you can use the following two wildcard characters in the pattern:
% matches any number of characters, even zero characters.
_ matches exactly one character.
It means the REGEX ^[a-zA-Z]*word[a-zA-Z]*$' is not equivalent to %word%
Second Question:
Change * to {0,2} to indicate you want to match at maximum 2 characters either before or after it:
SELECT 1
WHERE 'SWORDFISH' REGEXP '^[a-zA-Z]{0,2}word[a-zA-Z]{0,2}$'
And to make case insensitive:
SELECT 1 WHERE LOWER('SWORDFISH') REGEXP '^[a-z]{0,2}word[a-z]{0,2}$'
Assuming
The test string (or column) has only letters. (Hence, I can use . instead of [a-z]).
Case folding and accents are not an issue (presumably handled by a suitable COLLATION).
Either way:
WHERE x LIKE '%word%' -- found the word
AND x NOT LIKE '%___word%' -- but fail if too many leading chars
AND x NOT LIKE '%word___%' -- or trailing
WHERE x RLIKE '^.{0,2}word.{0,2}$'
I vote for RLIKE being slightly faster than LIKE -- only because there are fewer ANDs.
(MySQL 8.0 introduced incompatible regexp syntax; I think the syntax above works in all versions of MySQL/MariaDB. Note that I avoided word boundaries and character class shortcuts like \\w.)

RegEx SQL, combine known pattern with variable numbers

I want to check whether there is a known pattern with variable numbers.
This column 'shortcut' has values like this
|shortcuts|
-----------
|ab1
|ab2
|ab23
|abc123
The only thing I've got for my SQL-statement is the alphabetical pattern e.g. 'ab'
So I started with
SELECT * FROM mytable WHERE shortcut LIKE 'ab%'
I only need ab1, ab2 and ab23 and NOT abc12.
Is there a way to modify my statement? There is at least one number, numbers always follow the known pattern and the pattern is the only known value.
You can use regular expressions:
where shortcut regexp '^ab[0-9]+'
This says that shortcut starts with "ab" and is followed by at least one digit.
You can use
SELECT * FROM mytable WHERE shortcut REGEXP '^ab[0-9]+$'
The ^ab[0-9]+$ regex (see its online demo) matches:
^ - start of string
ab - an ab string (case insensitively, use BINARY after REGEXP to make it case sensitive)
[0-9]+ - one or more digits
$ - end of string.
See this regex graph:

How to match any letter in SQL?

I want to return rows where certain fields follow a particular pattern such as whether a particular character in a string is a letter or number. To test it out, I want to return fields where the first letter is any letter. I used this code.
SELECT * FROM sales WHERE `customerfirstname` like '[a-z]%';
It returns nothing. So I would think that the criteria is the first character is a letter and then any following characters do not matter.
The following code works, but limits rows where the first character is an a.
SELECT * FROM sales WHERE `customerfirstname` like 'a%';
Am I not understanding pattern matching? Isn't it [a-z] or [A-Z] or [0-9] for any letter or number?
Also if I wanted to run this test on the second character in a string, wouldn't I use
SELECT * FROM `sales` WHERE `customerfirstname` like '_[a-z]%'
This is for SQL and MySQL. I am doing this in phpmyadmin.
You want to use regular expressions:
SELECT s.*
FROM sales s
WHERE s.customerfirstname REGEXP '^[a-zA-Z]';
This can be achieved with a regular expression.
SELECT * FROM sales WHERE REGEXP_LIKE(customerfirstname, '^[[:alpha:]]');
^ denotes the start of the string, while the [:alpha] character class matches any alphabetic character.
Just in case, here are a few others character classes that you may find useful :
alnum : dlphanumeric characters
digit: digit characters
lower : lowercase alphabetic characters
upper: uppercase alphabetic characters
See the mysql regexp docs for many more...

Match beginning of words in Mysql for UTF8 strings

I m trying to match beignning of words in a mysql column that stores strings as varchar. Unfortunately, REGEXP does not seem to work for UTF-8 strings as mentioned here
So,
select * from names where name REGEXP '[[:<:]]Aandre';
does not work if I have name like Foobar Aándreas
However,
select * from names where name like '%andre%'
matches the row I need but does not guarantee beginning of words matches.
Is it better to do the like and filter it out on the application side ? Any other solutions?
A citation from tha page you mentioned:
Warning
The REGEXP and RLIKE operators work in byte-wise fashion, so they are not multi-byte safe and may produce unexpected results with multi-byte character sets. In addition, these operators compare characters by their byte values and accented characters may not compare as equal even if a given collation treats them as equal.
select * from names where name like 'andre%'
select * from names where name like 'andre%' is not solution for eg:
name = 'richard andrew', because the string begining with richa... and not with andre...
for the moment, the temporaly solution, for search words (words != string) starting with a string
select * from names where name REGEXP '[[:<:]]andre';
But it no matching with accented words, eg: ándrew.
Any other solution, with regular expressions (mysql) to search in accented words?

How can I find non-ASCII characters in MySQL?

I'm working with a MySQL database that has some data imported from Excel. The data contains non-ASCII characters (em dashes, etc.) as well as hidden carriage returns or line feeds. Is there a way to find these records using MySQL?
MySQL provides comprehensive character set management that can help with this kind of problem.
SELECT whatever
FROM tableName
WHERE columnToCheck <> CONVERT(columnToCheck USING ASCII)
The CONVERT(col USING charset) function turns the unconvertable characters into replacement characters. Then, the converted and unconverted text will be unequal.
See this for more discussion. https://dev.mysql.com/doc/refman/8.0/en/charset-repertoire.html
You can use any character set name you wish in place of ASCII. For example, if you want to find out which characters won't render correctly in code page 1257 (Lithuanian, Latvian, Estonian) use CONVERT(columnToCheck USING cp1257)
You can define ASCII as all characters that have a decimal value of 0 - 127 (0x00 - 0x7F) and find columns with non-ASCII characters using the following query
SELECT * FROM TABLE WHERE NOT HEX(COLUMN) REGEXP '^([0-7][0-9A-F])*$';
This was the most comprehensive query I could come up with.
It depends exactly what you're defining as "ASCII", but I would suggest trying a variant of a query like this:
SELECT * FROM tableName WHERE columnToCheck NOT REGEXP '[A-Za-z0-9]';
That query will return all rows where columnToCheck contains any non-alphanumeric characters. If you have other characters that are acceptable, add them to the character class in the regular expression. For example, if periods, commas, and hyphens are OK, change the query to:
SELECT * FROM tableName WHERE columnToCheck NOT REGEXP '[A-Za-z0-9.,-]';
The most relevant page of the MySQL documentation is probably 12.5.2 Regular Expressions.
This is probably what you're looking for:
select * from TABLE where COLUMN regexp '[^ -~]';
It should return all rows where COLUMN contains non-ASCII characters (or non-printable ASCII characters such as newline).
One missing character from everyone's examples above is the termination character (\0). This is invisible to the MySQL console output and is not discoverable by any of the queries heretofore mentioned. The query to find it is simply:
select * from TABLE where COLUMN like '%\0%';
Based on the correct answer, but taking into account ASCII control characters as well, the solution that worked for me is this:
SELECT * FROM `table` WHERE NOT `field` REGEXP "[\\x00-\\xFF]|^$";
It does the same thing: searches for violations of the ASCII range in a column, but lets you search for control characters too, since it uses hexadecimal notation for code points. Since there is no comparison or conversion (unlike #Ollie's answer), this should be significantly faster, too. (Especially if MySQL does early-termination on the regex query, which it definitely should.)
It also avoids returning fields that are zero-length. If you want a slightly-longer version that might perform better, you can use this instead:
SELECT * FROM `table` WHERE `field` <> "" AND NOT `field` REGEXP "[\\x00-\\xFF]";
It does a separate check for length to avoid zero-length results, without considering them for a regex pass. Depending on the number of zero-length entries you have, this could be significantly faster.
Note that if your default character set is something bizarre where 0x00-0xFF don't map to the same values as ASCII (is there such a character set in existence anywhere?), this would return a false positive. Otherwise, enjoy!
Try Using this query for searching special character records
SELECT *
FROM tableName
WHERE fieldName REGEXP '[^a-zA-Z0-9#:. \'\-`,\&]'
#zende's answer was the only one that covered columns with a mix of ascii and non ascii characters, but it also had that problematic hex thing. I used this:
SELECT * FROM `table` WHERE NOT `column` REGEXP '^[ -~]+$' AND `column` !=''
In Oracle we can use below.
SELECT * FROM TABLE_A WHERE ASCIISTR(COLUMN_A) <> COLUMN_A;
for this question we can also use this method :
Question from sql zoo:
Find all details of the prize won by PETER GRÜNBERG
Non-ASCII characters
ans: select*from nobel where winner like'P% GR%_%berg';