MySQL REGEXP not matching string - mysql

I have a table of messages. I am trying to find messages in the table that have an ID code which complies with a specific format. The regexp that I have below was written for matching these values in PHP, but I want to move it to a MySQL query.
It is looking for a specific format of an identifier code that looks like this:
[692370613-3CUWU]
The code has a consistent format:
starts and ends with hard brackets [ ]
two components inside,
first is an account number, min 9 digits, but could be higher
second component is a alphanumeric code, 5 characters, can include 1-9, and capital letters excluding "O"
the complete code can occur anywhere in the message
I have a query that reads:
SELECT * FROM messages
WHERE
msgBody REGEXP '\\[(\d){9,}-([A-NP-Z1-9]){5}\\]'
OR
msgSubject REGEXP '\\[(\d){9,}-([A-NP-Z1-9]){5}\\]'
I created a test row in the table which has only the sample value above in the msgBody field for testing - but it does not return any results.
I am guessing that I am missing something in the conversion of PHP style regex vs. MySQL.
Help is greatly appreciated.
Thank you!

Instead of \d try using [[:digit:]]

SELECT * FROM messages
WHERE
msgBody REGEXP '\\[([0-9]){9,}-([A-NP-Z1-9]){5}\\]'
OR
msgSubject REGEXP '\\[([0-9]){9,}-([A-NP-Z1-9]){5}\\]'

Related

Why isn't MySQL REGEXP filtering out these values?

So I'm trying to find what "special characters" have been used in my customer names. I'm going through updating this query to find them all one-by-one, but it's still showing all customers with a - despite me trying to exlude that in the query.
Here's the query I'm using:
SELECT * FROM customer WHERE name REGEXP "[^\da-zA-Z\ \.\&\-\(\)\,]+";
This customer (and many others with a dash) are still showing in the query results:
Test-able Software Ltd
What am I missing? Based on that regexp, shouldn't that one be excluded from the query results?
Testing it on https://regex101.com/r/AMOwaj/1 shows there is no match.
Edit - So I want to FIND any which have characters other than the ones in the regex character set. Not exclude any which do have these characters.
Your code checks if the string contains any character that does not belong to the character class, while you want to ensure that none does belong to it.
You can use ^ and $ to check the while string at once:
SELECT * FROM customer WHERE name REGEXP '^[^\da-zA-Z .&\-(),]+$';
This would probably be simpler expressed with NOT, and without negating the character class:
SELECT * FROM customer WHERE name NOT REGEXP '[\da-zA-Z .&\-(),]';
Note that you don't need to escape all the characters within the character class, except probably for -.
Use [0-9] or [[:digit:]] to match digits irrespective of MySQL version.
Use the hyphen where it can't make part of a range construction.
Fix the expression as
SELECT * FROM customer WHERE name REGEXP "[^0-9a-zA-Z .&(),-]+";
If the entire text should match this pattern, enclose with ^ / $:
SELECT * FROM customer WHERE name REGEXP "^[^0-9a-zA-Z .&(),-]+$";
- implies a range except if it is first. (Well, after the "not" (^).)
So use
"[^-0-9a-zA-Z .&(),]"
I removed the + at the end because you don't really care how many; this way it will stop after finding one.

SQL conditional: using Regex formatter for "Like"

I have a record in a database like this: 1K-05, in a column called "DocXmtlNum"
The SQL statement to try to get it is like this:
"SELECT DISTINCT DocXmtlNum FROM table1 WHERE DocXmtlNum Like '#?[A-Z]*' ORDER BY DocXmtlNum Desc"
However, it does not grab any records. I am assuming that the "#?[A-Z]*" part is saying that it wants to get records that start with a number, is followed by a letters, and is followed by any other characters. What's wrong with this? How would I write the regular expression to get a record that is a number followed by a letter, and followed by any character?
Note: The SQL statement was auto translated from VB6 to vb.net4, so there were errors introduced.
Is this what you want?
WHERE DocXmtlNum REGEXP '^[0-9]?[A-Z]-.+$'
This checks for:
An optional digit
A letter
A hyphen
At least one more character

Isolate an email address from a string using MySQL

I am trying to isolate an email address from a block of free field text (column name is TEXT).
There are many different variations of preceding and succeeding characters in the free text field, i.e.:
email me! john#smith.com
e:john#smith.com m:555-555-5555
john#smith.com--personal email
I've tried variations of INSTR() and SUBSTRING_INDEX() to first isolate the "#" (probably the one reliable constant in finding an email...) and extracting the characters to the left (up until a space or non-qualifying character like "-" or ":") and doing the same thing with the text following the #.
However - everything I've tried so far hasn't filtered out the noise to the level I need.
Obviously 100% accuracy isn't possible but would someone mind taking a crack at how I can structure my select statement?
There is no easy solution to do this within MySQL. However you can do this easily after you have retrieved it using regular expressions.
Here would be a an example of how to use it in your case: Regex example
If you want it to select all e-mail addresses from one string: Regex Example
You can use regex to extract the ones where it does contain an e-mail in MySQL but it still doesn't extract the group from the string. This has to be done outside MySQL
SELECT * FROM table
WHERE column RLIKE '\w*#\w*.\w*'
RLIKE is only for matching it, you can use REGEXP in the SELECT but it only returns 1 or 0 on whether it has found a match or not :s
If you do want to extract it in MySQL maybe this other stackoverflow post helps you out. But it seems like a lot of work instead of doing it outside MySQL
Now in MySQL 5 and 8 you can use REGEXP_SUBSTR to isolate just the email from a block of free text.
SELECT *, REGEXP_SUBSTR(`TEXT`, '([a-zA-Z0-9._%+\-]+)#([a-zA-Z0-9.-]+)\.([a-zA-Z]{2,4})') AS Emails FROM `mytable`;
If you want to get just the records with emails and remove duplicates ...
SELECT DISTINCT REGEXP_SUBSTR(`TEXT`, '([a-zA-Z0-9._%+\-]+)#([a-zA-Z0-9.-]+)\.([a-zA-Z]{2,4})') AS Emails FROM `mytable` WHERE `TEXT` REGEXP '([a-zA-Z0-9._%+\-]+)#([a-zA-Z0-9.-]+)\.([a-zA-Z]{2,4})';

SQL Select Statement(REGEXP) to find special characters and numbers in an alpha only field

I am using mySQL to query a field which would be LastName. I am looking for any errors in the field such as any special characters or numbers. I am not terribly familiar with SQL so this has been a challenge so far. I have written simple statements with REGEXP but I have run into some issues the REGEXP i was using was:
SELECT LastName FROM `DB`.`PLANNAME` where LastName REGEXP '^([0-9])'
now this turned up results where numbers were the first character in the string and i realized that if anything was in the middle of the string that started with a letter this would not pick it out.
To be clear i just need to find the errors not write a code to clean them out.
Any help would be greatly appreciated
Thanks
Pete
Something like this should do it for you.
SELECT column FROM table WHERE column REGEXP '[^A-Za-z]'
This will return any rows where a character that is not a-z. You might want to add in and '. For O'briens and von lansing etc. Any characters you think are acceptable should go in the character class [], http://www.regular-expressions.info/charclass.html.
Demo: https://regex101.com/r/nC9cG7/1
Maybe you are looking for something like this:
SELECT LastName FROM `DB`.`PLANNAME` WHERE NOT LastName REGEXP '[A-Za-z0-9]';
Here is a documentation on this:
Table 12.9 String Regular Expression Operators

Issue with RegExp matching a phone number format in query

I am trying to write a RegExp for MySQL that will only return the following 3 phone formats:
+1 000-000-0000 x0000
+1 000-000-0000
000-000-0000 x0000
000-000-0000
So basically:
+[any number of digits][space][any three digits]-[any three digits]-[any four digits][space][x][any number of digits]
The country code and the extension are optional. I am new to these but I would think this should return at least a number including both country code and extension options, but I get 0 results when I execute it.
\x2B[0-9]*\x20[0-9]{3}\-[0-9]{3}\-[0-9]{4}\x20x[0-9]+|
\x2B[0-9]*\x20[0-9]{3}\-[0-9]{3}\-[0-9]{4}|
[0-9]{3}\-[0-9]{3}\-[0-9]{4}
Can someone tell me why I am getting 0 results even though I have records like +1 555-555-5555 x5555 in my db. Also what is the syntax to make the country code and the extension are optional
Please note I am using [0-9] because I am querying a text field and \d didn't seem to return anything even when my criteria was something simple like \d*
So, as a joint effort, turns out the answer is here:
SELECT * FROM table
WHERE field
REGEXP '(^[+][0-9]+\ )?([0-9]{3}\-[0-9]{3}\-[0-9]{4})(\ x[0-9]+$)?'
First was the fact that character codes in mysql (x20, x2B and the like) are not allowed. Next important step was the use of parenthesis and the "?" token to make the different sections optional.
============
I think the issue is the lack of parenthesis to define the subexpressions.
This seems to work out for me, though it doesn't look real pretty:
SELECT '+1 000-000-0000 x0000' REGEXP '^(([+][0-9]*\[ ][0-9]{3}\-[0-9]{3}\-[0-9]{4}[ ]x[0-9]+)|())$'
as does
SELECT '' REGEXP '^(([+][0-9]*\[ ][0-9]{3}\-[0-9]{3}\-[0-9]{4}[ ]x[0-9]+)|())$'
as does (plain phone number):
SELECT '555-555-5555' REGEXP '^(([+][0-9]*\[ ])*[0-9]{3}\-[0-9]{3}\-[0-9]{4}([ ]x[0-9]+)*)|())$'
EDIT: (same regexp as above, but testing against version w/ country code and extension):
SELECT '+1 555-555-5555 x55' REGEXP '^(([+][0-9]*\[ ])*[0-9]{3}\-[0-9]{3}\-[0-9]{4}([ ]x[0-9]+)*)|())$'
When I try yours:
SELECT '+1 000-000-0000 x0000' REGEXP '\x2B[0-9]*\x20[0-9]{3}\-[0-9]{3}\-[0-9]{4}\x20x[0-9]+|'
I get:
1139 - Got error 'empty (sub)expression' from regexp