Issue with RegExp matching a phone number format in query - mysql

I am trying to write a RegExp for MySQL that will only return the following 3 phone formats:
+1 000-000-0000 x0000
+1 000-000-0000
000-000-0000 x0000
000-000-0000
So basically:
+[any number of digits][space][any three digits]-[any three digits]-[any four digits][space][x][any number of digits]
The country code and the extension are optional. I am new to these but I would think this should return at least a number including both country code and extension options, but I get 0 results when I execute it.
\x2B[0-9]*\x20[0-9]{3}\-[0-9]{3}\-[0-9]{4}\x20x[0-9]+|
\x2B[0-9]*\x20[0-9]{3}\-[0-9]{3}\-[0-9]{4}|
[0-9]{3}\-[0-9]{3}\-[0-9]{4}
Can someone tell me why I am getting 0 results even though I have records like +1 555-555-5555 x5555 in my db. Also what is the syntax to make the country code and the extension are optional
Please note I am using [0-9] because I am querying a text field and \d didn't seem to return anything even when my criteria was something simple like \d*

So, as a joint effort, turns out the answer is here:
SELECT * FROM table
WHERE field
REGEXP '(^[+][0-9]+\ )?([0-9]{3}\-[0-9]{3}\-[0-9]{4})(\ x[0-9]+$)?'
First was the fact that character codes in mysql (x20, x2B and the like) are not allowed. Next important step was the use of parenthesis and the "?" token to make the different sections optional.
============
I think the issue is the lack of parenthesis to define the subexpressions.
This seems to work out for me, though it doesn't look real pretty:
SELECT '+1 000-000-0000 x0000' REGEXP '^(([+][0-9]*\[ ][0-9]{3}\-[0-9]{3}\-[0-9]{4}[ ]x[0-9]+)|())$'
as does
SELECT '' REGEXP '^(([+][0-9]*\[ ][0-9]{3}\-[0-9]{3}\-[0-9]{4}[ ]x[0-9]+)|())$'
as does (plain phone number):
SELECT '555-555-5555' REGEXP '^(([+][0-9]*\[ ])*[0-9]{3}\-[0-9]{3}\-[0-9]{4}([ ]x[0-9]+)*)|())$'
EDIT: (same regexp as above, but testing against version w/ country code and extension):
SELECT '+1 555-555-5555 x55' REGEXP '^(([+][0-9]*\[ ])*[0-9]{3}\-[0-9]{3}\-[0-9]{4}([ ]x[0-9]+)*)|())$'
When I try yours:
SELECT '+1 000-000-0000 x0000' REGEXP '\x2B[0-9]*\x20[0-9]{3}\-[0-9]{3}\-[0-9]{4}\x20x[0-9]+|'
I get:
1139 - Got error 'empty (sub)expression' from regexp

Related

Why isn't MySQL REGEXP filtering out these values?

So I'm trying to find what "special characters" have been used in my customer names. I'm going through updating this query to find them all one-by-one, but it's still showing all customers with a - despite me trying to exlude that in the query.
Here's the query I'm using:
SELECT * FROM customer WHERE name REGEXP "[^\da-zA-Z\ \.\&\-\(\)\,]+";
This customer (and many others with a dash) are still showing in the query results:
Test-able Software Ltd
What am I missing? Based on that regexp, shouldn't that one be excluded from the query results?
Testing it on https://regex101.com/r/AMOwaj/1 shows there is no match.
Edit - So I want to FIND any which have characters other than the ones in the regex character set. Not exclude any which do have these characters.
Your code checks if the string contains any character that does not belong to the character class, while you want to ensure that none does belong to it.
You can use ^ and $ to check the while string at once:
SELECT * FROM customer WHERE name REGEXP '^[^\da-zA-Z .&\-(),]+$';
This would probably be simpler expressed with NOT, and without negating the character class:
SELECT * FROM customer WHERE name NOT REGEXP '[\da-zA-Z .&\-(),]';
Note that you don't need to escape all the characters within the character class, except probably for -.
Use [0-9] or [[:digit:]] to match digits irrespective of MySQL version.
Use the hyphen where it can't make part of a range construction.
Fix the expression as
SELECT * FROM customer WHERE name REGEXP "[^0-9a-zA-Z .&(),-]+";
If the entire text should match this pattern, enclose with ^ / $:
SELECT * FROM customer WHERE name REGEXP "^[^0-9a-zA-Z .&(),-]+$";
- implies a range except if it is first. (Well, after the "not" (^).)
So use
"[^-0-9a-zA-Z .&(),]"
I removed the + at the end because you don't really care how many; this way it will stop after finding one.

MySql regexp escaping apostrophe(’)

I can't find a proper way to escape apostrophe sign(’) in my mysql query. Regexp I have, works fine with online tools for regexp testing.
Problematic example is the string G’Schlössl.
I want to have optional apostrophe sign in the query in front of the s character G(’?)Schlö(’?)ssl for all the different cases which could occur in other strings. I am not sure if the problem is caused by incorrect sign escaping but I have tried many options like ’?, \’?, \’{0,1} which works for the first occurrence but doesn't for the second optional one and cause query to return nothing. Other possibilities like ’’?, [’]?, [\’]?, [\’]{0,1} does not work even for the first one.
select id, name from restaurant where name regexp '.*g\’?(s|ß|ss|sz)chl(o|ö|oe)\’?s.*';
When I remove the last \’? it works:
select id, name from restaurant where name regexp '.*g\’?(s|ß|ss|sz)chl(o|ö|oe)s.*';
When I replace the last \’? with x? it works as well:
select id, name from restaurant where name regexp '.*g\’?(s|ß|ss|sz)chl(o|ö|oe)x?s.*';
Any ideas where the problem is or what else to try?
This thread explains escaping normal single quote only, which seems not to work in my case.
Instead of \’?, try (’)?. I'm thinking that the ? may apply to only the last byte of ’. By using parentheses instead, the ? applies to the entire 3 bytes (hex E28099) of the "RIGHT SINGLE QUOTATION MARK".

MySQL Find similar strings

I have an InnoDB database table of 12 character codes which often need to be entered by a user.
Occasionally, a user will enter the code incorrectly (for example typing a lower case L instead of a 1 etc).
I'm trying to write a query that will find similar codes to the one they have entered but using LIKE '%code%' gives me way too many results, many of which contain only one matching character.
Is there a way to perform a more detailed check?
Edit - Case sensitive not required.
Any advice appreciated.
Thanks.
Have a look at soundex. Commonly misspelled strings have the same soundex code, so you can query for:
where soundex(Code) like soundex(UserInput)
use without wildcard % for that
SELECT `code` FROM table where code LIKE 'user_input'
thi wil also check the space
SELECT 'a' = 'a ', return 1 whereas SELCET 'a' LIKE 'a ' return 0
reference

Mysql RegExp question selecting from a list of codes

I am trying to match a list of motorcycle models to a series of ebay codes for listing motorcycles in ebay.
So we get a motorcycle model name that will be something like:
XL883C Sportster where the manufacturer is Harley Davidson
I have a list of ebay codes that look like this
MB-100-0 Other
MB-100-1 883
MB-100-2 1000
MB-100-3 1130
MB-100-4 1200
MB-100-5 1340
MB-100-6 1450
MB-100-7 Dyna
MB-100-8 Electra
MB-100-9 FLHR
MB-100-10 FLHT
MB-100-11 FLSTC
MB-100-12 FLSTR
MB-100-13 FXCW
MB-100-14 FXSTB
MB-100-15 Softail
MB-100-16 Sportster
MB-100-17 Touring
MB-100-18 VRSCAW
MB-100-19 VRSCD
MB-100-20 VRSCR
So I want to match the model name against the list above using a regExp pattern.
I have tried the following code:
SELECT modelID FROM tblEbayModelCodes WHERE
LOWER(makeName) = 'harley-davidson' AND fnmodel REGEXP '[883|1000|1130|1200|1340|1450|Dyna|Electra|FLHR|FLHT|FLSTC|FLSTR|FXCW|FXSTB|Softail|Sportster|Touring|VRSCAW|VRSCD|VRSCR].*' LIMIT 1
however when I run the query I would expect the code to match on either MB-100-1 for 883 or MB-100-16 for Sportster but when I run it the query returns MB-100-0 for Other.
I am guessing that I have the pattern incorrect, so can anybody suggest what I might need to do to correct this?
Many thanks
Graham
[chars] matches any of the characters 'c','h','a','r','s'
So by giving it such a long list, it will inevitably match just the first item (single character)
Try this instead
LOWER(makeName) = 'harley-davidson' AND fnmodel REGEXP '(883|1000|1130|1200|1340|1450|Dyna|Electra|FLHR|FLHT|FLSTC|FLSTR|FXCW|FXSTB|Softail|Sportster|Touring|VRSCAW|VRSCD|VRSCR).*' LIMIT 1
You might also consider not using REGEX and using FIND_IN_SET instead.
Not really fully tested, but it should be something like this:
REGEXP '^MB-[0-9]+-[0-9]+[[:space:]]+(883|1000|1130|1200|1340|1450|Dyna|Electra|FLHR|FLHT|FLSTC|FLSTR|FXCW|FXSTB|Softail|Sportster|Touring|VRSCAW|VRSCD|VRSCR)$'
In detail:
^MB- Starts with MB-
[0-9]+ One or more digits
- Dash
[0-9]+ One or more digits
[[:space:]]+ One or more white space
(883|1000|...)$ Ends with one of these
Here's the reference for the regexp dialect spoken by MySQL:
http://dev.mysql.com/doc/refman/5.1/en/regexp.html
Answer to comment:
If you want to match the Sportster row them remove all other conditions. And you may not even need regular expressions:
WHERE fnmodel LIKE '% Sportster'

MySQL query - select postcode matches

I need to make a selection based on the first 2 characters of a field, so for example
SELECT * from table WHERE postcode LIKE 'rh%'
But this would select any record that contains those 2 characters at any point in the "postcode" field right? I am in need of a query that just selects the first 2 characters. Any pointerS?
Thanks
Your query is correct. It searches for postcodes starting with "rh".
In contrast, if you wanted to search for postcodes containing the string "rh" anywhere in the field, you would write:
SELECT * from table WHERE postcode LIKE '%rh%'
Edit:
To answer your comment, you can use either or both % and _ for relatively simple searches. As you have noticed already, % matches any number of characters whereas _ matches a single character.
So, in order to match postcodes starting with "RHx " (where x is any character) your query would be:
SELECT * from table WHERE postcode LIKE 'RH_ %'
(mind the space after _). For more complex search patterns, you need to read about regular expressions.
Further reading:
http://dev.mysql.com/doc/refman/5.1/en/pattern-matching.html
http://dev.mysql.com/doc/refman/5.1/en/regexp.html
LIKE '%rh%' will return all rows with 'rh' anywhere
LIKE 'rh%' will return all rows with 'rh' at the beginning
LIKE '%rh' will return all rows with 'rh' at the end.
If you want to get only first two characters 'rh', use MySQL SUBSTR() function
http://dev.mysql.com/doc/refman/5.1/en/string-functions.html#function_substr
Dave, your way seems correct to me (and works on my test data). Using a leading % as well will match anywhere in the string which obviously isn't desirable when dealing with postcodes.