MySQL REGEXP: matching blank entries - mysql

I have this SQL condition that is supposed to retrieve all rows that satisfy the given regexp condition:
country REGEXP ('^(USA|Italy|France)$')
However, I need to add a pattern for retrieving all blank country values. Currently I am using this condition
country REGEXP ('^(USA|Italy|France)$') OR country = ""
How can achieve the same effect without having to include the OR clause?
Thanks,
Erwin

This should work:
country REGEXP ('^(USA|Italy|France|)$')
However from a performance point of view, you may want to use the IN syntax
country IN ('USA','Italy','France', '')
The later should be faster as REGEXP can be quite slow.

There's no reason you can't use the $ (match end of string) to fill in your "empty subexpression" issue...
It looks a little weird but country REGEXP ('^(USA|Italy|France|$)$') will actually work

You could try:
country REGEXP ('^(USA|Italy|France|)$')
I just added another | after France, which should would basically tell it to also match ^$ which is the same as country = ''.
Update: since this method doesn't work, I would recommend you use this regex:
country REGEXP ('^(USA|Italy|France)$|^$')
Note that you can't use the regex: ^(USA|Italy|France|.{0})$ because it will complain that there is an empty sub expression. Although ^(USA|Italy|France)$|^.{0}$ would work.
Here are some examples of the return value of this regex:
select '' regexp '^(USA|Italy|France)$|^$'
> 1
select 'abc' regexp '^(USA|Italy|France)$|^$'
> 0
select 'France' regexp '^(USA|Italy|France)$|^$'
> 1
select ' ' regexp '^(USA|Italy|France)$|^$'
> 0
As you can see, it returns exactly what you want.
If you want to treat blank values the same (e.g. 0 spaces and 5 spaces both count as blank), you should use the regex:
country REGEXP ('^(USA|Italy|France|\s*)$')
This will cause the last row in the previous example to behave differently, i.e.:
select ' ' regexp '^(USA|Italy|France|\s*)$'
> 1

Related

How to find variable pattern in MySql with Regex?

I am trying to pull a product code from a long set of string formatted like a URL address. The pattern is always 3 letters followed by 3 or 4 numbers (ex. ???### or ???####). I have tried using REGEXP and LIKE syntax, but my results are off for both/I am not sure which operators to use.
The first select statement is close to trimming the URL to show just the code, but oftentimes will show a random string of numbers it may find in the URL string.
The second select statement is more rudimentary, but I am unsure which operators to use.
Which would be the quickest solution?
SELECT columnName, SUBSTR(columnName, LOCATE(columnName REGEXP "[^=\-][a-zA-Z]{3}[\d]{3,4}", columnName), LENGTH(columnName) - LOCATE(columnName REGEXP "[^=\-][a-zA-Z]{3}[\d]{3,4}", REVERSE(columnName))) AS extractedData FROM tableName
SELECT columnName FROM tableName WHERE columnName LIKE '%___###%' OR columnName LIKE '%___####%'
-- Will take a substring of this result as well
Example Data:
randomwebsite.com/3982356923abcd1ab?random_code=12480712_ABC_DEF_ANOTHER_CODE-xyz123&hello_world=us&etc_etc
In this case, the desired string is "xyz123" and the location of said pattern is variable based on each entry.
EDIT
SELECT column, LOCATE(column REGEXP "([a-zA-Z]{3}[0-9]{3,4}$)", column), SUBSTR(column, LOCATE(column REGEXP "([a-zA-Z]{3}[0-9]{3,4}$)", column), LENGTH(column) - LOCATE(column REGEXP "^.*[a-zA-Z]{3}[0-9]{3,4}", REVERSE(column))) AS extractData From mainTable
This expression is still not grabbing the right data, but I feel like it may get me closer.
I suggest using
REGEXP_SUBSTR(column, '(?<=[&?]random_code=[^&#]{0,256}-)[a-zA-Z]{3}[0-9]{3,4}(?![^&#])')
Details:
(?<=[&?]random_code=[^&#]{0,256}-) - immediately on the left, there must be & or &, random_code=, and then zero to 256 chars other than & and # followed with a - char
[a-zA-Z]{3} - three ASCII letters
[0-9]{3,4} - three to four ASCII digits
(?![^&#]) - that are followed either with &, # or end of string.
See the online demo:
WITH cte AS ( SELECT 'randomwebsite.com/3982356923abcd1ab?random_code=12480712_ABC_DEF_ANOTHER_CODE-xyz123&hello_world=us&etc_etc' val
UNION ALL
SELECT 'randomwebsite.com/3982356923abcd1ab?random_code=12480712_ABC_DEF_ANOTHER_CODE-xyz4567&hello_world=us&etc_etc'
UNION ALL
SELECT 'randomwebsite.com/3982356923abcd1ab?random_code=12480712_ABC_DEF_ANOTHER_CODE-xyz89&hello_world=us&etc_etc'
UNION ALL
SELECT 'randomwebsite.com/3982356923abcd1ab?random_code=12480712_ABC_DEF_ANOTHER_CODE-xyz00000&hello_world=us&etc_etc'
UNION ALL
SELECT 'randomwebsite.com/3982356923abcd1ab?random_code=12480712_ABC_DEF_ANOTHER_CODE-aaaaa11111&hello_world=us&etc_etc')
SELECT REGEXP_SUBSTR(val,'(?<=[&?]random_code=[^&#]{0,256}-)[a-zA-Z]{3}[0-9]{3,4}(?![^&#])') output
FROM cte
Output:
I'd make use of capture groups:
(?<=[=\-\\])([a-zA-Z]{3}[\d]{3,4})(?=[&])
I assume with [^=\-] you wanted to capture string with "-","\" or "=" in front but not include those chars in the result. To do that use "positive lookbehind" (?<=.
I also added a lookahead (?= for "&".
If you'd like to fidget more with regex I recommend RegExr

MySQL - need to find records without a period in them

I've been to the regexp page on the MySQL website and am having trouble getting the query right. I have a list of links and I want to find invalid links that do not contain a period. Here's my code that doesn't work:
select * from `links` where (url REGEXP '[^\\.]')
It's returning all rows in the entire database. I just want it to show me the rows where 'url' doesn't contain a period. Thanks for your help!
SELECT c1 FROM t1 WHERE c1 NOT LIKE '%.%'
Your regexp matches anything that contains a character that isn't a period. So if it contains foo.bar, the regexp matches the f and succeeds. You can do:
WHERE url REGEXP '^[^.]*$'
The anchors and repetition operator make this check that every character is not a period. Or you can do:
WHERE LOCATE(url, '.') = 0
BTW, you don't need to escape . when it's inside [] in a regexp.
Using regexp seems like an overkill here. A simple like operator would do the trick:
SELECT * FROM `links` WHERE url NOT LIKE '%.%
EDIT:
Having said that, if you really want to negate regexp, just use not regexp:
SELECT * FROM `links` WHERE url NOT REGEXP '[\\.]';

How to use SQL to remove superfluous characters from names?

How do I remove all superfluous full-stop . and semi-colon ; characters from end of last name field values in SQL?
One way to check of the last character is a "full stop" or "semicolon" is to use a substring function to get the last character, and compare that to the characters you are looking for. (There are several ways to do this, for example, using LIKE or REGEXP operator.
If that last character matches, then lop off that last character. One way to do that is to use a substring function. (Use the CHAR_LENGTH function to return the number of characters in the string.)
For example, something like this:
UPDATE mytable t
SET t.last_name = SUBSTR(t.last_name,1,CHAR_LENGTH(t.last_name)-1)
WHERE SUBSTRING(t.last_name,CHAR_LENGTH(t.last_name),1) IN ('.',';')
But, I'd strongly recommend that you test those expressions using a SELECT statement, before running an UPDATE statement.
SELECT t.last_name AS old_val
, SUBSTR(t.last_name,1,CHAR_LENGTH(t.last_name)-1) AS new_val
FROM mytable t
WHERE SUBSTRING(t.last_name,CHAR_LENGTH(t.last_name),1) IN ('.',';')
Substring rows that have a semi-colon or dot :
update emp
set ename = substring(ename, 1, char_length(ename) - 1)
where ename REGEXP '[.;]$';

REGEXP not working in mysql

I am trying to find record with names which have non-alpha numeric characters.
I thought that I could do it with REGEXP
http://dev.mysql.com/doc/refman/5.1/en/regexp.html
Then I referred another SO question
How can I find non-ASCII characters in MySQL?
I found I could use this query :
SELECT * FROM tableName WHERE NOT columnToCheck REGEXP '[A-Za-z0-9]';
But it returns me zero rows . If I replaced the command to :
SELECT * FROM tableName WHERE columnToCheck REGEXP '[A-Za-z0-9]';
It returns me all the rows!!.
I tried some basic commands :
SELECT 'justffalnums' REGEXP '[[:alnum:]]'; returns 1
which is correct but
SELECT 'justff?alnums ' REGEXP '[[:alnum:]]'; also returns 1
I don't understand why it returs one. It should return 0 as it has space and also a '?' .
Is there anything to be enable in mysql for the regexp to work ?
I am using mysql 5.0 and tried with 5.1 too .
You need to add ^ (string begins) and $ (string ends) as well as an operator saying a certain number of alphanum's to use. Below I used + which means one or more.
SELECT 'justff?alnums ' REGEXP '^[[:alnum:]]+$';
-- only contains alphanumns => 0
SELECT 'justff?alnums ' REGEXP '^[[:alnum:]]+';
-- just begins with alphanum => 1
SELECT 'justff?alnums ' REGEXP '[[:alnum:]]+$';
-- just ends with alphanum => 0
The regex that you've given does not say that the entire field has to contain the characters in question. You can use the negation character ^ at the beginning of a character set.
SELECT 'justff?alnums' REGEXP '[^A-Za-z0-9]'; returns 1
SELECT 'justffalnums' REGEXP '[^A-Za-z0-9]'; returns 0

MySQL regex query case insensitive

In my table I have firstname and last name. Few names are upper case ( ABRAHAM ), few names are lower case (abraham), few names are character starting with ucword (Abraham).
So when i am doing the where condition using REGEXP '^[abc]', I am not getting proper records. How to change the names to lower case and use SELECT QUERY.
SELECT * FROM `test_tbl` WHERE cus_name REGEXP '^[abc]';
This is my query, works fine if the records are lower case, but my records are intermediate ,my all cus name are not lower case , all the names are like ucword.
So for this above query am not getting proper records display.
I think you should query your database making sure that the names are lowered, suppose that name is the name you whish to find out, and in your application you've lowered it like 'abraham', now your query should be like this:
SELECT * FROM `test_tbl` WHERE LOWER(cus_name) = name
Since i dont know what language you use, I've just placed name, but make sure that this is lowered and you should retrieve Abraham, ABRAHAM or any variation of the name!
Hepe it helps!
Have you tried:
SELECT * FROM `test_tbl` WHERE LOWER(cus_name) REGEXP '^[abc]';
I don't know since when, but nowadays MySql REGEXP is case insensitive.
https://dev.mysql.com/doc/refman/5.7/en/pattern-matching.html
You don't need regexp to search for names starting with a specific string or character.
SELECT * FROM `test_tbl` WHERE cus_name LIKE 'abc%' ;
% is wildcard char. The search is case insensitive unless you set the binary attribute for column cus_name or you use the binary operator
SELECT * FROM `test_tbl` WHERE BINARY cus_name LIKE 'abc%' ;
A few valid options already presented, but here's one more with just regex:
SELECT * FROM `test_tbl` WHERE cus_name REGEXP '^[abcABC]';