I am trying to find record with names which have non-alpha numeric characters.
I thought that I could do it with REGEXP
http://dev.mysql.com/doc/refman/5.1/en/regexp.html
Then I referred another SO question
How can I find non-ASCII characters in MySQL?
I found I could use this query :
SELECT * FROM tableName WHERE NOT columnToCheck REGEXP '[A-Za-z0-9]';
But it returns me zero rows . If I replaced the command to :
SELECT * FROM tableName WHERE columnToCheck REGEXP '[A-Za-z0-9]';
It returns me all the rows!!.
I tried some basic commands :
SELECT 'justffalnums' REGEXP '[[:alnum:]]'; returns 1
which is correct but
SELECT 'justff?alnums ' REGEXP '[[:alnum:]]'; also returns 1
I don't understand why it returs one. It should return 0 as it has space and also a '?' .
Is there anything to be enable in mysql for the regexp to work ?
I am using mysql 5.0 and tried with 5.1 too .
You need to add ^ (string begins) and $ (string ends) as well as an operator saying a certain number of alphanum's to use. Below I used + which means one or more.
SELECT 'justff?alnums ' REGEXP '^[[:alnum:]]+$';
-- only contains alphanumns => 0
SELECT 'justff?alnums ' REGEXP '^[[:alnum:]]+';
-- just begins with alphanum => 1
SELECT 'justff?alnums ' REGEXP '[[:alnum:]]+$';
-- just ends with alphanum => 0
The regex that you've given does not say that the entire field has to contain the characters in question. You can use the negation character ^ at the beginning of a character set.
SELECT 'justff?alnums' REGEXP '[^A-Za-z0-9]'; returns 1
SELECT 'justffalnums' REGEXP '[^A-Za-z0-9]'; returns 0
Related
NOTICE TO THE MODS: DON'T DELETE/ DON'T CLOSE
I asked this question earlier and the mods closed it because they thought it was similar to a question by another user. I have looked at the thread that they referred me to and it doesn't contain the kind of numeric problems that I have. That thread is How do I match an entire string with a regex?
My Question/Issue:
REGEXP is returning a false positive.
SELECT '123456' REGEXP '[0-9]{1,4}' AS Test;
Based on what I've read, the part with the curly brace {1,4} means minimum of 1 occurrence and no more than 4. But from the above , the occurrence of the range [0-9] is more than 4 and yet the query returns a 1 instead of a 0. I've attached a screenshot. What am I missing? Thanks.
Screenshot of the example in Workbench
SELECT '123456' REGEXP '^[0-9]{1,4}$' AS Test;
By "anchoring" you are asking to match the entire string. The above will fail because of the limit of 4.
SELECT '123456' REGEXP '^[0-9]{1,}$' AS Test;
Passes because it allows at least number of digits.
SELECT 'zzz123456' REGEXP '^[0-9]{1,}$' AS Test; -- Fail
SELECT '123456' REGEXP '^[0-9]*$' AS Test; -- pass
SELECT '' REGEXP '^[0-9]{1,}$' AS Test; -- fail (too short)
SELECT '' REGEXP '^[0-9]+$' AS Test; -- same as {1,}
SELECT 'abc123456def' REGEXP '[0-9]{1,4}' AS Test; -- pass (no anchor)
SELECT 'abc123456def' REGEXP '^[^0-9]+[0-9]{1,4}[^0-9]+$' AS Test; -- fail
SELECT 'abc123456def' REGEXP '[^0-9]*[0-9]+[^0-9]*' AS Test; -- pass
Those last two include [^0-9], which means "anything except 0-9.
Elaboration on ^
At the beginning of the regexp, ^ "anchors" the processing at the beginning: REGEXP "^x" means "starts with x"; REGEXP "x" succeeds if "x" is anywhere in the string.
At the beginning of a "character set", ^ means "not": REGEXP "x[0-9]" looks for x followed immediately by a digit' REGEXP "x[^0-9]" looks for x not immediately followed by a digit.
I have a bunch url that has a string either has
hotel+4 digit number: hotel1234
or slash+4digit.html: /1234.html
Is there a regex to extract 4 digit number like 1234 either use python or mysql?
I'm thinking 'hotel'[0-9][0-9][0-9][0-9],sth like this
Thanks!
You can try the REGEXP
SELECT * FROM Table WHERE ColumnName REGEXP '^[0-9]{4}$'
or
SELECT * FROM Table WHERE ColumnName REGEXP '^[[:digit:]]{4}$';
The following is a stackoverflow.com link that might be useful showing
how to extract a substring from inside a string in Python?
Unfortunately, MySQL regexp simply returns true if the string exists. I have found substring_index useful if you know the text surrounding the target...
select case when ColumnName like 'hotel____' then substring_index(ColumnName,'hotel',-1)
when ColumnName like '/____.html' then substring_index(substring_index(ColumnName,'/',-1),'.html',1)
else ColumnName
end digit_extraction
from TableName
where ...;
The case statement above isn't necessary because of the way substring_index works (by returning the entire string if the search string isn't found).
select substring_index(substring_index(substring_index(ColumnName,'hotel',-1),'/',-1),'.html',1)
from TableName
where ...;
How do I remove all superfluous full-stop . and semi-colon ; characters from end of last name field values in SQL?
One way to check of the last character is a "full stop" or "semicolon" is to use a substring function to get the last character, and compare that to the characters you are looking for. (There are several ways to do this, for example, using LIKE or REGEXP operator.
If that last character matches, then lop off that last character. One way to do that is to use a substring function. (Use the CHAR_LENGTH function to return the number of characters in the string.)
For example, something like this:
UPDATE mytable t
SET t.last_name = SUBSTR(t.last_name,1,CHAR_LENGTH(t.last_name)-1)
WHERE SUBSTRING(t.last_name,CHAR_LENGTH(t.last_name),1) IN ('.',';')
But, I'd strongly recommend that you test those expressions using a SELECT statement, before running an UPDATE statement.
SELECT t.last_name AS old_val
, SUBSTR(t.last_name,1,CHAR_LENGTH(t.last_name)-1) AS new_val
FROM mytable t
WHERE SUBSTRING(t.last_name,CHAR_LENGTH(t.last_name),1) IN ('.',';')
Substring rows that have a semi-colon or dot :
update emp
set ename = substring(ename, 1, char_length(ename) - 1)
where ename REGEXP '[.;]$';
I have this SQL condition that is supposed to retrieve all rows that satisfy the given regexp condition:
country REGEXP ('^(USA|Italy|France)$')
However, I need to add a pattern for retrieving all blank country values. Currently I am using this condition
country REGEXP ('^(USA|Italy|France)$') OR country = ""
How can achieve the same effect without having to include the OR clause?
Thanks,
Erwin
This should work:
country REGEXP ('^(USA|Italy|France|)$')
However from a performance point of view, you may want to use the IN syntax
country IN ('USA','Italy','France', '')
The later should be faster as REGEXP can be quite slow.
There's no reason you can't use the $ (match end of string) to fill in your "empty subexpression" issue...
It looks a little weird but country REGEXP ('^(USA|Italy|France|$)$') will actually work
You could try:
country REGEXP ('^(USA|Italy|France|)$')
I just added another | after France, which should would basically tell it to also match ^$ which is the same as country = ''.
Update: since this method doesn't work, I would recommend you use this regex:
country REGEXP ('^(USA|Italy|France)$|^$')
Note that you can't use the regex: ^(USA|Italy|France|.{0})$ because it will complain that there is an empty sub expression. Although ^(USA|Italy|France)$|^.{0}$ would work.
Here are some examples of the return value of this regex:
select '' regexp '^(USA|Italy|France)$|^$'
> 1
select 'abc' regexp '^(USA|Italy|France)$|^$'
> 0
select 'France' regexp '^(USA|Italy|France)$|^$'
> 1
select ' ' regexp '^(USA|Italy|France)$|^$'
> 0
As you can see, it returns exactly what you want.
If you want to treat blank values the same (e.g. 0 spaces and 5 spaces both count as blank), you should use the regex:
country REGEXP ('^(USA|Italy|France|\s*)$')
This will cause the last row in the previous example to behave differently, i.e.:
select ' ' regexp '^(USA|Italy|France|\s*)$'
> 1
I need a MySQL query w/ Regex to tell me if my string's first character is a number from 0 to 9.
The following query returns '1', since the REGEXP matches. You can adapt it for your purposes:
SELECT '123 this starts with a digit' REGEXP '^[[:digit:]]';
You can use it in a SELECT like this:
SELECT * FROM tbl WHERE field REGEXP '^[[:digit:]]';
Use this:
SELECT 'a12' REGEXP '^[0-9]';
=> 0
SELECT '4ab' REGEXP '^[0-9]';
=> 1