In MySql (I'm using 5.1.48), the following regular expressions return true i.e 1.
SELECT '10-5' REGEXP '10-5' as temp;
SELECT '10/5' REGEXP '10/5' as temp;
SELECT '1*5' REGEXP '1*5' as temp;
The following expressions however return false i.e 0.
SELECT '10+5' REGEXP '10+5' as temp;
SELECT '10*5' REGEXP '10*5' as temp;
To use a literal instance of a special character in a regular
expression, precede it by two backslash (\) characters. The MySQL
parser interprets one of the backslashes, and the regular expression
library interprets the other.
Escaping + and * in the preceding two statements returns true i.e 1 as follows.
SELECT '10+5' REGEXP '10\\+5' as temp;
SELECT '10*5' REGEXP '10\\*5' as temp;
If this is the case then why is * in the following statement (the last one in the first snippet) not required to escape?
SELECT '1*5' REGEXP '1*5' as temp;
It returns true i.e 1 without escaping * and the following something similar (the last one in the second snippet) returns false.
SELECT '10*5' REGEXP '10*5' as temp;
It requires * to be escaped. Why?
An unescaped asterisk, as you know, means "zero or more of the preceeding character", so "1*5" means "any number of 1s, followed by a 5".
The key is this info from the doc:
A REGEXP pattern match succeeds if the pattern matches anywhere in the value being tested. (This differs from a LIKE pattern match, which succeeds only if the pattern matches the entire value.)
So, "1*5" ("any number of 1s, followed by a 5") will match the string "1*5" by only seeing the "5". "10*5" ("1, followed by any number of 0s, followed by a 5") won't match the string "10*5" because the "*" character breaks it up.
Hope that helps.
Related
I have tried:
....WHERE fieldname REGEXP '.*\(.*\).*';
But this returns every record in the table.
You should use double backslashes when escaping a regex special metacharacter in a REGEXP pattern. Also, since REGEXP also finds partial matches, you do not need the .* at the start/end.
So, you could fix the expression as
WHERE fieldname REGEXP '\\(.*\\)';
Or just use LIKE where % matches any amount of arbitrary chars (but must match the whole entry unlike REGEXP):
WHERE fieldname LIKE '%(%)%';
A MySQL fiddle:
DROP TABLE IF EXISTS t;
CREATE TABLE t (word varchar(255));
INSERT INTO t (word)
VALUES
('test (here)'),
('test (here) test'),
('(here) test'),
('test no'),
('no test');
SELECT * FROM t WHERE word REGEXP '\\(.*\\)';
SELECT * FROM t WHERE word LIKE '%(%)%';
To get entries like text ) and ( here, you may use
SELECT * FROM t WHERE word REGEXP '\\(.*\\)|\\).*\\(';
SELECT * FROM t WHERE word LIKE '%(%)%' OR word LIKE '%)%(%';
See another fiddle.
LIKE would be more efficient than REGEXP, but to answer in REGEXP terms...
I suspect the OP wants parentheses to be somewhere in the middle of row text.
The problem with the REGEXP supplied is that expressions are greedy.
As soon as the .* finishes matching, it has already soaked up the entire record.
Try making the first two .* expressions as [^(]( and [^)]) and removing the final .* as it is superfluous.
'[^\\(]*\\([^\\)]*\\)'
Basically, this expression says
Look for zero or more non ( characters
then look for a single ( character
then look for zero or more non ) characters
then look for a single ) character
The () may be anywhere in the record and may contain zero or more characters inside the ().
Might I suggest fiddling in https://regex101.com/
Hope this helps.
I have a list of numbers in some fields in a table, for example something like this:
2033,1869,1914,1913,19120,1911,1910,1909,1908,1907,1866,1921,1922,1923
Now, I'm trying to do a query to check if a number is found in the row, however, I can't use LIKE as then it may return false positives as if I did a search for 1912 in the above field I would get a result returned because of the number 19120, obviously we don't want that - we can't append or prepend a comma as the start/end numbers don't have them.
So, onto using REGEXP I go... I tried this, but it doesn't work (it returns a result):
SELECT * FROM cat_listing WHERE cats REGEXP '[^0-9]*1912[^0-9]*';
I imagine why it still finds something is because of the * quantifier; it found [^0-9] 0 times AFTER 1912 so it considers it a match.
I'm not sure how to modify it to do what I want.
In your case, it seems word boundaries are necessary:
SELECT * FROM cat_listing WHERE cats REGEXP '[[:<:]]1912[[:>:]]';
[[:<:]] is the beginning of a word and [[:>:]] is the end. See reference:
[[:<:]], [[:>:]]
These markers stand for word boundaries. They match the beginning and end of >words, respectively. A word is a sequence of word characters that is not >preceded by or followed by word characters. A word character is an alphanumeric >character in the alnum class or an underscore (_).
You have another option called find_in_set()
SELECT * FROM cat_listing WHERE find_in_set('1912', cats) <> 0;
Returns 0 if str is not in strlist or if strlist is the empty string. Returns NULL if either argument is NULL. This function does not work properly if the first argument contains a comma (“,”) character.
No need to use a regex just because the column value has no comma at either end:
SELECT
cats
FROM cat_listing
WHERE INSTR(CONCAT(',', cats, ','), ',1912,')
;
See it in action: SQL Fiddle.
Please comment if adjustment / further detail is required.
I have an issue while I fetch data from database using regular expression. While I search for 'man' in tags it returns tags contains 'woman' too; because its substring.
SELECT '#hellowomanclothing' REGEXP '^(.)*[^wo]man(.)*$'; # returns 0 correct, it contains 'woman'
SELECT '#helloowmanclothing' REGEXP '^(.)*[^wo]man(.)*$'; # returns 0 incorrect, it can contain anything other than 'woman'
SELECT '#stylemanclothing' REGEXP '^(.)*[^wo]man(.)*$'; # returns 1 correct
How can I update the regular expression, when I search for 'man' it should return only the tag contains 'man' not 'woman'?
You can use two expressions. I think like is sufficient:
SELECT ('#stylemanclothing' like '%man%' and '#stylemanclothing' not like '%woman%')
Although you can express this in a regular expression, this is probably the easier solution.
Use this:
SELECT '#helloowmanclothing' REGEXP '^(.)*([^o]|[^w]o)man(.)*$'
In your pattern [^wo] stands for "one character except for w and o", while you need to exclude two consecutive characters - w and then o.
Therefore above pattern allows for o before man only if o is preceeded by character other than w.
A variant of n-dru pattern since you don't need to describe all the string:
SELECT '#hellowomanclothing' REGEXP '(^#.|[^o]|[^w]o)man';
Note: if a tag contains 'man' and 'woman' this pattern will return 1. If you don't want that Gordon Linoff solution is what you are looking for.
I would like to extract specific pattern from string in MySQL.
The column contains specific string like xxx-atg168d and xxx-atg444-6x. From these string, I want to extract atg168 and atg444 only. How can I perform this in MySQL?
**Input_column**
xxx-atg168d
xxx-atg444-6x
xxx-atg1689d
xxx-atg16507d
xxx-atg444d-6x
xxx-atg444c-6x
**Output_column**
atg168
atg444
atg1689
atg16507
atg444
atg444
Something like this may meet your specification:
SUBSTRING_INDEX(SUBSTR( t.col ,INSTR( t.col ,'-')+1),'-',1)
This assumes that you want to return the portion of the string following the first dash character, up to the next dash character (if present). If no dash characters exist within the string, the entire string will be returned.
EDIT
Ooops. That expression also includes the trailing "d". If it's just a trailing "d" character that needs to be removed...
TRIM(TRAILING 'd' FROM SUBSTRING_INDEX(SUBSTR( t.col ,INSTR( t.col ,'-')+1),'-',1))
In the more general case, to remove any "non-digit" character from the end (not just a "d"), things get pretty ugly. We need to check the rightmost character, and see if it matches a character we wnt to keep. If it's not, we shorten the string by one character.
IF( INSTR('0123456789',RIGHT(
#t := SUBSTRING_INDEX(SUBSTR( t.col ,INSTR( t.col ,'-')+1),'-',1)
,1))
, #t
, SUBSTRING( #t, 1, CHAR_LENGTH( #t )-1)
)
I made use of a user-defined variable here to avoid repeating the same expression multiple times. It's not required that we do that. The #t :=assignment can be removed, and other occurrences of #t can be replaced with the expression that was assigned to #t.
The literal '0123456789' in that expression is the set of characters that we don't want to remove from the end of the string.
Use SUBSTRING function , like this :
select SUBSTRING(column_name ,5,6) from table_name;
Here 5 is starting position and 6 is length of sub-string getting extracted from string.
Thanks spencer for your suggestions. I edited your code to get the solution for my query. Here is the update query,
left(substring_index (substr(subid,instr(subid,'-')+1),'-',1) , char_length(substring_index (substr(subid,instr(subid,'-')+1),'-',1))-1)
I'm using a stored procedure to validate the input parameter. The input parameter must contain a-z and A-Z and 0-9.
for Example:
aS78fhE0 -> Correct
76AfbRZt -> Correct
76afbrzt -> Incorrect(doesn't contain Upper Case A-Z)
asAfbRZt -> Incorrect(doesn't contain Numeric 0-9)
4QA53RZJ -> Incorrect(doesn't contain Lower Case a-z)
what Regular Expression that can validate the input parameter like above example,.?
Many Thanks,Praditha
UPDATEOthers character except Alphanumeric are not allowedI'm Using MySQL version 5
Further from Johns Post and subsequent comments:
The MySql you require would be
SELECT * FROM mytable WHERE mycolumn REGEXP BINARY '[a-z]'
AND mycolumn REGEXP BINARY '[A-Z]'
AND mycolumn REGEXP BINARY '[0-9]'
Add additional
AND mycolum REGEXP BINARY '^[a-zA-Z0-9]+$'
If you only want Alphanumerics in the string
With look-ahead assertion you could do like this:
/^(?=.*[a-z])(?=.*[A-Z])(?=.*[0-9]).*$/
update: It seems mysql doesn't support look around assertions.
You could split it up into 3 separate regex to test for each case.
[a-z], [A-Z], and [0-9]
and the results of those matches together, and you can achieve the result you're looking for.
EDIT:
if you're only looking to match alphanumerics, you should do ^[a-zA-Z0-9]+$ as suggested by Ed Head in the comments
My solution is leads to a long expression becuase i will permutate over all 6 possibilities the found capital letter, small letter and the needed number can be arranged in the string:
^(.*[a-z].*[A-Z].*[0-9].*|
.*[a-z].*[0-9].*[A-Z].*|
.*[A-Z].*[a-z].*[0-9].*|
.*[A-Z].*[0-9].*[a-z].*|
.*[0-9].*[a-z].*[A-Z].*|
.*[0-9].*[A-Z].*[a-z].*)$
Edit: Forgot the .* at the end and at the beginning.
Unfortunately, MySQL does not support lookaround assertions, therefore you'll have to spell it out for the regex engine (assuming that only those characters are legal):
^(
[A-Za-z0-9]*[a-z][A-Za-z0-9]*[A-Z][A-Za-z0-9]*[0-9][A-Za-z0-9]*|
[A-Za-z0-9]*[a-z][A-Za-z0-9]*[0-9][A-Za-z0-9]*[A-Z][A-Za-z0-9]*|
[A-Za-z0-9]*[A-Z][A-Za-z0-9]*[a-z][A-Za-z0-9]*[0-9][A-Za-z0-9]*|
[A-Za-z0-9]*[A-Z][A-Za-z0-9]*[0-9][A-Za-z0-9]*[a-z][A-Za-z0-9]*|
[A-Za-z0-9]*[0-9][A-Za-z0-9]*[a-z][A-Za-z0-9]*[A-Z][A-Za-z0-9]*|
[A-Za-z0-9]*[0-9][A-Za-z0-9]*[A-Z][A-Za-z0-9]*[a-z][A-Za-z0-9]*
)$
or, in MySQL:
SELECT * FROM mytable WHERE mycolumn REGEXP BINARY "^([A-Za-z0-9]*[a-z][A-Za-z0-9]*[A-Z][A-Za-z0-9]*[0-9][A-Za-z0-9]*|[A-Za-z0-9]*[a-z][A-Za-z0-9]*[0-9][A-Za-z0-9]*[A-Z][A-Za-z0-9]*|[A-Za-z0-9]*[A-Z][A-Za-z0-9]*[a-z][A-Za-z0-9]*[0-9][A-Za-z0-9]*|[A-Za-z0-9]*[A-Z][A-Za-z0-9]*[0-9][A-Za-z0-9]*[a-z][A-Za-z0-9]*|[A-Za-z0-9]*[0-9][A-Za-z0-9]*[a-z][A-Za-z0-9]*[A-Z][A-Za-z0-9]*|[A-Za-z0-9]*[0-9][A-Za-z0-9]*[A-Z][A-Za-z0-9]*[a-z][A-Za-z0-9]*)$";
[a-zA-Z0-9]*[a-z]+[a-zA-Z0-9]*[A-Z]+[a-zA-Z0-9]*[0-9]+[a-zA-Z0-9]*|[a-zA-Z0-9]*[a-z]+[a-zA-Z0-9]*[0-9]+[a-zA-Z0-9]*[A-Z]+[a-zA-Z0-9]*|[a-zA-Z0-9]*[A-Z]+[a-zA-Z0-9]*[a-z]+[a-zA-Z0-9]*[0-9]+[a-zA-Z0-9]*|[a-zA-Z0-9]*[A-Z]+[a-zA-Z0-9]*[0-9]+[a-zA-Z0-9]*[a-z]+[a-zA-Z0-9]*|[a-zA-Z0-9]*[0-9]+[a-zA-Z0-9]*[A-Z]+[a-zA-Z0-9]*[a-z]+[a-zA-Z0-9]*|[a-zA-Z0-9]*[0-9]+[a-zA-Z0-9]*[a-z]+[a-zA-Z0-9]*[A-Z]+[a-zA-Z0-9]*