I am trying to build a query that checks whether the string contains at least more than 5 consecutive digits or not.
Query;
SELECT count(id) as gV FROM someTable WHERE ... title REGEXP '(\d{5,})' order by id limit 0,10
Sample data
Some text here 123456 (MATCH)
S0m3 t3xt h3r3
Some text 123 here 345
98765 Some text here (MATCH)
Some12345Text Around (MATCH)
Desired output
3 (Some text here 123456, 98765 Some text here, Some12345Text Around)
Is there any specific rules for regex in MySQL queries?
MySQL's regular expression engine does not implement the \d "digit" expression, but instead you can represent it either as a character class range like [0-9] or as the special character class [[:digit:]]. The curly brace repeat syntax {5,} is supported in the form you've attempted.
The available regular expression syntax is described in the manual
So you can use either of the following forms:
title REGEXP '[0-9]{5,}'
title REGEXP '[[:digit:]]{5,}'
Examples:
Non matching:
> SELECT '123' REGEXP '[[:digit:]]{5,}';
+--------------------------------+
| '123' REGEXP '[[:digit:]]{5,}' |
+--------------------------------+
| 0 |
+--------------------------------+
> SELECT '1X345' REGEXP '[0-9]{5,}';
+--------------------------------+
| '123' REGEXP '[0-9]{5,}' |
+--------------------------------+
| 0 |
+--------------------------------+
Matching examples:
> SELECT '98765 Some text here' REGEXP '[[:digit:]]{5,}';
+-------------------------------------------------+
| '98765 Some text here' REGEXP '[[:digit:]]{5,}' |
+-------------------------------------------------+
| 1 |
+-------------------------------------------------+
> SELECT 'Some text here 123456' REGEXP '[0-9]{5,}';
+--------------------------------------------+
| 'Some text here 123456' REGEXP '[0-9]{5,}' |
+--------------------------------------------+
| 1 |
+--------------------------------------------+
1 row in set (0.00 sec)
Related
Hello i need a regular expression per my sql query to match to text
"SIP/(10 NUMBERS)"
equals
"SIP/1234567890"
"SIP" are text
and 10 number randoms 0-9
UPDATE
Final text are SIP/0123456789-000001cc
where
"SIP/" is text
"0123456789" Always 10 digits
"-" is character
"000001cc" is random alphanumeric
You can use this regex:
^SIP/[[:digit:]]{10}-
Examples:
mysql> select 'SIP/0123456789-000001cc' regexp '^SIP/[[:digit:]]{10}-';
+----------------------------------------------------------+
| 'SIP/0123456789-000001cc' regexp '^SIP/[[:digit:]]{10}-' |
+----------------------------------------------------------+
| 1 |
+----------------------------------------------------------+
1 row in set (0.00 sec)
mysql> select 'SIP/123456789-000001cc' regexp '^SIP/[[:digit:]]{10}-';
+----------------------------------------------------------+
| 'SIP/123456789-000001cc' regexp '^SIP/[[:digit:]]{10}-' |
+----------------------------------------------------------+
| 0 |
+----------------------------------------------------------+
1 row in set (0.00 sec)
Use \ to escape /
The following RegEx targets SIP followed by /and then 10 digit characters:
SIP\/\d{10}
I would like to select data from a table having first character as * and second character having numbers from 0 to 9
I using this code which is working and giving strings like *0123456* *34567* etc.:
SELECT * FROM `MyTable` WHERE SUBSTRING(MyColumn,1,1) = "*" AND
(SUBSTRING(MyColumn,2,1) ="0" OR SUBSTRING(MyColumn,2,1) BETWEEN 1 AND 10) ;
But when I change the query to shorten it like this, it is giving strings having alphabets which I do not want
SELECT * FROM `MyTable` WHERE SUBSTRING(MyColumn,1,1) = "*" AND
(SUBSTRING(MyColumn,2,1) BETWEEN 0 AND 10) ;
Why the 0 is not working with between in this query as expected?
You are seeing this problem because MySQL will attempt to cast the non-numeric character to an integer before comparing it to 0 and 10. Due to MySQL's casting rules, non-numeric strings are always considered equal to zero, so the condition BETWEEN 0 AND 10 is true.
-- The second character 'B' is equal to 0 after casting
> SELECT SUBSTRING('*BC', 2, 1) = 0;
+----------------------------+
| SUBSTRING('ABC', 2, 1) = 0 |
+----------------------------+
| 1 |
+----------------------------+
Since you are matching a specific pattern, I would recommend doing this with a REGEXP match instead of chopping it up into substrings.
SELECT *
FROM `MyTable`
WHERE MyColumn REGEXP '(^\\*[0-9])'
Examples:
> SELECT '*123' REGEXP '^\\*[0-9]';
+---------------------------+
| '*123' REGEXP '^\\*[0-9]' |
+---------------------------+
| 1 |
+---------------------------+
> SELECT '*A23' REGEXP '^\\*[0-9]';
+---------------------------+
| '*A23' REGEXP '^\\*[0-9]' |
+---------------------------+
| 0 |
+---------------------------+
The pattern match here breaks down to:
^ start of the string
\\* match a literal '*', requiring an escaping backslash
[0-9] followed by a digit.
It is possible to use your method and avoid the casting if you use quoted strings in the BETWEEN as in (SUBSTRING(msgbody,2,1) BETWEEN '0' AND '9')
> SELECT SUBSTRING('*BC', 2, 1) BETWEEN '0' AND '9';
+--------------------------------------------+
| SUBSTRING('*BC', 2, 1) BETWEEN '0' AND '9' |
+--------------------------------------------+
| 0 |
+--------------------------------------------+
> SELECT SUBSTRING('*99', 2, 1) BETWEEN '0' AND '9';
+--------------------------------------------+
| SUBSTRING('*99', 2, 1) BETWEEN '0' AND '9' |
+--------------------------------------------+
| 1 |
+--------------------------------------------+
But I prefer the REGEXP method because it expresses the whole pattern you wish to match as one condition. I find it much easier to read because the character positions are built into the expression, rather than needing to decode them from substring(). In either of these options, I expect MySQL will not utilize an index.
You're over-complicating things. In this specific case, if you say
would like to select data from a table having first character as * and second character having numbers from 0 to 9
then you want everything from '*0...' to '*9....'.
So you want,
WHERE MyColumn >= '*0' AND SUBSTR(MyColumn, 1, 2) <= '*9';
The query would be more efficient if you knew that no value of MyColumn will ever exceed, say, "*9ZZZZZZZZ". Then you would ask
WHERE MyColumn >= '*0' AND MyColumn <= '*9ZZZZZZZZ'
or since usually 9 is followed lexicographically by ':', and you don't want it,
WHERE MyColumn >= '*0' AND MyColumn <= '*:'
which allows better use of an index on MyColumn.
I am trying to fetch rows from my database by checking if the json in one of their fields contains a specific id.
Example: col(kats): [2,4,7,9]
I am trying to do so by using the following query
SELECT column FROM table WHERE column REGEXP '(\[|\,)1(\]|\,)'
The Problem: MySQL returns 1 for every row in the table.
MySQL requires that any literal backslash \ characters (which are literal in the REGEXP string as escape characters to the following []) be escaped themselves. Thus, you must double-escape [] as \\[ and \\].
From the docs:
Because MySQL uses the C escape syntax in strings (for example, ā\nā to represent the newline character), you must double any ā\ā that you use in your REGEXP strings.
The rest of your pattern is basically correct, except that the comma , does not require escaping.
1 does not match:
> SELECT '[2,4,7,9]' REGEXP '(\\[|,)1(\\]|,)';
+--------------------------------------+
| '[2,4,7,9]' REGEXP '(\\[|,)1(\\]|,)' |
+--------------------------------------+
| 0 |
+--------------------------------------+
1 row in set (0.00 sec)
But 2 does match
> SELECT '[2,4,7,9]' REGEXP '(\\[|,)2(\\]|,)';
+--------------------------------------+
| '[2,4,7,9]' REGEXP '(\\[|,)2(\\]|,)' |
+--------------------------------------+
| 1 |
+--------------------------------------+
1 row in set (0.00 sec)
Why does this match (it should match (44[0-9]) zero or more times)
mysql> SELECT "tampampam" REGEXP "(44[0-9])*$";
+----------------------------------+
| "tampampam" REGEXP "(44[0-9])*$" |
+----------------------------------+
| 1 |
+----------------------------------+
1 row in set (0.00 sec)
And this does not (it should match 44 followed by ([0-9]) zero or more times
mysql> SELECT "44tampampam" REGEXP "44([0-9])*$";
+------------------------------------+
| "44tampampam" REGEXP "44([0-9])*$" |
+------------------------------------+
| 0 |
+------------------------------------+
1 row in set (0.00 sec)
Well, it is a very strange regex expression.
As for the first case, (44[0-9])*$ means "match a string starting with 44 and then a number from 0 to 9, any number of times up to the end of string". Since "any number" is possible, the string "tampampam" is matched.
As for the second case, 44([0-9])*$ means "match 44, then any number from 0 to 9 (with heavy backtracking), zero or more times, up to the end of string". But after 44 there is "tampampam". No match is due. Remove $, and you'll have a match.
You must use start anchor also to make sure it doesn't match unwanted text:
SELECT "tampampam" REGEXP "^(44[0-9])*$";
+-----------------------------------+
| "tampampam" REGEXP "^(44[0-9])*$" |
+-----------------------------------+
| 0 |
+-----------------------------------+
The first query matches because matching something zero or more times, means that not matching it (ie. matching zero times), is also a match.
The second query does not match, because you have anchored the regular expression to the end of the string, because of the dollar-sign ($). As the end of the string is not the string 44 optionally followed by digits, it does not match.
I see no reason to use *$ in your case. Keep it simple:
SELECT "tampampam" REGEXP "44[0-9]";
=> 0
SELECT "t441ampampam" REGEXP "44[0-9]";
=> 1
SELECT "t441ampampam" REGEXP "^44[0-9]";
=> 0
SELECT "441tampampam" REGEXP "^44[0-9]";
=> 1
So if you need 44 to be the first characters in the string use '^44[0-9]'.
If you don't care that is as simple as '44[0-9]'.
I've been looking into the REGEXP when filtering my entries in my database.
I have a columns with values separated by commas looking like:
id col A
|---|------------------------|
| 1 | P:1,P:2,P:5,P:7 |
| 2 | P:6,P:8,P:10,P:11 |
| 3 | P:4,P:3,P1,P:0 |
| 4 | P:2,P:1 |
|---|------------------------|
Let's say I want the rows containing the value P:1, how can i design a REGEXP in the form:
SELECT * FROM `table` WHERE `col A` REGEXP '?'
so that i get rows 1 3 and 4? My previous approach was simply to use:
SELECT * FROM `table` WHERE `col A` LIKE 'P:1'
However that would naturally also return row 2 because it technically contains P:1...
Any help would be appreciated, I thinking this problem is fairly simple for a regexp expert!Cheers,Andreas
You need to read up on word boundaries.
[[:<:]], [[:>:]]
These markers stand for word boundaries. They match the beginning and end of words, respectively. A word is a sequence of word characters that is not preceded by or followed by word characters. A word character is an alphanumeric character in the alnum class or an underscore (_).