Use regular expression to match a subtring of a pattern in mysql - mysql

I want to query all strings that match the following pattern in mysql.
At least one non-empty character,
followed by a literal dash character -,
then followed by at least one non-empty character,
then followed by the literal string in ('true')
the substring of "and" can not appear in between - and in ('true').
For Example:
segment-123 in ('true')
matches the above pattern.
content-foo and segment in ('true')
does not match the above pattern because it has the substring "and" in between - and in ('true').
Is this achievable using REGEXP in mysql? Any help is greatly appreciated.

mysql> select 'segment-123 in (\'true\')'
regexp '[^[:space:]]+-[^[:space:]]+ in \\(\'true\'\\)' as matched;
+---------+
| matched |
+---------+
| 1 |
+---------+
mysql> select 'content-foo and segment in (\'true\')'
regexp '[^[:space:]]+-[^[:space:]]+ in \\(\'true\'\\)' as matched;
+---------+
| matched |
+---------+
| 0 |
+---------+
See https://dev.mysql.com/doc/refman/8.0/en/regexp.html#regexp-syntax for more documentation on the regular expression syntax in MySQL.

Related

MySQL RLIKE behaviour for numbered string

I am using RLIKE to find some email domains with mysql.
Here is the Query:
SELECT something
FROM table1
WHERE SUBSTRING_INDEX(table1.email, "#", -1) RLIKE "test1.com"|"test2.com"
This matched all the email domains with numbers in, example:
aaa#domain0.com
Any idea why?
EDIT: I also noticed that it finds email domains that has at least two successive digits.
Really strange.
The string supplied to RLIKE or REGEXP needs to be a quoted string, wherein the entire regular expression is single-quoted. What you have are two double-quoted strings separated by |, which is the bitwise OR operator.
That is causing the whole expression to be evaluated as 0, and that's why the domain aaa#domain0.com is matched:
# The unquoted | evaluates this to zero:
mysql> SELECT "string" | "string";
+---------------------+
| "string" | "string" |
+---------------------+
| 0 |
+---------------------+
# And zero matches domain0.com
mysql> SELECT 'domain0.com' RLIKE '0';
+-------------------------+
| 'domain0.com' RLIKE '0' |
+-------------------------+
| 1 |
+-------------------------+
Instead, you would need to use RLIKE with a single-quoted string, and backslash-escape the .. I'm also adding ^$ anchors so substrings are not matched.
WHERE SUBSTRING_INDEX(table1.email, "#", -1) RLIKE '^test1\.com$|^test2\.com$'
It could also be expressed as '^(test1\.com|test2\.com)$'. The trick is that | has very low precedence so you need to ensure both ends are anchored for every possible string you want to match.
However, if you are just trying to match a list of domains, it is far easier to do it with IN () so you may merely list them:
WHERE SUBSTRING_INDEX(table1.email, "#", -1) IN ('test1.com', 'test2.com', 'test4.org')

regular expression for MySQL always returns 1

I am trying to fetch rows from my database by checking if the json in one of their fields contains a specific id.
Example: col(kats): [2,4,7,9]
I am trying to do so by using the following query
SELECT column FROM table WHERE column REGEXP '(\[|\,)1(\]|\,)'
The Problem: MySQL returns 1 for every row in the table.
MySQL requires that any literal backslash \ characters (which are literal in the REGEXP string as escape characters to the following []) be escaped themselves. Thus, you must double-escape [] as \\[ and \\].
From the docs:
Because MySQL uses the C escape syntax in strings (for example, ā€œ\nā€ to represent the newline character), you must double any ā€œ\ā€ that you use in your REGEXP strings.
The rest of your pattern is basically correct, except that the comma , does not require escaping.
1 does not match:
> SELECT '[2,4,7,9]' REGEXP '(\\[|,)1(\\]|,)';
+--------------------------------------+
| '[2,4,7,9]' REGEXP '(\\[|,)1(\\]|,)' |
+--------------------------------------+
| 0 |
+--------------------------------------+
1 row in set (0.00 sec)
But 2 does match
> SELECT '[2,4,7,9]' REGEXP '(\\[|,)2(\\]|,)';
+--------------------------------------+
| '[2,4,7,9]' REGEXP '(\\[|,)2(\\]|,)' |
+--------------------------------------+
| 1 |
+--------------------------------------+
1 row in set (0.00 sec)

Mysql regex quirks

Why does this match (it should match (44[0-9]) zero or more times)
mysql> SELECT "tampampam" REGEXP "(44[0-9])*$";
+----------------------------------+
| "tampampam" REGEXP "(44[0-9])*$" |
+----------------------------------+
| 1 |
+----------------------------------+
1 row in set (0.00 sec)
And this does not (it should match 44 followed by ([0-9]) zero or more times
mysql> SELECT "44tampampam" REGEXP "44([0-9])*$";
+------------------------------------+
| "44tampampam" REGEXP "44([0-9])*$" |
+------------------------------------+
| 0 |
+------------------------------------+
1 row in set (0.00 sec)
Well, it is a very strange regex expression.
As for the first case, (44[0-9])*$ means "match a string starting with 44 and then a number from 0 to 9, any number of times up to the end of string". Since "any number" is possible, the string "tampampam" is matched.
As for the second case, 44([0-9])*$ means "match 44, then any number from 0 to 9 (with heavy backtracking), zero or more times, up to the end of string". But after 44 there is "tampampam". No match is due. Remove $, and you'll have a match.
You must use start anchor also to make sure it doesn't match unwanted text:
SELECT "tampampam" REGEXP "^(44[0-9])*$";
+-----------------------------------+
| "tampampam" REGEXP "^(44[0-9])*$" |
+-----------------------------------+
| 0 |
+-----------------------------------+
The first query matches because matching something zero or more times, means that not matching it (ie. matching zero times), is also a match.
The second query does not match, because you have anchored the regular expression to the end of the string, because of the dollar-sign ($). As the end of the string is not the string 44 optionally followed by digits, it does not match.
I see no reason to use *$ in your case. Keep it simple:
SELECT "tampampam" REGEXP "44[0-9]";
=> 0
SELECT "t441ampampam" REGEXP "44[0-9]";
=> 1
SELECT "t441ampampam" REGEXP "^44[0-9]";
=> 0
SELECT "441tampampam" REGEXP "^44[0-9]";
=> 1
So if you need 44 to be the first characters in the string use '^44[0-9]'.
If you don't care that is as simple as '44[0-9]'.

MySQL/Regexp: Partial regexp match

I have a bunch of regular expressions in a MySQL table. I want to know whether a given string matches a part of any regular expression or not.
Eg:
+----+--------------------------------+
| id | regexps |
+----+--------------------------------+
| 1 | foo-[0-9]*\.example\.com |
| 2 | (bar|tux)-[0-9]*\.example\.com |
+----+--------------------------------+
(The regexps attribute is of VARCHAR type)
foo-11.example.com matches the first regexp.
I want a MySQL query that returns the first row with the given string as foo-11
This should do it on MySql:
select * from table t where 'foo-11.example.com' rlike t.data;
There are other ways in PostGreSQL. Here's the link from where I have referenced this:
http://www.tutorialspoint.com/mysql/mysql-regexps.htm
Match a Query to a Regular Expression in SQL?
PS: Using * is tricky though!

About mysql regex,how do I search and return string use mysql regex

My table filed's value is "<script type="text/javascript"src="http://localhost:8080/db/widget/10217EN/F"></script>",
I want to analyse this string and fetch the id 10217,how to do use mysql regex?
I know python regex group function can return the id 10217,but i'm not familiar with mysql regex.
Please help me,Thank you very much.
MySQL regular expressions do not support subpattern extraction. You will probably have better luck iterating over all of the rows in your database and storing the results in a new column.
As far as I know, you can't use MySQL's REGEXP for substring retrieval; it is designed for use in WHERE clauses and is limited to returning 0 or 1 to indicate failure or success at a match.
Since your pattern is pretty well defined, you can probably retrieve the id with a query that uses SUBSTR and LOCATE. It will be a bit of a mess since SUBSTR wants the start index and the length of the substring (it would be easier if it took the end index). Perhaps you could use TRIM to chop off the unwanted trailing part.
This query get the Id from the field
SELECT substring_index(SUBSTRING_INDEX(testvar,'/',-3),'EN',1) from testtab;
where as testtab - is table name , testvar - is field name
inner substring get string starts with last 3 / which is
mysql> SELECT SUBSTRING_INDEX(testvar,'/',-3) from testtab;
+----------------------------+
| SUBSTRING_INDEX(testvar,'/',-3) |
+----------------------------+
| 10217EN/F"> |
| 10222EN/F"> |
+----------------------------+
2 rows in set (0.00 sec)
outer substring get
mysql> SELECT substring_index(SUBSTRING_INDEX(testvar,'/',-3),'EN',1) from testtab;
+----------------------------------------------------+
| substring_index(SUBSTRING_INDEX(testvar,'/',-3),'EN',1) |
+----------------------------------------------------+
| 10217 |
| 10222 |
+----------------------------------------------------+
2 rows in set (0.00 sec)