MySQL RLIKE behaviour for numbered string - mysql

I am using RLIKE to find some email domains with mysql.
Here is the Query:
SELECT something
FROM table1
WHERE SUBSTRING_INDEX(table1.email, "#", -1) RLIKE "test1.com"|"test2.com"
This matched all the email domains with numbers in, example:
aaa#domain0.com
Any idea why?
EDIT: I also noticed that it finds email domains that has at least two successive digits.
Really strange.

The string supplied to RLIKE or REGEXP needs to be a quoted string, wherein the entire regular expression is single-quoted. What you have are two double-quoted strings separated by |, which is the bitwise OR operator.
That is causing the whole expression to be evaluated as 0, and that's why the domain aaa#domain0.com is matched:
# The unquoted | evaluates this to zero:
mysql> SELECT "string" | "string";
+---------------------+
| "string" | "string" |
+---------------------+
| 0 |
+---------------------+
# And zero matches domain0.com
mysql> SELECT 'domain0.com' RLIKE '0';
+-------------------------+
| 'domain0.com' RLIKE '0' |
+-------------------------+
| 1 |
+-------------------------+
Instead, you would need to use RLIKE with a single-quoted string, and backslash-escape the .. I'm also adding ^$ anchors so substrings are not matched.
WHERE SUBSTRING_INDEX(table1.email, "#", -1) RLIKE '^test1\.com$|^test2\.com$'
It could also be expressed as '^(test1\.com|test2\.com)$'. The trick is that | has very low precedence so you need to ensure both ends are anchored for every possible string you want to match.
However, if you are just trying to match a list of domains, it is far easier to do it with IN () so you may merely list them:
WHERE SUBSTRING_INDEX(table1.email, "#", -1) IN ('test1.com', 'test2.com', 'test4.org')

Related

Use regular expression to match a subtring of a pattern in mysql

I want to query all strings that match the following pattern in mysql.
At least one non-empty character,
followed by a literal dash character -,
then followed by at least one non-empty character,
then followed by the literal string in ('true')
the substring of "and" can not appear in between - and in ('true').
For Example:
segment-123 in ('true')
matches the above pattern.
content-foo and segment in ('true')
does not match the above pattern because it has the substring "and" in between - and in ('true').
Is this achievable using REGEXP in mysql? Any help is greatly appreciated.
mysql> select 'segment-123 in (\'true\')'
regexp '[^[:space:]]+-[^[:space:]]+ in \\(\'true\'\\)' as matched;
+---------+
| matched |
+---------+
| 1 |
+---------+
mysql> select 'content-foo and segment in (\'true\')'
regexp '[^[:space:]]+-[^[:space:]]+ in \\(\'true\'\\)' as matched;
+---------+
| matched |
+---------+
| 0 |
+---------+
See https://dev.mysql.com/doc/refman/8.0/en/regexp.html#regexp-syntax for more documentation on the regular expression syntax in MySQL.

regular expression for MySQL always returns 1

I am trying to fetch rows from my database by checking if the json in one of their fields contains a specific id.
Example: col(kats): [2,4,7,9]
I am trying to do so by using the following query
SELECT column FROM table WHERE column REGEXP '(\[|\,)1(\]|\,)'
The Problem: MySQL returns 1 for every row in the table.
MySQL requires that any literal backslash \ characters (which are literal in the REGEXP string as escape characters to the following []) be escaped themselves. Thus, you must double-escape [] as \\[ and \\].
From the docs:
Because MySQL uses the C escape syntax in strings (for example, ā€œ\nā€ to represent the newline character), you must double any ā€œ\ā€ that you use in your REGEXP strings.
The rest of your pattern is basically correct, except that the comma , does not require escaping.
1 does not match:
> SELECT '[2,4,7,9]' REGEXP '(\\[|,)1(\\]|,)';
+--------------------------------------+
| '[2,4,7,9]' REGEXP '(\\[|,)1(\\]|,)' |
+--------------------------------------+
| 0 |
+--------------------------------------+
1 row in set (0.00 sec)
But 2 does match
> SELECT '[2,4,7,9]' REGEXP '(\\[|,)2(\\]|,)';
+--------------------------------------+
| '[2,4,7,9]' REGEXP '(\\[|,)2(\\]|,)' |
+--------------------------------------+
| 1 |
+--------------------------------------+
1 row in set (0.00 sec)

"Where" statement: match a single word (not substring)

I am using MySQL.
I have a car table in my database, and there is a name column in that table.
Suppose the name column of the table contain values:
+----------+
| name |
+----------+
| AAA BB |
----------
| CC D BB |
----------
| OO kk BB |
----------
| PP B CC |
----------
I would like to search the table where name column value contains word "BB" (not substring), What is the SQL command to achieve this ?
I know LIKE , but it is used to match a contained substring, not for a word match.
P.S.
My table contains large data. So, I probably need a more efficient way than using LIKE
The values in name column are random strings.
Please do not ask me to use IN (...) , because the values in that column is unpredictable.
Try this WHERE clause:
WHERE name LIKE '% BB %'
OR name LIKE 'BB %'
OR name LIKE '% BB'
OR name = 'BB'
Note that this will not perform well if your table is large. You may also want to consider a full-text search if you need better performance.
You can use the REGEXP operator in MySQL:
SELECT *
FROM car
WHERE name REGEXP '[[:<:]]BB[[:>:]]'
It will match BB if it occurs as a single word. From the MySQL manual:
[[:<:]], [[:>:]]
These markers stand for word boundaries. They match the beginning and end of words, respectively. A word is a sequence of word characters that is not preceded by or followed by word characters. A word character is an alphanumeric character in the alnum class or an underscore (_).
mysql> SELECT 'a word a' REGEXP '[[:<:]]word[[:>:]]'; -> 1
mysql> SELECT 'a xword a' REGEXP '[[:<:]]word[[:>:]]'; -> 0

MySQL concat() and lower() weirdness

Any idea why this works sensibly*:
mysql> select lower('AB100c');
+-----------------+
| lower('AB100c') |
+-----------------+
| ab100c |
+-----------------+
1 row in set (0.00 sec)
But this doesn't?
mysql> select lower(concat('A', 'B', 100,'C'));
+----------------------------------+
| lower(concat('A', 'B', 100,'C')) |
+----------------------------------+
| AB100C |
+----------------------------------+
1 row in set (0.00 sec)
*sensibly = 'the way I think it should work.'
As stated on MySql String functions:
LOWER(str)
LOWER() is ineffective when applied to
binary strings (BINARY, VARBINARY,
BLOB).
CONCAT(str1,str2,...)
Returns the string that results from
concatenating the arguments. May have
one or more arguments. If all
arguments are nonbinary strings, the
result is a nonbinary string. If the
arguments include any binary strings,
the result is a binary string. A
numeric argument is converted to its
equivalent binary string form; if you
want to avoid that, you can use an
explicit type cast.
In your code you are passing 100 as a numeric so concat will return a binary string and lower is ineffective when applied to binary strings that's why it's not get converted. If you want to convert you can try this:
select lower(concat('A', 'B', '100','C'));
lower is used to convert STRINGS to lowercase. But your value 100 is considered numeric. If you want to still achieve the result of lower case conversion, you should enclose the number in quotes like this:
select lower(concat('A', 'B', '100','C'));
I've tested this and it works fine.
And here is an other example with CONCAT and LIKE
LOWER(CONCAT(firstname, ' ', lastname)) LIKE LOWER('%my name%')

mySQL SELECT IN from string

Here is my table X:
id vals
---------------------
1 4|6|8|
Now table Y:
id name
--------------------
1 a
4 b
6 c
8 d
Now I want the following:
select * from Y where id IN (replace(select vals from X where id = '1'),'|',',')
But this does not seem to work. Any ideas why?
You may use FIND_IN_SET instead of just IN, normal IN keyword couldn't search between comma seperated values within one field.
For example
mysql> select FIND_IN_SET(4, replace('4|6|8|','|',','));
+-------------------------------------------+
| FIND_IN_SET(4, replace('4|6|8|','|',',')) |
+-------------------------------------------+
| 1 |
+-------------------------------------------+
1 row in set (0.00 sec)
Replace gives you a string back - but it's a string value, not a string as in part of your query.
What you can do is instead of using IN, use a REGEXP to match within your original string, for example:
vals REGEXP '[[:<:]]4[[:>:]]'
would be true only if there is a "4" in the original string that isn't part of a larger number (thus if you have 3|44|100 it wouldn't match on "4" but would match on "44").
The [[:<:]] and [[:>:]] are "left side of word" and "right side of word" respectively.
To generate that string, you can do something like...
CONCAT('[[:<:]]', CAST(id AS CHAR), '[[:>:]]')