Passing parameters to SQL regular expressions - mysql

I was wondering if it is possible to pass parameters to regular expressions as if they were literal strings in the MySQL REGEXP function. What I would like to do is the following:
SELECT ? REGEXP CONCAT('string', ?, 'string')
Now when I pass a dot (".") to the second parameter, it will automatically match any character, as expected. This means that strings like "stringastring" and "stringbstring" match the pattern. I wondered if it is possible to match the literal dot only, so as to only match "string.string" in this case. Is there a way to do such a thing with a MySQL regular expression, that does not involve explicitly escaping the parameter (which defeats the purpose of passing parameters in this first place)?

Try putting brackets, as in:
SELECT ? REGEXP CONCAT('string', '[.]', 'string')
See here: http://sqlfiddle.com/#!2/a3059/1

If I understand your question correctly I think you are looking for this:
SELECT ? REGEXP CONCAT('string', REPLACE(?, '.', '[.]'), 'string')
using the REPLACE function, any dot is always escaped to [.], but all others special characters are passed literally.

For those who are using PHP and looking for answer related to this I found an answer related to this in stackover here, though not directly for MySQL queries. The idea is to use preg_quote() to escape regexp meta characters. But in your case you have to apply it twice to your parameter.
$param = preg_quote(preg_quote($param))
Then
SELECT ? REGEXP CONCAT('string', $param, 'string')
Read this article and the comments to find out more

Related

SQL - match last two characters in a string

I have a small mysql database with a column which has format of a field as following:
x_1_1,
x_1_2,
x_1_2,
x_2_1,
x_2_12,
x_3_1,
x_3_2,
x_3_11,
I want to extra the data where it matches last '_1'. So if I run a query on above sample dataset, it would return
x_1_1,
x_2_1,
x_3_1,
This should not return x_2_12 or x_3_11.
I tried like '%_1' but it returns x_2_12 and x_3_11 as well.
Thank you!
A simple method is the right() function:
select t.*
from t
where right(field, 2) = '_1';
You can use like but you need to escape the _:
where field like '%$_1' escape '$'
Or use regular expressions:
where field regexp '_1$'
The underscore character has special significance in a LIKE clause. It acts as a wildcard and represent one single character. So you would have to escape it with a backslash:
LIKE '%\_1'
RIGHT does the job too, but it requires that you provide the proper length for the string being sought and is thus less flexible.
Duh, I found the answer.
Use RIGHT (col_name, 2) = '_1'
Thank you!

Types of Wildcards in MySql

My query:
Select * From tableName Where columnName Like "[PST]%"
is not giving the expected result.
Why does this wildcard not work in MySql?
If you want to filter on strings that contain any 'P', 'S', or 'T', then you can use a regex:
where col rlike '[PST]'
If you want strings that contain substring 'PST', then no need for square brackets - and like is enough:
where col like '%PST%'
If you want the matching character(s) at the start of the string, then the regex solution looks like:
where col rlike '^PST'
And the like option would be:
where col like 'PST%'
MySQL's LIKE syntax is documented here: https://dev.mysql.com/doc/refman/8.0/en/pattern-matching.html
Standard SQL from decades ago defined only two wildcards: % and _. These are the only wildcards an SQL product needs to support if they want to say they are SQL compliant and support the LIKE predicate.
% matches zero or more of any characters. It's analogous to .* in regular expressions.
_ matches exactly one of any character. It's analogous to . in regular expressions.
Also if you want to match a literal '%' or '_', you need to escape it, i.e. put a backslash before it:
WHERE title LIKE 'The 7\% Solution'
Microsoft SQL Server's LIKE syntax is documented here: https://learn.microsoft.com/en-us/sql/t-sql/language-elements/like-transact-sql?view=sql-server-ver15
They support % and _ wildcards, and the \ escape character, but they extend standard SQL with two other forms:
[a-z] matches one character, but only characters in the range inside the brackets. This is similar in regular expressions. The - is a range operator, unless it appears at the start or end of the string inside the brackets.
[^a-z] matches one character, which must not be one of the characters in the range inside the brackets. Also the same in regular expressions.
These are not standard forms of wildcards for the LIKE predicate, and other brands of SQL database don't support them.
Later versions of the SQL standard introduced a new predicate SIMILAR TO which supports much richer patterns and wildcards, since the right-side operand is a string which contains a regular expression. But since this predicate was introduced in a later edition of the SQL standard, some implementations had already developed their own solution that was almost the same.
MySQL called the operator REGEXP and RLIKE is a synonym (https://dev.mysql.com/doc/refman/8.0/en/regexp.html).
It was requested in https://bugs.mysql.com/bug.php?id=746 to support SIMILAR TO syntax to help MySQL comply with the SQL standard, but the request was turned down, because it had subtly different behavior to the existing REGEXP/RLIKE operator.
Microsoft SQL Server has partial support of regular expression wildcards in the LIKE operator, and also a dbo.RegexMatch() function.
SQLite has a GLOB operator, and so on.
Thanks everyone!
For specific this question, we need to use regexp
Select * From tableName Where ColumnName Regexp "^[PST]";
For more detail over Regular Expression i.e Regexp :
https://www.youtube.com/watch?v=KoltE-JUY0c

Using MySQL LIKE operator for fields encoded in JSON

I've been trying to get a table row with this query:
SELECT * FROM `table` WHERE `field` LIKE "%\u0435\u0442\u043e\u0442%"
Field itself:
Field
--------------------------------------------------------------------
\u0435\u0442\u043e\u0442 \u0442\u0435\u043a\u0441\u0442 \u043d\u0430
Although I can't seem to get it working properly.
I've already tried experimenting with the backslash character:
LIKE "%\\u0435\\u0442\\u043e\\u0442%"
LIKE "%\\\\u0435\\\\u0442\\\\u043e\\\\u0442%"
But none of them seems to work, as well.
I'd appreciate if someone could give a hint as to what I'm doing wrong.
Thanks in advance!
EDIT
Problem solved.
Solution: even after correcting the syntax of the query, it didn't return any results. After making the field BINARY the query started working.
As documented under String Comparison Functions:
Note
Because MySQL uses C escape syntax in strings (for example, “\n” to represent a newline character), you must double any “\” that you use in LIKE strings. For example, to search for “\n”, specify it as “\\n”. To search for “\”, specify it as “\\\\”; this is because the backslashes are stripped once by the parser and again when the pattern match is made, leaving a single backslash to be matched against.
Therefore:
SELECT * FROM `table` WHERE `field` LIKE '%\\\\u0435\\\\u0442\\\\u043e\\\\u0442%'
See it on sqlfiddle.
it can be useful for those who use PHP, and it works for me
$where[] = 'organizer_info LIKE(CONCAT("%", :organizer, "%"))';
$bind['organizer'] = str_replace('"', '', quotemeta(json_encode($orgNameString)));

How to select records from mysql database by regex

I have a regexp to validate user email address.
/^(|(([A-Za-z0-9]+_+)|([A-Za-z0-9]+\-+)|([A-Za-z0-9]+\.+)|([A-Za-z0-9]+\++))*[A-Za-z0-9]+#((\w+\-+)|(\w+\.))*\w{1,63}\.[a-zA-Z]{2,})$/i"
With the help of active record, I want to fetch from a database all the users whose email address doesn't match this regexp. I tried the following scope to achieve the desired result, but all I get is ActiveRecord::Relation.
scope :not_match_email_regex, :conditions => ["NOT email REGEXP ?'", /^(|(([A-Za-z0-9]+_+)|([A-Za-z0-9]+\-+)|([A-Za-z0-9]+\.+)|([A-Za-z0-9]+\++))*[A-Za-z0-9]+#((\w+\-+)|(\w+\.))*\w{1,63}\.[a-zA-Z]{2,})$/"]
This gives me the following query:
SELECT `users`.* FROM `users` WHERE (email REGEXP '--- !ruby/regexp /^(|(([A-Za-z0-9]+_+)|([A-Za-z0-9]+\\-+)|([A-Za-z0-9]+\\.+)|([A-Za-z0-9]+\\++))*[A-Za-z0-9]+#((\\w+\\-+)|(\\w+\\.))*\\w{1,63}\\.[a-zA-Z]{2,})$/\n...\n')
I also tried to define this scope in the following way with the same result:
scope :not_match_email_regex, :conditions => ["email REGEXP '(|(([A-Za-z0-9]+_+)|([A-Za-z0-9]+\-+)|([A-Za-z0-9]+\.+)|([A-Za-z0-9]+\++))*[A-Za-z0-9]+#((\w+\-+)|(\w+\.))*\w{1,63}\.[a-zA-Z]{2,})'"]
The query it generates is:
SELECT `users`.* FROM `users` WHERE (email REGEXP '(|(([A-Za-z0-9]+_+)|([A-Za-z0-9]+-+)|([A-Za-z0-9]+.+)|([A-Za-z0-9]+++))*[A-Za-z0-9]+#((w+-+)|(w+.))*w{1,63}.[a-zA-Z]{2,})')
How can I fetch all records that match or don't match the given regex?
EDIT 12-11-30 small corrections partly according to the comment by #innocent_rifle
The suggested Regexp here is trying to make the same matches as in the original question
1. In my solution when I first wrote it I forgot that you must escape \ in strings because I was testing directly in MySQL. When discussing Regexps it's confusing to use Regexps in strings, so I will use this form instead e.g. /dot\./.source which (in Ruby) will give "dot\\.".
2. REGEXP in MySQL (manual for 5.6, tested in 5.0.67) are using "C escape syntax in strings", so WHERE email REGEXP '\.' is still the same as WHERE email REGEXP '.', to find the character "." you must use WHERE email REGEXP '\\.', to achieve that you must use the code .where([ 'email REGEXP ?', "\\\\."]). It's more readable to use .where([ 'email REGEXP ?', /\\./.source ]) (MySQL needs 2 escapes). However, I prefer to use .where([ 'email REGEXP ?', /[.]/.source ]), then I don't have to worry about how many escapes you need.
3. You don't need to escape "-" in a Regexp, not when using that in [] either as long as that character is the first or the last.
Some errors I found: it's the first regexp-or "|" in you expression, and it should be as a String in the query, or using Regexp#source which I prefer. There was also an extra quote at the end I think.
Except from that are you really sure the regexps works. If you try it in the console on a string?
Also be aware of that you won't catch emails with NULL in db, in that case you must add (<your existing expr in parentheses>) OR IS NULL
Regexp syntax in my MySQL verion.
I also tested what #Olaf Dietsche wrote in his suggestion, it seems that it's not needed, but it's strongly recommended to follow the standard syntax anyway (NOT (expr REGEXP pat) or expr NOT REGEXP pat).
I have done some checking, these things must be changed: use [A-Za-z0-9_] instead of \w, and \+ is not valid, you must use \\+ ("\\\\+" if string), easier with [+] (in both Regexp or string).
It leads to following REGEXP in MySQL
'^(([A-Za-z0-9]+_+)|([A-Za-z0-9]+-+)|([A-Za-z0-9]+[.]+)|([A-Za-z0-9]+[+]+))*[A-Za-z0-9]+#(([A-Za-z0-9]+-+)|([A-Za-z0-9]+[.]))*[A-Za-z0-9]{1,63}[.][a-zA-Z]{2,}$'
Small change suggestions
I don't understand your regexp exactly, so this is only changing your regexp without changing what it will find.
First: change the whole string as I described above
Then change
(([A-Za-z0-9]+_+)|([A-Za-z0-9]+-+)|([A-Za-z0-9]+[.]+)|([A-Za-z0-9]+[+]+))*
to
([A-Za-z0-9]+[-+_.]+)*
and
#(([A-Za-z0-9]+-+)|([A-Za-z0-9]+[.]))*
to
#([A-Za-z]+[-.]+)*
Final code (change to ..., :conditions => ...syntax if you prefer that). I tried to make this find the same strings as in the comment by #innocent_rifle, only adding "_" in expressions to the right of #
.where([ 'NOT (email REGEXP ?)', /^([A-Za-z0-9]+[-+_.]+)*[A-Za-z0-9]+#([A-Za-z0-9]+[-._]+)*[A-Za-z0-9_]{1,63}[.][A-Za-z]{2,}$/.source ])
For validating email addresses, you might want to consider How to Find or Validate an Email Address. At least, this regexp looks a bit simpler.
According to MySQL - Regular Expressions the proper syntax is
expr REGEXP pat
for a match, and
expr NOT REGEXP pat or NOT (expr REGEXP pat)
for the opposite. Don't forget the braces in the second version.

How to use regex in MySQL?

I want to filter out those with field not like '%_[0-9]+' ,
but it turns out that MySQL doesn't take it as regex,
is that possible in MySQL?
That happens because of LIKE is not supposed to accept regular expression as a parameter. There is REGEXP for such things
WHERE field NOT REGEXP '%_[0-9]+'