How to select records from mysql database by regex - mysql

I have a regexp to validate user email address.
/^(|(([A-Za-z0-9]+_+)|([A-Za-z0-9]+\-+)|([A-Za-z0-9]+\.+)|([A-Za-z0-9]+\++))*[A-Za-z0-9]+#((\w+\-+)|(\w+\.))*\w{1,63}\.[a-zA-Z]{2,})$/i"
With the help of active record, I want to fetch from a database all the users whose email address doesn't match this regexp. I tried the following scope to achieve the desired result, but all I get is ActiveRecord::Relation.
scope :not_match_email_regex, :conditions => ["NOT email REGEXP ?'", /^(|(([A-Za-z0-9]+_+)|([A-Za-z0-9]+\-+)|([A-Za-z0-9]+\.+)|([A-Za-z0-9]+\++))*[A-Za-z0-9]+#((\w+\-+)|(\w+\.))*\w{1,63}\.[a-zA-Z]{2,})$/"]
This gives me the following query:
SELECT `users`.* FROM `users` WHERE (email REGEXP '--- !ruby/regexp /^(|(([A-Za-z0-9]+_+)|([A-Za-z0-9]+\\-+)|([A-Za-z0-9]+\\.+)|([A-Za-z0-9]+\\++))*[A-Za-z0-9]+#((\\w+\\-+)|(\\w+\\.))*\\w{1,63}\\.[a-zA-Z]{2,})$/\n...\n')
I also tried to define this scope in the following way with the same result:
scope :not_match_email_regex, :conditions => ["email REGEXP '(|(([A-Za-z0-9]+_+)|([A-Za-z0-9]+\-+)|([A-Za-z0-9]+\.+)|([A-Za-z0-9]+\++))*[A-Za-z0-9]+#((\w+\-+)|(\w+\.))*\w{1,63}\.[a-zA-Z]{2,})'"]
The query it generates is:
SELECT `users`.* FROM `users` WHERE (email REGEXP '(|(([A-Za-z0-9]+_+)|([A-Za-z0-9]+-+)|([A-Za-z0-9]+.+)|([A-Za-z0-9]+++))*[A-Za-z0-9]+#((w+-+)|(w+.))*w{1,63}.[a-zA-Z]{2,})')
How can I fetch all records that match or don't match the given regex?

EDIT 12-11-30 small corrections partly according to the comment by #innocent_rifle
The suggested Regexp here is trying to make the same matches as in the original question
1. In my solution when I first wrote it I forgot that you must escape \ in strings because I was testing directly in MySQL. When discussing Regexps it's confusing to use Regexps in strings, so I will use this form instead e.g. /dot\./.source which (in Ruby) will give "dot\\.".
2. REGEXP in MySQL (manual for 5.6, tested in 5.0.67) are using "C escape syntax in strings", so WHERE email REGEXP '\.' is still the same as WHERE email REGEXP '.', to find the character "." you must use WHERE email REGEXP '\\.', to achieve that you must use the code .where([ 'email REGEXP ?', "\\\\."]). It's more readable to use .where([ 'email REGEXP ?', /\\./.source ]) (MySQL needs 2 escapes). However, I prefer to use .where([ 'email REGEXP ?', /[.]/.source ]), then I don't have to worry about how many escapes you need.
3. You don't need to escape "-" in a Regexp, not when using that in [] either as long as that character is the first or the last.
Some errors I found: it's the first regexp-or "|" in you expression, and it should be as a String in the query, or using Regexp#source which I prefer. There was also an extra quote at the end I think.
Except from that are you really sure the regexps works. If you try it in the console on a string?
Also be aware of that you won't catch emails with NULL in db, in that case you must add (<your existing expr in parentheses>) OR IS NULL
Regexp syntax in my MySQL verion.
I also tested what #Olaf Dietsche wrote in his suggestion, it seems that it's not needed, but it's strongly recommended to follow the standard syntax anyway (NOT (expr REGEXP pat) or expr NOT REGEXP pat).
I have done some checking, these things must be changed: use [A-Za-z0-9_] instead of \w, and \+ is not valid, you must use \\+ ("\\\\+" if string), easier with [+] (in both Regexp or string).
It leads to following REGEXP in MySQL
'^(([A-Za-z0-9]+_+)|([A-Za-z0-9]+-+)|([A-Za-z0-9]+[.]+)|([A-Za-z0-9]+[+]+))*[A-Za-z0-9]+#(([A-Za-z0-9]+-+)|([A-Za-z0-9]+[.]))*[A-Za-z0-9]{1,63}[.][a-zA-Z]{2,}$'
Small change suggestions
I don't understand your regexp exactly, so this is only changing your regexp without changing what it will find.
First: change the whole string as I described above
Then change
(([A-Za-z0-9]+_+)|([A-Za-z0-9]+-+)|([A-Za-z0-9]+[.]+)|([A-Za-z0-9]+[+]+))*
to
([A-Za-z0-9]+[-+_.]+)*
and
#(([A-Za-z0-9]+-+)|([A-Za-z0-9]+[.]))*
to
#([A-Za-z]+[-.]+)*
Final code (change to ..., :conditions => ...syntax if you prefer that). I tried to make this find the same strings as in the comment by #innocent_rifle, only adding "_" in expressions to the right of #
.where([ 'NOT (email REGEXP ?)', /^([A-Za-z0-9]+[-+_.]+)*[A-Za-z0-9]+#([A-Za-z0-9]+[-._]+)*[A-Za-z0-9_]{1,63}[.][A-Za-z]{2,}$/.source ])

For validating email addresses, you might want to consider How to Find or Validate an Email Address. At least, this regexp looks a bit simpler.
According to MySQL - Regular Expressions the proper syntax is
expr REGEXP pat
for a match, and
expr NOT REGEXP pat or NOT (expr REGEXP pat)
for the opposite. Don't forget the braces in the second version.

Related

SQL Regex last character search not working

I'm using regex to find specific search but the last separator getting ignore.
Must search for |49213[A-Z]| but searches for |49213[A-Z]
SELECT * FROM table WHERE (data REGEXP '/\|49213[A-Z]+\|/')
Why are you using | in the pattern? Why the +?
SELECT * FROM table WHERE (data REGEXP '\|49213[A-Z]\|')
If you want multiple:
SELECT * FROM table WHERE (data REGEXP '\|49213[A-Z]+\|')
or:
SELECT * FROM table WHERE (data REGEXP '[|]49213[A-Z][|]')
Aha. That is rather subtle.
\ escapes certain characters that have special meaning.
But it does not seem to do so for | ("or") or . ("any byte"), etc.
So, \| is the same as |.
But the regexp parser does not like having either side of "or" being empty. (I suspect this is a "bug"). Hence the error message.
https://dev.mysql.com/doc/refman/5.7/en/regexp.html says
To use a literal instance of a special character in a regular expression, precede it by two backslash () characters. The MySQL parser interprets one of the backslashes, and the regular expression library interprets the other. For example, to match the string 1+2 that contains the special + character, only the last of the following regular expressions is the correct one:
The best fix seems to be [|] or \\| instead of \| when you want the pipe character.
Someday, the REGEXP parser in MySQL will be upgraded to PCRE as in MariaDB. Then a lot more features will come, and this 'bug' may go away.

Passing parameters to SQL regular expressions

I was wondering if it is possible to pass parameters to regular expressions as if they were literal strings in the MySQL REGEXP function. What I would like to do is the following:
SELECT ? REGEXP CONCAT('string', ?, 'string')
Now when I pass a dot (".") to the second parameter, it will automatically match any character, as expected. This means that strings like "stringastring" and "stringbstring" match the pattern. I wondered if it is possible to match the literal dot only, so as to only match "string.string" in this case. Is there a way to do such a thing with a MySQL regular expression, that does not involve explicitly escaping the parameter (which defeats the purpose of passing parameters in this first place)?
Try putting brackets, as in:
SELECT ? REGEXP CONCAT('string', '[.]', 'string')
See here: http://sqlfiddle.com/#!2/a3059/1
If I understand your question correctly I think you are looking for this:
SELECT ? REGEXP CONCAT('string', REPLACE(?, '.', '[.]'), 'string')
using the REPLACE function, any dot is always escaped to [.], but all others special characters are passed literally.
For those who are using PHP and looking for answer related to this I found an answer related to this in stackover here, though not directly for MySQL queries. The idea is to use preg_quote() to escape regexp meta characters. But in your case you have to apply it twice to your parameter.
$param = preg_quote(preg_quote($param))
Then
SELECT ? REGEXP CONCAT('string', $param, 'string')
Read this article and the comments to find out more

Using MySQL LIKE operator for fields encoded in JSON

I've been trying to get a table row with this query:
SELECT * FROM `table` WHERE `field` LIKE "%\u0435\u0442\u043e\u0442%"
Field itself:
Field
--------------------------------------------------------------------
\u0435\u0442\u043e\u0442 \u0442\u0435\u043a\u0441\u0442 \u043d\u0430
Although I can't seem to get it working properly.
I've already tried experimenting with the backslash character:
LIKE "%\\u0435\\u0442\\u043e\\u0442%"
LIKE "%\\\\u0435\\\\u0442\\\\u043e\\\\u0442%"
But none of them seems to work, as well.
I'd appreciate if someone could give a hint as to what I'm doing wrong.
Thanks in advance!
EDIT
Problem solved.
Solution: even after correcting the syntax of the query, it didn't return any results. After making the field BINARY the query started working.
As documented under String Comparison Functions:
Note
Because MySQL uses C escape syntax in strings (for example, “\n” to represent a newline character), you must double any “\” that you use in LIKE strings. For example, to search for “\n”, specify it as “\\n”. To search for “\”, specify it as “\\\\”; this is because the backslashes are stripped once by the parser and again when the pattern match is made, leaving a single backslash to be matched against.
Therefore:
SELECT * FROM `table` WHERE `field` LIKE '%\\\\u0435\\\\u0442\\\\u043e\\\\u0442%'
See it on sqlfiddle.
it can be useful for those who use PHP, and it works for me
$where[] = 'organizer_info LIKE(CONCAT("%", :organizer, "%"))';
$bind['organizer'] = str_replace('"', '', quotemeta(json_encode($orgNameString)));

How to remove fake names using regular expression in mysql query?

I want to remove the names which may be registered with fake names.
As the developer forgot to put validation on form registration.
Now i want to remove the fake names.
And for checking if that name is fake or not, I am checking if the name content any numbers or not ?
This is my query which i have written but its not working...
SELECT registration.regi_id, student.first_name,
student.cont_no, student.email_id,
registration.college,
registration.event_name,
registration.accomodation
FROM student, registration
WHERE student.stud_id = registration.stud_id
AND student.first_name NOT RLIKE '%[0-9]%'
How to fix this problem ?
Sorry for my language issues,
P.S.
There are many names in "first_name" field like "asdfasdf12323", i don't want that kind of names to be shown on list.
Your column may contain Alphanumeric characters also.YOu need to filter Numbers and Alphanumeric characters both
For Alphanumeric characters Try REGEXP '^[A-Za-z0-9]+$'
For numbers Try REGEXP '[0-9]'
Well as far as the regex is involved, your expression is only looking for a single number. Also, your 'NOT RLIKE' isn't using regex but is doing a basic string search for the literal '[0-9]' I believe. MySql has support for regex, and your last clause would look like so: AND student.first_name NOT REGEXP '[0-9]*'

SQL LIKE wildcard space character

let's say I have a string in which the words are separated by 1 or more spaces and I want to use that string in and SQL LIKE condition. How do I make my SQL and tell it to match 1 or more blank space character in my string? Is there an SQL wildcard that I can use to do that?
Let me know
If you're just looking to get anything with atleast one blank / whitespace then you can do something like the following WHERE myField LIKE '% %'
If your dialect allows it, use SIMILAR TO, which allows for more flexible matching, including the normal regular expression quantifiers '?', '*' and '+', with grouping indicated by '()'
where entry SIMILAR TO 'hello +there'
will match 'hello there' with any number of spaces between the two words.
I guess in MySQL this is
where entry RLIKE 'hello +there'
I know this is late, but I never found a solution to this in relation to a LIKE question.
There is no way to do what you're wanting within a SQL LIKE. What you would have to do is use REGEXP and [[:space:]] inside your expression.
So to find one or more spaces between two words..
WHERE col REGEXP 'firstword[[:space:]]+secondword'
Another way to match for one or more space would be to use [].
It's done like this.
LIKE '%[ ]%'
This will match one or more spaces.
you can't do this using LIKE but what you can do, if you know this condition can exist in your data, is as you're inserting the data into the table, use regular expression matching to detect it up front and set a flag in a different column created for this purpose.
I just replace the whitespace chars with '%'. Lets say I want to do a LIKE query on a string like this 'I want to query this string with a LIKE'
#search_string = 'I want to query this string with a LIKE'
#search_string = ("%"+#search_string+"%").tr(" ", "%")
#my_query = MyTable.find(:all, :conditions => ['my_column LIKE ?', #search_string])
first I add the '%' to the start and end of string with
("%"+#search_string+"%")
and then replace other remaining whitespace chars with '%' like so
.tr(" ", "%")
http://www.techonthenet.com/sql/like.php
The patterns that you can choose from are:
% allows you to match any string of any length (including zero length)
_ allows you to match on a single character
I think that the question is not asking to match any spaces but to match two strings one a pattern and the other with wrong number of spaces because of typos.
In my case I have to check two fields from different tables one preloaded and the other filled typed by users so sometimes they don't respect 100% the pattern.
The solution was to use LIKE in the join
Select table1.field
from table1
left join table2 on table1.field like('%' + replace(table2.field,' ','%')+'%')
if the condition:
WHERE myField LIKE '%Hello world%'
doesn't work try
WHERE myField LIKE '%Hello%'
and
WHERE myField LIKE '%world%'
this approach is helpful in a few specific use cases, hope this helps.