mySQL Regexp with square brackets - mysql

I am trying to match strings like '[sometext<someothertext>]' (i.e., left square bracket, text, left angle bracket, text, right angle bracket, right square bracket) within a column in mySQL. Originally I used the following query (notice that since regex queries are escaped twice in mySQL, you must use two backslashes where you would normally use one):
SELECT * FROM message WHERE msgtext REGEXP '\\[(.+)?<(.+)?>\\]'
This query received no errors, but it returned things I didn't want. Instead of the (.+), I wanted [^\]] (match everything except a right square bracket). When I changed the query, I got the following error: "Got error 'repetition-operator operand invalid' from regexp"
After reading through the mySQL documentation here, it states "To include a literal ] character, it must immediately follow the opening bracket [." Since I want "^\]" instead of "]", is this even possible since the bracket can't be the first character after the opening bracket? Below are some of the queries I have tried which get the same error listed above:
SELECT * FROM message WHERE msgtext REGEXP '\\[([^\\]]+?)<([^\\]]+?)>\\]'
SELECT * FROM message WHERE msgtext REGEXP '\\[[^\\]]+?<[^\\]]+?>\\]'
SELECT * FROM message WHERE msgtext REGEXP '\\[[^[.right-square-bracket.]]]+?<[^[.right-square-bracket.]]]+?>\\]'
UPDATE:
The following query runs without errors, but does not return any rows even though I know there are columns which match what I am looking for (based on my original query at the top):
SELECT * FROM message WHERE msgtext REGEXP '\\[([^\\]]+)?<([^\\]]+)?>\\]'

This works for me:
SELECT '[sometext<someothertext>]' REGEXP '\\[([^[.right-square-bracket.]]+)?<([^[.right-square-bracket.]]+)?>\\]$';

Your final regex looks correct and works in Firefox/JS once the slashes are unescaped. Doesn't look like MySQL supports capture groups natively though... Maybe that's the problem.
Perhaps this would useful: http://mysqludf.com/lib_mysqludf_preg/
Also, you might try a * instead of +? for your negated right squares.
* means 0 or more repetitions (greedy)
+? means 1 or more repetitions (lazy)

Related

SQL Regex last character search not working

I'm using regex to find specific search but the last separator getting ignore.
Must search for |49213[A-Z]| but searches for |49213[A-Z]
SELECT * FROM table WHERE (data REGEXP '/\|49213[A-Z]+\|/')
Why are you using | in the pattern? Why the +?
SELECT * FROM table WHERE (data REGEXP '\|49213[A-Z]\|')
If you want multiple:
SELECT * FROM table WHERE (data REGEXP '\|49213[A-Z]+\|')
or:
SELECT * FROM table WHERE (data REGEXP '[|]49213[A-Z][|]')
Aha. That is rather subtle.
\ escapes certain characters that have special meaning.
But it does not seem to do so for | ("or") or . ("any byte"), etc.
So, \| is the same as |.
But the regexp parser does not like having either side of "or" being empty. (I suspect this is a "bug"). Hence the error message.
https://dev.mysql.com/doc/refman/5.7/en/regexp.html says
To use a literal instance of a special character in a regular expression, precede it by two backslash () characters. The MySQL parser interprets one of the backslashes, and the regular expression library interprets the other. For example, to match the string 1+2 that contains the special + character, only the last of the following regular expressions is the correct one:
The best fix seems to be [|] or \\| instead of \| when you want the pipe character.
Someday, the REGEXP parser in MySQL will be upgraded to PCRE as in MariaDB. Then a lot more features will come, and this 'bug' may go away.

digit character class in regexp doesn't work in MySQL for me

The following query
SELECT "T2N1M0" REGEXP "^T[:digit:].*";
returns single row with 0 for me.
I would expect it return 1.
What I am doing wrong?
You are missing one level of square brackets []:
SELECT "T2N1M0" REGEXP "^T[[:digit:]].*";
You should have gotten this error message that hints at the problem:
Got error 'POSIX named classes are supported only within a class at offset ' from regexp
More one the syntax for regular expressions are given by the manual page 13.5.2 Regular Expressions.

SQL query to select strings that contain a "Unit Separator" character

I have table like this
I want get those record which content Unit Separator
I have try many things but not getting result.I try with char(31) and 0x1f and many other ways but not getting desired result.This is my query which i try
SELECT * FROM `submissions_answers` WHERE `question_id`=90 AND `answer` like '%0x1f%'
How can i do this? Please help me..
Problem
The expression you tried won't work because answer LIKE '%0x1f%' is looking for a string with literally '0x1f' as part of it - it doesn't get converted to an ASCII code.
Solutions
Some alternatives to this part of the expression that ought to work are:-
answer LIKE CONCAT('%', 0x1F, '%')
answer REGEXP 0x1F
INSTR(answer, 0x1F) > 0
Further consideration
If none of these work then there may be a further possibility. Are you sure the character seen in the strings is actually 0x1F? I only ask because the first thing I tried was to paste in ␟ but it turns out MySQL see this as a decimal character code of 226 rather than 31. Not sure which client you are using but if the 0x1F character is in the string, it might not actually appear in the output.
Demo
Some tests demonstrating the points above: SQL Fiddle demo
You can use:
SELECT * FROM submissions_answers WHERE question_id=90 AND instr(answer,char(31))>0
The keyword here being the INSTR MySQL function, which you can read about here. This function returns the position of the first occurrence of substring (char(31)) in the string (answer).
Yet another way...
SELECT * FROM `submissions_answers`
WHERE `question_id`=90
AND HEX(`answer`) REGEXP '^(..)*1F'
Explanation of the regexp:
^ - start matching at the beginning (of answer)
(..)* -- match any number (*) of 2-byte things (..)
then match 1F, the hex for US.
You could convert the answer column into a HEX value, and then look for values containing that hex string.
SELECT * FROM `submissions_answers`
WHERE HEX(`answer`) LIKE '%E2909F%'

MySQL regex matching at least 2 dots

Consider the following regex
#(.*\..*){2,}
Expected behaviour:
a#b doesnt match
a#b.c doesnt match
a#b.c.d matches
a#b.c.d.e matches
and so on
Testing in regexpal it works as expected.
Using it it in a mysql select doesn't work as expected. Query:
SELECT * FROM `users` where mail regexp '#(.*\..*){2,}'
is returning lines like
foo#example.com
that should not match the given regex. Why?
I think the answer to your question is here.
Because MySQL uses the C escape syntax in strings (for example, “\n”
to represent the newline character), you must double any “\” that you
use in your REGEXP strings.
MYSQL Reference
Because your middle dot wasn't properly escaped it was treated as just another wildcard and in the end your expression was effectively collapsed to #.{2,} or #..+
#anubhava's answer is probably a better substitute for what you tried to do though I would note #dasblinkenlight's comment about using the character class [.] which will make it easy to drop in a regex you've already tested in at RegexPal.
You can use:
SELECT * FROM `users` where mail REGEXP '([^.]*\\.){2}'
to enforce at least 2 dots in mail column.
I would match two dots in MySQL using like:
where col like '%#.%.%'
The problem with your code is that .* (match-everything dot) matches dot '.' character. Replacing it with [^.]* fixes the problem:
SELECT *
FROM `users`
where mail regexp '#([^.]*[.]){2,}'
Note the use of [.] in place of the equivalent \.. This syntax makes it easier to embed the regex into programming languages that use backslash as escape character in their string literals.
Demo.

MySql regexp escaping apostrophe(’)

I can't find a proper way to escape apostrophe sign(’) in my mysql query. Regexp I have, works fine with online tools for regexp testing.
Problematic example is the string G’Schlössl.
I want to have optional apostrophe sign in the query in front of the s character G(’?)Schlö(’?)ssl for all the different cases which could occur in other strings. I am not sure if the problem is caused by incorrect sign escaping but I have tried many options like ’?, \’?, \’{0,1} which works for the first occurrence but doesn't for the second optional one and cause query to return nothing. Other possibilities like ’’?, [’]?, [\’]?, [\’]{0,1} does not work even for the first one.
select id, name from restaurant where name regexp '.*g\’?(s|ß|ss|sz)chl(o|ö|oe)\’?s.*';
When I remove the last \’? it works:
select id, name from restaurant where name regexp '.*g\’?(s|ß|ss|sz)chl(o|ö|oe)s.*';
When I replace the last \’? with x? it works as well:
select id, name from restaurant where name regexp '.*g\’?(s|ß|ss|sz)chl(o|ö|oe)x?s.*';
Any ideas where the problem is or what else to try?
This thread explains escaping normal single quote only, which seems not to work in my case.
Instead of \’?, try (’)?. I'm thinking that the ? may apply to only the last byte of ’. By using parentheses instead, the ? applies to the entire 3 bytes (hex E28099) of the "RIGHT SINGLE QUOTATION MARK".