MySQL regex matching at least 2 dots - mysql

Consider the following regex
#(.*\..*){2,}
Expected behaviour:
a#b doesnt match
a#b.c doesnt match
a#b.c.d matches
a#b.c.d.e matches
and so on
Testing in regexpal it works as expected.
Using it it in a mysql select doesn't work as expected. Query:
SELECT * FROM `users` where mail regexp '#(.*\..*){2,}'
is returning lines like
foo#example.com
that should not match the given regex. Why?

I think the answer to your question is here.
Because MySQL uses the C escape syntax in strings (for example, “\n”
to represent the newline character), you must double any “\” that you
use in your REGEXP strings.
MYSQL Reference
Because your middle dot wasn't properly escaped it was treated as just another wildcard and in the end your expression was effectively collapsed to #.{2,} or #..+
#anubhava's answer is probably a better substitute for what you tried to do though I would note #dasblinkenlight's comment about using the character class [.] which will make it easy to drop in a regex you've already tested in at RegexPal.

You can use:
SELECT * FROM `users` where mail REGEXP '([^.]*\\.){2}'
to enforce at least 2 dots in mail column.

I would match two dots in MySQL using like:
where col like '%#.%.%'

The problem with your code is that .* (match-everything dot) matches dot '.' character. Replacing it with [^.]* fixes the problem:
SELECT *
FROM `users`
where mail regexp '#([^.]*[.]){2,}'
Note the use of [.] in place of the equivalent \.. This syntax makes it easier to embed the regex into programming languages that use backslash as escape character in their string literals.
Demo.

Related

How to use regex flags in Mariadb's regexp_replace?

I have a table with records. A record has a field content that contains some html like <p><img src=\"/pictures/image.jpg\" vspace=\"6\" hspace=\"6\" align=\"left\" alt=\"Alt text\" title=\"Title Text\" width=\"260\"> Some text content...
I need to remove <a></a> tags that are now placed around <img>. There can be multiple <a><img></a> occurrences in the string. I kinda made a corresponding regexp and learnt about REGEXP_REPLACE function. Ideally I expect something like
UPDATE table_name SET content = REGEXP_REPLACE(content, '/<a\shref=\\?"\/pictures\/.+">(<img.+">)<\/a>/gmU', '\\1') WHERE id=1
to work out, but it doesn't. I don't understand where to put flags gmU. Also in the articles/docs I found on the internet I don't see flags like g (global) and U (ungreedy). Is it global and ungreedy by default? How to make it all work?
10.3.15-MariaDB.
In MariaDB you pass flags to REGEXP_REPLACE by in-lining them in the regex using (?x) notation, where x is the flag. REGEXP_REPLACE by default replaces all occurrences of pattern in the string, so you don't need the g flag; nor in your case do you need the multi-line flag m as you are not attempting to use beginning/end of line anchors. You can use U though in place of the ? modifier to make + non-greedy.
There's a couple of issues with your regex:
MariaDB does not require regexes to be contained with /
\s represents a literal s and needs to be \\s
To match a literal \ you need to use \\\\, not \\
This regex should give you the results you want:
(?U)<a\\s.*href=\\\\?"/pictures.+(<img.+>)</a>
In a query:
SELECT REGEXP_REPLACE(content, '(?U)<a\\s.*href=\\\\?"/pictures.+(<img.+>)</a>', '\\1')
FROM test
Demo on dbfiddle

SQL Regex last character search not working

I'm using regex to find specific search but the last separator getting ignore.
Must search for |49213[A-Z]| but searches for |49213[A-Z]
SELECT * FROM table WHERE (data REGEXP '/\|49213[A-Z]+\|/')
Why are you using | in the pattern? Why the +?
SELECT * FROM table WHERE (data REGEXP '\|49213[A-Z]\|')
If you want multiple:
SELECT * FROM table WHERE (data REGEXP '\|49213[A-Z]+\|')
or:
SELECT * FROM table WHERE (data REGEXP '[|]49213[A-Z][|]')
Aha. That is rather subtle.
\ escapes certain characters that have special meaning.
But it does not seem to do so for | ("or") or . ("any byte"), etc.
So, \| is the same as |.
But the regexp parser does not like having either side of "or" being empty. (I suspect this is a "bug"). Hence the error message.
https://dev.mysql.com/doc/refman/5.7/en/regexp.html says
To use a literal instance of a special character in a regular expression, precede it by two backslash () characters. The MySQL parser interprets one of the backslashes, and the regular expression library interprets the other. For example, to match the string 1+2 that contains the special + character, only the last of the following regular expressions is the correct one:
The best fix seems to be [|] or \\| instead of \| when you want the pipe character.
Someday, the REGEXP parser in MySQL will be upgraded to PCRE as in MariaDB. Then a lot more features will come, and this 'bug' may go away.

SQL - How to use wildcard in LIKE as a normal character

if I have a column with some values that starts with "%" like this:
[ID]-----[VALUES]
1--------Amount
2--------Percentage
3--------%Amount
4--------%Percentage
how can I have only these two rows with a "select" query?:
[ID]-----[VALUES]
3--------%Amount
4--------%Percentage
I tried these queries but them don't work:
select * from TABLE where VALUES like '[%]%'
select * from TABLE where VALUES like '\%%'
I know that in Java, C and other languages, the backspace \ let you use a jolly character as a normal one like:
var s = "I called him and he sad: \"Hi, there!\"";
There is a similar character/function that do it in SQL?
All answers will be appreciated, thank you for reading the question!
Your query
select * from TABLE where VALUES like '\%%'
should work. The reason it doesn't is because you may have NO_BACKSLASH_ESCAPES enabled which would treat \ as a literal character.
A way to avoid it is using LIKE BINARY
select * from TABLE where VALUES like binary '%'
or with an escape character (can be any character you choose) specification.
select * from TABLE where VALUES like '~%%' escape '~'
try this :
select * from TABLE where VALUES like '%[%]%'
There is an ESCAPE option on LIKE:
select *
from TABLE
where VALUES like '$%%' escape '$';
Anything following the escape character is treated as a regular character. However, the default is backslash (see here), so the version with backslash should do what you want.
Of course, you could also use a regular expression (although that has no hope of using an index).
Note: escape is part of the answer standard so it should work in any database.
You're right that you'll need an escape character for this. In SQL you have to define the escape character.
SELECT * FROM TABLE where VALUES like ESCAPE '!';
I'm pretty sure you can use whatever character you want.
Here's a link to a microsoft explanation that goes into more detail.
Microsoft explanation
MySQL Explanation

Mysql regex error #1139 using literal -

I tried running this query:
SELECT column FROM table WHERE column REGEXP '[^A-Za-z\-\']'
but this returns
#1139 - Got error 'invalid character range' from regexp
which seems to me like the - in the character class is not being escaped, and instead read as an invalid range. Is there some other way that it's suppose to be escaped for mysql to be the literal -?
This regex works as expected outside of mysql, https://regex101.com/r/wE8vY5/1.
I came up with an alternative to that regex which is
SELECT column FROM table WHERE column NOT REGEXP '([:alpha:]|-|\')'
so the question isn't how do I get this to work. The question is why doesn't the first regex work?
Here's a SQL fiddle of the issue, http://sqlfiddle.com/#!9/f8a006/1.
Also, there is no language being used here, query is being run at DB level.
Regex in PHP: http://sandbox.onlinephpfunctions.com/code/10f5fe2939bdbbbebcc986c171a97c0d63d06e55
Regex in JS: https://jsfiddle.net/6ay4zmrb/
Just change the order.
SELECT column FROM table WHERE column REGEXP '[^-A-Za-z\']'
#Avinash Raj is correct the - must be first (or last). The \ is not an escape character in POSIX, which is what mysql uses, https://dev.mysql.com/doc/refman/5.1/en/regexp.html.
One key syntactic difference is that the backslash is NOT a metacharacter in a POSIX bracket expression.
-http://www.regular-expressions.info/posixbrackets.html
What special characters must be escaped in regular expressions?
Inside character classes, the backslash is a literal character in POSIX regular expressions. You cannot use it to escape anything. You have to use "clever placement" if you want to include character class metacharacters as literals. Put the ^ anywhere except at the start, the ] at the start, and the - at the start or the end of the character class to match these literally

Using MySQL LIKE operator for fields encoded in JSON

I've been trying to get a table row with this query:
SELECT * FROM `table` WHERE `field` LIKE "%\u0435\u0442\u043e\u0442%"
Field itself:
Field
--------------------------------------------------------------------
\u0435\u0442\u043e\u0442 \u0442\u0435\u043a\u0441\u0442 \u043d\u0430
Although I can't seem to get it working properly.
I've already tried experimenting with the backslash character:
LIKE "%\\u0435\\u0442\\u043e\\u0442%"
LIKE "%\\\\u0435\\\\u0442\\\\u043e\\\\u0442%"
But none of them seems to work, as well.
I'd appreciate if someone could give a hint as to what I'm doing wrong.
Thanks in advance!
EDIT
Problem solved.
Solution: even after correcting the syntax of the query, it didn't return any results. After making the field BINARY the query started working.
As documented under String Comparison Functions:
Note
Because MySQL uses C escape syntax in strings (for example, “\n” to represent a newline character), you must double any “\” that you use in LIKE strings. For example, to search for “\n”, specify it as “\\n”. To search for “\”, specify it as “\\\\”; this is because the backslashes are stripped once by the parser and again when the pattern match is made, leaving a single backslash to be matched against.
Therefore:
SELECT * FROM `table` WHERE `field` LIKE '%\\\\u0435\\\\u0442\\\\u043e\\\\u0442%'
See it on sqlfiddle.
it can be useful for those who use PHP, and it works for me
$where[] = 'organizer_info LIKE(CONCAT("%", :organizer, "%"))';
$bind['organizer'] = str_replace('"', '', quotemeta(json_encode($orgNameString)));