How to use regex flags in Mariadb's regexp_replace? - mysql

I have a table with records. A record has a field content that contains some html like <p><img src=\"/pictures/image.jpg\" vspace=\"6\" hspace=\"6\" align=\"left\" alt=\"Alt text\" title=\"Title Text\" width=\"260\"> Some text content...
I need to remove <a></a> tags that are now placed around <img>. There can be multiple <a><img></a> occurrences in the string. I kinda made a corresponding regexp and learnt about REGEXP_REPLACE function. Ideally I expect something like
UPDATE table_name SET content = REGEXP_REPLACE(content, '/<a\shref=\\?"\/pictures\/.+">(<img.+">)<\/a>/gmU', '\\1') WHERE id=1
to work out, but it doesn't. I don't understand where to put flags gmU. Also in the articles/docs I found on the internet I don't see flags like g (global) and U (ungreedy). Is it global and ungreedy by default? How to make it all work?
10.3.15-MariaDB.

In MariaDB you pass flags to REGEXP_REPLACE by in-lining them in the regex using (?x) notation, where x is the flag. REGEXP_REPLACE by default replaces all occurrences of pattern in the string, so you don't need the g flag; nor in your case do you need the multi-line flag m as you are not attempting to use beginning/end of line anchors. You can use U though in place of the ? modifier to make + non-greedy.
There's a couple of issues with your regex:
MariaDB does not require regexes to be contained with /
\s represents a literal s and needs to be \\s
To match a literal \ you need to use \\\\, not \\
This regex should give you the results you want:
(?U)<a\\s.*href=\\\\?"/pictures.+(<img.+>)</a>
In a query:
SELECT REGEXP_REPLACE(content, '(?U)<a\\s.*href=\\\\?"/pictures.+(<img.+>)</a>', '\\1')
FROM test
Demo on dbfiddle

Related

regexp mysql group

I try get name of city's from string '{"travelzoo_hotel_name":"Graduate Minneapolis","travelzoo_hotel_id":"223","city":"Minneapolis","country":"USA","sales_manager":"Stephen Conti"}'
I try this regexp:
SELECT REGEXP_SUBSTR('{\"travelzoo_hotel_name\":\"Graduate Minneapolis\",\"travelzoo_hotel_id\":\"223\",\"city\":\"Minneapolis\",\"country\":\"USA\",\"sales_manager\":\"Stephen Conti\"}'
,'(?:.city...)([[:alnum:]]+)');
I have: '"city":"Minneapolis'
Me need only name of city:Minneapolis.
How to use groups in queries?
My example in regex101
Help me Please
I assume you are using MySQL 8.x that uses ICU regex expressions.
It looks like the string you want to process is JSON. You may use JSON_EXTRACT with JSON_UNQUOTE and a '$.city' as JSON path then:
JSON_UNQUOTE(JSON_EXTRACT('{"travelzoo_hotel_name":"Graduate Minneapolis","travelzoo_hotel_id":"223","city":"Minneapolis","country":"USA","sales_manager":"Stephen Conti"}', '$.city'))
will return Minneapolis.
In your regex, the non-capturing group pattern is still matched and appended to the match value. "Non-capturing" only means no separate memory buffer is alotted to the text captured with a grouping construct. So, you may fix it with '(?<="city":")[^"]+' pattern where (?<="city":") is a positive lookbehind that matches "city":" but does not put it into the match value. The only text you will have in the output is the one matched with [^"]+, 1+ chars other than ".

Issue with Regexp with mySQL query

I'm trying to build a search query which searches for a word in a string and finds matches based on the following criteria:
The word is surrounded by a combination of either a space, period or comma
The word is at the start of the string and ends with a space, period or comma
The word is at the end of the string and is followed by a space, period or comma
It's a full match, i.e. the entire string is just the word
For example, if the word is 'php' the following strings would be matches:
php
mysql, php, javascript
php.mysql
javascript php
But for instance it wouldn't match:
php5
I've tried the following query:
SELECT * FROM candidate WHERE skillset REGEXP '^|[., ]php[., ]|$'
However that doesn't work, it returns every record as a match which is wrong.
Without the ^| and |$ in there, i.e.
SELECT * FROM candidate WHERE skillset REGEXP '[., ]php[., ]'
It successfully finds matches where 'php' is somewhere in the string except the start and end of the string. So the problem must be with the ^| and |$ part of the regexp.
How can I add those conditions in to make it work as required?
Try '\bphp\b', \b is a word boundary and might just be exactly what you need because it looks for the whole word php.
For MySQL, word boundaries are represented with [[:<:]] and [[:>:]] instead of \b, so use the query '[[:<:]]php[[:>:]]'. More info on word boundaries here.
Well, you can play around a bit with regex101.com
Something I found that works for you but doesn't exactly follow your rules is:
/(?=[" ".,]?php[" ".,]?)(?=php[\W])/
This uses the lookahead operator, ?=, to do AND
The first portion of the regex is
[" ".,]?php[" ".,]?
This will match anything that has a space, period, or comma before or after the php, but at most only one.
The section portion of the regex is
php[\W]
This will match anything that is php, followed by a non-character. In other words, it will NOT match php followed by a character, digit, or underscore.
It's not the perfect answer for your set of rules, but it does work with your sample data set. Play around on regex101.com and try to make a perfect one.

MySQL regex matching at least 2 dots

Consider the following regex
#(.*\..*){2,}
Expected behaviour:
a#b doesnt match
a#b.c doesnt match
a#b.c.d matches
a#b.c.d.e matches
and so on
Testing in regexpal it works as expected.
Using it it in a mysql select doesn't work as expected. Query:
SELECT * FROM `users` where mail regexp '#(.*\..*){2,}'
is returning lines like
foo#example.com
that should not match the given regex. Why?
I think the answer to your question is here.
Because MySQL uses the C escape syntax in strings (for example, ā€œ\nā€
to represent the newline character), you must double any ā€œ\ā€ that you
use in your REGEXP strings.
MYSQL Reference
Because your middle dot wasn't properly escaped it was treated as just another wildcard and in the end your expression was effectively collapsed to #.{2,} or #..+
#anubhava's answer is probably a better substitute for what you tried to do though I would note #dasblinkenlight's comment about using the character class [.] which will make it easy to drop in a regex you've already tested in at RegexPal.
You can use:
SELECT * FROM `users` where mail REGEXP '([^.]*\\.){2}'
to enforce at least 2 dots in mail column.
I would match two dots in MySQL using like:
where col like '%#.%.%'
The problem with your code is that .* (match-everything dot) matches dot '.' character. Replacing it with [^.]* fixes the problem:
SELECT *
FROM `users`
where mail regexp '#([^.]*[.]){2,}'
Note the use of [.] in place of the equivalent \.. This syntax makes it easier to embed the regex into programming languages that use backslash as escape character in their string literals.
Demo.

Match optional end of line

Hey I want to use a regular expression in MySQL to match rows.
It needs to match rows where a the pattern ends with anything that's not a digit or the end of the line.
This pattern works in Ruby /download:223(?:[\D]|$)/
In MySQL it doesn't match. I'm guessing it doesn't allow for optional matching of eol.
SELECT id FROM stories WHERE body REGEXP 'download:223(?:[\D]|$)'
I need to match the following (quotes just for clarity):
"download:223"
"download:223*"
"download:223 something"
"download:223 more text"
But NOT the following (again quotes just for clarity):
"download:2234"
"download:2234 more text"
"download:2234*"
"download:2234* even more"
Thanks!
This regex should work for you:
"download:223([^0-9]|$)"
MySQL regex engine doesn't support \D, \d etc.
Non-capturing groups are not supported in MySQL regexes. The rest should be fine. It definitely supports $ matching the end of string. Also, \D is not supported, but you can use [^0-9]
Try this:
SELECT id FROM stories WHERE body REGEXP 'download:223([^0-9]|$)'
MySQL groups don't capture, so supporting non-capturing groups is unnecessary.
Reference source:
Using Non-Capturing Groups in MySQL REGEXP

What's wrong with this query? Paths

I'm trying to get info from a table using a browser path column in the table. This is what the query looks like:
select * from selwowscheduler sc
join browser b on sc.scheduledbrowser = b.browserid
where b.browserpath like '*iexplore C:\Program Files\Internet Explorer\iexplore.exe'
Thing is, this returns nothing. I can put %iexplore.exe instead of *iexplore C:\Program Files\Internet Explorer\iexplore.exe and that returns something (though more than I want).
I thought maybe it was the literals \ so I replaced the \ with \\, but that didn't work either (Still returns nothing).
Does anyone know why this isn't working?
Thanks.
EDIT: I know * is not a wild card, it is part of what is on the path. We use it to initiate different browsers on different PCs.
You have to escape a backslash like that \\\\. Try:
where b.browserpath like '%iexplore C:\\\\Program Files\\\\Internet Explorer\\\\iexplore.exe'
The problem is with the the Like syntax what is the * supposed to be matching against because * is not a special character % means match any character 0 or more and _ means match any one character.
Also if you cannot use like to accomplish what you need I would look into Regexp which uses regular expressions to match against and are usually more adaptable then simple Like comparisons
Well first off * is not a valid wildcard for mysql as near as I can tell which is why the query returns nothing (it is looking for a path with '*' in it). My guess without knowing exactly what you are looking for is that some variant of the % wildcard would work. It can be in the middle of the string such as:
where b.browserpath like 'C:%iexplore'
This would return all paths on "C" that end in iexplore. This:
where b.browserpath like 'C:\Program Files%.exe'
returns the paths to anything on "C:\Program Files" that has an ending of ".exe" and so on.