MySQL REGEXP_SUBSTR() escaping issue? - mysql

Please take the following example regex:
https://regexr.com/4ek7r
As you can see, the regex works great and matches the sizes (e.g. 3/16" etc) from the product descriptions.
I'm trying to implement this in MySQL 8.0.15 using REGEXP_SUBSTR()
As per the documentation I have doubled up the escape characters but the regex is not working.
Please see the following SQL fiddle:
https://www.db-fiddle.com/f/e6Ez3XCdU5Ahs91z6TQA8P/0
As you can see, REGEXP_SUBSTR() returns NULL
I'm presuming this is an escape issue - but i'm not 100% sure.
How can I ensure MySQL returns the 1st match per product (row) akin to the regexr.com example?
Cheers
Edit: 28/05/2019 - root cause
Wiktor's answer below solved my problem and his regex was much cleaner & well worth the upvote. That said, i didn't understand why my original version was not working after the port from SQL Server to MySQL. I finally noticed the problem this morning - it had nothing to do with the regex, it was a rookie error in string concatenation! Specifically, I was using UPPER(Description + ' ') (i.e. using +) - which works fine in SQL Server but obviously; MySQL forces numeric! So i was essentially running my regex against a 0! Replacing the + with CONCAT actually fixed my original query with original regex - just thought i'd share this in case it helps anyone else!

In MySQL v8.x that supports ICU regex, you may use
SELECT Description, REGEXP_SUBSTR(Description, '(?im)(?=\\b(?:[0-9/]+(?:\\.[0-9/]+)?\\s*(?:[X-]|$)|[0-9/\\s]+(?:\\.[0-9/]+)?(?:[CM]?M|["”TH])))[0-9/\\s.]+(?:[CM]?M|["”TH])?(?:\\s*[/X-]\\s*[0-9/\\s.]+(?:[CM]?M|["”TH])?)?(?=[.\\s()]|$)') AS Size FROM tbl_Example
The main points:
The flags can be used as inline options, (?mi), m will enable multiline mode when ^ and $ match start/end of a line and i will enable case insensitive mode
[$] matches a $ char, to match end of a line position, you need to move $ out of a character class, use alternations in this case ((?=[\.\s\(\)$]) -> (?=[.\s()]|$), yes, do not escape what does not have to be escaped, too)
Matching fractional number part, it is better to use a (?:\.[0-9/]+)? like pattern (it matches an optional sequence of . and then 1 or more digits or /s)
(C|M)? is better written as [CM]? (a character class is more efficient)

Related

How to make this REGEX below work for MySql?

I have written regex and tested it online, works fine. When I test in terminal, MySQL console, it doesn't match and I get an empty set. I believe MySQL regexp syntax is somehow different but I cannot find the right way.
This is data I use:
edu.ba;
medu.ba;
edu.ba;
med.edu.ba;
edu.com;
edu.ba
I should get only edu.ba matches including; if there is some. Works fine except in actual query.
(\;+|^)\bedu.ba\b(\;+|$|\n)
Is there anything I could change to get the same results?
You want to match edu.ba in between semi-colons or start/end of string. The word boundaries are redundant here (although if you want to experiment, the MySQL regex before MySQL v8 used [[:<:]] / [[:>:]] word boundaries, and in MySQL v8+, you need to use double backslashes with \b - '\\b').
Use
(;|^)edu[.]ba(;|$)
Details
(;|^) - ; or start of string
edu[.]ba - edu.ba literal string (dot inside brackets always matches a literal dot)
(;|$) - ; or end of string.

Extract Domain Name form url Using REGEXP_REPLACE in the Mysql Version 8

I am using latest version of Server version: 8.0.11 MySQL Community Server - GPL. Download link
Table structure is web_url < id, url >
Sample Data in the url column.
www.google.com
www.yahoo.com
how to extract domain name only like google and yahoo using the function
REGEXP_REPLACE. so it's meaning is the replace the www. and .com part be replace by space or with ''. if you have any other solution then you are welcome.
I am either missing some. So help.
I have written the regular expression but don't getting how use it properly in query. Please give query base answer.
my query
select REGEXP_REPLACE(url,'^(www.)[a-zA-z0-9]*(.[a-zA-Z]{3})(\/[a-zA-Z09-]*)*','') from web_url;
So far i done this query.
So far i done this may be someone get help
done this query to test for comparison is happening or not.
kindly ignore last two rows as they are just for testing purpose of regexp_like function
select regexp_like(url,'^(www.)(.*)(.com)$') from web_url;
If url is misleading and is in fact an FQDN, what the comments suggest,
reverse(regexp_replace(reverse(regexp_replace(url, '\\.[^\\.]*\\.?$', '')), '\\..*', ''))
should give you the second label from the right (or the third if there is the optional empty label at the end), what seems to be what you want.
The reverse() are necessary since we don't know how many dots are in the string. Because of the reverse() after we've removed the the last (or the last two) labels, we can simply remove from the first dot on.
If there were capture groups with regexp_replace() it would be easier, but I didn't find anything about that in the documentation. Please comment and point me to that, if I just missed that.

Ideas for Find and Replace character

I need to search address fields and change one character to upper case if there is an apartment number. So '521 Main St. #3b' would change to '521 Main St. #3B'.
The way I know to do this would be to write a program that loops through the recordset, looks at the address field for the last character to see if it's an alpha, then if the character before it is a numeric, change the case of the last char and update the record.
Is this something that would be quicker/simpler with regular expressions (haven't ever used)?
If so, is this best done from within a programming environmnet or using a text editor such as Textmate or vi ? The data is in MySQL and Excel, but I can export it to a text file.
Thanks.
I solved this using TextMate which, once I began to understand a little regex, was simple. (details here Regex Syntax for making the last character Uppercase in TextMate)
Still, I wonder if something like sed or awk, (which I started to try out) might be a better tool. And the SQL solution that Olexa provided works. I just don't know how to have it apply to the entire recordset.
If the data is stored in MySQL, then it is better to process it there:
UPDATE addresses
SET address = CONCAT(LEFT(address, CHAR_LENGTH(address) - 1), UPPER(RIGHT(address, 1)))
WHERE address REGEXP BINARY '#[[:digit:]]+[[:lower:]]{1}$'
;
I've added BINARY because otherwise REGEXP is not case-sensitive, but BINARY may need to be omitted to support multi-byte strings. In this case, surplus updates will be made, but the result would be correct anyway.
P. S. An example on SQL Fiddle showing which values are affected, and how they are affected: http://sqlfiddle.com/#!2/b29326/1

Regex Search in phpMyAdmin

Attempting to change the "files" folder location in a Drupal site from /files to /sites/default/files.
In order to avoid changing anything else such as
http://www.google.com/profiles/
I'm trying to use a basic regular expression with a word boundary.
\bfiles/
A quick check in regexpal is working as expected, but when I enter the above in the phpMyAdmin search , checking the "as regular expression" checkbox, I don't get the expected result.
Two questions:
How should I write my expression with a word boundary so that it works in phpMyAdmin?
I'm really a newbie at SQL statements! Would it be possible to write a SQL query that would simply look for every occurrence of "files/" & replace it with "sites/default/files/"?
According to the MySql docs, the regex flavour used is POSIX 1003.2. For this flavour of regex, word boundaries are as follows:
[[:<:]] (beginning) [[:>:]] (end)
so your regex would be:
[[:<:]]files/
If you want to use sql to search and replace all instances of [[:<:]]files/ from a specific field in a table, you could use a UDF such as the one found here
Also, you should be aware of the following while using regex with MySql:
Because MySQL uses the C escape syntax in strings (for example, “\n”
to represent the newline character), you must double any “\” that you
use in your REGEXP strings.

MySQL: Find and Replace Between Certain Characters

In field post_content I have a string like this in nearly 800 rows:
http://somesite.com/">This is some site</a>
I need to remove everything from "> onwards so that it leaves just the URL. I can't do a straight find and replace because the text is unique.
Any clues? This is really my first foray into MySQL database modifications but I did do an extensive search before posting here.
Thanks,
~Kyle~
From this site: http://www.regular-expressions.info/mysql.html
LIB_MYSQLUDF_PREG
If you want more regular expression power in your database, you can consider using LIB_MYSQLUDF_PREG. This is an open source library of MySQL user functions that imports the PCRE library. LIB_MYSQLUDF_PREG is delivered in source code form only. To use it, you'll need to be able to compile it and install it into your MySQL server. Installing this library does not change MySQL's built-in regex support in any way. It merely makes the following additional functions available:
Here it comes...
PREG_CAPTURE extracts a regex match from a string. PREG_POSITION returns the position at which a regular expression matches a string. PREG_REPLACE performs a search-and-replace on a string. PREG_RLIKE tests whether a regex matches a string.
Sounds exactly what you're looking for.
All these functions take a regular expression as their first parameter. This regular expression must be formatted like a Perl regular expression operator. E.g. to test if regex matches the subject case insensitively, you'd use the MySQL code PREG_RLIKE('/regex/i', subject). This is similar to PHP's preg functions, which also require the extra // delimiters for regular expressions inside the PHP string.
See this post: How to do a regular expression replace in MySQL?
Either that or you could just write a script in any lanugage which goes through each record, does a regex replacement and then updates the field. For more info on regex, see here: http://www.regular-expressions.info/reference.html
There's a number of options. One might be to use SUBSTRING_INDEX():
UPDATE
table
SET field = SUBSTRING_INDEX( field, '">', 1 )
It's possible - there is a syntax for User Defined Functions which would let you pass in a regular expression pattern that matches the link and strips everything else.
However, this is quite complicated for somebody new to MySQL, and from your question, this sounds like a one-off. In which case - why not just use Excel and then reimport the data?
Great stuff!
All seems doable with a little bit of time and self education.
In the end, I exported that table as a CSV in Sequel Pro and did some nifty find and replace work in Coda. Not as sophisticated as your suggestions, but it worked.
Thanks again,
~Kyle~