MySQL: Select regex group [duplicate] - mysql

How to reference to a group using a regex in MySQL?
I tried:
REGEXP '^(.)\1$'
but it does not work.
How to do this?

(Old question, but top search result)
For MySQL 8:
SELECT REGEXP_REPLACE('stackoverflow','(.{5})(.*)','$2$1');
-- "overflowstack"
You can create capture groups with (), and you can refer to them using $1, $2, etc.
For MariaDB, capturing is done in REGEXP_REPLACE with \\1, \\2, etc. respectively.

You can't, there is no way to reference regex capturing groups in MySql.

You can solve this problem by nesting the function calls in your query. Say you have this string in your column:
'100 SOME ST,THE VILLAGES,FL 32163,USA'
and you want to capture the city name. A Capture Group like this would work if MySQL supported it (but it doesn't):
'^[0-9A-Z\s]+,\s*([a-zA-Z\s]*)'
You CAN nest function calls to strip off the part you don't want, and then grab the part you DO want like this:
SELECT REGEXP_SUBSTR(REGEXP_REPLACE(column_name, '^[0-9\\sA-Z]+,', ''), '^[0-9\\sA-Z]+') FROM table_name;
THE VILLAGES
...

Related

RegExp in mysql for field

I have the following query:
SELECT item from table
Which gives me:
<title>Titanic</title>
How would I extract the name "Titanic" from this? Something like:
SELECT re.find('\>(.+)\>, item) FROM table
What would be the correct syntax for this?
By default, MySQL does not provide functionality for extracting text using regular expressions. You can use REGEXP to find rows that match something like >.+<, but there is no straightforward way of extracting the captured group without some additional effort, such as:
using a library like lib_mysqludf_preg
writing your own MySQL function to extract matched text
performing regular string manipulation
using the regex functionality of whatever environment you're using MySQL from (e.g. PHP's preg_match)
reconsidering your need for regular expressions entirely. If you know that all your rows contain a <title> tag, for instance, it may be a better idea to simply use "normal" string functions such as SUBSTRING
As pointed out in the informative answer by George Bahij MySQL lacks this functionality so the options would be to either extend the functionality using udfs etc, or use the available string functions, in which case you could do:
SELECT
SUBSTR(
SUBSTRING_INDEX(
SUBSTRING_INDEX(item,'<title>',2)
,'</title>',1)
FROM 8
)
from table
Or if the string you need to extract from always is on the format <title>item</title> then you could simple use replace: replace(replace(item, '<title>', ''), '</title>','')
This regex: <\w+>.+</\w+> will match content in tags.
Your query should be something like:
SELECT * FROM `table` WHERE `field` REGEXP '<\w+>.+</\w+>';
Then if you're using PHP or something similar you could use a function like strip_tags to extract the content between the tags.
XML shouldn't be parsed with regexes, and at any rate MySQL only supports matching, not replacement.
But MySQL supports XPath 1.0. You should be able to simply do this:
SELECT ExtractValue(item,'/title') AS item_title FROM table;
https://dev.mysql.com/doc/refman/5.6/en/xml-functions.html

Extract email address from mysql field

I have a longtext column "description" in my table that sometimes contains an email address. I need to extract this email address and add to a separate column for each row. Is this possible to do in MySQL?
Yes, you can use mysql's REGEXP (perhaps this is new to version 5 and 8 which may be after this question was posted.)
SELECT *, REGEXP_SUBSTR(`description`, '([a-zA-Z0-9._%+\-]+)#([a-zA-Z0-9.-]+)\.([a-zA-Z]{2,4})') AS Emails FROM `mytable`;
You can use substring index to capture email addresses...
The first substring index capture the account.
The second substring_index captures the hostname. It is necessary to pick the same email address in case the are multiple atso (#) stored in the column.
select concat( substring_index(substring_index(description,'#',1),' ',-1)
, substring_index(substring_index( description,
substring_index(description,'#',1),-1),
' ',1))
You can't select matched part only from Regular expression matching using pure Mysql. You can use mysql extension (as stated in Return matching pattern, or use a scripting language (ex. PHP).
MySQL does have regular expressions, but regular expressions are not the best way to match email addresses. I'd strongly recommend using your client language.
If you can install the lib_mysqludf_preg MySQL UDF, then you could do:
SET #regex = "/([a-z0-9!#\$%&'\*\+\/=\?\^_`\{\|\}~\-]+(?:\.[a-z0-9!#\$%&'\*\+\/=\?^_`{\|}~\-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+(?:[A-Z]{2}|aero|arpa|asia|biz|cat|com|coop|edu|gov|info|int|jobs|mil|mobi|museum|name|net|org|post|pro|tel|travel|xxx))/i";
SELECT
PREG_CAPTURE(#regex, description)
FROM
example
WHERE
PREG_CAPTURE(#regex, description) > '';
to extract the first email address from the description field.
I can't think of another solution, as the REGEXP operator simply returns 1 or 0, and not the location of where the regular expression matched.

MySql : query to format a specific column in database table

I have one column name phone_number in the database table.Right now the numbers stored in the table are format like ex.+91-852-9689568.I want to format it and just want only digits.
How can i do it in MySql ? I have tried it with using functions like REGEXP but it displays error like function does not exist.And i don't want to use multiple REPLACE.
One of the options is to use mySql substring. (As long as the format doesn't change)
SELECT concat(SUBSTRING(pNo,2,2), SUBSTRING(pNo,5,3), SUBSTRING(pNo,9,7));
if you want to format via projection only, use SELECT, you will only need to use replace twice and no problem with that.
SELECT REPLACE(REPLACE(columnNAme, '-', ''), '+', '')
FROM tableName
otherwise, if you want to update the value permanently, use UPDATE
UPDATE tableName
SET columnName = REPLACE(REPLACE(columnNAme, '-', ''), '+', '')
MySQL does not have a builtin function for pattern-matching and replace.
You'll be better off fetching the whole string back to your application, and then using a more flexible string-manipulation function on it. For instance, preg_replace() in PHP.
Try the following and comment please.
Select dbo.Regex('\d+',pNo);
Select dbo.Regex('[0-9]+',pNo);
Reference on RUBLAR.
So MYSQL is not like Oracle, hence you may just use a USer defined Function to get numbers. This could get you going.

How to select all distinct filename extensions from table of filenames?

I have a table of ~20k filenames. How do I select a list of the distinct extensions? A filename extension can be considered the case insensitive string after the last .
You can use substring_index:
SELECT DISTINCT substring_index(column_containing_file_names,'.',-1) FROM table
-1 means it will start searching for the '.' from the right side.
there is A very cool and powerful capability in MySQL and other databases is the ability to incorporate regular expression syntax when selecting data example
SELECT something FROM table WHERE column REGEXP 'regexp'
see this http://www.tech-recipes.com/rx/484/use-regular-expressions-in-mysql-select-statements/
so you can write pattern to select what you want.
The answer given by #bnvdarklord is right but it would include file names which does not have extensions as well in result set, so if you want only extension patterns use below query.
SELECT DISTINCT substring_index(column_containing_file_names,'.',-1) FROM table where column_containing_file_names like '%.%';

mysql: replace \ (backslash) in strings

I am having the following problem:
I have a table T which has a column Name with names. The names have the following structure:
A\\B\C
You can create on yourself like this:
create table T ( Name varchar(10));
insert into T values ('A\\\\B\\C');
select * from T;
Now if I do this:
select Name from T where Name = 'A\\B\C';
That doesn't work, I need to escape the \ (backslash):
select Name from T where Name = 'A\\\\B\\C';
Fine.
But how do I do this automatically to a string Name?
Something like the following won't do it:
select replace('A\\B\C', '\\', '\\\\');
I get: A\\\BC
Any suggestions?
Many thanks in advance.
You have to use "verbatim string".After using that string your Replace function will
look like this
Replace(#"\", #"\\")
I hope it will help for you.
The literal A\\B\C must be coded as A\\\\A\\C, and the parameters of replace() need escaping too:
select 'A\\\\B\\C', replace('A\\\\B\\C', '\\', '\\\\');
output (see this running on SQLFiddle):
A\\B\C A\\\\B\\C
So there is little point in using replace. These two statements are equivalent:
select Name from T where Name = replace('A\\\\B\\C', '\\', '\\\\');
select Name from T where Name = 'A\\\\B\\C';
Usage of regular expression will solve your problem.
This below query will solve the given example.
1) S\\D\B
select * from T where Name REGEXP '[A-Z]\\\\\\\\[A-Z]\\\\[A-Z]$';
if incase the given example might have more then one char
2) D\\B\ACCC
select * from T where Name REGEXP '[A-Z]{1,5}\\\\\\\\[A-Z]{1,5}\\\\[A-Z]{1,5}$';
note: i have used 5 as the max occurrence of char considering the field size is 10 as its mentioned in the create table query.
We can still generalize it.If this still has not met your expectation feel free to ask for my help.
You're confusing what's IN the database with how you represent that data in SQL statements. When a string in the database contains a special character like \, you have to type \\ to represent that character, because \ is a special character in SQL syntax. You have to do this in INSERT statements, but you also have to do it in the parameters to the REPLACE function. There are never actually any double slashes in the data, they're just part of the UI.
Why do you think you need to double the slashes in the SQL expression? If you're typing queries, you should just double the slashes in your command line. If you're generating the query in a programming language, the best solution is to use prepared statements; the API will take care of proper encoding (prepared statements usually use a binary interface, which deals with the raw data). If, for some reason, you need to perform queries by constructing strings, the language should hopefully provide a function to escape the string. For instance, in PHP you would use mysqli_real_escape_string.
But you can't do it by SQL itself -- if you try to feed the non-escaped string to SQL, data is lost and it can't reconstruct it.
You could use LIKE:
SELECT NAME FROM T WHERE NAME LIKE '%\\\\%';
Not exactly sure by what you mean but, this should work.
select replace('A\\B\C', '\', '\\');
It's basically going to replace \ whereever encountered with \\ :)
Is this what you wanted?