Extract email address from mysql field - mysql

I have a longtext column "description" in my table that sometimes contains an email address. I need to extract this email address and add to a separate column for each row. Is this possible to do in MySQL?

Yes, you can use mysql's REGEXP (perhaps this is new to version 5 and 8 which may be after this question was posted.)
SELECT *, REGEXP_SUBSTR(`description`, '([a-zA-Z0-9._%+\-]+)#([a-zA-Z0-9.-]+)\.([a-zA-Z]{2,4})') AS Emails FROM `mytable`;

You can use substring index to capture email addresses...
The first substring index capture the account.
The second substring_index captures the hostname. It is necessary to pick the same email address in case the are multiple atso (#) stored in the column.
select concat( substring_index(substring_index(description,'#',1),' ',-1)
, substring_index(substring_index( description,
substring_index(description,'#',1),-1),
' ',1))

You can't select matched part only from Regular expression matching using pure Mysql. You can use mysql extension (as stated in Return matching pattern, or use a scripting language (ex. PHP).

MySQL does have regular expressions, but regular expressions are not the best way to match email addresses. I'd strongly recommend using your client language.

If you can install the lib_mysqludf_preg MySQL UDF, then you could do:
SET #regex = "/([a-z0-9!#\$%&'\*\+\/=\?\^_`\{\|\}~\-]+(?:\.[a-z0-9!#\$%&'\*\+\/=\?^_`{\|}~\-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+(?:[A-Z]{2}|aero|arpa|asia|biz|cat|com|coop|edu|gov|info|int|jobs|mil|mobi|museum|name|net|org|post|pro|tel|travel|xxx))/i";
SELECT
PREG_CAPTURE(#regex, description)
FROM
example
WHERE
PREG_CAPTURE(#regex, description) > '';
to extract the first email address from the description field.
I can't think of another solution, as the REGEXP operator simply returns 1 or 0, and not the location of where the regular expression matched.

Related

Excluding records using regex

I'm trying to get exclude the email_id that has any name and end with either #abcd.in or #abcd.live and only include the email's having mobile numbers, but not sure if this is the correct regex I'm using, can you help?
the statement I'm using to filter is below
(NOT(lower(`table`.`user_email`) like '[a-z].*#Abcd.in$'|'[a-z].*#Abcd.live$')
If you want to do filtering based on a regular expression, you should be using REGEXP or REGEXP_LIKE (both are synonyms). Assuming you just want to exclude the two domains mentioned, you could use:
SELECT *
FROM yourTable
WHERE email NOT REGEXP '[a-z]+#Abcd\.(in|live)$';
Assuming you wanted to enhance the above by also whitelisting certain email patterns, you could make another call to REGEXP.
I'll probably do something like this:
SELECT *
FROM mytable
WHERE SUBSTRING_INDEX(user_email,'#',-1) IN ('Abcd.live','Abcd.in')
AND SUBSTRING_INDEX(user_email,'#',1) REGEXP '[0-9]'
Using SUBSTRING_INDEX() to separate the email name and domain by using # as delimiter. The first condition is simply just filtering the domain with IN so other than the ones being defined, it will be omitted. Then the second condition is using REGEXP to check if numerical values are present in the email name.
Demo fiddle

mysql replace function not expected results

select replace(lastname,'%%',firstname) as new1 from names;
When I run this, lastname is returned. Why? I expect it to search names.lastname for everything (%% wildcard) and return names.firstname.
All the syntax I have reviewed suggest I am doing this right, it seems so simple...
Why?
The expression REPLACE(lastname,'%%',firstname) will return lastname, whenever lastname doesn't contain two contiguous percent sign characters. Why? Because that's the documented behavior of the REPLACE function.
The '%' is a wildcard when used with LIKE. It's not a wildcard in the REPLACE() function.
The expression in your question will search the value of lastname for occurrences of two contiguous percent sign characters, and replace those occurrences with the value in firstname.
For example:
SELECT REPLACE('fee%%fi%%fo','%%','-dah ') AS foo
foo
-------------------
fee-dah fi-dah fo
(I believe this answers the question you asked.)
What are you trying to achieve?
I'm pretty sure wildcards are not allowed in REPLACE, but the way you have written it suggests that you want to SELECT the lastname with all its characters replaced with the string firstname, which is the same as SELECT firstname.
If you need to change the lastname in the table, you will need to run an UPDATE:
UPDATE names SET lastname=firstname
If you simply want a concatenation of the two then SELECT CONCAT(lastname,' ', firstname) is your game.

MySQL: Select regex group [duplicate]

How to reference to a group using a regex in MySQL?
I tried:
REGEXP '^(.)\1$'
but it does not work.
How to do this?
(Old question, but top search result)
For MySQL 8:
SELECT REGEXP_REPLACE('stackoverflow','(.{5})(.*)','$2$1');
-- "overflowstack"
You can create capture groups with (), and you can refer to them using $1, $2, etc.
For MariaDB, capturing is done in REGEXP_REPLACE with \\1, \\2, etc. respectively.
You can't, there is no way to reference regex capturing groups in MySql.
You can solve this problem by nesting the function calls in your query. Say you have this string in your column:
'100 SOME ST,THE VILLAGES,FL 32163,USA'
and you want to capture the city name. A Capture Group like this would work if MySQL supported it (but it doesn't):
'^[0-9A-Z\s]+,\s*([a-zA-Z\s]*)'
You CAN nest function calls to strip off the part you don't want, and then grab the part you DO want like this:
SELECT REGEXP_SUBSTR(REGEXP_REPLACE(column_name, '^[0-9\\sA-Z]+,', ''), '^[0-9\\sA-Z]+') FROM table_name;
THE VILLAGES
...

How to remove fake names using regular expression in mysql query?

I want to remove the names which may be registered with fake names.
As the developer forgot to put validation on form registration.
Now i want to remove the fake names.
And for checking if that name is fake or not, I am checking if the name content any numbers or not ?
This is my query which i have written but its not working...
SELECT registration.regi_id, student.first_name,
student.cont_no, student.email_id,
registration.college,
registration.event_name,
registration.accomodation
FROM student, registration
WHERE student.stud_id = registration.stud_id
AND student.first_name NOT RLIKE '%[0-9]%'
How to fix this problem ?
Sorry for my language issues,
P.S.
There are many names in "first_name" field like "asdfasdf12323", i don't want that kind of names to be shown on list.
Your column may contain Alphanumeric characters also.YOu need to filter Numbers and Alphanumeric characters both
For Alphanumeric characters Try REGEXP '^[A-Za-z0-9]+$'
For numbers Try REGEXP '[0-9]'
Well as far as the regex is involved, your expression is only looking for a single number. Also, your 'NOT RLIKE' isn't using regex but is doing a basic string search for the literal '[0-9]' I believe. MySql has support for regex, and your last clause would look like so: AND student.first_name NOT REGEXP '[0-9]*'

How to select all distinct filename extensions from table of filenames?

I have a table of ~20k filenames. How do I select a list of the distinct extensions? A filename extension can be considered the case insensitive string after the last .
You can use substring_index:
SELECT DISTINCT substring_index(column_containing_file_names,'.',-1) FROM table
-1 means it will start searching for the '.' from the right side.
there is A very cool and powerful capability in MySQL and other databases is the ability to incorporate regular expression syntax when selecting data example
SELECT something FROM table WHERE column REGEXP 'regexp'
see this http://www.tech-recipes.com/rx/484/use-regular-expressions-in-mysql-select-statements/
so you can write pattern to select what you want.
The answer given by #bnvdarklord is right but it would include file names which does not have extensions as well in result set, so if you want only extension patterns use below query.
SELECT DISTINCT substring_index(column_containing_file_names,'.',-1) FROM table where column_containing_file_names like '%.%';