Excluding records using regex - mysql

I'm trying to get exclude the email_id that has any name and end with either #abcd.in or #abcd.live and only include the email's having mobile numbers, but not sure if this is the correct regex I'm using, can you help?
the statement I'm using to filter is below
(NOT(lower(`table`.`user_email`) like '[a-z].*#Abcd.in$'|'[a-z].*#Abcd.live$')

If you want to do filtering based on a regular expression, you should be using REGEXP or REGEXP_LIKE (both are synonyms). Assuming you just want to exclude the two domains mentioned, you could use:
SELECT *
FROM yourTable
WHERE email NOT REGEXP '[a-z]+#Abcd\.(in|live)$';
Assuming you wanted to enhance the above by also whitelisting certain email patterns, you could make another call to REGEXP.

I'll probably do something like this:
SELECT *
FROM mytable
WHERE SUBSTRING_INDEX(user_email,'#',-1) IN ('Abcd.live','Abcd.in')
AND SUBSTRING_INDEX(user_email,'#',1) REGEXP '[0-9]'
Using SUBSTRING_INDEX() to separate the email name and domain by using # as delimiter. The first condition is simply just filtering the domain with IN so other than the ones being defined, it will be omitted. Then the second condition is using REGEXP to check if numerical values are present in the email name.
Demo fiddle

Related

Isolate an email address from a string using MySQL

I am trying to isolate an email address from a block of free field text (column name is TEXT).
There are many different variations of preceding and succeeding characters in the free text field, i.e.:
email me! john#smith.com
e:john#smith.com m:555-555-5555
john#smith.com--personal email
I've tried variations of INSTR() and SUBSTRING_INDEX() to first isolate the "#" (probably the one reliable constant in finding an email...) and extracting the characters to the left (up until a space or non-qualifying character like "-" or ":") and doing the same thing with the text following the #.
However - everything I've tried so far hasn't filtered out the noise to the level I need.
Obviously 100% accuracy isn't possible but would someone mind taking a crack at how I can structure my select statement?
There is no easy solution to do this within MySQL. However you can do this easily after you have retrieved it using regular expressions.
Here would be a an example of how to use it in your case: Regex example
If you want it to select all e-mail addresses from one string: Regex Example
You can use regex to extract the ones where it does contain an e-mail in MySQL but it still doesn't extract the group from the string. This has to be done outside MySQL
SELECT * FROM table
WHERE column RLIKE '\w*#\w*.\w*'
RLIKE is only for matching it, you can use REGEXP in the SELECT but it only returns 1 or 0 on whether it has found a match or not :s
If you do want to extract it in MySQL maybe this other stackoverflow post helps you out. But it seems like a lot of work instead of doing it outside MySQL
Now in MySQL 5 and 8 you can use REGEXP_SUBSTR to isolate just the email from a block of free text.
SELECT *, REGEXP_SUBSTR(`TEXT`, '([a-zA-Z0-9._%+\-]+)#([a-zA-Z0-9.-]+)\.([a-zA-Z]{2,4})') AS Emails FROM `mytable`;
If you want to get just the records with emails and remove duplicates ...
SELECT DISTINCT REGEXP_SUBSTR(`TEXT`, '([a-zA-Z0-9._%+\-]+)#([a-zA-Z0-9.-]+)\.([a-zA-Z]{2,4})') AS Emails FROM `mytable` WHERE `TEXT` REGEXP '([a-zA-Z0-9._%+\-]+)#([a-zA-Z0-9.-]+)\.([a-zA-Z]{2,4})';

Extract email address from mysql field

I have a longtext column "description" in my table that sometimes contains an email address. I need to extract this email address and add to a separate column for each row. Is this possible to do in MySQL?
Yes, you can use mysql's REGEXP (perhaps this is new to version 5 and 8 which may be after this question was posted.)
SELECT *, REGEXP_SUBSTR(`description`, '([a-zA-Z0-9._%+\-]+)#([a-zA-Z0-9.-]+)\.([a-zA-Z]{2,4})') AS Emails FROM `mytable`;
You can use substring index to capture email addresses...
The first substring index capture the account.
The second substring_index captures the hostname. It is necessary to pick the same email address in case the are multiple atso (#) stored in the column.
select concat( substring_index(substring_index(description,'#',1),' ',-1)
, substring_index(substring_index( description,
substring_index(description,'#',1),-1),
' ',1))
You can't select matched part only from Regular expression matching using pure Mysql. You can use mysql extension (as stated in Return matching pattern, or use a scripting language (ex. PHP).
MySQL does have regular expressions, but regular expressions are not the best way to match email addresses. I'd strongly recommend using your client language.
If you can install the lib_mysqludf_preg MySQL UDF, then you could do:
SET #regex = "/([a-z0-9!#\$%&'\*\+\/=\?\^_`\{\|\}~\-]+(?:\.[a-z0-9!#\$%&'\*\+\/=\?^_`{\|}~\-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+(?:[A-Z]{2}|aero|arpa|asia|biz|cat|com|coop|edu|gov|info|int|jobs|mil|mobi|museum|name|net|org|post|pro|tel|travel|xxx))/i";
SELECT
PREG_CAPTURE(#regex, description)
FROM
example
WHERE
PREG_CAPTURE(#regex, description) > '';
to extract the first email address from the description field.
I can't think of another solution, as the REGEXP operator simply returns 1 or 0, and not the location of where the regular expression matched.

MySQL regex only returns a single row

I have been writing a REGEX in MySQL to identify those domains that have a .com TLD. The URLs are usually of the form
http://example.com/
The regex I came up with looks like this:
REGEXP '[[.colon.]][[.slash.]][[.slash.]]([:alnum:]+)[[...]]com[[./.]]'
The reason we match the :// is so that we don't pick up URLs such as http://example.com/error.com/wrong.com
Therefore my query is
SELECT DISTINCT name
FROM table
WHERE name REGEXP '[[.colon.]][[.slash.]][[.slash.]]([:alnum:]+)[[...]]com[[./.]]'"
However, this is returning only a single row when it should really be returning many more (upwards of a thousand). What mistake am I making with the query?
Not sure if that's the problem, but it should be [[:alnum:]], not [:alnum:]
Your current query only matches names that end with .com/ rather than .com followed by anything that starts with a slash. Try the following:
SELECT DISTINCT name
FROM table
WHERE name REGEXP '[[.colon.]][[.slash.]][[.slash.]]([:alnum:]+)[[...]]com([[./.]].*)?'"
It might be clearer to split the URL rather than regexing it
SELECT DISTINCT name FROM table
WHERE SUBSTRING_INDEX((SUBSTRING_INDEX(name,'/',3),'.',-1)='com';

How to select all distinct filename extensions from table of filenames?

I have a table of ~20k filenames. How do I select a list of the distinct extensions? A filename extension can be considered the case insensitive string after the last .
You can use substring_index:
SELECT DISTINCT substring_index(column_containing_file_names,'.',-1) FROM table
-1 means it will start searching for the '.' from the right side.
there is A very cool and powerful capability in MySQL and other databases is the ability to incorporate regular expression syntax when selecting data example
SELECT something FROM table WHERE column REGEXP 'regexp'
see this http://www.tech-recipes.com/rx/484/use-regular-expressions-in-mysql-select-statements/
so you can write pattern to select what you want.
The answer given by #bnvdarklord is right but it would include file names which does not have extensions as well in result set, so if you want only extension patterns use below query.
SELECT DISTINCT substring_index(column_containing_file_names,'.',-1) FROM table where column_containing_file_names like '%.%';

MySql pattern matching

I need a MySQL pattern to match a number, followed by a question mark.
I need something like
... like '%[0-9]?%'
but I have no idea how to create this regular expression.
http://dev.mysql.com/doc/refman/5.0/en/pattern-matching.html does not help.
Thanks!
you could try this:
SELECT * FROM YourTable WHERE YourField REGEXP '[0-9]\\?'
That will return rows where YourField contains a number followed by a ? anywhere in the value.
If you want it to only match if the whole field is a number followed by a ?. I.e. 9? then you could use this regex instead:
^[0-9]\\?$
I guess you're looking for something like this:
select * from table
where field rlike '[0-9]\\?'
Remember to escape the question mark. Otherwise, it will make the number optional.
Source.