mySQL query and match entire e-mail address - mysql

In a mySQL query I use something like this to search for matches
SELECT * FROM clients WHERE email = keyword
When there is an e-mail without a hyphen - like foo#domain.com and the keyword is foo then mySQL easily presents me the correct result.
But when there is a hyphen - in the e-mail like foo-bar#domain.com and the keyword is foo-bar mySQL present me all entries with foo. This is also true with #.
Is there a workaround avaliable to query and match for an entire e-mail adress including - and # sign ?

Try to use Asci code for HTML table code

Related

Excluding records using regex

I'm trying to get exclude the email_id that has any name and end with either #abcd.in or #abcd.live and only include the email's having mobile numbers, but not sure if this is the correct regex I'm using, can you help?
the statement I'm using to filter is below
(NOT(lower(`table`.`user_email`) like '[a-z].*#Abcd.in$'|'[a-z].*#Abcd.live$')
If you want to do filtering based on a regular expression, you should be using REGEXP or REGEXP_LIKE (both are synonyms). Assuming you just want to exclude the two domains mentioned, you could use:
SELECT *
FROM yourTable
WHERE email NOT REGEXP '[a-z]+#Abcd\.(in|live)$';
Assuming you wanted to enhance the above by also whitelisting certain email patterns, you could make another call to REGEXP.
I'll probably do something like this:
SELECT *
FROM mytable
WHERE SUBSTRING_INDEX(user_email,'#',-1) IN ('Abcd.live','Abcd.in')
AND SUBSTRING_INDEX(user_email,'#',1) REGEXP '[0-9]'
Using SUBSTRING_INDEX() to separate the email name and domain by using # as delimiter. The first condition is simply just filtering the domain with IN so other than the ones being defined, it will be omitted. Then the second condition is using REGEXP to check if numerical values are present in the email name.
Demo fiddle

Why isn't MySQL REGEXP filtering out these values?

So I'm trying to find what "special characters" have been used in my customer names. I'm going through updating this query to find them all one-by-one, but it's still showing all customers with a - despite me trying to exlude that in the query.
Here's the query I'm using:
SELECT * FROM customer WHERE name REGEXP "[^\da-zA-Z\ \.\&\-\(\)\,]+";
This customer (and many others with a dash) are still showing in the query results:
Test-able Software Ltd
What am I missing? Based on that regexp, shouldn't that one be excluded from the query results?
Testing it on https://regex101.com/r/AMOwaj/1 shows there is no match.
Edit - So I want to FIND any which have characters other than the ones in the regex character set. Not exclude any which do have these characters.
Your code checks if the string contains any character that does not belong to the character class, while you want to ensure that none does belong to it.
You can use ^ and $ to check the while string at once:
SELECT * FROM customer WHERE name REGEXP '^[^\da-zA-Z .&\-(),]+$';
This would probably be simpler expressed with NOT, and without negating the character class:
SELECT * FROM customer WHERE name NOT REGEXP '[\da-zA-Z .&\-(),]';
Note that you don't need to escape all the characters within the character class, except probably for -.
Use [0-9] or [[:digit:]] to match digits irrespective of MySQL version.
Use the hyphen where it can't make part of a range construction.
Fix the expression as
SELECT * FROM customer WHERE name REGEXP "[^0-9a-zA-Z .&(),-]+";
If the entire text should match this pattern, enclose with ^ / $:
SELECT * FROM customer WHERE name REGEXP "^[^0-9a-zA-Z .&(),-]+$";
- implies a range except if it is first. (Well, after the "not" (^).)
So use
"[^-0-9a-zA-Z .&(),]"
I removed the + at the end because you don't really care how many; this way it will stop after finding one.

Isolate an email address from a string using MySQL

I am trying to isolate an email address from a block of free field text (column name is TEXT).
There are many different variations of preceding and succeeding characters in the free text field, i.e.:
email me! john#smith.com
e:john#smith.com m:555-555-5555
john#smith.com--personal email
I've tried variations of INSTR() and SUBSTRING_INDEX() to first isolate the "#" (probably the one reliable constant in finding an email...) and extracting the characters to the left (up until a space or non-qualifying character like "-" or ":") and doing the same thing with the text following the #.
However - everything I've tried so far hasn't filtered out the noise to the level I need.
Obviously 100% accuracy isn't possible but would someone mind taking a crack at how I can structure my select statement?
There is no easy solution to do this within MySQL. However you can do this easily after you have retrieved it using regular expressions.
Here would be a an example of how to use it in your case: Regex example
If you want it to select all e-mail addresses from one string: Regex Example
You can use regex to extract the ones where it does contain an e-mail in MySQL but it still doesn't extract the group from the string. This has to be done outside MySQL
SELECT * FROM table
WHERE column RLIKE '\w*#\w*.\w*'
RLIKE is only for matching it, you can use REGEXP in the SELECT but it only returns 1 or 0 on whether it has found a match or not :s
If you do want to extract it in MySQL maybe this other stackoverflow post helps you out. But it seems like a lot of work instead of doing it outside MySQL
Now in MySQL 5 and 8 you can use REGEXP_SUBSTR to isolate just the email from a block of free text.
SELECT *, REGEXP_SUBSTR(`TEXT`, '([a-zA-Z0-9._%+\-]+)#([a-zA-Z0-9.-]+)\.([a-zA-Z]{2,4})') AS Emails FROM `mytable`;
If you want to get just the records with emails and remove duplicates ...
SELECT DISTINCT REGEXP_SUBSTR(`TEXT`, '([a-zA-Z0-9._%+\-]+)#([a-zA-Z0-9.-]+)\.([a-zA-Z]{2,4})') AS Emails FROM `mytable` WHERE `TEXT` REGEXP '([a-zA-Z0-9._%+\-]+)#([a-zA-Z0-9.-]+)\.([a-zA-Z]{2,4})';

Find email links in HTML using MySQL

The HTML is stored within MySQL. What I need to do is find out if there are href links containing an email AND do not have mailto: prefixed to the email. Can this be done in MySQL?
This should be found by the query:
... user1#example.com ...
but not this one:
... user2#example.com ...
Note: I can use PHP/Python and parse the HTML if required, but I'm hoping there is a faster/easier way to do this by only using MySQL.
Bonus Question:
Can you use the above query in an update to add the missing mailto?
You can use MySQL REGEXP to find if there are any emails without the mailto.
SELECT * FROM 'table' WHERE 'column' REGEXP 'href\=\"[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\"'
I believe that regex should match anything in this format: href="asdf#asdf.com"
But it won't match: href="mailto:asdf#asdf.com"

MySQL REGEXP not matching string

I have a table of messages. I am trying to find messages in the table that have an ID code which complies with a specific format. The regexp that I have below was written for matching these values in PHP, but I want to move it to a MySQL query.
It is looking for a specific format of an identifier code that looks like this:
[692370613-3CUWU]
The code has a consistent format:
starts and ends with hard brackets [ ]
two components inside,
first is an account number, min 9 digits, but could be higher
second component is a alphanumeric code, 5 characters, can include 1-9, and capital letters excluding "O"
the complete code can occur anywhere in the message
I have a query that reads:
SELECT * FROM messages
WHERE
msgBody REGEXP '\\[(\d){9,}-([A-NP-Z1-9]){5}\\]'
OR
msgSubject REGEXP '\\[(\d){9,}-([A-NP-Z1-9]){5}\\]'
I created a test row in the table which has only the sample value above in the msgBody field for testing - but it does not return any results.
I am guessing that I am missing something in the conversion of PHP style regex vs. MySQL.
Help is greatly appreciated.
Thank you!
Instead of \d try using [[:digit:]]
SELECT * FROM messages
WHERE
msgBody REGEXP '\\[([0-9]){9,}-([A-NP-Z1-9]){5}\\]'
OR
msgSubject REGEXP '\\[([0-9]){9,}-([A-NP-Z1-9]){5}\\]'