Find email links in HTML using MySQL - html

The HTML is stored within MySQL. What I need to do is find out if there are href links containing an email AND do not have mailto: prefixed to the email. Can this be done in MySQL?
This should be found by the query:
... user1#example.com ...
but not this one:
... user2#example.com ...
Note: I can use PHP/Python and parse the HTML if required, but I'm hoping there is a faster/easier way to do this by only using MySQL.
Bonus Question:
Can you use the above query in an update to add the missing mailto?

You can use MySQL REGEXP to find if there are any emails without the mailto.
SELECT * FROM 'table' WHERE 'column' REGEXP 'href\=\"[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\"'
I believe that regex should match anything in this format: href="asdf#asdf.com"
But it won't match: href="mailto:asdf#asdf.com"

Related

mySQL query and match entire e-mail address

In a mySQL query I use something like this to search for matches
SELECT * FROM clients WHERE email = keyword
When there is an e-mail without a hyphen - like foo#domain.com and the keyword is foo then mySQL easily presents me the correct result.
But when there is a hyphen - in the e-mail like foo-bar#domain.com and the keyword is foo-bar mySQL present me all entries with foo. This is also true with #.
Is there a workaround avaliable to query and match for an entire e-mail adress including - and # sign ?
Try to use Asci code for HTML table code

Does a RegEx Pattern need to be modified to be used with SQL in MySql?

I'am trying to write a SELECT-Statement to retrieve a list of Usernames from my Database. My Pattern is: /placeholder\d+/ig and I already tested it and can confirm it is working properly. I'am trying to retrieve every Placeholder in the Table.
I also tried to escape the \ after placeholder.
My SQL-Statement is: SELECT * FROM table WHERE (name REGEX '/placeholder\d+/ig') ... I tried different variations with backticks, etc or LIKE instead of REGEXbut LIKEonly has % and _ as a Wildcard.
Does my RegEx pattern needs to be modified in order to work with MySQL?
Unlike most scripting languages, MySQL is not using the PREG library for regular expression matching.
So yes, you need to modify your regex to make it work properly in MySQL:
SELECT * FROM table WHERE name REGEXP 'placeholder[0-9]+'
OR
SELECT * FROM table WHERE name REGEXP 'placeholder[[:digit:]]+'
There are no short-hand character classes like \d in MySQL. Also, you do not use the regex-delimeter ("/../si" is just ".." in MySQL)
Read the documentation on regular expressions in MySQL for more information.

SQL - How to remove HTML tags on Select

I have a row in a MySQL table with some HTML tags.
I need export this table to an Excel xls, without this tags.
SELECT REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(HTML_ROW,'<br>',''),'</font>',''),'</b>',''),'<font size="2pt" color="#676767">',' | '),'<font color="#00c9ff"><b>',''),'<font color="#009f9f"><b>',''),'<font color="#e25ac6"><b>',''),'<font color="#008cff"><b>',''),'<font color="#c60c9e"><b>','') FROM MYTABLE
With REPLACE function its works.. but If I change any record on the table, I need to remake the SELECT..
And - of course - I think, this isn't the fast and best way to do this.
There is another way to replace all HTML tags on SELECT?
Mysql did not support wild char or regexp in replace function AFAIK.
I would suggest you keep the result as is, then remove tags in excel, if you cannot enum all possible tags in your query.

Regexp to validate URL in MySQL

I have tried several regex patterns (designed for use with PHP because I couldn't find any for MySQL) for URL validation, but none of them are working. Probably MySQL has a slightly different syntax.
I've also tried to come up with one, but no success.
So does anyone know a fairly good regex to use with MySQL for URL validation?
According to article 11.5.2. Regular Expressions in MySQL's documentation, you can perform selections with a regular expression with the following syntax
SELECT field FROM table WHERE field REGEX pattern
In order to match simple URLS, you may use
SELECT field FROM table
WHERE field REGEXP "^(https?://|www\\.)[\.A-Za-z0-9\-]+\\.[a-zA-Z]{2,4}"
This will match most urls like
www.google.il
http://google.com/
http://ww.google.net/
www.google.com/index.php?test=data
https://yahoo.dk/as
http://goo.gle.com/
http://wt.a.x24-s.org/ye/
www.website.info
But not
htp://google.com
ww.google.com/
www-google.com
http://google.c
http://goo#.com
httpf://google.com
Although the answer KBA posted works, there are some inconstancies with the escaping.
The proper syntax should be, this syntax works in MySQL as well as in PHP for example.
SELECT field FROM table
WHERE field REGEXP "^(https?:\/\/|www\.)[\.A-Za-z0-9\-]+\.[a-zA-Z]{2,4}"
The above code will only match if the content of 'field' starts with a URL. If you would like to match all rows where the field contains a url (so for example surrounded by other text / content) just simply use:
SELECT field FROM table
WHERE field REGEXP "(https?:\/\/|www\.)[\.A-Za-z0-9\-]+\.[a-zA-Z]{2,4}"

How do I search and replace using regex in MySQL?

I'm trying to update a field which contains HTML and I want to find all the rows that have forms in them and remove the form tags and anything in between them, however I'm running into problems with my select and the regex.
SELECT * FROM db.table WHERE body REGEXP "<form[^>].+?>.+?</form>";
and the error I get says:
'repetition-operator operand invalid' from regexp.
I was hoping to make that SELECT into a subselect for an update query but I'm stuck at this point.
I think your problem is in your form expression. Try the following:
"<form[^>]*>.+?</form>"
Remember that MySQL supports a limited set of regular expression matching and testing.
See this document.