Regexp to validate URL in MySQL - mysql

I have tried several regex patterns (designed for use with PHP because I couldn't find any for MySQL) for URL validation, but none of them are working. Probably MySQL has a slightly different syntax.
I've also tried to come up with one, but no success.
So does anyone know a fairly good regex to use with MySQL for URL validation?

According to article 11.5.2. Regular Expressions in MySQL's documentation, you can perform selections with a regular expression with the following syntax
SELECT field FROM table WHERE field REGEX pattern
In order to match simple URLS, you may use
SELECT field FROM table
WHERE field REGEXP "^(https?://|www\\.)[\.A-Za-z0-9\-]+\\.[a-zA-Z]{2,4}"
This will match most urls like
www.google.il
http://google.com/
http://ww.google.net/
www.google.com/index.php?test=data
https://yahoo.dk/as
http://goo.gle.com/
http://wt.a.x24-s.org/ye/
www.website.info
But not
htp://google.com
ww.google.com/
www-google.com
http://google.c
http://goo#.com
httpf://google.com

Although the answer KBA posted works, there are some inconstancies with the escaping.
The proper syntax should be, this syntax works in MySQL as well as in PHP for example.
SELECT field FROM table
WHERE field REGEXP "^(https?:\/\/|www\.)[\.A-Za-z0-9\-]+\.[a-zA-Z]{2,4}"
The above code will only match if the content of 'field' starts with a URL. If you would like to match all rows where the field contains a url (so for example surrounded by other text / content) just simply use:
SELECT field FROM table
WHERE field REGEXP "(https?:\/\/|www\.)[\.A-Za-z0-9\-]+\.[a-zA-Z]{2,4}"

Related

MYSQL REGEXP with JSON array

I have an JSON string stored in the database and I need to SQL COUNT based on the WHERE condition that is in the JSON string. I need it to work on the MYSQL 5.5.
The only solution that I found and could work is to use the REGEXP function in the SQL query.
Here is my JSON string stored in the custom_data column:
{"language_display":["1","2","3"],"quantity":1500,"meta_display:":["1","2","3"]}
https://regex101.com/r/G8gfzj/1
I now need to create a SQL sentence:
SELECT COUNT(..) WHERE custom_data REGEXP '[HELP_HERE]'
The condition that I look for is that the language_display has to be either 1, 2 or 3... or whatever value I will define when I create the SQL sentence.
So far I came here with the REGEX expression, but it does not work:
(?:\"language_display\":\[(?:"1")\])
Where 1 is replaced with the value that I look for. I could in general look also for "1" (with quotes), but it will also be found in the meta_display array, that will have different values.
I am not good with REGEX! Any suggestions?
I used the following regex to get matches on your test string
\"language_display\":\[(:?\"[0-9]\"\,)*?\"3\"(:?\,\"[0-9]\")*?\]
https://regex101.com/ is a free online regex tester, it seems to work great. Start small and work big.
Sorry it doesn't work for you. It must be failing on the non greedy '*?' perhaps try without the '?'
Have a look at how to serialize this data, with an eye to serializing the language display fields.
How to store a list in a column of a database table
Even if you were to get your idea working it will be slow as fvck. Better off to process through each row once and generate something more easily searched via sql. Even a field containing the comma separated list would be better.

Does a RegEx Pattern need to be modified to be used with SQL in MySql?

I'am trying to write a SELECT-Statement to retrieve a list of Usernames from my Database. My Pattern is: /placeholder\d+/ig and I already tested it and can confirm it is working properly. I'am trying to retrieve every Placeholder in the Table.
I also tried to escape the \ after placeholder.
My SQL-Statement is: SELECT * FROM table WHERE (name REGEX '/placeholder\d+/ig') ... I tried different variations with backticks, etc or LIKE instead of REGEXbut LIKEonly has % and _ as a Wildcard.
Does my RegEx pattern needs to be modified in order to work with MySQL?
Unlike most scripting languages, MySQL is not using the PREG library for regular expression matching.
So yes, you need to modify your regex to make it work properly in MySQL:
SELECT * FROM table WHERE name REGEXP 'placeholder[0-9]+'
OR
SELECT * FROM table WHERE name REGEXP 'placeholder[[:digit:]]+'
There are no short-hand character classes like \d in MySQL. Also, you do not use the regex-delimeter ("/../si" is just ".." in MySQL)
Read the documentation on regular expressions in MySQL for more information.

RegExp in mysql for field

I have the following query:
SELECT item from table
Which gives me:
<title>Titanic</title>
How would I extract the name "Titanic" from this? Something like:
SELECT re.find('\>(.+)\>, item) FROM table
What would be the correct syntax for this?
By default, MySQL does not provide functionality for extracting text using regular expressions. You can use REGEXP to find rows that match something like >.+<, but there is no straightforward way of extracting the captured group without some additional effort, such as:
using a library like lib_mysqludf_preg
writing your own MySQL function to extract matched text
performing regular string manipulation
using the regex functionality of whatever environment you're using MySQL from (e.g. PHP's preg_match)
reconsidering your need for regular expressions entirely. If you know that all your rows contain a <title> tag, for instance, it may be a better idea to simply use "normal" string functions such as SUBSTRING
As pointed out in the informative answer by George Bahij MySQL lacks this functionality so the options would be to either extend the functionality using udfs etc, or use the available string functions, in which case you could do:
SELECT
SUBSTR(
SUBSTRING_INDEX(
SUBSTRING_INDEX(item,'<title>',2)
,'</title>',1)
FROM 8
)
from table
Or if the string you need to extract from always is on the format <title>item</title> then you could simple use replace: replace(replace(item, '<title>', ''), '</title>','')
This regex: <\w+>.+</\w+> will match content in tags.
Your query should be something like:
SELECT * FROM `table` WHERE `field` REGEXP '<\w+>.+</\w+>';
Then if you're using PHP or something similar you could use a function like strip_tags to extract the content between the tags.
XML shouldn't be parsed with regexes, and at any rate MySQL only supports matching, not replacement.
But MySQL supports XPath 1.0. You should be able to simply do this:
SELECT ExtractValue(item,'/title') AS item_title FROM table;
https://dev.mysql.com/doc/refman/5.6/en/xml-functions.html

Using MySQL LIKE operator for fields encoded in JSON

I've been trying to get a table row with this query:
SELECT * FROM `table` WHERE `field` LIKE "%\u0435\u0442\u043e\u0442%"
Field itself:
Field
--------------------------------------------------------------------
\u0435\u0442\u043e\u0442 \u0442\u0435\u043a\u0441\u0442 \u043d\u0430
Although I can't seem to get it working properly.
I've already tried experimenting with the backslash character:
LIKE "%\\u0435\\u0442\\u043e\\u0442%"
LIKE "%\\\\u0435\\\\u0442\\\\u043e\\\\u0442%"
But none of them seems to work, as well.
I'd appreciate if someone could give a hint as to what I'm doing wrong.
Thanks in advance!
EDIT
Problem solved.
Solution: even after correcting the syntax of the query, it didn't return any results. After making the field BINARY the query started working.
As documented under String Comparison Functions:
Note
Because MySQL uses C escape syntax in strings (for example, “\n” to represent a newline character), you must double any “\” that you use in LIKE strings. For example, to search for “\n”, specify it as “\\n”. To search for “\”, specify it as “\\\\”; this is because the backslashes are stripped once by the parser and again when the pattern match is made, leaving a single backslash to be matched against.
Therefore:
SELECT * FROM `table` WHERE `field` LIKE '%\\\\u0435\\\\u0442\\\\u043e\\\\u0442%'
See it on sqlfiddle.
it can be useful for those who use PHP, and it works for me
$where[] = 'organizer_info LIKE(CONCAT("%", :organizer, "%"))';
$bind['organizer'] = str_replace('"', '', quotemeta(json_encode($orgNameString)));

Extracting information out of a column with regexs in MySQL

Is it possible to get the (first?) match of a regex and output it within a select? It looks like the REGEXP function only return whether there has been a match or not. I want to be able to extract information out of a varchar column without having to use complex SUBSTRING-LOCATION nestings.
Any ideas?
http://dev.mysql.com/doc/refman/5.1/en/regexp.html that's all there is. You can't do more than pattern comparison.