Using mysql Regexp in a select statement not the where clause - mysql

So what i want to do is use a Regex to select only certain parts of data from a column
Example :
SELECT URL REGEXP 'http://' from websitelist --(website list is a list of URL's)
If I run that against the table it returns 1 foreach row in which 'htt://' was found, what I want is to return the string that matches the regexp

The REGEXP operator performs a match and returns 0 or 1 based on whether the string matched the expression or not. It does not provide a way to extract the matched portion. I don't think that there is any way to extract the matched portion.

You could use just use string functions if it's as simple as your example - just removing http://
SELECT REPLACE(URL, 'http://', '') AS url FROM websitelist;
Probably faster as there is overhead for the REGEX engine.

Related

Excluding records using regex

I'm trying to get exclude the email_id that has any name and end with either #abcd.in or #abcd.live and only include the email's having mobile numbers, but not sure if this is the correct regex I'm using, can you help?
the statement I'm using to filter is below
(NOT(lower(`table`.`user_email`) like '[a-z].*#Abcd.in$'|'[a-z].*#Abcd.live$')
If you want to do filtering based on a regular expression, you should be using REGEXP or REGEXP_LIKE (both are synonyms). Assuming you just want to exclude the two domains mentioned, you could use:
SELECT *
FROM yourTable
WHERE email NOT REGEXP '[a-z]+#Abcd\.(in|live)$';
Assuming you wanted to enhance the above by also whitelisting certain email patterns, you could make another call to REGEXP.
I'll probably do something like this:
SELECT *
FROM mytable
WHERE SUBSTRING_INDEX(user_email,'#',-1) IN ('Abcd.live','Abcd.in')
AND SUBSTRING_INDEX(user_email,'#',1) REGEXP '[0-9]'
Using SUBSTRING_INDEX() to separate the email name and domain by using # as delimiter. The first condition is simply just filtering the domain with IN so other than the ones being defined, it will be omitted. Then the second condition is using REGEXP to check if numerical values are present in the email name.
Demo fiddle

SQL Data Update Query About

I have a table named testlink and it has url and newtarget columns.
I would like to take the string expressions https://domain1.com/ here in the url column and change all the data in the newtarget column to https://domain1.com/search/?q= pulled string expression.
So briefly;
url columns from https://domain1.com/topic1
will be changed to https://domain1.com/search/?q=topic1 in the newtarget column
There are about 6 thousand different topics (lines) available.
Database: Mysql / Phpmyadmin.
use REPLACE
UPDATE testlink
SET newtarget = REPLACE(url,'https://domain1.com/','https://domain1.com/search/?q=')
MySQL REPLACE() replaces all the occurrences of a substring within a
string.
REPLACE(str, find_string, replace_with)
If you want to conditionally change the value, you can use string manipulations:
update t
set url = concat(left(url, length(url) - length(substring_index(url, '/', -1))), 'q=', substring_index(url, '/', -1))
where url like 'https://domain1.com/%';
This uses substring_index() to get the last part of the string (after the last /). It uses left() to get the first part (based on the length of the last part) and then concatenates the values you want.
Of course, test this logic using a SELECT before implementing an UPDATE.
If you're using MySQL 8, then you'd be able to do that with REGEXP_REPLACE.
For your example, this should work :
SELECT REGEXP_REPLACE('https://domain1.com/topic1','(https:\/\/domain1\.com\/)(.+)','$1search/?q=$2')

substring_index does not take exact prefix

I have a table contains strings. Some of them starts with https:// and some starts with http://. I want to extract those that starts with http:// (without s). Please note I do not want to use LIKE statement because of another conflict in my plan to treat this string. So if I have the following items in a column called mycol in mytable:
https://111.com/
https://www.222.com/en-gb/
I make this query:
SELECT `mytable`.`mycol`, substring_index(`mytable`.`mycol`,'http://',-1)
I still get these strings in the results:
https://111.com/
https://www.222.com/en-gb/
Why? since my query is looking for http:// not https://, why do I get results start with https://? In this simple example, it should return nothing as there is no string starts with http://
want to extract the string
use regex. Much more simpler.
SELECT mycol FROM mytable WHERE mycol REGEXP '^http://.+';
You could add check for delimeter:
SELECT `mycol`,
IF(instr(mycol,'http://') > 0, substring_index(`mytable`.`mycol`,'http://',-1),NULL)
FROM mytable;
db<>fiddle demo
When the SUBSTRING_INDEX function cannot find delim string it will return original string instead of NULL.
SELECT substring_index('abc','.',-1)
=>
abc
I want to extract those that starts with http://
If you don't want want to use LIKE you can use left():
select right(columnname, length(columnname) - 7) from tablename
where left(columnname, 7) = 'http://'
You don't need to know the length of the string.
All you want is extract the part of the string after the 1st 7 chars.
The length of this part is:
length(columnname) - 7
So use right().

regexp mysql group

I try get name of city's from string '{"travelzoo_hotel_name":"Graduate Minneapolis","travelzoo_hotel_id":"223","city":"Minneapolis","country":"USA","sales_manager":"Stephen Conti"}'
I try this regexp:
SELECT REGEXP_SUBSTR('{\"travelzoo_hotel_name\":\"Graduate Minneapolis\",\"travelzoo_hotel_id\":\"223\",\"city\":\"Minneapolis\",\"country\":\"USA\",\"sales_manager\":\"Stephen Conti\"}'
,'(?:.city...)([[:alnum:]]+)');
I have: '"city":"Minneapolis'
Me need only name of city:Minneapolis.
How to use groups in queries?
My example in regex101
Help me Please
I assume you are using MySQL 8.x that uses ICU regex expressions.
It looks like the string you want to process is JSON. You may use JSON_EXTRACT with JSON_UNQUOTE and a '$.city' as JSON path then:
JSON_UNQUOTE(JSON_EXTRACT('{"travelzoo_hotel_name":"Graduate Minneapolis","travelzoo_hotel_id":"223","city":"Minneapolis","country":"USA","sales_manager":"Stephen Conti"}', '$.city'))
will return Minneapolis.
In your regex, the non-capturing group pattern is still matched and appended to the match value. "Non-capturing" only means no separate memory buffer is alotted to the text captured with a grouping construct. So, you may fix it with '(?<="city":")[^"]+' pattern where (?<="city":") is a positive lookbehind that matches "city":" but does not put it into the match value. The only text you will have in the output is the one matched with [^"]+, 1+ chars other than ".

MySQL regex only returns a single row

I have been writing a REGEX in MySQL to identify those domains that have a .com TLD. The URLs are usually of the form
http://example.com/
The regex I came up with looks like this:
REGEXP '[[.colon.]][[.slash.]][[.slash.]]([:alnum:]+)[[...]]com[[./.]]'
The reason we match the :// is so that we don't pick up URLs such as http://example.com/error.com/wrong.com
Therefore my query is
SELECT DISTINCT name
FROM table
WHERE name REGEXP '[[.colon.]][[.slash.]][[.slash.]]([:alnum:]+)[[...]]com[[./.]]'"
However, this is returning only a single row when it should really be returning many more (upwards of a thousand). What mistake am I making with the query?
Not sure if that's the problem, but it should be [[:alnum:]], not [:alnum:]
Your current query only matches names that end with .com/ rather than .com followed by anything that starts with a slash. Try the following:
SELECT DISTINCT name
FROM table
WHERE name REGEXP '[[.colon.]][[.slash.]][[.slash.]]([:alnum:]+)[[...]]com([[./.]].*)?'"
It might be clearer to split the URL rather than regexing it
SELECT DISTINCT name FROM table
WHERE SUBSTRING_INDEX((SUBSTRING_INDEX(name,'/',3),'.',-1)='com';