How to extract id from url in a query? - mysql

I have a table with a column containing many urls like this one:
https://myshop.com/lv/buitine-technika-elektronika/virtuves-iranga/virduliai/virdulys-elektrinis-virdulys-forme-fkg-147?id=22031685
I want to extract the id from the URL, but I have no idea how to do this. I could easily do this in Python later using this regex:
\?id\=(\d+)
But I'd like to have as much of data prepared in my query before going to python if that is possible. I know how to use regex in MySQL where clause, but no idea how to use it anywhere else. Is there a way to do this?
ID length can be different and there might be something else after that...

If every URL would only ever have a single id query parameter, then we can use SUBSTRING_INDEX:
SELECT SUBSTRING_INDEX(url, '?id=', -1) AS id
FROM yourTable;
Demo
For a more general solution in MySQL 8+, we can use REGEXP_SUBSTRING:
SELECT REGEXP_REPLACE(url, '^.*?.*id=([^&]+).*$', '$1') AS id
FROM yourTable;
Demo
The regex based approach can handle id appearing anywhere in the query string.

Related

MySql extract data from a database based on two columns

I Am trying to create/run an my sql query in such a way that the sql selects data based a some conditions from Column 1 (USER) but at the same time Excludes some data, based on some conditions from column 2 (ADDRESS)
E.g.:
SELECT ADDRESS,USER
FROM Data1.Table1
WHERE FIELD(USER,'%AMIT%','%JOHN%','%SANDEEP%','%WARNE%')
AND ORIGINATING_ADDRESS NOT LIKE 'MUMBAI','CHINA','PAKISTAN'
This is giving error.Can some one please help ?
Use NOT IN to discard list of values from select. Considering that you want to discard when there is exact match
ORIGINATING_ADDRESS NOT IN ('MUMBAI','CHINA','PAKISTAN')
When you want to use pattern search and discard the use this
ORIGINATING_ADDRESS NOT LIKE '%MUMBAI%' OR
ORIGINATING_ADDRESS NOT LIKE '%CHINA%' OR
ORIGINATING_ADDRESS NOT LIKE '%PAKISTAN%'
For a set of values, use NOT IN, instead of NOT LIKE.
You might find regular expressions simpler for this purpose:
SELECT ADDRESS,USER
FROM Data1.Table1
WHERE USER REGEXP 'AMIT|JOHN|SANDEEP|WARNE' AND
ORIGINATING_ADDRESS NOT REGEXP 'MUMBAI|CHINA|PAKISTAN';

RegExp in mysql for field

I have the following query:
SELECT item from table
Which gives me:
<title>Titanic</title>
How would I extract the name "Titanic" from this? Something like:
SELECT re.find('\>(.+)\>, item) FROM table
What would be the correct syntax for this?
By default, MySQL does not provide functionality for extracting text using regular expressions. You can use REGEXP to find rows that match something like >.+<, but there is no straightforward way of extracting the captured group without some additional effort, such as:
using a library like lib_mysqludf_preg
writing your own MySQL function to extract matched text
performing regular string manipulation
using the regex functionality of whatever environment you're using MySQL from (e.g. PHP's preg_match)
reconsidering your need for regular expressions entirely. If you know that all your rows contain a <title> tag, for instance, it may be a better idea to simply use "normal" string functions such as SUBSTRING
As pointed out in the informative answer by George Bahij MySQL lacks this functionality so the options would be to either extend the functionality using udfs etc, or use the available string functions, in which case you could do:
SELECT
SUBSTR(
SUBSTRING_INDEX(
SUBSTRING_INDEX(item,'<title>',2)
,'</title>',1)
FROM 8
)
from table
Or if the string you need to extract from always is on the format <title>item</title> then you could simple use replace: replace(replace(item, '<title>', ''), '</title>','')
This regex: <\w+>.+</\w+> will match content in tags.
Your query should be something like:
SELECT * FROM `table` WHERE `field` REGEXP '<\w+>.+</\w+>';
Then if you're using PHP or something similar you could use a function like strip_tags to extract the content between the tags.
XML shouldn't be parsed with regexes, and at any rate MySQL only supports matching, not replacement.
But MySQL supports XPath 1.0. You should be able to simply do this:
SELECT ExtractValue(item,'/title') AS item_title FROM table;
https://dev.mysql.com/doc/refman/5.6/en/xml-functions.html

More efficient than using lots of LIKE queries mysql

I need to query a database to find certain urls containing a certian set of criteria for example : "MY" AND "sand" in any order.
I am currently using LIKE '%MY%' AND LIKE '%Sand%' is there a btter way of executing this?
Thanks
Could try REGEXP e.g:
WHERE url REGEXP '(my.*sand|sand.*my)'
Or alternatively:
WHERE URL REGEXP 'my' AND url REGEXP 'sand'
Not sure how the speed will compare...

MySQL: Select regex group [duplicate]

How to reference to a group using a regex in MySQL?
I tried:
REGEXP '^(.)\1$'
but it does not work.
How to do this?
(Old question, but top search result)
For MySQL 8:
SELECT REGEXP_REPLACE('stackoverflow','(.{5})(.*)','$2$1');
-- "overflowstack"
You can create capture groups with (), and you can refer to them using $1, $2, etc.
For MariaDB, capturing is done in REGEXP_REPLACE with \\1, \\2, etc. respectively.
You can't, there is no way to reference regex capturing groups in MySql.
You can solve this problem by nesting the function calls in your query. Say you have this string in your column:
'100 SOME ST,THE VILLAGES,FL 32163,USA'
and you want to capture the city name. A Capture Group like this would work if MySQL supported it (but it doesn't):
'^[0-9A-Z\s]+,\s*([a-zA-Z\s]*)'
You CAN nest function calls to strip off the part you don't want, and then grab the part you DO want like this:
SELECT REGEXP_SUBSTR(REGEXP_REPLACE(column_name, '^[0-9\\sA-Z]+,', ''), '^[0-9\\sA-Z]+') FROM table_name;
THE VILLAGES
...

MySQL regex only returns a single row

I have been writing a REGEX in MySQL to identify those domains that have a .com TLD. The URLs are usually of the form
http://example.com/
The regex I came up with looks like this:
REGEXP '[[.colon.]][[.slash.]][[.slash.]]([:alnum:]+)[[...]]com[[./.]]'
The reason we match the :// is so that we don't pick up URLs such as http://example.com/error.com/wrong.com
Therefore my query is
SELECT DISTINCT name
FROM table
WHERE name REGEXP '[[.colon.]][[.slash.]][[.slash.]]([:alnum:]+)[[...]]com[[./.]]'"
However, this is returning only a single row when it should really be returning many more (upwards of a thousand). What mistake am I making with the query?
Not sure if that's the problem, but it should be [[:alnum:]], not [:alnum:]
Your current query only matches names that end with .com/ rather than .com followed by anything that starts with a slash. Try the following:
SELECT DISTINCT name
FROM table
WHERE name REGEXP '[[.colon.]][[.slash.]][[.slash.]]([:alnum:]+)[[...]]com([[./.]].*)?'"
It might be clearer to split the URL rather than regexing it
SELECT DISTINCT name FROM table
WHERE SUBSTRING_INDEX((SUBSTRING_INDEX(name,'/',3),'.',-1)='com';