Extract third level of folder structure stored in MySQL field - mysql

I'm using the following regex (https://regex101.com/r/Kt9sNj/1) in PHP to get all the files in the third level of a directory:
/^(\/[^\/]*){1,4}\/?$/m
Then if I have the following data:
/home/myuser/folder_example/first_file.txt
/home/myuser/folder_example/second_file.txt
/home/myuser/folder_example/third_file.txt
I get:
first_file.txt
second_file.txt
third_file.txt
I try to use this in a MySQL query that contains an array of a json object.
My Query is:
SELECT data->'$.files' AS File
FROM table
WHERE user = 'myuser';
And I get:
["/home/myuser/folder_example/first_file.txt","/home/myuser/folder_example/second_file.txt","/home/myuser/folder_example/third_file.txt"]
But when I use that regex on my sql query:
SELECT data->'$.files' AS File
FROM table
WHERE user = 'myuser'
AND data->'$.files' REGEXP '^(\/[^\/]*){1,4}\/?$';
I need to get this (all files under that directory):
["first_file.txt","second_file.txt","third_file.txt"]
It doesn't work. Do you know why?

The function REGEXP returns 1 if the pattern matches and will return the full match as the pattern does match the example strings.
In your pattern you are repeating a capturing group, which will capture the last value of the iteration in group 1, but it still contains a leading forward slash that you don't want in the output.
What you might do is match the first /, and then use a quantifier {3} to repeat exactly 3 times a part ending on a / using a non capture group.
Then capture the filename in group 1, and refer to that group using '$1' in the replacement using REGEXP_REPLACE
^/(?:[^/]*/){3}(\S+\.[^.\s]+)$
Regex demo | Mysql with replace demo

Related

SQL Data Update Query About

I have a table named testlink and it has url and newtarget columns.
I would like to take the string expressions https://domain1.com/ here in the url column and change all the data in the newtarget column to https://domain1.com/search/?q= pulled string expression.
So briefly;
url columns from https://domain1.com/topic1
will be changed to https://domain1.com/search/?q=topic1 in the newtarget column
There are about 6 thousand different topics (lines) available.
Database: Mysql / Phpmyadmin.
use REPLACE
UPDATE testlink
SET newtarget = REPLACE(url,'https://domain1.com/','https://domain1.com/search/?q=')
MySQL REPLACE() replaces all the occurrences of a substring within a
string.
REPLACE(str, find_string, replace_with)
If you want to conditionally change the value, you can use string manipulations:
update t
set url = concat(left(url, length(url) - length(substring_index(url, '/', -1))), 'q=', substring_index(url, '/', -1))
where url like 'https://domain1.com/%';
This uses substring_index() to get the last part of the string (after the last /). It uses left() to get the first part (based on the length of the last part) and then concatenates the values you want.
Of course, test this logic using a SELECT before implementing an UPDATE.
If you're using MySQL 8, then you'd be able to do that with REGEXP_REPLACE.
For your example, this should work :
SELECT REGEXP_REPLACE('https://domain1.com/topic1','(https:\/\/domain1\.com\/)(.+)','$1search/?q=$2')

Making a SQL query via regex matching

Is it possible to make a SQL query by matching a pattern? I know SQL allows for wildcards but I dont think they fit my use case.
Suppose I have a table that contains the following record (represented here in JSON):
Table Name: url_stuff
{
id: 2
path: \/user\/(.*)\/
value: "I am the user path"
}
Then suppose I had the following string representing a URL path:
/user/gandalfthewhite
I would like to make a query that returns this record.
SELECT * FROM url_stuff WHERE path LIKE '/user/gandalfthewhite'
Obviously this wont work, but perhaps there is some other way to use SQL such that /user/gandalfthewhite matches \/user\/(.*)\/ as it would with regex and return the above record.
One solution is obviously to grab all records from the database and search via regex after the fact, but this would not be scaleable for a large number of records. I would ideally be able to grab all matching records with a query directly.
If I understand correctly, you can just use regexp:
SELECT *
FROM url_stuff
WHERE '/user/gandalfthewhite' REGEXP url

regexp mysql group

I try get name of city's from string '{"travelzoo_hotel_name":"Graduate Minneapolis","travelzoo_hotel_id":"223","city":"Minneapolis","country":"USA","sales_manager":"Stephen Conti"}'
I try this regexp:
SELECT REGEXP_SUBSTR('{\"travelzoo_hotel_name\":\"Graduate Minneapolis\",\"travelzoo_hotel_id\":\"223\",\"city\":\"Minneapolis\",\"country\":\"USA\",\"sales_manager\":\"Stephen Conti\"}'
,'(?:.city...)([[:alnum:]]+)');
I have: '"city":"Minneapolis'
Me need only name of city:Minneapolis.
How to use groups in queries?
My example in regex101
Help me Please
I assume you are using MySQL 8.x that uses ICU regex expressions.
It looks like the string you want to process is JSON. You may use JSON_EXTRACT with JSON_UNQUOTE and a '$.city' as JSON path then:
JSON_UNQUOTE(JSON_EXTRACT('{"travelzoo_hotel_name":"Graduate Minneapolis","travelzoo_hotel_id":"223","city":"Minneapolis","country":"USA","sales_manager":"Stephen Conti"}', '$.city'))
will return Minneapolis.
In your regex, the non-capturing group pattern is still matched and appended to the match value. "Non-capturing" only means no separate memory buffer is alotted to the text captured with a grouping construct. So, you may fix it with '(?<="city":")[^"]+' pattern where (?<="city":") is a positive lookbehind that matches "city":" but does not put it into the match value. The only text you will have in the output is the one matched with [^"]+, 1+ chars other than ".

MySQL REGEXP Match / or End of Field

I've got a load of paths in a MySQL Database (5.7.17-0ubuntu0.16.04.2) and the user chooses a selection of paths, I want to select all those paths and those below, but I've hit an issue.
Say the user wants "/root/K" I need to do a select for:
a. /root/K%
b. /root/K
How can I get the REGEXP to match the end of a field or a / ?
I've tried the following:
original query:
where path REGEXP ('/root/K/|/root/J/J/') # this works but doesn't show the items in that path, only ones below
where path REGEXP '/root/K[/\z]' # does the same as above
where path REGEXP '/root/K(?=/|$)' # Get error 1139, repetition-operator invalid
I've also tried: Regex to match _ or end of string but that gives error 1139
Any other suggestions?
There is no support for lookarounds, not \z anchor in MySQL regex. You may use a normal capturing group:
WHERE path REGEXP '/root/K(/|$)'
It will match
/root/K - literal char sequence
(/|$) - either / or end of entry.

Using mysql Regexp in a select statement not the where clause

So what i want to do is use a Regex to select only certain parts of data from a column
Example :
SELECT URL REGEXP 'http://' from websitelist --(website list is a list of URL's)
If I run that against the table it returns 1 foreach row in which 'htt://' was found, what I want is to return the string that matches the regexp
The REGEXP operator performs a match and returns 0 or 1 based on whether the string matched the expression or not. It does not provide a way to extract the matched portion. I don't think that there is any way to extract the matched portion.
You could use just use string functions if it's as simple as your example - just removing http://
SELECT REPLACE(URL, 'http://', '') AS url FROM websitelist;
Probably faster as there is overhead for the REGEX engine.