I have a database with a lot of paths, and I need to find all the paths that have only 2 levels, not more not less.
For example, I need a query that will find string matching the following structure:
folder/folder/file.ext
But not:
folder/file.ext and not folder/folder/folder/file.ext or anything longer
My guess here is to use REGEX and match strings that precisely have 2 slashes / but I don't know how to formulate the expression, something like:
SELECT `name` FROM `table` WHERE `name` REGEXP '????'
In my case I need to find 2 slashes and is very specific but ideally this answer will be useful for anybody looking for 3 or X number of slashes or any other character repeated on the string.
The simplest method uses like:
where name like '%/%/%' and
name not like '%/%/%/%'
Doing this as a regular expression is tricky. But here is another method:
where length(name) - length(replace(name, '/', '')) = 2
As a regular expression:
where name regexp '^([^/]*[/]){2}[^/]*$'
So it is possible, although perhaps less scrutable.
WHERE LENGTH(name) = 2 +
LENGTH(REPLACE(name, '/', ''))
Related
I am trying to pull a product code from a long set of string formatted like a URL address. The pattern is always 3 letters followed by 3 or 4 numbers (ex. ???### or ???####). I have tried using REGEXP and LIKE syntax, but my results are off for both/I am not sure which operators to use.
The first select statement is close to trimming the URL to show just the code, but oftentimes will show a random string of numbers it may find in the URL string.
The second select statement is more rudimentary, but I am unsure which operators to use.
Which would be the quickest solution?
SELECT columnName, SUBSTR(columnName, LOCATE(columnName REGEXP "[^=\-][a-zA-Z]{3}[\d]{3,4}", columnName), LENGTH(columnName) - LOCATE(columnName REGEXP "[^=\-][a-zA-Z]{3}[\d]{3,4}", REVERSE(columnName))) AS extractedData FROM tableName
SELECT columnName FROM tableName WHERE columnName LIKE '%___###%' OR columnName LIKE '%___####%'
-- Will take a substring of this result as well
Example Data:
randomwebsite.com/3982356923abcd1ab?random_code=12480712_ABC_DEF_ANOTHER_CODE-xyz123&hello_world=us&etc_etc
In this case, the desired string is "xyz123" and the location of said pattern is variable based on each entry.
EDIT
SELECT column, LOCATE(column REGEXP "([a-zA-Z]{3}[0-9]{3,4}$)", column), SUBSTR(column, LOCATE(column REGEXP "([a-zA-Z]{3}[0-9]{3,4}$)", column), LENGTH(column) - LOCATE(column REGEXP "^.*[a-zA-Z]{3}[0-9]{3,4}", REVERSE(column))) AS extractData From mainTable
This expression is still not grabbing the right data, but I feel like it may get me closer.
I suggest using
REGEXP_SUBSTR(column, '(?<=[&?]random_code=[^&#]{0,256}-)[a-zA-Z]{3}[0-9]{3,4}(?![^&#])')
Details:
(?<=[&?]random_code=[^&#]{0,256}-) - immediately on the left, there must be & or &, random_code=, and then zero to 256 chars other than & and # followed with a - char
[a-zA-Z]{3} - three ASCII letters
[0-9]{3,4} - three to four ASCII digits
(?![^&#]) - that are followed either with &, # or end of string.
See the online demo:
WITH cte AS ( SELECT 'randomwebsite.com/3982356923abcd1ab?random_code=12480712_ABC_DEF_ANOTHER_CODE-xyz123&hello_world=us&etc_etc' val
UNION ALL
SELECT 'randomwebsite.com/3982356923abcd1ab?random_code=12480712_ABC_DEF_ANOTHER_CODE-xyz4567&hello_world=us&etc_etc'
UNION ALL
SELECT 'randomwebsite.com/3982356923abcd1ab?random_code=12480712_ABC_DEF_ANOTHER_CODE-xyz89&hello_world=us&etc_etc'
UNION ALL
SELECT 'randomwebsite.com/3982356923abcd1ab?random_code=12480712_ABC_DEF_ANOTHER_CODE-xyz00000&hello_world=us&etc_etc'
UNION ALL
SELECT 'randomwebsite.com/3982356923abcd1ab?random_code=12480712_ABC_DEF_ANOTHER_CODE-aaaaa11111&hello_world=us&etc_etc')
SELECT REGEXP_SUBSTR(val,'(?<=[&?]random_code=[^&#]{0,256}-)[a-zA-Z]{3}[0-9]{3,4}(?![^&#])') output
FROM cte
Output:
I'd make use of capture groups:
(?<=[=\-\\])([a-zA-Z]{3}[\d]{3,4})(?=[&])
I assume with [^=\-] you wanted to capture string with "-","\" or "=" in front but not include those chars in the result. To do that use "positive lookbehind" (?<=.
I also added a lookahead (?= for "&".
If you'd like to fidget more with regex I recommend RegExr
I have a table with a varchar column that represents a path. I want to search for rows that have a path that follow a pattern like name.name[*] where name can be anything. I am looking for repeated strings contained anywhere in the path column that are separated by a period and have a square bracket after them.
This seems to call for Regexp, so through python I have something like https://regex101.com/r/apS20a/4
However, trying to implement this with MySQL Regexp is not working. I have been able to translate the shorthand into REGEXP '([A-Za-z_]+).(\1[[0-9]+])', but it seems that MySql Regex does not support capture groups. Is there a way to accomplish what I am trying to do with mysql regexp? Thank you
I don't think that MySQL supports capture groups. But if you only have one example of .name[ in the string between the first . and the first [, you can hack your way around it. This is not a general solution, just a specific approach in this case.
You can get the name with:
select substring_index(substring_index(url, '[', 1), '.', -1) as name
And then incorporate this into a regular expression:
select t.*
from (select t.*,
substring_index(substring_index(url, '[', 1), '.', -1) as name
from t
) t
where url like concat('%', name, '.', name, '[%');
This just uses like instead of regexp, because [ and . are regular expression wildcards. Of course, this assumes that name does not have _ or %.
EDIT:
Here is a method that actually identifies when this occurs -- and works even if there are multiple patterns.
The idea is to construct the regular expression based on what happens between the . and [ -- and then to apply it. Delightfully self-referential:
select t.*,
(url regexp regex)
from (select t.*,
substr(regexp_replace(url, '[^.]*[.]([^\\[]*)\\[[^.]*', '|$1[.]$1\\\\['), 2) as regex
from (select 'abcde.de[12345.345[ABC' as url union all
select 'abcdefdef[[[[..123.124['
) t
) t;
Here is the above in a db<>fiddle.
I want to find common parent folder in my table images with field path.
For example, in my table i have for this table 6 rows.
1 c:\images\site1\root\img\logo.png
2 c:\images\site1\root\resource\test1.png
3 c:\images\site1\root\resource\test2.png
4 c:\images\site1\root\resource\test3.png
5 c:\images\site1\root\images\background.png
And I want to have "root" folders:
c:\images\site1\root\img\
c:\images\site1\root\resource\
c:\images\site1\root\images\
What is the SQL request to get this, please?
Many thanks.
This can be a bit tricky. If you know that the path does not share the name of the file, you could replace the last part with an empty string:
select distinct replace(path, substring_index(path, '\\', -1), '')
However, this may not always be true.
What you want is everything before the last occurrence of \\. Here is a method using substring_index():
select distinct substring_index(path, '\\',
length(path) - length(replace(path, '\\', '')
)
The difference in lengths is simply counting the number of separator characters in the string. If the path has one separator (i.e. 'a\b'), then the argument to substring_index() is 1, which is what you want.
I've been to the regexp page on the MySQL website and am having trouble getting the query right. I have a list of links and I want to find invalid links that do not contain a period. Here's my code that doesn't work:
select * from `links` where (url REGEXP '[^\\.]')
It's returning all rows in the entire database. I just want it to show me the rows where 'url' doesn't contain a period. Thanks for your help!
SELECT c1 FROM t1 WHERE c1 NOT LIKE '%.%'
Your regexp matches anything that contains a character that isn't a period. So if it contains foo.bar, the regexp matches the f and succeeds. You can do:
WHERE url REGEXP '^[^.]*$'
The anchors and repetition operator make this check that every character is not a period. Or you can do:
WHERE LOCATE(url, '.') = 0
BTW, you don't need to escape . when it's inside [] in a regexp.
Using regexp seems like an overkill here. A simple like operator would do the trick:
SELECT * FROM `links` WHERE url NOT LIKE '%.%
EDIT:
Having said that, if you really want to negate regexp, just use not regexp:
SELECT * FROM `links` WHERE url NOT REGEXP '[\\.]';
I'm building a basic search functionality, using LIKE (I'd be using fulltext but can't at the moment) and I'm wondering if MySQL can, on searching for a keyword (e.g. WHERE field LIKE '%word%') return 20 words either side of the keyword, as well?
You can do it all in the query using SUBSTRING_INDEX
CONCAT_WS(
' ',
-- 20 words before
TRIM(
SUBSTRING_INDEX(
SUBSTRING(field, 1, INSTR(field, 'word') - 1 ),
' ',
-20
)
),
-- your word
'word',
-- 20 words after
TRIM(
SUBSTRING_INDEX(
SUBSTRING(field, INSTR(field, 'word') + LENGTH('word') ),
' ',
20
)
)
)
Use the INSTR() function to find the position of the word in the string, and then use SUBSTRING() function to select a portion of characters before and after the position.
You'd have to look out that your SUBSTRING instruction don't use negative values or you'll get weird results.
Try that, and report back.
I don't think its possible to limit the number of words returned, however to limit the number of chars returned you could do something like
SELECT SUBSTRING(field_name, LOCATE('keyword', field_name) - chars_before, total_chars) FROM table_name WHERE field_name LIKE "%keyword%"
chars_before - is the number of
chars you wish to select before the
keyword(s)
total_chars - is the
total number of chars you wish to
select
i.e. the following example would return 30 chars of data staring from 15 chars before the keyword
SUBSTRING(field_name, LOCATE('keyword', field_name) - 15, 30)
Note: as aryeh pointed out, any negative values in SUBSTRING() buggers things up considerably - for example if the keyword is found within the first [chars_before] chars of the field, then the last [chars_before] chars of data in the field are returned.
I think your best bet is to get the result via SQL query and apply a regular expression programatically that will allow you to retrieve a group of words before and after the searched word.
I can't test it now, but the regular expression should be something like:
.*(\w+)\s*WORD\s*(\w+).*
where you replace WORD for the searched word and use regex group 1 as before-words, and 2 as after-words
I will test it later when I can ask my RegexBuddy if it will work :) and I will post it here