Group by substring

Group by substring - mysql

I have a field with text like "/site/index?sid=18&sub=321333&tid=site.net&ukey=1234543254".
How can I group it by part of string( 'sid' url param e.g.)?
And params may be in a different order.(sid on the end of line and etc.)

Take a look at the MySQL string functions:
http://dev.mysql.com/doc/refman/5.0/en/string-functions.html
Especially this looks helpful:
http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_substring-index
UPDATE
This is exactly what you asked for:
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX("/site/index?sid=18&sub=321333&tid=site.net&ukey=1234543254", 'sid=', -1), '&', 1) AS this_will_be_grouped
and use this_will_be_grouped in the GROUP BY clause of your query

Related

Capture groups in mysql regexp

I have a table with a varchar column that represents a path. I want to search for rows that have a path that follow a pattern like name.name[*] where name can be anything. I am looking for repeated strings contained anywhere in the path column that are separated by a period and have a square bracket after them.
This seems to call for Regexp, so through python I have something like https://regex101.com/r/apS20a/4
However, trying to implement this with MySQL Regexp is not working. I have been able to translate the shorthand into REGEXP '([A-Za-z_]+).(\1[[0-9]+])', but it seems that MySql Regex does not support capture groups. Is there a way to accomplish what I am trying to do with mysql regexp? Thank you

I don't think that MySQL supports capture groups. But if you only have one example of .name[ in the string between the first . and the first [, you can hack your way around it. This is not a general solution, just a specific approach in this case.
You can get the name with:
select substring_index(substring_index(url, '[', 1), '.', -1) as name
And then incorporate this into a regular expression:
select t.*
from (select t.*,
substring_index(substring_index(url, '[', 1), '.', -1) as name
from t
) t
where url like concat('%', name, '.', name, '[%');
This just uses like instead of regexp, because [ and . are regular expression wildcards. Of course, this assumes that name does not have _ or %.
EDIT:
Here is a method that actually identifies when this occurs -- and works even if there are multiple patterns.
The idea is to construct the regular expression based on what happens between the . and [ -- and then to apply it. Delightfully self-referential:
select t.*,
(url regexp regex)
from (select t.*,
substr(regexp_replace(url, '[^.]*[.]([^\\[]*)\\[[^.]*', '|$1[.]$1\\\\['), 2) as regex
from (select 'abcde.de[12345.345[ABC' as url union all
select 'abcdefdef[[[[..123.124['
) t
) t;
Here is the above in a db<>fiddle.

GROUP BY multiple text matches within one column

Given data like:
URL
some_url.com
some_url.com
some_url.co.uk
some_other_url.com
some_other_url.co.uk
some_other_url.co.uk
some_other_url.org
is there a way to construct a query that will result in;
some_url 3
some_other_url 4
Currently I'm either using a standard group by url or I query the aggregations one by one using LIKE
Is there a way to do this in one query? (using mysql currently, but will be moving this data over to postgresql)
Would it be better practice to add a column to reflect this grouping (at insert time)? (this feels redundant but would be best performing I guess)
EDIT:
data can contain www and non-www as well as http, https. Also I'll have to do similar thing on other columns that contain (free) text values.

This is ANSI SQL compliant and should probably work with both MySQL and Postgresql:
select url, count(*)
from
(
select substring(url from 1 for position('.' in url) -1) as url
from tablename
) dt
group by url
Using position() to find the first . character. Do substring() and finally GROUP BY the result.

use SUBSTRING_INDEX in mysql which help you substring from a string before a specified number of occurrences of the delimiter.
select count(*) as cnt, SUBSTRING_INDEX(c,'.',1) as val from cte
group by SUBSTRING_INDEX(c,'.',1)

Since the values can have http, https and www, and may be query string too, you will have to clean all such values first before grouping it. Took the reference from here and modified it to match your requirement.
SELECT url,
SUBSTRING_INDEX(
SUBSTRING_INDEX(
SUBSTRING_INDEX(
SUBSTRING_INDEX(
SUBSTRING_INDEX(
SUBSTRING_INDEX(url, '/', 3),
'://', -1),
'/', 1),
'?', 1),
'www.', -1),
'.', 1) AS domain,
COUNT(1)
FROM tblname
GROUP BY domain;

This works in Postgesql:
select split_part(url,'.',1) g,count(*)
from url_table
group by g
order by g;
Best regards,
Bjarni

Getting unique entries from a columns generated by matching regexp in SQL

I have a table which i am using to query and getting its one column which matches regular expression which is (\/.+\/\?).
Content of the resulted column is like:
/Anything here/?
Example output:
\abc\cdf\?....
\ab\?....
\abc\cdf\?....
\sb\?....
where '....' can be anything
Desired result i want is unique values before \? such that rows with duplicate regexp matched content are shown once only like here (\abc\cdf\?.... showing twice instead of onece)
\abc\cdf\?....
\ab\?....
\sb\?....
OR
\abc\cdf\?
\ab\?
\sb\?
I have looked very much but couldn't find anything there is regexp_substr in oracle but that is not working in SQL.
Please if someone could help me with the sql query that would be awesome.

If you want everything before the last \, then you can use substring_index() and some string manipulation:
select substring_index(col, '\\',
length(col) - length(replace(col, '\\', ''))
) as firstpart,
count(*)
from table t
group by substring_index(col, '\\',
length(col) - length(replace(col, '\\', ''))
);

query on FIND_IN_SET

The following query
SELECT ASSOCIATED_RISK
FROM PROJECT_ISSUES
WHERE FIND_IN_SET('98',ASSOCIATED_RISK);
returns output as
96,98
90,98
but if I use
SELECT ASSOCIATED_RISK
FROM PROJECT_ISSUES
WHERE FIND_IN_SET('96,98',ASSOCIATED_RISK);
it doesn't returns anything.In this case I would like to retrieve the first row.
96,98

Use the AND clause, like this:
SELECT ASSOCIATED_RISK
FROM PROJECT_ISSUES
WHERE FIND_IN_SET('96',ASSOCIATED_RISK)
AND FIND_IN_SET('98',ASSOCIATED_RISK)
Your query is failing because FIND_IN_SET() does not work properly if the first argument contains a comma (",") character. Reference: http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_find-in-set. In your case, the first argument is '96,98', so it fails.

Your comment:
is there any other way I can get it in single query instead of framing multiple find_in_set and concat them
As an alternative solution, you can use locate on your ASSOCIATED_RISK value.
Example:
locate( replace( '96,98', ',', '' ), replace( ASSOCIATED_RISK, ',', '' ) )
Edit:
As per Aziz Shaikh comment, we can see that there is a possibility of true result though the search string not existing in the target string.
As an alternative solution, you can replace the search string from target string with an empty string and compare the lengths. If original string's length is grater than new replaced string, then it is a found true result.
Example:
-- this should be greater than 0 for a found true
length( ASSOCIATED_RISK ) > length( replace( ASSOCIATED_RISK, '96,98', '' ) )

This will Give result. see the difference.
SELECT ASSOCIATED_RISK FROM PROJECT_ISSUES WHERE FIND_IN_SET(ASSOCIATED_RISK,'96,98');

MySQL remove final part of a string after specific character

I need to remove the last part of a string in a column where I have a field named "path" that looks like:
images/prop/images/2034/22399_2034.JPG
I need everything after the last "/" to be deleted, in order to have
images/prop/images/2034/
instead of
images/prop/images/2034/22399_2034.JPG
I have no idea if this is possible. Thanks.

You can combine MySQL's TRIM() and SUBSTRING_INDEX() functions:
SELECT TRIM(TRAILING SUBSTRING_INDEX(path, '/', -1) FROM path)
FROM my_table
See it on sqlfiddle.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Group by substring - mysql

I have a field with text like "/site/index?sid=18&sub=321333&tid=site.net&ukey=1234543254". How can I group it by part of string( 'sid' url param e.g.)? And params may be in a different order.(sid on the end of line and etc.)

Related

Capture groups in mysql regexp

GROUP BY multiple text matches within one column

Getting unique entries from a columns generated by matching regexp in SQL

query on FIND_IN_SET

MySQL remove final part of a string after specific character

Categories

Resources