Group by variable substring in MySQL - mysql

I have a table that contains multiple fields - let's say FieldA, FieldB etc. and finally Location. The Location field has values such as:
http://192.168.1.10/location?n=5
http://192.168.1.10/location?n=8
http://192.168.15.6/location?n=1
http://192.168.0.9/location?n=11
http://192.168.15.6/location?n=5
http://192.168.0.9/location?n=6
http://192.168.1.10/location?n=2
I need to get the unique values of the IP addresses in the Location field. In other words, from the above example data, I should get
http://192.168.1.10
http://192.168.15.6
http://192.168.0.9
Based on this answer, I am using the following SQL - without much luck
SELECT * FROM `table` WHERE FieldA = 'Example' GROUP BY (SELECT SUBSTRING_INDEX("`table`.Location", "/", 3))
The above gives me just a single record. What am I doing wrong?

Making judicious use of SUBSTRING_INDEX:
SELECT DISTINCT SUBSTRING_INDEX(SUBSTRING_INDEX(Location, '/', 3), '/', -1) AS distinct_ips
FROM yourTable;
Demo
For an explanation on how the above logic works, consider the location value http://192.168.1.10/location?n=5. The inner call to SUBSTRING_INDEX returns http://192.168.1.10, which is everything to the left of the third forward slash. Then, the outer call returns everything to the right of the last forward slash, which leaves us with the IP address.

Related

Searching a string in a column

can anyone help me. i have a DB in mysql, and need to search for a string in a particular column.
the field is var char, and contains various serial number, divided by the character "/".
example
613003593/8876572/TJMC49
the problem is searching in the string. If i use like, it will work most of the times, but not always, because if i do a like '%13003593%' it will return one row, when that is not true, the saved value is 613003593. how can i search, the string.
on the example there are 3 strings divided, and i need to search all of them.
apologies for my english
For the first part of the serial number,
SELECT * FROM table_name
WHERE SUBSTRING_INDEX(serial_number, '/', 1) = '613003593';
For the 2nd part,
SELECT * FROM table_name
WHERE SUBSTRING_INDEX(SUBSTRING_INDEX(serial_number, '/', 2), '/', -1)='8876572';
For the last part,
SELECT * FROM table_name
WHERE SUBSTRING_INDEX(SUBSTRING_INDEX(serial_number, '/', 3), '/', -1)='TJMC49';
Check How to split and search in comma-separated string in MySQL

Capture groups in mysql regexp

I have a table with a varchar column that represents a path. I want to search for rows that have a path that follow a pattern like name.name[*] where name can be anything. I am looking for repeated strings contained anywhere in the path column that are separated by a period and have a square bracket after them.
This seems to call for Regexp, so through python I have something like https://regex101.com/r/apS20a/4
However, trying to implement this with MySQL Regexp is not working. I have been able to translate the shorthand into REGEXP '([A-Za-z_]+).(\1[[0-9]+])', but it seems that MySql Regex does not support capture groups. Is there a way to accomplish what I am trying to do with mysql regexp? Thank you
I don't think that MySQL supports capture groups. But if you only have one example of .name[ in the string between the first . and the first [, you can hack your way around it. This is not a general solution, just a specific approach in this case.
You can get the name with:
select substring_index(substring_index(url, '[', 1), '.', -1) as name
And then incorporate this into a regular expression:
select t.*
from (select t.*,
substring_index(substring_index(url, '[', 1), '.', -1) as name
from t
) t
where url like concat('%', name, '.', name, '[%');
This just uses like instead of regexp, because [ and . are regular expression wildcards. Of course, this assumes that name does not have _ or %.
EDIT:
Here is a method that actually identifies when this occurs -- and works even if there are multiple patterns.
The idea is to construct the regular expression based on what happens between the . and [ -- and then to apply it. Delightfully self-referential:
select t.*,
(url regexp regex)
from (select t.*,
substr(regexp_replace(url, '[^.]*[.]([^\\[]*)\\[[^.]*', '|$1[.]$1\\\\['), 2) as regex
from (select 'abcde.de[12345.345[ABC' as url union all
select 'abcdefdef[[[[..123.124['
) t
) t;
Here is the above in a db<>fiddle.

GROUP BY multiple text matches within one column

Given data like:
URL
some_url.com
some_url.com
some_url.co.uk
some_other_url.com
some_other_url.co.uk
some_other_url.co.uk
some_other_url.org
is there a way to construct a query that will result in;
some_url 3
some_other_url 4
Currently I'm either using a standard group by url or I query the aggregations one by one using LIKE
Is there a way to do this in one query? (using mysql currently, but will be moving this data over to postgresql)
Would it be better practice to add a column to reflect this grouping (at insert time)? (this feels redundant but would be best performing I guess)
EDIT:
data can contain www and non-www as well as http, https. Also I'll have to do similar thing on other columns that contain (free) text values.
This is ANSI SQL compliant and should probably work with both MySQL and Postgresql:
select url, count(*)
from
(
select substring(url from 1 for position('.' in url) -1) as url
from tablename
) dt
group by url
Using position() to find the first . character. Do substring() and finally GROUP BY the result.
use SUBSTRING_INDEX in mysql which help you substring from a string before a specified number of occurrences of the delimiter.
select count(*) as cnt, SUBSTRING_INDEX(c,'.',1) as val from cte
group by SUBSTRING_INDEX(c,'.',1)
Since the values can have http, https and www, and may be query string too, you will have to clean all such values first before grouping it. Took the reference from here and modified it to match your requirement.
SELECT url,
SUBSTRING_INDEX(
SUBSTRING_INDEX(
SUBSTRING_INDEX(
SUBSTRING_INDEX(
SUBSTRING_INDEX(
SUBSTRING_INDEX(url, '/', 3),
'://', -1),
'/', 1),
'?', 1),
'www.', -1),
'.', 1) AS domain,
COUNT(1)
FROM tblname
GROUP BY domain;
This works in Postgesql:
select split_part(url,'.',1) g,count(*)
from url_table
group by g
order by g;
Best regards,
Bjarni

MySql how can i get specific value before specific character in where clause of specific column

I would like to get specific value before specific character from specific column.
For example ..
In town column i want only string value before - character , I mean only need ABBEYARD from town column.
I have used following query but not work.
SELECT * FROM `locations` WHERE town = SUBSTRING_INDEX('ABBOTSFORD','-',1)
Note: I only need in WHERE Clouse.
I think you want:
WHERE SUBSTRING_INDEX(town, '-', 1) = 'ABBOTSFORD'
However, I would recommend writing this as:
WHERE town LIKE 'ABBOTSFORD-%'
This can actually take advantage of an index.
Also, it looks like your data might have spaces around the '-'. If so, those should be in the comparison strings as well.
WHERE SUBSTRING_INDEX(town, ' - ', 1) = 'ABBOTSFORD'
WHERE town LIKE 'ABBOTSFORD -%'
You can repeat SUBSTRING_INDEX() in the SELECT to
get only the town or the number that follows.
If you want to return the town names because in the column town they are concatenated with the postcode, then you need to modify the select and not the where part:
SELECT id, TRIM(SUBSTRING_INDEX(town, '-', 1)) townname, postcode, state FROM locations
and maybe add this where part:
WHERE TRIM(SUBSTRING_INDEX(town, '-', 1)) = 'ABBOTSFORD'

If value is present in stored text string

I have a table, one of the columns contains a text values, some of which are comma separated string, like this:
Downtown, Market District, Warehouse District
I need to modify my query to see is a given value matches this column. I decided that using IN() is the best choice.
SELECT *
FROM t1
WHERE myValue IN (t1.nighborhood)
I am getting spotty results - sometimes I return records and sometimes not. If there's a value in t1.nighborhood that matches myValue, I do get data.
I checked and there are no MySQL errors. What am I missing?
You can use FIND_IN_SET() to search a comma-delimited list:
SELECT *
FROM t1
WHERE FIND_IN_SET(myValue, REPLACE(t1.nighborhood, ', ', ','));
The REPLACE() is necessary to remove the extra spaces.
Another solution is to use regex to match your search value surrounded by commas if necessary:
SELECT *
FROM t1
WHERE t1.nighborhood REGEXP CONCAT('(^|, )', myValue, '(, |$)');
In general, it's bad design to store distinct values in a single column. The data should be normalized into a related table with a foreign key.