going trough the position of a specific character using mysql select statement - mysql

sorry for the title..
my problem is on how to get a specific part of a URL using mysql select statement for example the url
http://www.google.com/search?q=lpol&ie=utf-8&oe=utf-8&client=ubuntu&channel=fs
and
http://www.google.com/search?q=query+to+count+specific+character&ie=utf-8&oe=utf-8&client=ubuntu&channel=fs#hl=fil&client=ubuntu&channel=fs&sa=X&ei=J1knUPu9GsiUiAe3xYB4&ved=0CEQQvwUoAQ&q=mysql+query+to+go+through+specific+character+position&spell=1&bav=on.2,or.r_gc.r_pw.r_qf.&fp=c4fd06cd155ee554&biw=1014&bih=424
these two are different url's but they both have google.com in their url so how can i get the google.com so i can count these 2 url in to 1 using mysql select statement

The SUBSTRING_INDEX function should work
SELECT SUBSTRING_INDEX('http://www.mysql.com/abcd/asd/', '/', 3); -> 'www.mysql.com'
Use this in combination with the column that you have.
SELECT SUBSTRING_INDEX(column, '/', 3) FROM table; -> URLS without slashes
im no entierly sure about the 3 it could be 2.
Goodluck.

Related

SQL Data Update Query About

I have a table named testlink and it has url and newtarget columns.
I would like to take the string expressions https://domain1.com/ here in the url column and change all the data in the newtarget column to https://domain1.com/search/?q= pulled string expression.
So briefly;
url columns from https://domain1.com/topic1
will be changed to https://domain1.com/search/?q=topic1 in the newtarget column
There are about 6 thousand different topics (lines) available.
Database: Mysql / Phpmyadmin.
use REPLACE
UPDATE testlink
SET newtarget = REPLACE(url,'https://domain1.com/','https://domain1.com/search/?q=')
MySQL REPLACE() replaces all the occurrences of a substring within a
string.
REPLACE(str, find_string, replace_with)
If you want to conditionally change the value, you can use string manipulations:
update t
set url = concat(left(url, length(url) - length(substring_index(url, '/', -1))), 'q=', substring_index(url, '/', -1))
where url like 'https://domain1.com/%';
This uses substring_index() to get the last part of the string (after the last /). It uses left() to get the first part (based on the length of the last part) and then concatenates the values you want.
Of course, test this logic using a SELECT before implementing an UPDATE.
If you're using MySQL 8, then you'd be able to do that with REGEXP_REPLACE.
For your example, this should work :
SELECT REGEXP_REPLACE('https://domain1.com/topic1','(https:\/\/domain1\.com\/)(.+)','$1search/?q=$2')

How to pass multiple delimeters in substring_index

I want to query the string between https:// or http:// and the first delimeter characters that comes after it. For example, if the field contains:
https://google.com/en/
https://www.yahoo.com?en/
I want to get:
google.com
www.yahoo.com
My initial query that will capture the / only contains two substring_index as follows:
SELECT substring_index(substring_index(mycol,'/',3),'://',-1)
FROM mytable;
Now I found that the URLs may contain multiple delimeters. I want my statament to capture multiple delimeters possibilities which are (each one is a separate character):
:/?#[]#!$&'()*+,;=
How to do this in my statement? I tried this solution but the end result the command could not be executed due to syntax error while I am sure I followed the solution. Can anyone help me correctly construct the query to capture all the delimeter characters I listed above?
I use MySQL workbecnh 6.3 on Ubuntu 18.04.
EDIT:
Some corrections made in the first example of URLs.
First, note that https://www.yahoo.com?en/ seems like an unlikely URL, because it has a path separator contained inside the query string. In any case, if you are using MySQL 8+, then consider using its regex functionality. The REGEXP_REPLACE function can be helpful here, using the following pattern:
https?://([A-Za-z_0-9.-]+).*
Sample query:
WITH yourTable AS (
SELECT 'https://www.yahoo.com?en/' AS url UNION ALL
SELECT 'no match'
)
SELECT
REGEXP_REPLACE(url, 'https?://([A-Za-z_0-9.-]+).*', '$1') AS url
FROM yourTable
WHERE url REGEXP 'https?://[^/]+';
Demo
The term $1 refers to the first capture group in the regex pattern. An explicit capture group is denoted by a quantity in parentheses. In this case, here is the capture group (highlighted below):
https?://([A-Za-z_0-9.-]+).*
^^^^^^^^^^^^^^^
That is, the capture group is the first portion of the URL path, including domain, subdomain, etc.
In MySQL 8+, this should work:
SELECT regexp_replace(regexp_substr(mycol, '://[a-zA-Z0-9_.]+[/:?]'), '[^a-zA-Z0-9_.]', '')
FROM (SELECT 'https://google.com/en' as mycol union all
SELECT 'https://www.yahoo.com?en'
) x
In older versions, this is much more challenging because there is no way to search for a string class.
One brute force method is:
select (case when substring_index(mycol, '://', -1) like '%/%'
then substring_index(substring_index(mycol, '://', -1), '/', 1)
when substring_index(mycol, '://', -1) like '%?%'
then substring_index(substring_index(mycol, '://', -1), '?', 1)
. . . -- and so on for each character
else substring_index(mycol, '://', -1)
end) as what_you_want
The [a-zA-Z0-9_.] is intended to be something like the valid character class for your domain names.

substring_index does not take exact prefix

I have a table contains strings. Some of them starts with https:// and some starts with http://. I want to extract those that starts with http:// (without s). Please note I do not want to use LIKE statement because of another conflict in my plan to treat this string. So if I have the following items in a column called mycol in mytable:
https://111.com/
https://www.222.com/en-gb/
I make this query:
SELECT `mytable`.`mycol`, substring_index(`mytable`.`mycol`,'http://',-1)
I still get these strings in the results:
https://111.com/
https://www.222.com/en-gb/
Why? since my query is looking for http:// not https://, why do I get results start with https://? In this simple example, it should return nothing as there is no string starts with http://
want to extract the string
use regex. Much more simpler.
SELECT mycol FROM mytable WHERE mycol REGEXP '^http://.+';
You could add check for delimeter:
SELECT `mycol`,
IF(instr(mycol,'http://') > 0, substring_index(`mytable`.`mycol`,'http://',-1),NULL)
FROM mytable;
db<>fiddle demo
When the SUBSTRING_INDEX function cannot find delim string it will return original string instead of NULL.
SELECT substring_index('abc','.',-1)
=>
abc
I want to extract those that starts with http://
If you don't want want to use LIKE you can use left():
select right(columnname, length(columnname) - 7) from tablename
where left(columnname, 7) = 'http://'
You don't need to know the length of the string.
All you want is extract the part of the string after the 1st 7 chars.
The length of this part is:
length(columnname) - 7
So use right().

Replace a part of a file path in a string field with SQL

Hello I have a table Gallery with a field url_immagine and I would like to use a query to replace all values that look like upload/gallery/311/ge_c1966615153f6b2fcf5d84c1e389eea8.jpg in /ge_c1966615153f6b2fcf5d84c1e389eea8.jpg
Unfortunately the a part of the string, the ID (331) is not always the same and therefore can not understand how ...
I tried the regular expression like this:
UPDATE gallery SET url_immagine = replace(url_immagine, 'upload/gallery/.*/', '/')
but it seem not to work.
Combine CONCAT and SUBSTRING_INDEX since you can use last index of "/"
UPDATE gallery
SET url_immagine = (SELECT CONCAT('/',SUBSTRING_INDEX(url_immagine, '/', -1)));
Try that to confirm it's doing what you want :
SELECT CONCAT('/',SUBSTRING_INDEX(url_immagine, '/', -1))
FROM gallery
You can see documentation for the replace function and all other string functions in the mysql manual:
https://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_replace
It does not mention that replace handles regular expressions, so we an assume it does not, and it is working verbatim and uses the your * to look for the char *.
You see also that there seem not to be a function that does the whole job for you. So you must somehow combine them. The idea of Mateo is probably the right direction.

MySQL regex only returns a single row

I have been writing a REGEX in MySQL to identify those domains that have a .com TLD. The URLs are usually of the form
http://example.com/
The regex I came up with looks like this:
REGEXP '[[.colon.]][[.slash.]][[.slash.]]([:alnum:]+)[[...]]com[[./.]]'
The reason we match the :// is so that we don't pick up URLs such as http://example.com/error.com/wrong.com
Therefore my query is
SELECT DISTINCT name
FROM table
WHERE name REGEXP '[[.colon.]][[.slash.]][[.slash.]]([:alnum:]+)[[...]]com[[./.]]'"
However, this is returning only a single row when it should really be returning many more (upwards of a thousand). What mistake am I making with the query?
Not sure if that's the problem, but it should be [[:alnum:]], not [:alnum:]
Your current query only matches names that end with .com/ rather than .com followed by anything that starts with a slash. Try the following:
SELECT DISTINCT name
FROM table
WHERE name REGEXP '[[.colon.]][[.slash.]][[.slash.]]([:alnum:]+)[[...]]com([[./.]].*)?'"
It might be clearer to split the URL rather than regexing it
SELECT DISTINCT name FROM table
WHERE SUBSTRING_INDEX((SUBSTRING_INDEX(name,'/',3),'.',-1)='com';