Regex Search with delimiters in and Mysql - mysql

I'm trying to convert a regex that works fine in PHP to MySQL.
MySQL does not allow negative look-ahead (?!) so I need a solution or a workaround
My DB column data is a string like this:
title:The Book Title¬#¬description:The Book Description¬#¬Price:$10.57
The regex I can use in PHP would be
(^|¬#¬)title:(((?!¬#¬).)*Book((?!¬#¬).)*)
but in MySQL I'm struggling. Anybody have any advice or suggestions

MySQL doesn't have a way to apply a REGEX to col's content in SELECT clause.
You may use SUBSTRING function to extract your content in this case.
SELECT
SUBSTRING_INDEX(
LEFT( content, LOCATE('?#?description', content)-1 ), 'title:', -1) AS title,
SUBSTRING_INDEX(
LEFT( content, LOCATE('?#?Price', content)-1 ), 'description:', -1) AS description,
SUBSTRING_INDEX(
RIGHT( content, LOCATE('?#?Price', content)-1 ), 'Price:', -1) AS price
FROM test_table
SQLFiddle
http://sqlfiddle.com/#!2/04e83/1

The solution was simple once I thought about splitting it like nhahtdh suggested.
select
SUBSTRING_INDEX(SUBSTR(table.data, LOCATE('title:', table.data)+6), '¬#¬', 1) regexp '[[:<:]]Book[[:>:]]' AS hasResult
from
table;

Related

How to pass multiple delimeters in substring_index

I want to query the string between https:// or http:// and the first delimeter characters that comes after it. For example, if the field contains:
https://google.com/en/
https://www.yahoo.com?en/
I want to get:
google.com
www.yahoo.com
My initial query that will capture the / only contains two substring_index as follows:
SELECT substring_index(substring_index(mycol,'/',3),'://',-1)
FROM mytable;
Now I found that the URLs may contain multiple delimeters. I want my statament to capture multiple delimeters possibilities which are (each one is a separate character):
:/?#[]#!$&'()*+,;=
How to do this in my statement? I tried this solution but the end result the command could not be executed due to syntax error while I am sure I followed the solution. Can anyone help me correctly construct the query to capture all the delimeter characters I listed above?
I use MySQL workbecnh 6.3 on Ubuntu 18.04.
EDIT:
Some corrections made in the first example of URLs.
First, note that https://www.yahoo.com?en/ seems like an unlikely URL, because it has a path separator contained inside the query string. In any case, if you are using MySQL 8+, then consider using its regex functionality. The REGEXP_REPLACE function can be helpful here, using the following pattern:
https?://([A-Za-z_0-9.-]+).*
Sample query:
WITH yourTable AS (
SELECT 'https://www.yahoo.com?en/' AS url UNION ALL
SELECT 'no match'
)
SELECT
REGEXP_REPLACE(url, 'https?://([A-Za-z_0-9.-]+).*', '$1') AS url
FROM yourTable
WHERE url REGEXP 'https?://[^/]+';
Demo
The term $1 refers to the first capture group in the regex pattern. An explicit capture group is denoted by a quantity in parentheses. In this case, here is the capture group (highlighted below):
https?://([A-Za-z_0-9.-]+).*
^^^^^^^^^^^^^^^
That is, the capture group is the first portion of the URL path, including domain, subdomain, etc.
In MySQL 8+, this should work:
SELECT regexp_replace(regexp_substr(mycol, '://[a-zA-Z0-9_.]+[/:?]'), '[^a-zA-Z0-9_.]', '')
FROM (SELECT 'https://google.com/en' as mycol union all
SELECT 'https://www.yahoo.com?en'
) x
In older versions, this is much more challenging because there is no way to search for a string class.
One brute force method is:
select (case when substring_index(mycol, '://', -1) like '%/%'
then substring_index(substring_index(mycol, '://', -1), '/', 1)
when substring_index(mycol, '://', -1) like '%?%'
then substring_index(substring_index(mycol, '://', -1), '?', 1)
. . . -- and so on for each character
else substring_index(mycol, '://', -1)
end) as what_you_want
The [a-zA-Z0-9_.] is intended to be something like the valid character class for your domain names.

How to split a column in two columns

I have an issue with a table called "movies". I found the date and the movie title are both in the title column. As shown in the picture:
I don't know how to deal with this kind of issues. So, I tried to play with this code to make it similar to MySQL codes but I didn't work anyways.
DataFrame(row.str.split(' ',-1).tolist(),columns = ['title','date'])
How do I split it in two columns (title, date)?
If you are using MySQL 8+, then we can try using REGEXP_REPLACE:
SELECT
REGEXP_REPLACE(title, '^(.*)\\s\\(.*$', '$1') AS title,
REGEXP_REPLACE(title, '^.*\\s\\((\\d+)\\)$', '$1') AS date
FROM yourTable;
Demo
Here is a general regex pattern which can match your title strings:
^.*\s\((\d+)\)$
Explanation:
^ from the start of the string
(.*)\s match and capture anything, up to the last space
\( match a literal opening parenthesis
(\d+) match and capture the year (any number of digits)
\) match a literal closing parenthesis
$ end of string
I would simply do:
select left(title, length(title) - 7) as title,
replace(right(title, 5) ,')', '') as year
Regular expressions seem like overkill for this logic.
In Hive, you need to use substr() for this:
select substr(title, 1, length(title) - 7) as title,
substr(title, length(title) - 5, 4) as year
After struggling and searching I was able to build this command which works perfectly.
select
translate(substr(title,0,length(title) -6) ,'', '') as title,
translate(substr(title, -5) ,')', '') as date
from movies;
Thanks for the people who answered too!

Replacing substrings in MySQL

I have some sentences in string in MySQL. And I need to replace substrings such as 'My' to 'my' if this word not first in sentence. How I can doing this?
CHAR, REPLACE, REPEAT, etc. I'd recommend reading mySQL ref: http://dev.mysql.com/doc/refman/5.5/en/string-functions.html
If you just want to replace several words, you can replace them, using this approach:
UPDATE str_test SET str = REPLACE(str, ' My', ' my')
fiddle
As the words inside the text will be preceded by space. But if you want a regexp replace, it will be a more difficult task:
How to count words in MySQL / regular expression replacer?
https://dba.stackexchange.com/questions/15250/how-to-do-a-case-sensitive-search-in-where-clause
MySql support for string is very limited. A quick solution would be to use something like this:
SELECT
CONCAT(
LEFT(col, 1),
REPLACE(SUBSTRING(col, 2), 'My', 'my')
)
Please see fiddle here. This will replace all the strings My to my, except the first one.
Or maybe this:
SELECT
col,
RTRIM(REPLACE(CONCAT(col, ' '), ' My ', ' my '))
FROM
yourtable
that will replace all whole words except the first one.

mysql get whole word from string all words that have character in it

I have a table with a string field and I want to extract the first word that contains a '-'
if the field has "so I want to get th-is word" would return "th-is"
The basic answer is you should rely on your application code to parse this response, as Mysql does not have string functions built in that would handle this efficiently.
Another option is to create your own MySql function to handle this - this link may work as a tutorial for you.
Otherwise, here is a select statement that would do want you want - however I don't think I'd use it in production myself.
SELECT
CONCAT(
SUBSTRING_INDEX( SUBSTRING_INDEX( "so I want to get th-is word", '-', 1 ), ' ', -1 ),
'-',
SUBSTRING_INDEX( SUBSTRING_INDEX( "so I want to get th-is word", '-', -1 ), ' ', 1 )
) AS returnstring;
The simplest way would be to enclose the word that contains punctuation characters within `back-ticks`
"so I want to get `th-is` word" would generate "so I want to get th-is word"

mysql natural sort

I have example number in the format :
1.1
1.1.4
1.1.5
2.1
2.1.10
2.1.23
3.1a
3.1b
4.1.5
4.2.6
4.7.12
How do I sort it in MySQL ? I can do that easily from the $sort command line option but nothing seems to work in MySQL
It may work if you split the string into pieces and order by each relevant piece.
SELECT data
FROM example
ORDER BY
CAST(SUBSTRING_INDEX(data, '.', 1) AS BINARY) ASC,
CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(data , '.', 2), '.', -1) AS BINARY) ASC,
CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(data , '.', -1), '.', 1) AS BINARY) ASC;
Can't say I support doing something like that in MySQL, but I guess it would get you where you need to be, at least with my test data. Just remember you'll need to edit the number if you change the number of elements in the string.
Try ordering by INET_ATON (for MySQL 3.23.15 and newer)
ORDER BY INET_ATON(some_field);
PS. It works for IP addresses, don't know how it handle letters
Was there something wrong with ORDER BY?
I tried:
CREATE TABLE example (data VARCHAR(30));
INSERT INTO example VALUES ('4.2.6'), ('1.1.5'), ('2.1.10'), ('3.1b'), ('2.1'), ('4.7.12'), ('1.1'), ('2.1.23'), ('1.1.4'), ('3.1a'), ('4.1.5');
SELECT * FROM example ORDER BY data;
... and it seemed to work as you'd like. (I can't guarantee that there isn't some corner case where your real data might not ordered by what you'd consider "natural." That seems to be a heuristic term rather than a precisely defined term of art.