Extracting matches of REGEXP in SQL [duplicate] - mysql

I would like to have a mysql query like this:
select <second word in text> word, count(*) from table group by word;
All the regex examples in mysql are used to query if the text matches the expression, but not to extract text out of an expression. Is there such a syntax?

The following is a proposed solution for the OP's specific problem (extracting the 2nd word of a string), but it should be noted that, as mc0e's answer states, actually extracting regex matches is not supported out-of-the-box in MySQL. If you really need this, then your choices are basically to 1) do it in post-processing on the client, or 2) install a MySQL extension to support it.
BenWells has it very almost correct. Working from his code, here's a slightly adjusted version:
SUBSTRING(
sentence,
LOCATE(' ', sentence) + CHAR_LENGTH(' '),
LOCATE(' ', sentence,
( LOCATE(' ', sentence) + 1 ) - ( LOCATE(' ', sentence) + CHAR_LENGTH(' ') )
)
As a working example, I used:
SELECT SUBSTRING(
sentence,
LOCATE(' ', sentence) + CHAR_LENGTH(' '),
LOCATE(' ', sentence,
( LOCATE(' ', sentence) + 1 ) - ( LOCATE(' ', sentence) + CHAR_LENGTH(' ') )
) as string
FROM (SELECT 'THIS IS A TEST' AS sentence) temp
This successfully extracts the word IS

Shorter option to extract the second word in a sentence:
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX('THIS IS A TEST', ' ', 2), ' ', -1) as FoundText
MySQL docs for SUBSTRING_INDEX

According to http://dev.mysql.com/ the SUBSTRING function uses start position then the length so surely the function for the second word would be:
SUBSTRING(sentence,LOCATE(' ',sentence),(LOCATE(' ',LOCATE(' ',sentence))-LOCATE(' ',sentence)))

No, there isn't a syntax for extracting text using regular expressions. You have to use the ordinary string manipulation functions.
Alternatively select the entire value from the database (or the first n characters if you are worried about too much data transfer) and then use a regular expression on the client.

As others have said, mysql does not provide regex tools for extracting sub-strings. That's not to say you can't have them though if you're prepared to extend mysql with user-defined functions:
https://github.com/mysqludf/lib_mysqludf_preg
That may not be much help if you want to distribute your software, being an impediment to installing your software, but for an in-house solution it may be appropriate.

I used Brendan Bullen's answer as a starting point for a similar issue I had which was to retrive the value of a specific field in a JSON string. However, like I commented on his answer, it is not entirely accurate. If your left boundary isn't just a space like in the original question, then the discrepancy increases.
Corrected solution:
SUBSTRING(
sentence,
LOCATE(' ', sentence) + 1,
LOCATE(' ', sentence, (LOCATE(' ', sentence) + 1)) - LOCATE(' ', sentence) - 1
)
The two differences are the +1 in the SUBSTRING index parameter and the -1 in the length parameter.
For a more general solution to "find the first occurence of a string between two provided boundaries":
SUBSTRING(
haystack,
LOCATE('<leftBoundary>', haystack) + CHAR_LENGTH('<leftBoundary>'),
LOCATE(
'<rightBoundary>',
haystack,
LOCATE('<leftBoundary>', haystack) + CHAR_LENGTH('<leftBoundary>')
)
- (LOCATE('<leftBoundary>', haystack) + CHAR_LENGTH('<leftBoundary>'))
)

I don't think such a thing is possible. You can use SUBSTRING function to extract the part you want.

My home-grown regular expression replace function can be used for this.
Demo
See this DB-Fiddle demo, which returns the second word ("I") from a famous sonnet and the number of occurrences of it (1).
SQL
Assuming MySQL 8 or later is being used (to allow use of a Common Table Expression), the following will return the second word and the number of occurrences of it:
WITH cte AS (
SELECT digits.idx,
SUBSTRING_INDEX(SUBSTRING_INDEX(words, '~', digits.idx + 1), '~', -1) word
FROM
(SELECT reg_replace(UPPER(txt),
'[^''’a-zA-Z-]+',
'~',
TRUE,
1,
0) AS words
FROM tbl) delimited
INNER JOIN
(SELECT #row := #row + 1 as idx FROM
(SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) t1,
(SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) t2,
(SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) t3,
(SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) t4,
(SELECT #row := -1) t5) digits
ON LENGTH(REPLACE(words, '~' , '')) <= LENGTH(words) - digits.idx)
SELECT c.word,
subq.occurrences
FROM cte c
LEFT JOIN (
SELECT word,
COUNT(*) AS occurrences
FROM cte
GROUP BY word
) subq
ON c.word = subq.word
WHERE idx = 1; /* idx is zero-based so 1 here gets the second word */
Explanation
A few tricks are used in the SQL above and some accreditation is needed. Firstly the regular expression replacer is used to replace all continuous blocks of non-word characters - each being replaced by a single tilda (~) character. Note: A different character could be chosen instead if there is any possibility of a tilda appearing in the text.
The technique from this answer is then used for transforming a string with delimited values into separate row values. It's combined with the clever technique from this answer for generating a table consisting of a sequence of incrementing numbers: 0 - 10,000 in this case.

The field's value is:
"- DE-HEB 20% - DTopTen 1.2%"
SELECT ....
SUBSTRING_INDEX(SUBSTRING_INDEX(DesctosAplicados, 'DE-HEB ', -1), '-', 1) DE-HEB ,
SUBSTRING_INDEX(SUBSTRING_INDEX(DesctosAplicados, 'DTopTen ', -1), '-', 1) DTopTen ,
FROM TABLA
Result is:
DE-HEB DTopTEn
20% 1.2%

Related

JOIN multiple unions using a loop

I'm facing the following issue:
I have a JSON array such as this:
[{"from":"09:25","to":"14:00"},{"from":"15:05","to":"16:10"},{"from":"17:40","to":"17:50"},{"from":"19:00","to":"19:10"},{"from":"19:30","to":"19:50"}]
And I want to have a MySQL query that returns a row for each of the intervals, containing the 'from' and 'to' as columns.
So far I have tried this:
SELECT
idx,
REPLACE(JSON_EXTRACT(JSON_EXTRACT(json, CONCAT('$[', idx, ']')), CONCAT('$.from')), '"', '') AS 'from',
REPLACE(JSON_EXTRACT(JSON_EXTRACT(json, CONCAT('$[', idx, ']')), CONCAT('$.to')), '"', '') AS 'to',
json
FROM test.json
JOIN (
SELECT 0 AS idx UNION
SELECT 1 AS idx UNION
SELECT 2 AS idx UNION
SELECT 3 AS idx UNION
SELECT 4
) AS indexes
And it does work. I get the following result:
this is the desired output. The problem is the set number of SELECTS in the join.
The issue is that I have to do this:
SELECT 0 AS idx UNION
SELECT 1 AS idx UNION
SELECT 2 AS idx UNION
SELECT 3 AS idx UNION
SELECT 4
to insert as many 'idx' as there are items in the JSON array. Is there any way to create do this with a loop? The count of the items will be stored in a separate column, 'howmany' in the table that contains the JSON.
This is how the table I extract data from looks like:
I've tried iterating with a while:
declare counter int unsigned default 0;
SELECT
idx,
REPLACE(JSON_EXTRACT(JSON_EXTRACT(json, CONCAT('$[', idx, ']')), CONCAT('$.from')), '"', '') AS 'from',
REPLACE(JSON_EXTRACT(JSON_EXTRACT(json, CONCAT('$[', idx, ']')), CONCAT('$.to')), '"', '') AS 'to',
json
FROM test.json
JOIN (
(while counter < howmany do
SELECT counter AS idx UNION
set counter=counter+1;
end WHILE)
) AS indexes
and it fails. I am 100% certain that the way I tried is not the way to do it, but I am out of ideas.
Edit: I think it's worth mentioning that we cannot use JSON_TABLE as our MariaDB version is a slightly earlier one than when JSON_TABLE was introduced.
Edit2: I'm using Apache XAMPP's MySQL server.
Database client version: libmysql - mysqlnd 5.0.12-dev - 20150407 - $Id: 7cc7cc96e675f6d72e5cf0f267f48e167c2abb23 $
MariaDB version 10.3.32
WITH RECURSIVE
cte AS (
SELECT id,
jsonvalue,
0 num,
JSON_UNQUOTE(JSON_EXTRACT(jsonvalue, CONCAT('$[', 0, '].from'))) `from`,
JSON_UNQUOTE(JSON_EXTRACT(jsonvalue, CONCAT('$[', 0, '].to'))) `to`
FROM test
UNION ALL
SELECT id,
jsonvalue,
1 + num,
JSON_UNQUOTE(JSON_EXTRACT(jsonvalue, CONCAT('$[', 1 + num, '].from'))),
JSON_UNQUOTE(JSON_EXTRACT(jsonvalue, CONCAT('$[', 1 + num, '].to')))
FROM cte
WHERE JSON_EXTRACT(jsonvalue, CONCAT('$[', 1 + num, '].from')) IS NOT NULL
)
SELECT id,
`from`,
`to`
FROM cte;
DEMO

Split values in SQL after a specific character [duplicate]

This question already has answers here:
SQL split values to multiple rows
(12 answers)
Closed 2 years ago.
I have a table with one column :
Val A
Val B
Val C,Val B,Val D
Val A,Val F,Val A
My question how can i split the values after a specific character in this case "," so that i can have only one per row like this :
Val A
Val B
Val C
Val B
Val D
Val A
Val F
Val A
I don't if it's important but i'm using MySql Workbench.
Thanks in advance.
You can use substring_index(). One method is:
select substring_index(col, ';', 1)
from t
union all
select substring_index(substring_index(col, ';', 2), ';', -1)
from t
where col like '%;%'
union all
select substring_index(substring_index(col, ';', 3), ';', -1)
from t
where col like '%;%;%';
You need to add a separate subquery up to the maximum number of elements in any row.
EDIT:
I don't really like the answers in the duplicate. I would recommend a recursive CTE:
with recursive cte as (
select col as part, concat(col, ',') as rest, 0 as lev
from t
union all
select substring_index(rest, ',', 1),
substr(rest, instr(rest, ',') + 1),
lev + 1
from cte
where rest <> '' and lev < 5
)
select part
from cte
where lev > 0;
Here is a db<>fiddle.

How to fix ORDER BY with item ids?

I have a table containing item id's
some examples are:
1
1:3
2:1
2:2
3
3:1
12:2
21:2
I want them to be sorted in the way listed ^
The MYSQL sorts them in following order:
1
1:3
12:2
2:1
2:2
21:2
3
3:1
Anyone has any idea how to fix that problem?
Using SUBSTRING_INDEX() it is possible:
SELECT *
FROM TestTable
ORDER BY CAST(SUBSTRING_INDEX(ColumnVal, ':', 1) AS UNSIGNED),
CAST(SUBSTRING_INDEX(ColumnVal, ':', 2) AS UNSIGNED)
Demo on db<>fiddle
In another way using POSITION()
SELECT *
FROM TestTable
ORDER BY CAST(SUBSTRING_INDEX(ColumnVal, ':', 2) AS UNSIGNED),
POSITION(":" IN ColumnVal),
SUBSTRING(ColumnVal, POSITION(":" IN ColumnVal) + 1, LENGTH(ColumnVal))
Demo on db<>fiddle
SELECT _table.*
# , RPAD(SUBSTRING_INDEX(_table._col, ':', 1), 3, 0)
FROM
(
SELECT
CAST('1' AS CHAR) AS _col
UNION
SELECT
'1:3'
UNION
SELECT
'2:1'
UNION
SELECT
'2:2'
UNION
SELECT
'3'
UNION
SELECT
'3:1'
UNION
SELECT
'12:2'
UNION
SELECT
'21:2') _table
ORDER BY RPAD(SUBSTRING_INDEX(_table._col, ':', 1), 3, 0),
RPAD(SUBSTRING_INDEX(_table._col, ':', 2), 5, 0)
;
You may use ABS() or CAST() if that satisfy you as the following:
SELECT * FROM table ORDER BY ABS(column);
SELECT * FROM table ORDER BY CAST(column as DECIMAL);

How to split the string in a single column and arrange the same in the same column in SQL

I have a column as below
Products
jeans,oil
jeans,shampoo
I want to split the strings and use it in the same column using SQL. The result I want is
Products count
jeans 2
oil 1
shampoo 1
Could you please guide me in getting this result
Thank you
You are storing CSV data in your SQL table, which is not a good thing. But it looks like you are trying to move away from that, which is a good thing. Here is one option using a union with SUBSTRING_INDEX:
SELECT Products, COUNT(*) AS count
FROM
(
SELECT SUBSTRING_INDEX(Products, ',', 1) AS Products FROM yourTable
UNION ALL
SELECT SUBSTRING_INDEX(Products, ',', -1) FROM yourTable
) t
GROUP BY Products
ORDER BY
count DESC, Products;
Demo
Firstly you need to split the data into two columns like
SELECT CASE
WHEN name LIKE '%,%' THEN LEFT(name, Charindex(' ', products) - 1)
ELSE name
END,
CASE
WHEN name LIKE '%,%' THEN RIGHT(name, Charindex(' ', Reverse(products)) - 1)
END
FROM YourTable
then you need to union this with the same table... and the final code will look like...
select count( distinct abc), abc from
(
SELECT CASE
WHEN PA_NAME LIKE '% %' THEN LEFT(PA_NAME, Charindex(' ', PA_NAME) - 1)
ELSE PA_NAME
END [abc]
FROM phparty
union all
SELECT CASE
WHEN PA_NAME LIKE '% %' THEN RIGHT(PA_NAME, Charindex(' ', Reverse(PA_NAME)) -1)
END [abc]
FROM phparty
) t group by abc
here you can replace pa_name with your_column_name

How to replace non-numiric characters from a string MySQL

I need a better way to replace a non-numeric characters in a string.
I have phone numbers like so
(888) 488-6655
888-555-8888
blah blah blah
So I am able to return a clean string by using a simple replace function but I am looking for a better way may be using expression function to replace any non-numeric value. like space slash, backslash, quote..... any none numeric value
this is my current query
SELECT
a.account_id,
REPLACE(REPLACE(REPLACE(REPLACE(t.phone_number, '-', ''), ' ', ''), ')', ''),'(','') AS contact_number,
IFNULL(t.ext, '') AS extention,
CASE WHEN EXISTS (SELECT number_id FROM contact_numbers WHERE main_number = 1 AND account_id = a.account_id) THEN 0 ELSE 1 END AS main_number,
'2' AS created_by
FROM cvsnumbers t
INNER JOIN accounts a ON a.company_code = t.company_code
WHERE REPLACE(REPLACE(REPLACE(REPLACE(t.phone_number, '-', ''), ' ', ''), ')', ''),'(','') NOT IN(SELECT contact_number FROM contact_numbers WHERE account_id = a.account_id)
AND LENGTH(REPLACE(REPLACE(REPLACE(REPLACE(t.phone_number, '-', ''), ' ', ''), ')', ''),'(','') ) = 10
How can I change my query to use an REGEX to replace non-numeric values.
Thanks
This is a brute force approach.
The idea is to create a numbers table, which will index each digit in the phone number. Keep the digit if it is a number and then group them together. Here is how it would work:
select t.phone_number,
group_concat(SUBSTRING(t.phone_number, n.n, 1) separator '' order by n
) as NumbersOnly
from cvsnumbers t cross join
(select 1 as n union all select 2 union all select 3
) n
where SUBSTRING(t.phone_number, n.n, 1) between '0' and '9'
group by t.phone_number;
This example only looks at the first 3 digits in the number. You would expand the subquery for n to the maximum length of a phone number.
I don't know the mySql regex flavour but I would give this a go:
REPLACE(t.phone_number, '[^\d]+', '')
[^\d]+ means: 'Match everything that is not a digit, once or more times'
You might need to escape the backslash ([^\\d]+).