What is the exact duplicate function of MySQL SUBSTRING_INDEX() in Snowflake??
I found SPLIT_PART() in Snowflake but this is not the exact same of SUBSTRING_INDEX().
E.g
SUBSTRING_INDEX("www.abc.com", ".", 2); returns www.abc
all the left side substring after 2nd delimiter '.'
but
SPLIT_PART("www.abc.com", ".", 2); return abc
it splits 1st then only returns the split part of a string.
How can I use SUBSTRING_INDEX() in the same way as MySQL in Snowflake
Similar effect could be achieved using ARRAY operations:
SELECT s.c, ARRAY_TO_STRING(ARRAY_SLICE(STRTOK_TO_ARRAY(s.c, '.'), 0, 2), '.')
FROM (VALUES ('www.abc.com')) AS s(c);
How does it works?
STRTOK_TO_ARRAY - make an array from string
ARRAY_SLICE - take the parts from 0 to n
ARRAY_TO_STRING - convert array back to string using '.' as delimeter
In steps:
SELECT
s.c,
STRTOK_TO_ARRAY(s.c, '.') AS arr,
ARRAY_SLICE(arr, 0, 2) AS slice,
ARRAY_TO_STRING(slice, '.') AS result
FROM (VALUES ('www.abc.com')) AS s(c);
You may use REGEXP_SUBSTR here:
SELECT REGEXP_SUBSTR('www.abc.com', '^[^.]+\.[^.]+');
Here is a demo showing that the regex pattern works as expected.
The substring_index function in MySQL returns the entire string if the substring isn't found or if the supplied occurrence is greater than the maximum occurrence. Assuming you want to preserve that behavior and that you'd also find it helpful to be able to extract non-contiguous parts of string, consider this approach.
with cte as (select 'www.abc.com' as txt)
select a.txt, listagg(b.value,'.') within group (order by b.index)
from cte a, lateral split_to_table(a.txt, '.') b
where b.index <=2 --you can also do for e.g. b.index in (1,3) to get 'www.com'
group by a.txt;
Related
I have a column that consists of details of an orderline named 'ConcatValue'. An example of a value in this column is:
573856014/100/M00558640/OrderQty12
I want to extract the order value which can be founded after 'OrderQty'. I thought I had a solution by executing the following statement: substr(ConcatValue,char_length(ConcatValue)-1,char_length(ConcatValue))
This results in only level the last 2 characters of the string from the column ConcatValue. For the ConcatValue mentioned above I will get the following result: '12'. Which is the desired result.
But when the orderline has an Order quantity below 10, for example in the following ConcatValue:573856014/100/M00558640/OrderQty3
I will get the following result: y3
My question: Is there a way to delete 'y' if a row has an y within the value? Or is there a way to replace the y with a 0? Or is there a way to only select the last digits from the ConcatValue string?
Use string functions.
With substring_index() you can get the last part of the string and with replace() remove 'OrderQty':
select replace(
substring_index(ConcatValue, '/', -1),
'OrderQty',
''
)
from tablename
Actually, the simplest method is simply:
select substring_index(ConcatValue, 'OrderQty', -1)
I have a table with a varchar column that represents a path. I want to search for rows that have a path that follow a pattern like name.name[*] where name can be anything. I am looking for repeated strings contained anywhere in the path column that are separated by a period and have a square bracket after them.
This seems to call for Regexp, so through python I have something like https://regex101.com/r/apS20a/4
However, trying to implement this with MySQL Regexp is not working. I have been able to translate the shorthand into REGEXP '([A-Za-z_]+).(\1[[0-9]+])', but it seems that MySql Regex does not support capture groups. Is there a way to accomplish what I am trying to do with mysql regexp? Thank you
I don't think that MySQL supports capture groups. But if you only have one example of .name[ in the string between the first . and the first [, you can hack your way around it. This is not a general solution, just a specific approach in this case.
You can get the name with:
select substring_index(substring_index(url, '[', 1), '.', -1) as name
And then incorporate this into a regular expression:
select t.*
from (select t.*,
substring_index(substring_index(url, '[', 1), '.', -1) as name
from t
) t
where url like concat('%', name, '.', name, '[%');
This just uses like instead of regexp, because [ and . are regular expression wildcards. Of course, this assumes that name does not have _ or %.
EDIT:
Here is a method that actually identifies when this occurs -- and works even if there are multiple patterns.
The idea is to construct the regular expression based on what happens between the . and [ -- and then to apply it. Delightfully self-referential:
select t.*,
(url regexp regex)
from (select t.*,
substr(regexp_replace(url, '[^.]*[.]([^\\[]*)\\[[^.]*', '|$1[.]$1\\\\['), 2) as regex
from (select 'abcde.de[12345.345[ABC' as url union all
select 'abcdefdef[[[[..123.124['
) t
) t;
Here is the above in a db<>fiddle.
I need to find the first and second "_" and extract whatever is between.
example data
doc_856_abc_123
doc_876_xyz_999
So far I have the following substring query. But I need help
select SUBSTRING_INDEX( column, '_', 2 )
It is outputting
doc_856
doc_867
How do I combine the above query to maybe another substring go get the desired results. Which would be.
856
867
Just apply SUBSTRING_INDEX again on the resulted string
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(column, '_', 2 ), '_', -1)
I have strings which have a JSON-like format, including:
..."id":"500", ..., "id":"600", ...
I need to parse the second id out of the column. I found lots of answers using substring_index, however, I need to get the string after the 2nd (of potentially n) occurrences and not the string before to parse out the ID.
Is there a nice solution?
To find the substring of a column "some_column" occurring after the nth
occurrence of a target string,
SELECT
SUBSTRING(some_column, CHAR_LENGTH(SUBSTRING_INDEX(some_column, <target_string>, <n>)) + <length of target string + 1>)
FROM some_table
-- or if you want to limit the length of your returned substring...
SELECT
SUBSTRING(some_column, CHAR_LENGTH(SUBSTRING_INDEX(some_column, <target_string>, <n>)) + <length of target string + 1>, <desired length>)
FROM some_table
For this question, the form would be:
SELECT
SUBSTRING(col, CHAR_LENGTH(SUBSTRING_INDEX(col, '"id":"', 2)) + 7)
FROM `table`
For now I have:
SELECT substring_index(
substr(col, locate('"id":"', col, locate('"id":"', col) + 6) + 6),
'"',
1)
FROM table
Would love to see a "nicer" answer :-)
In Snowflake, this can be done as follows:
select
, split_part([field_name], '{separator}', {n-counter})
from
[table]
Note: {separator} and {n-counter} are inputs provided by the user. Snowflake requires apostrophes around {separator}.
I'd like to extract the number between NUMBER and ;. So far I can extract the data up to the number, but I don't want anything after the number. e.g.,
SELECT
SUBSTRING(field, LOCATE('NUMBER=', rrule) + 7)
FROM table
Data field:
DATA:PASS=X12;NUMBER=331;FIELD=1
DATA:PASS=X12;NUMBER=2;FOO=BAR;FIELD=1
Desired Output:
331
2
You can use a combination of SUBSTRING_INDEX functions:
SELECT
SUBSTRING_INDEX(
SUBSTRING_INDEX(field, 'NUMBER=', -1),
';',
1)
FROM
tablename
Please see an example fiddle here.
The inner SUBSTRING_INDEX will return everything after the NUMBER= string, while the second will return everything before the ; returned by the inner function.