Parse text using substring in mysql - mysql

I want to parse a text using substring. The format we have for the text is like this:
N, Adele, A, 18
And the substring we do is like this:
SUBSTRING_INDEX(SUBSTRING_INDEX(text, ',', 2), ', ', -1) as 'Name',
SUBSTRING_INDEX(SUBSTRING_INDEX(text, ',', 4), ', ', -1) as 'Age',
The output we get is:
| Name | Age |
| Adele | 18 |
But we want to change the text format to:
N Adele, A 18
What would be the correct syntax so can I parse the text in the position 1 (N Adele) and use the delimiter space and just get Adele? And then same for the next text (A 18)?
I tried doing
SUBSTRING_INDEX(SUBSTRING_INDEX(text, ' ', 1), ', ', -1) as 'Name',
But the output I got is just
| Name |
| N |
The output I was hoping for is like this:
| Name |
| Adele |

Presuming here that you want to change your original data structure and still be able to get the results out. You change your data structure to:
N Adele, A 18 -- etc
With the potential to have multiple names as the name (space separated), my previous example is not correct.
You could trim off the N and A directly with their space, knowing that they will only ever be two characters long and that they will always be there, like this:
SUBSTRING(TRIM(SUBSTRING_INDEX(`text`, ',', 1)), 3) AS 'Name',
SUBSTRING(TRIM(SUBSTRING_INDEX(`text`, ',', -1)), 3) AS 'Age'
To get:
Name | Age
--------------------
Adele | 18

You can use
SELECT
SUBSTRING(text, 2, INSTR(text, ',') - INSTR(text, ' ')) AS Name,
SUBSTRING(text, INSTR(text, ',') + 3, LENGTH(text) - INSTR(text, ',') + 3) AS Age
FROM your_table;
as the position of the field descriptors (N and A) are fixed (relative to the start of the string and to the comma). You can check the working query in this fiddle.

Related

Splitting string based on only a specific delimiter

I'm trying to split a field (at some delimiter ';') and insert the results into a table.
The maximum is 5 substrings delimited by ';' . There will only be a maximum of 5 fruits.
Given only the fruit column, how can I split the string to get the separate fruits. If there are lesser fruits than 5, remaining columns will return NA.
fruits
fruit1
fruit2
fruit3
fruit4
fruit5
apple; orange; banana
apple
orange
banana
-null-
-null-
apple; orange; pine-apple; dragon-fruit; banana
apple
orange
pine-apple
dragon-fruit
banana
pear/grape ; orange; banana; strawberry
pear/grape
orange
banana
strawberry
-null-
apple; blueberry; kiwi/lemon
apple
blueberry
kiwi/lemon
-null-
-null-
I 1st created new columns and set it all to null.
I have tried the following code but it does not work, if there are less fruits than columns, the remaining columns will just take the values of the last fruit instead of null.
SELECT
fruits,
SUBSTRING_INDEX(fruits, ';', 1) AS 'fruit1',
CASE
WHEN LOCATE(';', fruits, LENGTH(fruit1)+1) = 0 THEN NULL
ELSE SUBSTRING_INDEX(SUBSTRING_INDEX(fruits, ';', 2), ';', -1)
END AS 'fruit2',
CASE
WHEN LOCATE(';', fruits, LENGTH(fruit1)+LENGTH(fruit2)+1) = 0 THEN NULL
WHEN LOCATE(';', fruits, (LOCATE(';', fruits, LENGTH(fruit1)) + 2)) = 0 THEN NULL
ELSE SUBSTRING_INDEX(SUBSTRING_INDEX(fruits, ';', 3), ';', -1)
END AS 'fruit3',
CASE
WHEN LOCATE(';', fruits, LENGTH(fruit1) + LENGTH(fruit2) + LENGTH(fruit3) + 3) = 0 THEN NULL
WHEN LOCATE(';', fruits, (LOCATE(';', fruits, LENGTH(fruit1) + LENGTH(fruit2) + LENGTH(fruit3)+2) + 1)) = 0 THEN NULL
ELSE SUBSTRING_INDEX(SUBSTRING_INDEX(fruits, ';', 4), ';', -1)
END AS 'fruit4'
FROM TABLENAME;
Is there any more information to split the string?
In MySQL 5.7 and 8.0, JSON functions are now supported. You could do some string-manipulation to turn this:
apple; orange; banana
into this:
["apple", "orange", "banana"]
Then use JSON functions to extract a specific array element by position.
mysql> set #s = 'apple; orange; banana';
mysql> select cast(concat('["', replace(#s, '; ', '","'), '"]') as json) as array;
+-------------------------------+
| array |
+-------------------------------+
| ["apple", "orange", "banana"] |
+-------------------------------+
mysql> select json_unquote(json_extract(
cast(concat('["', replace(#s, '; ', '","'), '"]') as json),
'$[1]')) as element;
+---------+
| element |
+---------+
| orange |
+---------+
Then you can extract '$[2]' or '$[3]' or any other element. You could use the ->> shortcut for extract-and-unquote.
SELECT
fruits,
fruits->>'$[0]' AS `fruit1`,
fruits->>'$[1]' AS `fruit2`,
fruits->>'$[2]' AS `fruit3`,
fruits->>'$[3]' AS `fruit4`
FROM (
SELECT CAST(CONCAT('["', REPLACE(fruits, '; ', '","'), '"]')) AS fruits
FROM TABLENAME
) AS f;
You might consider storing the list as a JSON columns, instead of your current semicolon-separated string format.

How To Substring and Add a Range

I have an URL that needs to be shortened. I have 2 formats of the URL, first one is /item/10/0100-, it stops at first -, the second one is /item/12/0100-CAK, it needs 3 more characters after the -.
Below is the example,
/item/10/0100-NAU1X010-10-A032 need to be /item/10/0100-
/item/2/0888-ADBACS11101-2-A048 need to be /item/2/0888-
/item/12/0100-CAK101827812018101-12-A034 need to be /item/12/0100-CAK
/item/3/0110-MSS0016-T03-3-A034 need to be /item/3/0110-MSS
I already try this query
CASE
WHEN Page LIKE "/item/10%" OR Page LIKE "/item/2/%" THEN CONCAT(SUBSTRING_INDEX(SUBSTR(Page, LOCATE('/', Page)+1), '-', 1), "-")
WHEN Page LIKE "/item/12%" OR Page LIKE "/item/3/%" THEN SUBSTRING_INDEX(SUBSTR(Page, LOCATE('/', Page)+1), '-', 1) + 4
ELSE Page
END
But it doesn't give me the right result. It seems simple but I really can't get over it. Please help me with this problem, thank you.
Use string functions in the CASE expression like this:
SELECT
page,
CASE
WHEN Page LIKE '/item/10%' OR Page LIKE '/item/2/%' THEN
CONCAT(SUBSTRING_INDEX(Page, '-', 1), '-')
WHEN Page LIKE '/item/12%' OR Page LIKE '/item/3/%' THEN
CONCAT(SUBSTRING_INDEX(Page, '-', 1), SUBSTR(Page, LOCATE('-', Page), 4))
ELSE Page
END short_Page
FROM tablename
See the demo.
Results:
> page | short_Page
> :--------------------------------------- | :----------------
> /item/10/0100-NAU1X010-10-A032 | /item/10/0100-
> /item/2/0888-ADBACS11101-2-A048 | /item/2/0888-
> /item/12/0100-CAK101827812018101-12-A034 | /item/12/0100-CAK
> /item/3/0110-MSS0016-T03-3-A034 | /item/3/0110-MSS
SELECT Path,
SUBSTRING(Path FROM 1 FOR LOCATE('-', Path) + 3 * (#format = 2)) AS shortened
FROM test;
The format is differentiated by the number after /item/
SELECT Path,
SUBSTRING(Path FROM 1 FOR LOCATE('-', Path) + 3 * (SUBSTRING_INDEX(SUBSTRING_INDEX(Path, '/', 3), '/', -1) IN (12, 3))) AS shortened,
(SUBSTRING_INDEX(SUBSTRING_INDEX(Path, '/', 3), '/', -1) IN (12, 3)) + 1 used_format
FROM test;
Adjust the values list for format 2.
fiddle

Apostrophe at beginning/end of search string not treated as part of a word by RegEx

We run a dictionary and have run into a problem with searches that contain an apostrophe at the start of a search string. In English words like 'twas are quite rare but in the language we're dealing with, ' is considered a word character and extremely common at the start of a phrase (for instance 's) and also at the end of words (for instance a').
Oddly enough, RegEx searches don't seem to struggle with this if it's in the middle (for example air a' bhòrd gets all the desired results) but ' at beginning or end of a search string is not treated as part of a word by RegEx.
We've ascertained this is part of the RegEx specification (only alphanumeric characters and _ are treated as part of a word) but we're wondering if it is it possible to write a RegEx expression that also treats apostrophes as part of a word?
This is what we're currently getting:
-- Demonstration on MySQL 5.6.21 Community
Select ('cat''s' REGEXP CONCAT('[[:<:]]', 'cat''s', '[[:>:]]'));
-- returns 1
Select ('''cat''s' REGEXP CONCAT('[[:<:]]' ,'''cat''s' ,'[[:>:]]' ));
-- returns 0
Select ('_cat''s' REGEXP CONCAT('[[:<:]]' ,'_cat''s' ,'[[:>:]]' ));
-- returns 1
Select ('-cat''s' REGEXP CONCAT('[[:<:]]' ,'-cat''s' ,'[[:>:]]' ));
-- returns 0
Select (' cat''s' REGEXP CONCAT('[[:<:]]' ,' cat''s' ,'[[:>:]]' ));
-- returns 0
Select ('cat''' REGEXP CONCAT('[[:<:]]' ,'cat''' ,'[[:>:]]' ));
-- returns 0
Any suggestions greatly welcomed :)
I think that you should provide your own definition of what a word character is, instead of relying on default ICE word boundaries ([[:<:]], [[:>:]]). From the mysql 5.6 documentation :
A word is a sequence of word characters that is not preceded by or followed by word characters. A word character is an alphanumeric character in the alnum class or an underscore (_).
That would mean : '^|[^[:alnum:]_]'
^ -- the beginning of the string
| -- OR
[^ -- any character OTHER than
[:alnum:] -- an alphanumeric character
_ -- an underscore
]
And ICE end of string would be : '[^[:alnum:]_]|$', where $ represents the end of string.
You could just modify this to add the single quote in the character class, like :
beginning : '^|[^[:alnum:]_'']'
end : '[^[:alnum:]_'']|$'
Here is your regex :
SELECT (val REGEXP CONCAT('(^|[^[:alnum:]_''])', 'cat''s', '([^[:alnum:]_'']|$)'));
See the demo on dbfiddle
Schema (MySQL v5.6)
Query #1
Select ('cat''s'
REGEXP CONCAT('(^|[^[:alnum:]_''])', 'cat''s', '([^[:alnum:]_'']|$)')) res;
| res |
| --- |
| 1 |
Query #2
Select ('''cat''s'
REGEXP CONCAT('(^|[^[:alnum:]_''])', '''cat''s', '([^[:alnum:]_'']|$)' )) res;
| res |
| --- |
| 1 |
Query #3
Select ('_cat''s'
REGEXP CONCAT('(^|[^[:alnum:]_''])', '_cat''s' , '([^[:alnum:]_'']|$)' )) res;
| res |
| --- |
| 1 |
Query #4
Select ('-cat''s'
REGEXP CONCAT('(^|[^[:alnum:]_''])', '-cat''s' , '([^[:alnum:]_'']|$)' )) res;
| res |
| --- |
| 1 |
Query #5
Select (' cat''s'
REGEXP CONCAT('(^|[^[:alnum:]_''])', ' cat''s' , '([^[:alnum:]_'']|$)' )) res;
| res |
| --- |
| 1 |
Query #6
Select ('cat'''
REGEXP CONCAT('(^|[^[:alnum:]_''])', 'cat''' , '([^[:alnum:]_'']|$)' )) res;
| res |
| --- |
| 1 |

MYSQL query average price

I have to calculate the average price of a house in Groningen.
Though the price is not stored as an number but as a string (with some additional information) and it uses a point ('.') as a thousands separator.
Price is stored as 'Vraagprijs' in Dutch.
The table results are:
€ 95.000 k.k.
€ 116.500 v.o.n.
€ 115.000 v.o.n.
and goes so on...
My query:
'$'SELECT AVG(SUBSTRING(value,8,8)) AS AveragePrice_Groningen
FROM properties
WHERE name = 'vraagprijs'
AND EXISTS (SELECT *
FROM estate
WHERE pc_wp LIKE '%Groningen%'
AND properties.woid = estate.id);
The result is:
209.47509187620884
But it has to be:
20947509187620,884
How can i get this done?
The AVG(SUBSTRING(value,8,8)) dosent work:
sample
MariaDB [yourSchema]> SELECT *,SUBSTRING(`value`,8,8), SUBSTRING_INDEX(SUBSTRING_INDEX(`value`, ' ', -2),' ',1) FROM properties;
+----+-----------------------+------------------------+----------------------------------------------------------+
| id | value | SUBSTRING(`value`,8,8) | SUBSTRING_INDEX(SUBSTRING_INDEX(`value`, ' ', -2),' ',1) |
+----+-----------------------+------------------------+----------------------------------------------------------+
| 1 | € 95.000 k.k. | 95.000 k | 95.000 |
| 2 | € 116.500 v.o.n. | 116.500 | 116.500 |
| 3 | € 115.000 v.o.n. | 115.000 | 115.000 |
+----+-----------------------+------------------------+----------------------------------------------------------+
3 rows in set (0.00 sec)
MariaDB [yourSchema]>
**change it to **
AVG(SUBSTRING_INDEX(SUBSTRING_INDEX(`value`, ' ', -2),' ',1))
Try using a CAST DECIMAL and SPLIT for get the right part of the string
'$'
SELECT AVG( CAST(SPLIT_STR(value,' ', 2)) AS DECIMAL) AS AveragePrice_Groningen
FROM properties
WHERE name = 'vraagprijs'
AND EXISTS (SELECT *
FROM estate
WHERE pc_wp LIKE '%Groningen%'
AND properties.woid = estate.id);
You entered the data with the . as decimal separator, which is normal in Dutch, but not normal in English where they tend to use the , as decimal separator.
Enter the data into you database as 215000.000, etc and you should get normal values as answer.

split characters and numbers in MySQL

I have a column in my table like this,
students
--------
abc23
def1
xyz567
......
and so on. Now i need output like only names
Need output as
students
--------
abc
def
xyz
How can i get this in mysql. Thanks advance.
You can do it with string functions ans some CAST() magic:
SELECT
SUBSTR(
name,
1,
CHAR_LENGTH(#name) - CHAR_LENGTH(
IF(
#c:=CAST(REVERSE(name) AS UNSIGNED),
#c,
''
)
)
)
FROM
students
for example:
SET #name:='abc12345';
mysql> SELECT SUBSTR(#name, 1, CHAR_LENGTH(#name) - CHAR_LENGTH(IF(#c:=CAST(REVERSE(#name) AS UNSIGNED), #c, ''))) AS name;
+------+
| name |
+------+
| abc |
+------+