Mysql split string containing different number of words [duplicate] - mysql

I have to create a report on some student completions. The students each belong to one client. Here are the tables (simplified for this question).
CREATE TABLE `clients` (
`clientId` int(10) unsigned NOT NULL auto_increment,
`clientName` varchar(100) NOT NULL default '',
`courseNames` varchar(255) NOT NULL default ''
)
The courseNames field holds a comma-delimited string of course names, eg "AB01,AB02,AB03"
CREATE TABLE `clientenrols` (
`clientEnrolId` int(10) unsigned NOT NULL auto_increment,
`studentId` int(10) unsigned NOT NULL default '0',
`courseId` tinyint(3) unsigned NOT NULL default '0'
)
The courseId field here is the index of the course name in the clients.courseNames field. So, if the client's courseNames are "AB01,AB02,AB03", and the courseId of the enrolment is 2, then the student is in AB03.
Is there a way that I can do a single select on these tables that includes the course name? Keep in mind that there will be students from different clients (and hence have different course names, not all of which are sequential,eg: "NW01,NW03")
Basically, if I could split that field and return a single element from the resulting array, that would be what I'm looking for. Here's what I mean in magical pseudocode:
SELECT e.`studentId`, SPLIT(",", c.`courseNames`)[e.`courseId`]
FROM ...

Until now, I wanted to keep those comma separated lists in my SQL db - well aware of all warnings!
I kept thinking that they have benefits over lookup tables (which provide a way to a normalized data base). After some days of refusing, I've seen the light:
Using lookup tables is NOT causing more code than those ugly string operations when using comma separated values in one field.
The lookup table allows for native number formats and is thus NOT bigger than those csv fields. It is SMALLER though.
The involved string operations are slim in high level language code (SQL and PHP), but expensive compared to using arrays of integers.
Databases are not meant to be human readable, and it is mostly stupid to try to stick to structures due to their readability / direct editability, as I did.
In short, there is a reason why there is no native SPLIT() function in MySQL.

MySQL's only string-splitting function is SUBSTRING_INDEX(str, delim, count). You can use this, to, for example:
Return the item before the first separator in a string:
mysql> SELECT SUBSTRING_INDEX('foo#bar#baz#qux', '#', 1);
+--------------------------------------------+
| SUBSTRING_INDEX('foo#bar#baz#qux', '#', 1) |
+--------------------------------------------+
| foo |
+--------------------------------------------+
1 row in set (0.00 sec)
Return the item after the last separator in a string:
mysql> SELECT SUBSTRING_INDEX('foo#bar#baz#qux', '#', -1);
+---------------------------------------------+
| SUBSTRING_INDEX('foo#bar#baz#qux', '#', -1) |
+---------------------------------------------+
| qux |
+---------------------------------------------+
1 row in set (0.00 sec)
Return everything before the third separator in a string:
mysql> SELECT SUBSTRING_INDEX('foo#bar#baz#qux', '#', 3);
+--------------------------------------------+
| SUBSTRING_INDEX('foo#bar#baz#qux', '#', 3) |
+--------------------------------------------+
| foo#bar#baz |
+--------------------------------------------+
1 row in set (0.00 sec)
Return the second item in a string, by chaining two calls:
mysql> SELECT SUBSTRING_INDEX(SUBSTRING_INDEX('foo#bar#baz#qux', '#', 2), '#', -1);
+----------------------------------------------------------------------+
| SUBSTRING_INDEX(SUBSTRING_INDEX('foo#bar#baz#qux', '#', 2), '#', -1) |
+----------------------------------------------------------------------+
| bar |
+----------------------------------------------------------------------+
1 row in set (0.00 sec)
In general, a simple way to get the nth element of a #-separated string (assuming that you know it definitely has at least n elements) is to do:
SUBSTRING_INDEX(SUBSTRING_INDEX(your_string, '#', n), '#', -1);
The inner SUBSTRING_INDEX call discards the nth separator and everything after it, and then the outer SUBSTRING_INDEX call discards everything except the final element that remains.
If you want a more robust solution that returns NULL if you ask for an element that doesn't exist (for instance, asking for the 5th element of 'a#b#c#d'), then you can count the delimiters using REPLACE and then conditionally return NULL using IF():
IF(
LENGTH(your_string) - LENGTH(REPLACE(your_string, '#', '')) / LENGTH('#') < n - 1,
NULL,
SUBSTRING_INDEX(SUBSTRING_INDEX(your_string, '#', n), '#', -1)
)
Of course, this is pretty ugly and hard to understand! So you might want to wrap it in a function:
CREATE FUNCTION split(string TEXT, delimiter TEXT, n INT)
RETURNS TEXT DETERMINISTIC
RETURN IF(
(LENGTH(string) - LENGTH(REPLACE(string, delimiter, ''))) / LENGTH(delimiter) < n - 1,
NULL,
SUBSTRING_INDEX(SUBSTRING_INDEX(string, delimiter, n), delimiter, -1)
);
You can then use the function like this:
mysql> SELECT SPLIT('foo,bar,baz,qux', ',', 3);
+----------------------------------+
| SPLIT('foo,bar,baz,qux', ',', 3) |
+----------------------------------+
| baz |
+----------------------------------+
1 row in set (0.00 sec)
mysql> SELECT SPLIT('foo,bar,baz,qux', ',', 5);
+----------------------------------+
| SPLIT('foo,bar,baz,qux', ',', 5) |
+----------------------------------+
| NULL |
+----------------------------------+
1 row in set (0.00 sec)
mysql> SELECT SPLIT('foo###bar###baz###qux', '###', 2);
+------------------------------------------+
| SPLIT('foo###bar###baz###qux', '###', 2) |
+------------------------------------------+
| bar |
+------------------------------------------+
1 row in set (0.00 sec)

Seeing that it's a fairly popular question - the answer is YES.
For a column column in table table containing all of your coma separated values:
CREATE TEMPORARY TABLE temp (val CHAR(255));
SET #S1 = CONCAT("INSERT INTO temp (val) VALUES ('",REPLACE((SELECT GROUP_CONCAT( DISTINCT `column`) AS data FROM `table`), ",", "'),('"),"');");
PREPARE stmt1 FROM #s1;
EXECUTE stmt1;
SELECT DISTINCT(val) FROM temp;
Please remember however to not store CSV in your DB
Per #Mark Amery - as this translates coma separated values into an INSERT statement, be careful when running it on unsanitised data
Just to reiterate, please don't store CSV in your DB; this function is meant to translate CSV into sensible DB structure and not to be used anywhere in your code. If you have to use it in production, please rethink your DB structure

You can create a function for this:
/**
* Split a string by string (Similar to the php function explode())
*
* #param VARCHAR(12) delim The boundary string (delimiter).
* #param VARCHAR(255) str The input string.
* #param INT pos The index of the string to return
* #return VARCHAR(255) The (pos)th substring
* #return VARCHAR(255) Returns the [pos]th string created by splitting the str parameter on boundaries formed by the delimiter.
* #{#example
* SELECT SPLIT_STRING('|', 'one|two|three|four', 1);
* This query
* }
*/
DROP FUNCTION IF EXISTS SPLIT_STRING;
CREATE FUNCTION SPLIT_STRING(delim VARCHAR(12), str VARCHAR(255), pos INT)
RETURNS VARCHAR(255) DETERMINISTIC
RETURN
REPLACE(
SUBSTRING(
SUBSTRING_INDEX(str, delim, pos),
LENGTH(SUBSTRING_INDEX(str, delim, pos-1)) + 1
),
delim, ''
);
Converting the magical pseudocode to use this, you would have:
SELECT e.`studentId`, SPLIT_STRING(',', c.`courseNames`, e.`courseId`)
FROM...

Based on Alex answer above (https://stackoverflow.com/a/11022431/1466341) I came up with even better solution. Solution which doesn't contain exact one record ID.
Assuming that the comma separated list is in table data.list, and it contains listing of codes from other table classification.code, you can do something like:
SELECT
d.id, d.list, c.code
FROM
classification c
JOIN data d
ON d.list REGEXP CONCAT('[[:<:]]', c.code, '[[:>:]]');
So if you have tables and data like this:
CLASSIFICATION (code varchar(4) unique): ('A'), ('B'), ('C'), ('D')
MY_DATA (id int, list varchar(255)): (100, 'C,A,B'), (150, 'B,A,D'), (200,'B')
above SELECT will return
(100, 'C,A,B', 'A'),
(100, 'C,A,B', 'B'),
(100, 'C,A,B', 'C'),
(150, 'B,A,D', 'A'),
(150, 'B,A,D', 'B'),
(150, 'B,A,D', 'D'),
(200, 'B', 'B'),

Building on Alwin Kesler's solution, here's a bit of a more practical real world example.
Assuming that the comma separated list is in my_table.list, and it's a listing of ID's for my_other_table.id, you can do something like:
SELECT
*
FROM
my_other_table
WHERE
(SELECT list FROM my_table WHERE id = '1234') REGEXP CONCAT(',?', my_other_table.id, ',?');

I've resolved this kind of problem with a regular expression pattern. They tend to be slower than regular queries but it's an easy way to retrieve data in a comma-delimited query column
SELECT *
FROM `TABLE`
WHERE `field` REGEXP ',?[SEARCHED-VALUE],?';
the greedy question mark helps to search at the beggining or the end of the string.
Hope that helps for anyone in the future

It is possible to explode a string in a MySQL SELECT statement.
Firstly generate a series of numbers up to the largest number of delimited values you wish to explode. Either from a table of integers, or by unioning numbers together. The following generates 100 rows giving the values 1 to 100. It can easily be expanded to give larger ranges (add another sub query giving the values 0 to 9 for hundreds - hence giving 0 to 999, etc).
SELECT 1 + units.i + tens.i * 10 AS aNum
FROM (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) units
CROSS JOIN (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) tens
This can be cross joined against your table to give you the values. Note that you use SUBSTRING_INDEX to get the delimited value up to a certain value, and then use SUBSTRING_INDEX to get that value, excluding previous ones.
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(clients.courseNames, ',', sub0.aNum), ',', -1) AS a_course_name
FROM clients
CROSS JOIN
(
SELECT 1 + units.i + tens.i * 10 AS aNum, units.i + tens.i * 10 AS aSubscript
FROM (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) units
CROSS JOIN (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) tens
) sub0
As you can see there is a slight issue here that the last delimited value is repeated many times. To get rid of this you need to limit the range of numbers based on how many delimiters there are. This can be done by taking the length of the delimited field and comparing it to the length of the delimited field with the delimiters changed to '' (to remove them). From this you can get the number of delimiters:-
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(clients.courseNames, ',', sub0.aNum), ',', -1) AS a_course_name
FROM clients
INNER JOIN
(
SELECT 1 + units.i + tens.i * 10 AS aNum
FROM (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) units
CROSS JOIN (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) tens
) sub0
ON (1 + LENGTH(clients.courseNames) - LENGTH(REPLACE(clients.courseNames, ',', ''))) >= sub0.aNum
In the original example field you could (for example) count the number of students on each course based on this. Note that I have changed the sub query that gets the range of numbers to bring back 2 numbers, 1 is used to determine the course name (as these are based on starting at 1) and the other gets the subscript (as they are based starting at 0).
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(clients.courseNames, ',', sub0.aNum), ',', -1) AS a_course_name, COUNT(clientenrols.studentId)
FROM clients
INNER JOIN
(
SELECT 1 + units.i + tens.i * 10 AS aNum, units.i + tens.i * 10 AS aSubscript
FROM (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) units
CROSS JOIN (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) tens
) sub0
ON (1 + LENGTH(clients.courseNames) - LENGTH(REPLACE(clients.courseNames, ',', ''))) >= sub0.aNum
LEFT OUTER JOIN clientenrols
ON clientenrols.courseId = sub0.aSubscript
GROUP BY a_course_name
As you can see, it is possible but quite messy. And with little opportunity to use indexes it is not going to be efficient. Further the range must cope with the greatest number of delimited values, and works by excluding lots of duplicates; if the max number of delimited values is very large then this will slow things down dramatically. Overall it is generally far better to just properly normalise the database.

There's an easier way, have a link table, i.e.:
Table 1: clients, client info, blah blah blah
Table 2: courses, course info, blah blah
Table 3: clientid, courseid
Then do a JOIN and you're off to the races.

SELECT
tab1.std_name, tab1.stdCode, tab1.payment,
SUBSTRING_INDEX(tab1.payment, '|', 1) as rupees,
SUBSTRING(tab1.payment, LENGTH(SUBSTRING_INDEX(tab1.payment, '|', 1)) + 2,LENGTH(SUBSTRING_INDEX(tab1.payment, '|', 2))) as date
FROM (
SELECT DISTINCT
si.std_name, hfc.stdCode,
if(isnull(hfc.payDate), concat(hfc.coutionMoneyIn,'|', year(hfc.startDtae), '-', monthname(hfc.startDtae)), concat(hfc.payMoney, '|', monthname(hfc.payDate), '-', year(hfc.payDate))) AS payment
FROM hostelfeescollection hfc
INNER JOIN hostelfeecollectmode hfm ON hfc.tranId = hfm.tranId
INNER JOIN student_info_1 si ON si.std_code = hfc.stdCode
WHERE hfc.tranId = 'TRAN-AZZZY69454'
) AS tab1

If you need get table from string with delimiters:
SET #str = 'function1;function2;function3;function4;aaa;bbbb;nnnnn';
SET #delimeter = ';';
SET #sql_statement = CONCAT('SELECT '''
,REPLACE(#str, #delimeter, ''' UNION ALL SELECT ''')
,'''');
SELECT #sql_statement;
SELECT 'function1' UNION ALL SELECT 'function2' UNION ALL SELECT 'function3' UNION ALL SELECT 'function4' UNION ALL SELECT 'aaa' UNION ALL SELECT 'bbbb' UNION ALL SELECT 'nnnnn'

Search in a column containing comma-separated values
MySQL has a dedicated function FIND_IN_SET() that returns field index if the value is found in a string containing comma-separated values.
For example, the following statement returns one-based index of value C in string A,B,C,D.
SELECT FIND_IN_SET('C', 'A,B,C,D') AS result;
+--------+
| result |
+--------+
| 3 |
+--------+
If the given value is not found, FIND_IN_SET() function returns 0.
SELECT FIND_IN_SET('Z', 'A,B,C,D') AS result;
+--------+
| result |
+--------+
| 0 |
+--------+

I used the above logic but modified it slightly.
My input is of format :
"apple:100|pinapple:200|orange:300" stored in a variable #updtAdvanceKeyVal
Here is the function block :
set #res = "";
set #i = 1;
set #updtAdvanceKeyVal = updtAdvanceKeyVal;
REPEAT
-- set r = replace(SUBSTRING(SUBSTRING_INDEX(#updtAdvanceKeyVal, "|", #i),
-- LENGTH(SUBSTRING_INDEX(#updtAdvanceKeyVal, "|", #i -1)) + 1),"|","");
-- wrapping the function in "replace" function as above causes to cut off a character from
-- the 2nd splitted value if the value is more than 3 characters. Writing it in 2 lines causes no such problem and the output is as expected
-- sample output by executing the above function :
-- orange:100
-- pi apple:200 !!!!!!!!strange output!!!!!!!!
-- tomato:500
set #r = SUBSTRING(SUBSTRING_INDEX(#updtAdvanceKeyVal, "|", #i),
LENGTH(SUBSTRING_INDEX(#updtAdvanceKeyVal, "|", #i -1)) + 1);
set #r = replace(#r,"|","");
if #r <> "" then
set #key = SUBSTRING_INDEX(#r, ":",1);
set #val = SUBSTRING_INDEX(#r, ":",-1);
select #key, #val;
end if;
set #i = #i + 1;
until #r = ""
END REPEAT;

I just had a similar issue with a field like that which I solved a different way. My use case was needing to take those ids in a comma separated list for use in a join.
I was able to solve it using a like, but it was made easier because in addition to the comma delimiter the ids were also quoted like so:
keys
"1","2","6","12"
Because of that, I was able to do a LIKE
SELECT twwf.id, jtwi.id joined_id
FROM table_with_weird_field twwf
INNER JOIN join_table_with_ids jtwi
ON twwf.delimited_field LIKE CONCAT("%\"", jtwi.id, "\"%")
This basically just looks to see if the id from the table you're trying to join appears in the set and at that point you can join on it easily enough and return your records. You could also just create a view from something like this.
It worked well for my use case where I was dealing with a Wordpress plugin that managed relations in the way described. The quotes really help though because otherwise you run the risk of partial matches (aka - id 1 within 18, etc).

You can do this with JSON in more recent MySQL versions. It's a blast. We will have a quick preparation to create a numbers table. Then first we create an intermediary table to convert the comma delimited strings into a json array then we will use json_extract to blast them apart. I am encapsulating the strings in quotes carefully escaping existing quotes because I had semicolon separated strings containing commas.
So to create the numbers table, hopefully you have more clients than courses, choose an adequately big table if not.
CREATE TABLE numbers (n int PRIMARY KEY);
INSERT INTO numbers
SELECT #row := #row + 1
FROM clients JOIN (select #row:=0) t2;
Add LIMIT 50 if you know you only have 50 courses. Now, that was easy, wasn't it? Now on to the real work, honestly it's the quotes that make it uglier but at least it's more generic that way:
CREATE TABLE json_coursenames
SELECT clientId,clientName,CONCAT('["', REPLACE(REPLACE(courseName,'"','\\"'), ',', '","'), '"]') AS a
FROM clients;
CREATE TABLE extracted
SELECT clientId,clientName,REPLACE(TRIM(TRIM('"' FROM JSON_EXTRACT(a, concat('$[', n, ']')))), '\\"', '"')
FROM json_coursenames
INNER JOIN numbers ON n < JSON_LENGTH(a);
Wheee!
The meat here are these two: the CONCAT('["', REPLACE(coursename, ',', '","'), '"]') (I dropped the second REPLACE to make it more visible) will convert foo,bar,bar into "foo","bar","baz". The other trick is JSON_EXTRACT(a, concat('$[', n, ']') will become JSON_EXTRACT(a, $[12]) and that's the 13th element in the array, see JSON Path syntax.

SQL:
SELECT
DISTINCT *
FROM
JSON_TABLE(
concat('["',replace('AB01,AB02,AB03',',','","'),'"]'),
'$[*]'
COLUMNS(
Value VARCHAR(50) PATH '$'
)
) data
WHERE Value != '';
RESULT:
| | Value |
| -------- | :-------------- |
| 1 | AB01 |
| 2 | AB02 |
| 3 | AB03 |

Here's how you do it for SQL Server. Someone else can translate it to MySQL. Parsing CSV Values Into Multiple Rows.
SELECT Author,
NullIf(SubString(',' + Phrase + ',' , ID , CharIndex(',' , ',' + Phrase + ',' , ID) - ID) , '') AS Word
FROM Tally, Quotes
WHERE ID <= Len(',' + Phrase + ',') AND SubString(',' + Phrase + ',' , ID - 1, 1) = ','
AND CharIndex(',' , ',' + Phrase + ',' , ID) - ID > 0
The idea is to cross join to a predefined table Tally which contains integer 1 through 8000 (or whatever big enough number) and run SubString to find the right ,word, position.

Here's what I've got so far (found it on the page Ben Alpert mentioned):
SELECT REPLACE(
SUBSTRING(
SUBSTRING_INDEX(c.`courseNames`, ',', e.`courseId` + 1)
, LENGTH(SUBSTRING_INDEX(c.`courseNames`, ',', e.`courseId`)
) + 1)
, ','
, ''
)
FROM `clients` c INNER JOIN `clientenrols` e USING (`clientId`)

Well, nothing I used worked, so I decided creating a real simple split function, hope it helps:
DECLARE inipos INTEGER;
DECLARE endpos INTEGER;
DECLARE maxlen INTEGER;
DECLARE item VARCHAR(100);
DECLARE delim VARCHAR(1);
SET delim = '|';
SET inipos = 1;
SET fullstr = CONCAT(fullstr, delim);
SET maxlen = LENGTH(fullstr);
REPEAT
SET endpos = LOCATE(delim, fullstr, inipos);
SET item = SUBSTR(fullstr, inipos, endpos - inipos);
IF item <> '' AND item IS NOT NULL THEN
USE_THE_ITEM_STRING;
END IF;
SET inipos = endpos + 1;
UNTIL inipos >= maxlen END REPEAT;

Related

Mysql select splitting the comma separated values [duplicate]

I have to create a report on some student completions. The students each belong to one client. Here are the tables (simplified for this question).
CREATE TABLE `clients` (
`clientId` int(10) unsigned NOT NULL auto_increment,
`clientName` varchar(100) NOT NULL default '',
`courseNames` varchar(255) NOT NULL default ''
)
The courseNames field holds a comma-delimited string of course names, eg "AB01,AB02,AB03"
CREATE TABLE `clientenrols` (
`clientEnrolId` int(10) unsigned NOT NULL auto_increment,
`studentId` int(10) unsigned NOT NULL default '0',
`courseId` tinyint(3) unsigned NOT NULL default '0'
)
The courseId field here is the index of the course name in the clients.courseNames field. So, if the client's courseNames are "AB01,AB02,AB03", and the courseId of the enrolment is 2, then the student is in AB03.
Is there a way that I can do a single select on these tables that includes the course name? Keep in mind that there will be students from different clients (and hence have different course names, not all of which are sequential,eg: "NW01,NW03")
Basically, if I could split that field and return a single element from the resulting array, that would be what I'm looking for. Here's what I mean in magical pseudocode:
SELECT e.`studentId`, SPLIT(",", c.`courseNames`)[e.`courseId`]
FROM ...
Until now, I wanted to keep those comma separated lists in my SQL db - well aware of all warnings!
I kept thinking that they have benefits over lookup tables (which provide a way to a normalized data base). After some days of refusing, I've seen the light:
Using lookup tables is NOT causing more code than those ugly string operations when using comma separated values in one field.
The lookup table allows for native number formats and is thus NOT bigger than those csv fields. It is SMALLER though.
The involved string operations are slim in high level language code (SQL and PHP), but expensive compared to using arrays of integers.
Databases are not meant to be human readable, and it is mostly stupid to try to stick to structures due to their readability / direct editability, as I did.
In short, there is a reason why there is no native SPLIT() function in MySQL.
MySQL's only string-splitting function is SUBSTRING_INDEX(str, delim, count). You can use this, to, for example:
Return the item before the first separator in a string:
mysql> SELECT SUBSTRING_INDEX('foo#bar#baz#qux', '#', 1);
+--------------------------------------------+
| SUBSTRING_INDEX('foo#bar#baz#qux', '#', 1) |
+--------------------------------------------+
| foo |
+--------------------------------------------+
1 row in set (0.00 sec)
Return the item after the last separator in a string:
mysql> SELECT SUBSTRING_INDEX('foo#bar#baz#qux', '#', -1);
+---------------------------------------------+
| SUBSTRING_INDEX('foo#bar#baz#qux', '#', -1) |
+---------------------------------------------+
| qux |
+---------------------------------------------+
1 row in set (0.00 sec)
Return everything before the third separator in a string:
mysql> SELECT SUBSTRING_INDEX('foo#bar#baz#qux', '#', 3);
+--------------------------------------------+
| SUBSTRING_INDEX('foo#bar#baz#qux', '#', 3) |
+--------------------------------------------+
| foo#bar#baz |
+--------------------------------------------+
1 row in set (0.00 sec)
Return the second item in a string, by chaining two calls:
mysql> SELECT SUBSTRING_INDEX(SUBSTRING_INDEX('foo#bar#baz#qux', '#', 2), '#', -1);
+----------------------------------------------------------------------+
| SUBSTRING_INDEX(SUBSTRING_INDEX('foo#bar#baz#qux', '#', 2), '#', -1) |
+----------------------------------------------------------------------+
| bar |
+----------------------------------------------------------------------+
1 row in set (0.00 sec)
In general, a simple way to get the nth element of a #-separated string (assuming that you know it definitely has at least n elements) is to do:
SUBSTRING_INDEX(SUBSTRING_INDEX(your_string, '#', n), '#', -1);
The inner SUBSTRING_INDEX call discards the nth separator and everything after it, and then the outer SUBSTRING_INDEX call discards everything except the final element that remains.
If you want a more robust solution that returns NULL if you ask for an element that doesn't exist (for instance, asking for the 5th element of 'a#b#c#d'), then you can count the delimiters using REPLACE and then conditionally return NULL using IF():
IF(
LENGTH(your_string) - LENGTH(REPLACE(your_string, '#', '')) / LENGTH('#') < n - 1,
NULL,
SUBSTRING_INDEX(SUBSTRING_INDEX(your_string, '#', n), '#', -1)
)
Of course, this is pretty ugly and hard to understand! So you might want to wrap it in a function:
CREATE FUNCTION split(string TEXT, delimiter TEXT, n INT)
RETURNS TEXT DETERMINISTIC
RETURN IF(
(LENGTH(string) - LENGTH(REPLACE(string, delimiter, ''))) / LENGTH(delimiter) < n - 1,
NULL,
SUBSTRING_INDEX(SUBSTRING_INDEX(string, delimiter, n), delimiter, -1)
);
You can then use the function like this:
mysql> SELECT SPLIT('foo,bar,baz,qux', ',', 3);
+----------------------------------+
| SPLIT('foo,bar,baz,qux', ',', 3) |
+----------------------------------+
| baz |
+----------------------------------+
1 row in set (0.00 sec)
mysql> SELECT SPLIT('foo,bar,baz,qux', ',', 5);
+----------------------------------+
| SPLIT('foo,bar,baz,qux', ',', 5) |
+----------------------------------+
| NULL |
+----------------------------------+
1 row in set (0.00 sec)
mysql> SELECT SPLIT('foo###bar###baz###qux', '###', 2);
+------------------------------------------+
| SPLIT('foo###bar###baz###qux', '###', 2) |
+------------------------------------------+
| bar |
+------------------------------------------+
1 row in set (0.00 sec)
Seeing that it's a fairly popular question - the answer is YES.
For a column column in table table containing all of your coma separated values:
CREATE TEMPORARY TABLE temp (val CHAR(255));
SET #S1 = CONCAT("INSERT INTO temp (val) VALUES ('",REPLACE((SELECT GROUP_CONCAT( DISTINCT `column`) AS data FROM `table`), ",", "'),('"),"');");
PREPARE stmt1 FROM #s1;
EXECUTE stmt1;
SELECT DISTINCT(val) FROM temp;
Please remember however to not store CSV in your DB
Per #Mark Amery - as this translates coma separated values into an INSERT statement, be careful when running it on unsanitised data
Just to reiterate, please don't store CSV in your DB; this function is meant to translate CSV into sensible DB structure and not to be used anywhere in your code. If you have to use it in production, please rethink your DB structure
You can create a function for this:
/**
* Split a string by string (Similar to the php function explode())
*
* #param VARCHAR(12) delim The boundary string (delimiter).
* #param VARCHAR(255) str The input string.
* #param INT pos The index of the string to return
* #return VARCHAR(255) The (pos)th substring
* #return VARCHAR(255) Returns the [pos]th string created by splitting the str parameter on boundaries formed by the delimiter.
* #{#example
* SELECT SPLIT_STRING('|', 'one|two|three|four', 1);
* This query
* }
*/
DROP FUNCTION IF EXISTS SPLIT_STRING;
CREATE FUNCTION SPLIT_STRING(delim VARCHAR(12), str VARCHAR(255), pos INT)
RETURNS VARCHAR(255) DETERMINISTIC
RETURN
REPLACE(
SUBSTRING(
SUBSTRING_INDEX(str, delim, pos),
LENGTH(SUBSTRING_INDEX(str, delim, pos-1)) + 1
),
delim, ''
);
Converting the magical pseudocode to use this, you would have:
SELECT e.`studentId`, SPLIT_STRING(',', c.`courseNames`, e.`courseId`)
FROM...
Based on Alex answer above (https://stackoverflow.com/a/11022431/1466341) I came up with even better solution. Solution which doesn't contain exact one record ID.
Assuming that the comma separated list is in table data.list, and it contains listing of codes from other table classification.code, you can do something like:
SELECT
d.id, d.list, c.code
FROM
classification c
JOIN data d
ON d.list REGEXP CONCAT('[[:<:]]', c.code, '[[:>:]]');
So if you have tables and data like this:
CLASSIFICATION (code varchar(4) unique): ('A'), ('B'), ('C'), ('D')
MY_DATA (id int, list varchar(255)): (100, 'C,A,B'), (150, 'B,A,D'), (200,'B')
above SELECT will return
(100, 'C,A,B', 'A'),
(100, 'C,A,B', 'B'),
(100, 'C,A,B', 'C'),
(150, 'B,A,D', 'A'),
(150, 'B,A,D', 'B'),
(150, 'B,A,D', 'D'),
(200, 'B', 'B'),
Building on Alwin Kesler's solution, here's a bit of a more practical real world example.
Assuming that the comma separated list is in my_table.list, and it's a listing of ID's for my_other_table.id, you can do something like:
SELECT
*
FROM
my_other_table
WHERE
(SELECT list FROM my_table WHERE id = '1234') REGEXP CONCAT(',?', my_other_table.id, ',?');
I've resolved this kind of problem with a regular expression pattern. They tend to be slower than regular queries but it's an easy way to retrieve data in a comma-delimited query column
SELECT *
FROM `TABLE`
WHERE `field` REGEXP ',?[SEARCHED-VALUE],?';
the greedy question mark helps to search at the beggining or the end of the string.
Hope that helps for anyone in the future
It is possible to explode a string in a MySQL SELECT statement.
Firstly generate a series of numbers up to the largest number of delimited values you wish to explode. Either from a table of integers, or by unioning numbers together. The following generates 100 rows giving the values 1 to 100. It can easily be expanded to give larger ranges (add another sub query giving the values 0 to 9 for hundreds - hence giving 0 to 999, etc).
SELECT 1 + units.i + tens.i * 10 AS aNum
FROM (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) units
CROSS JOIN (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) tens
This can be cross joined against your table to give you the values. Note that you use SUBSTRING_INDEX to get the delimited value up to a certain value, and then use SUBSTRING_INDEX to get that value, excluding previous ones.
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(clients.courseNames, ',', sub0.aNum), ',', -1) AS a_course_name
FROM clients
CROSS JOIN
(
SELECT 1 + units.i + tens.i * 10 AS aNum, units.i + tens.i * 10 AS aSubscript
FROM (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) units
CROSS JOIN (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) tens
) sub0
As you can see there is a slight issue here that the last delimited value is repeated many times. To get rid of this you need to limit the range of numbers based on how many delimiters there are. This can be done by taking the length of the delimited field and comparing it to the length of the delimited field with the delimiters changed to '' (to remove them). From this you can get the number of delimiters:-
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(clients.courseNames, ',', sub0.aNum), ',', -1) AS a_course_name
FROM clients
INNER JOIN
(
SELECT 1 + units.i + tens.i * 10 AS aNum
FROM (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) units
CROSS JOIN (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) tens
) sub0
ON (1 + LENGTH(clients.courseNames) - LENGTH(REPLACE(clients.courseNames, ',', ''))) >= sub0.aNum
In the original example field you could (for example) count the number of students on each course based on this. Note that I have changed the sub query that gets the range of numbers to bring back 2 numbers, 1 is used to determine the course name (as these are based on starting at 1) and the other gets the subscript (as they are based starting at 0).
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(clients.courseNames, ',', sub0.aNum), ',', -1) AS a_course_name, COUNT(clientenrols.studentId)
FROM clients
INNER JOIN
(
SELECT 1 + units.i + tens.i * 10 AS aNum, units.i + tens.i * 10 AS aSubscript
FROM (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) units
CROSS JOIN (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) tens
) sub0
ON (1 + LENGTH(clients.courseNames) - LENGTH(REPLACE(clients.courseNames, ',', ''))) >= sub0.aNum
LEFT OUTER JOIN clientenrols
ON clientenrols.courseId = sub0.aSubscript
GROUP BY a_course_name
As you can see, it is possible but quite messy. And with little opportunity to use indexes it is not going to be efficient. Further the range must cope with the greatest number of delimited values, and works by excluding lots of duplicates; if the max number of delimited values is very large then this will slow things down dramatically. Overall it is generally far better to just properly normalise the database.
There's an easier way, have a link table, i.e.:
Table 1: clients, client info, blah blah blah
Table 2: courses, course info, blah blah
Table 3: clientid, courseid
Then do a JOIN and you're off to the races.
SELECT
tab1.std_name, tab1.stdCode, tab1.payment,
SUBSTRING_INDEX(tab1.payment, '|', 1) as rupees,
SUBSTRING(tab1.payment, LENGTH(SUBSTRING_INDEX(tab1.payment, '|', 1)) + 2,LENGTH(SUBSTRING_INDEX(tab1.payment, '|', 2))) as date
FROM (
SELECT DISTINCT
si.std_name, hfc.stdCode,
if(isnull(hfc.payDate), concat(hfc.coutionMoneyIn,'|', year(hfc.startDtae), '-', monthname(hfc.startDtae)), concat(hfc.payMoney, '|', monthname(hfc.payDate), '-', year(hfc.payDate))) AS payment
FROM hostelfeescollection hfc
INNER JOIN hostelfeecollectmode hfm ON hfc.tranId = hfm.tranId
INNER JOIN student_info_1 si ON si.std_code = hfc.stdCode
WHERE hfc.tranId = 'TRAN-AZZZY69454'
) AS tab1
If you need get table from string with delimiters:
SET #str = 'function1;function2;function3;function4;aaa;bbbb;nnnnn';
SET #delimeter = ';';
SET #sql_statement = CONCAT('SELECT '''
,REPLACE(#str, #delimeter, ''' UNION ALL SELECT ''')
,'''');
SELECT #sql_statement;
SELECT 'function1' UNION ALL SELECT 'function2' UNION ALL SELECT 'function3' UNION ALL SELECT 'function4' UNION ALL SELECT 'aaa' UNION ALL SELECT 'bbbb' UNION ALL SELECT 'nnnnn'
Search in a column containing comma-separated values
MySQL has a dedicated function FIND_IN_SET() that returns field index if the value is found in a string containing comma-separated values.
For example, the following statement returns one-based index of value C in string A,B,C,D.
SELECT FIND_IN_SET('C', 'A,B,C,D') AS result;
+--------+
| result |
+--------+
| 3 |
+--------+
If the given value is not found, FIND_IN_SET() function returns 0.
SELECT FIND_IN_SET('Z', 'A,B,C,D') AS result;
+--------+
| result |
+--------+
| 0 |
+--------+
I used the above logic but modified it slightly.
My input is of format :
"apple:100|pinapple:200|orange:300" stored in a variable #updtAdvanceKeyVal
Here is the function block :
set #res = "";
set #i = 1;
set #updtAdvanceKeyVal = updtAdvanceKeyVal;
REPEAT
-- set r = replace(SUBSTRING(SUBSTRING_INDEX(#updtAdvanceKeyVal, "|", #i),
-- LENGTH(SUBSTRING_INDEX(#updtAdvanceKeyVal, "|", #i -1)) + 1),"|","");
-- wrapping the function in "replace" function as above causes to cut off a character from
-- the 2nd splitted value if the value is more than 3 characters. Writing it in 2 lines causes no such problem and the output is as expected
-- sample output by executing the above function :
-- orange:100
-- pi apple:200 !!!!!!!!strange output!!!!!!!!
-- tomato:500
set #r = SUBSTRING(SUBSTRING_INDEX(#updtAdvanceKeyVal, "|", #i),
LENGTH(SUBSTRING_INDEX(#updtAdvanceKeyVal, "|", #i -1)) + 1);
set #r = replace(#r,"|","");
if #r <> "" then
set #key = SUBSTRING_INDEX(#r, ":",1);
set #val = SUBSTRING_INDEX(#r, ":",-1);
select #key, #val;
end if;
set #i = #i + 1;
until #r = ""
END REPEAT;
I just had a similar issue with a field like that which I solved a different way. My use case was needing to take those ids in a comma separated list for use in a join.
I was able to solve it using a like, but it was made easier because in addition to the comma delimiter the ids were also quoted like so:
keys
"1","2","6","12"
Because of that, I was able to do a LIKE
SELECT twwf.id, jtwi.id joined_id
FROM table_with_weird_field twwf
INNER JOIN join_table_with_ids jtwi
ON twwf.delimited_field LIKE CONCAT("%\"", jtwi.id, "\"%")
This basically just looks to see if the id from the table you're trying to join appears in the set and at that point you can join on it easily enough and return your records. You could also just create a view from something like this.
It worked well for my use case where I was dealing with a Wordpress plugin that managed relations in the way described. The quotes really help though because otherwise you run the risk of partial matches (aka - id 1 within 18, etc).
You can do this with JSON in more recent MySQL versions. It's a blast. We will have a quick preparation to create a numbers table. Then first we create an intermediary table to convert the comma delimited strings into a json array then we will use json_extract to blast them apart. I am encapsulating the strings in quotes carefully escaping existing quotes because I had semicolon separated strings containing commas.
So to create the numbers table, hopefully you have more clients than courses, choose an adequately big table if not.
CREATE TABLE numbers (n int PRIMARY KEY);
INSERT INTO numbers
SELECT #row := #row + 1
FROM clients JOIN (select #row:=0) t2;
Add LIMIT 50 if you know you only have 50 courses. Now, that was easy, wasn't it? Now on to the real work, honestly it's the quotes that make it uglier but at least it's more generic that way:
CREATE TABLE json_coursenames
SELECT clientId,clientName,CONCAT('["', REPLACE(REPLACE(courseName,'"','\\"'), ',', '","'), '"]') AS a
FROM clients;
CREATE TABLE extracted
SELECT clientId,clientName,REPLACE(TRIM(TRIM('"' FROM JSON_EXTRACT(a, concat('$[', n, ']')))), '\\"', '"')
FROM json_coursenames
INNER JOIN numbers ON n < JSON_LENGTH(a);
Wheee!
The meat here are these two: the CONCAT('["', REPLACE(coursename, ',', '","'), '"]') (I dropped the second REPLACE to make it more visible) will convert foo,bar,bar into "foo","bar","baz". The other trick is JSON_EXTRACT(a, concat('$[', n, ']') will become JSON_EXTRACT(a, $[12]) and that's the 13th element in the array, see JSON Path syntax.
SQL:
SELECT
DISTINCT *
FROM
JSON_TABLE(
concat('["',replace('AB01,AB02,AB03',',','","'),'"]'),
'$[*]'
COLUMNS(
Value VARCHAR(50) PATH '$'
)
) data
WHERE Value != '';
RESULT:
| | Value |
| -------- | :-------------- |
| 1 | AB01 |
| 2 | AB02 |
| 3 | AB03 |
Here's how you do it for SQL Server. Someone else can translate it to MySQL. Parsing CSV Values Into Multiple Rows.
SELECT Author,
NullIf(SubString(',' + Phrase + ',' , ID , CharIndex(',' , ',' + Phrase + ',' , ID) - ID) , '') AS Word
FROM Tally, Quotes
WHERE ID <= Len(',' + Phrase + ',') AND SubString(',' + Phrase + ',' , ID - 1, 1) = ','
AND CharIndex(',' , ',' + Phrase + ',' , ID) - ID > 0
The idea is to cross join to a predefined table Tally which contains integer 1 through 8000 (or whatever big enough number) and run SubString to find the right ,word, position.
Here's what I've got so far (found it on the page Ben Alpert mentioned):
SELECT REPLACE(
SUBSTRING(
SUBSTRING_INDEX(c.`courseNames`, ',', e.`courseId` + 1)
, LENGTH(SUBSTRING_INDEX(c.`courseNames`, ',', e.`courseId`)
) + 1)
, ','
, ''
)
FROM `clients` c INNER JOIN `clientenrols` e USING (`clientId`)
Well, nothing I used worked, so I decided creating a real simple split function, hope it helps:
DECLARE inipos INTEGER;
DECLARE endpos INTEGER;
DECLARE maxlen INTEGER;
DECLARE item VARCHAR(100);
DECLARE delim VARCHAR(1);
SET delim = '|';
SET inipos = 1;
SET fullstr = CONCAT(fullstr, delim);
SET maxlen = LENGTH(fullstr);
REPEAT
SET endpos = LOCATE(delim, fullstr, inipos);
SET item = SUBSTR(fullstr, inipos, endpos - inipos);
IF item <> '' AND item IS NOT NULL THEN
USE_THE_ITEM_STRING;
END IF;
SET inipos = endpos + 1;
UNTIL inipos >= maxlen END REPEAT;

How to Explode string in mysql and get each value? [duplicate]

I have to create a report on some student completions. The students each belong to one client. Here are the tables (simplified for this question).
CREATE TABLE `clients` (
`clientId` int(10) unsigned NOT NULL auto_increment,
`clientName` varchar(100) NOT NULL default '',
`courseNames` varchar(255) NOT NULL default ''
)
The courseNames field holds a comma-delimited string of course names, eg "AB01,AB02,AB03"
CREATE TABLE `clientenrols` (
`clientEnrolId` int(10) unsigned NOT NULL auto_increment,
`studentId` int(10) unsigned NOT NULL default '0',
`courseId` tinyint(3) unsigned NOT NULL default '0'
)
The courseId field here is the index of the course name in the clients.courseNames field. So, if the client's courseNames are "AB01,AB02,AB03", and the courseId of the enrolment is 2, then the student is in AB03.
Is there a way that I can do a single select on these tables that includes the course name? Keep in mind that there will be students from different clients (and hence have different course names, not all of which are sequential,eg: "NW01,NW03")
Basically, if I could split that field and return a single element from the resulting array, that would be what I'm looking for. Here's what I mean in magical pseudocode:
SELECT e.`studentId`, SPLIT(",", c.`courseNames`)[e.`courseId`]
FROM ...
Until now, I wanted to keep those comma separated lists in my SQL db - well aware of all warnings!
I kept thinking that they have benefits over lookup tables (which provide a way to a normalized data base). After some days of refusing, I've seen the light:
Using lookup tables is NOT causing more code than those ugly string operations when using comma separated values in one field.
The lookup table allows for native number formats and is thus NOT bigger than those csv fields. It is SMALLER though.
The involved string operations are slim in high level language code (SQL and PHP), but expensive compared to using arrays of integers.
Databases are not meant to be human readable, and it is mostly stupid to try to stick to structures due to their readability / direct editability, as I did.
In short, there is a reason why there is no native SPLIT() function in MySQL.
MySQL's only string-splitting function is SUBSTRING_INDEX(str, delim, count). You can use this, to, for example:
Return the item before the first separator in a string:
mysql> SELECT SUBSTRING_INDEX('foo#bar#baz#qux', '#', 1);
+--------------------------------------------+
| SUBSTRING_INDEX('foo#bar#baz#qux', '#', 1) |
+--------------------------------------------+
| foo |
+--------------------------------------------+
1 row in set (0.00 sec)
Return the item after the last separator in a string:
mysql> SELECT SUBSTRING_INDEX('foo#bar#baz#qux', '#', -1);
+---------------------------------------------+
| SUBSTRING_INDEX('foo#bar#baz#qux', '#', -1) |
+---------------------------------------------+
| qux |
+---------------------------------------------+
1 row in set (0.00 sec)
Return everything before the third separator in a string:
mysql> SELECT SUBSTRING_INDEX('foo#bar#baz#qux', '#', 3);
+--------------------------------------------+
| SUBSTRING_INDEX('foo#bar#baz#qux', '#', 3) |
+--------------------------------------------+
| foo#bar#baz |
+--------------------------------------------+
1 row in set (0.00 sec)
Return the second item in a string, by chaining two calls:
mysql> SELECT SUBSTRING_INDEX(SUBSTRING_INDEX('foo#bar#baz#qux', '#', 2), '#', -1);
+----------------------------------------------------------------------+
| SUBSTRING_INDEX(SUBSTRING_INDEX('foo#bar#baz#qux', '#', 2), '#', -1) |
+----------------------------------------------------------------------+
| bar |
+----------------------------------------------------------------------+
1 row in set (0.00 sec)
In general, a simple way to get the nth element of a #-separated string (assuming that you know it definitely has at least n elements) is to do:
SUBSTRING_INDEX(SUBSTRING_INDEX(your_string, '#', n), '#', -1);
The inner SUBSTRING_INDEX call discards the nth separator and everything after it, and then the outer SUBSTRING_INDEX call discards everything except the final element that remains.
If you want a more robust solution that returns NULL if you ask for an element that doesn't exist (for instance, asking for the 5th element of 'a#b#c#d'), then you can count the delimiters using REPLACE and then conditionally return NULL using IF():
IF(
LENGTH(your_string) - LENGTH(REPLACE(your_string, '#', '')) / LENGTH('#') < n - 1,
NULL,
SUBSTRING_INDEX(SUBSTRING_INDEX(your_string, '#', n), '#', -1)
)
Of course, this is pretty ugly and hard to understand! So you might want to wrap it in a function:
CREATE FUNCTION split(string TEXT, delimiter TEXT, n INT)
RETURNS TEXT DETERMINISTIC
RETURN IF(
(LENGTH(string) - LENGTH(REPLACE(string, delimiter, ''))) / LENGTH(delimiter) < n - 1,
NULL,
SUBSTRING_INDEX(SUBSTRING_INDEX(string, delimiter, n), delimiter, -1)
);
You can then use the function like this:
mysql> SELECT SPLIT('foo,bar,baz,qux', ',', 3);
+----------------------------------+
| SPLIT('foo,bar,baz,qux', ',', 3) |
+----------------------------------+
| baz |
+----------------------------------+
1 row in set (0.00 sec)
mysql> SELECT SPLIT('foo,bar,baz,qux', ',', 5);
+----------------------------------+
| SPLIT('foo,bar,baz,qux', ',', 5) |
+----------------------------------+
| NULL |
+----------------------------------+
1 row in set (0.00 sec)
mysql> SELECT SPLIT('foo###bar###baz###qux', '###', 2);
+------------------------------------------+
| SPLIT('foo###bar###baz###qux', '###', 2) |
+------------------------------------------+
| bar |
+------------------------------------------+
1 row in set (0.00 sec)
Seeing that it's a fairly popular question - the answer is YES.
For a column column in table table containing all of your coma separated values:
CREATE TEMPORARY TABLE temp (val CHAR(255));
SET #S1 = CONCAT("INSERT INTO temp (val) VALUES ('",REPLACE((SELECT GROUP_CONCAT( DISTINCT `column`) AS data FROM `table`), ",", "'),('"),"');");
PREPARE stmt1 FROM #s1;
EXECUTE stmt1;
SELECT DISTINCT(val) FROM temp;
Please remember however to not store CSV in your DB
Per #Mark Amery - as this translates coma separated values into an INSERT statement, be careful when running it on unsanitised data
Just to reiterate, please don't store CSV in your DB; this function is meant to translate CSV into sensible DB structure and not to be used anywhere in your code. If you have to use it in production, please rethink your DB structure
You can create a function for this:
/**
* Split a string by string (Similar to the php function explode())
*
* #param VARCHAR(12) delim The boundary string (delimiter).
* #param VARCHAR(255) str The input string.
* #param INT pos The index of the string to return
* #return VARCHAR(255) The (pos)th substring
* #return VARCHAR(255) Returns the [pos]th string created by splitting the str parameter on boundaries formed by the delimiter.
* #{#example
* SELECT SPLIT_STRING('|', 'one|two|three|four', 1);
* This query
* }
*/
DROP FUNCTION IF EXISTS SPLIT_STRING;
CREATE FUNCTION SPLIT_STRING(delim VARCHAR(12), str VARCHAR(255), pos INT)
RETURNS VARCHAR(255) DETERMINISTIC
RETURN
REPLACE(
SUBSTRING(
SUBSTRING_INDEX(str, delim, pos),
LENGTH(SUBSTRING_INDEX(str, delim, pos-1)) + 1
),
delim, ''
);
Converting the magical pseudocode to use this, you would have:
SELECT e.`studentId`, SPLIT_STRING(',', c.`courseNames`, e.`courseId`)
FROM...
Based on Alex answer above (https://stackoverflow.com/a/11022431/1466341) I came up with even better solution. Solution which doesn't contain exact one record ID.
Assuming that the comma separated list is in table data.list, and it contains listing of codes from other table classification.code, you can do something like:
SELECT
d.id, d.list, c.code
FROM
classification c
JOIN data d
ON d.list REGEXP CONCAT('[[:<:]]', c.code, '[[:>:]]');
So if you have tables and data like this:
CLASSIFICATION (code varchar(4) unique): ('A'), ('B'), ('C'), ('D')
MY_DATA (id int, list varchar(255)): (100, 'C,A,B'), (150, 'B,A,D'), (200,'B')
above SELECT will return
(100, 'C,A,B', 'A'),
(100, 'C,A,B', 'B'),
(100, 'C,A,B', 'C'),
(150, 'B,A,D', 'A'),
(150, 'B,A,D', 'B'),
(150, 'B,A,D', 'D'),
(200, 'B', 'B'),
Building on Alwin Kesler's solution, here's a bit of a more practical real world example.
Assuming that the comma separated list is in my_table.list, and it's a listing of ID's for my_other_table.id, you can do something like:
SELECT
*
FROM
my_other_table
WHERE
(SELECT list FROM my_table WHERE id = '1234') REGEXP CONCAT(',?', my_other_table.id, ',?');
I've resolved this kind of problem with a regular expression pattern. They tend to be slower than regular queries but it's an easy way to retrieve data in a comma-delimited query column
SELECT *
FROM `TABLE`
WHERE `field` REGEXP ',?[SEARCHED-VALUE],?';
the greedy question mark helps to search at the beggining or the end of the string.
Hope that helps for anyone in the future
It is possible to explode a string in a MySQL SELECT statement.
Firstly generate a series of numbers up to the largest number of delimited values you wish to explode. Either from a table of integers, or by unioning numbers together. The following generates 100 rows giving the values 1 to 100. It can easily be expanded to give larger ranges (add another sub query giving the values 0 to 9 for hundreds - hence giving 0 to 999, etc).
SELECT 1 + units.i + tens.i * 10 AS aNum
FROM (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) units
CROSS JOIN (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) tens
This can be cross joined against your table to give you the values. Note that you use SUBSTRING_INDEX to get the delimited value up to a certain value, and then use SUBSTRING_INDEX to get that value, excluding previous ones.
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(clients.courseNames, ',', sub0.aNum), ',', -1) AS a_course_name
FROM clients
CROSS JOIN
(
SELECT 1 + units.i + tens.i * 10 AS aNum, units.i + tens.i * 10 AS aSubscript
FROM (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) units
CROSS JOIN (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) tens
) sub0
As you can see there is a slight issue here that the last delimited value is repeated many times. To get rid of this you need to limit the range of numbers based on how many delimiters there are. This can be done by taking the length of the delimited field and comparing it to the length of the delimited field with the delimiters changed to '' (to remove them). From this you can get the number of delimiters:-
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(clients.courseNames, ',', sub0.aNum), ',', -1) AS a_course_name
FROM clients
INNER JOIN
(
SELECT 1 + units.i + tens.i * 10 AS aNum
FROM (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) units
CROSS JOIN (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) tens
) sub0
ON (1 + LENGTH(clients.courseNames) - LENGTH(REPLACE(clients.courseNames, ',', ''))) >= sub0.aNum
In the original example field you could (for example) count the number of students on each course based on this. Note that I have changed the sub query that gets the range of numbers to bring back 2 numbers, 1 is used to determine the course name (as these are based on starting at 1) and the other gets the subscript (as they are based starting at 0).
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(clients.courseNames, ',', sub0.aNum), ',', -1) AS a_course_name, COUNT(clientenrols.studentId)
FROM clients
INNER JOIN
(
SELECT 1 + units.i + tens.i * 10 AS aNum, units.i + tens.i * 10 AS aSubscript
FROM (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) units
CROSS JOIN (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) tens
) sub0
ON (1 + LENGTH(clients.courseNames) - LENGTH(REPLACE(clients.courseNames, ',', ''))) >= sub0.aNum
LEFT OUTER JOIN clientenrols
ON clientenrols.courseId = sub0.aSubscript
GROUP BY a_course_name
As you can see, it is possible but quite messy. And with little opportunity to use indexes it is not going to be efficient. Further the range must cope with the greatest number of delimited values, and works by excluding lots of duplicates; if the max number of delimited values is very large then this will slow things down dramatically. Overall it is generally far better to just properly normalise the database.
There's an easier way, have a link table, i.e.:
Table 1: clients, client info, blah blah blah
Table 2: courses, course info, blah blah
Table 3: clientid, courseid
Then do a JOIN and you're off to the races.
SELECT
tab1.std_name, tab1.stdCode, tab1.payment,
SUBSTRING_INDEX(tab1.payment, '|', 1) as rupees,
SUBSTRING(tab1.payment, LENGTH(SUBSTRING_INDEX(tab1.payment, '|', 1)) + 2,LENGTH(SUBSTRING_INDEX(tab1.payment, '|', 2))) as date
FROM (
SELECT DISTINCT
si.std_name, hfc.stdCode,
if(isnull(hfc.payDate), concat(hfc.coutionMoneyIn,'|', year(hfc.startDtae), '-', monthname(hfc.startDtae)), concat(hfc.payMoney, '|', monthname(hfc.payDate), '-', year(hfc.payDate))) AS payment
FROM hostelfeescollection hfc
INNER JOIN hostelfeecollectmode hfm ON hfc.tranId = hfm.tranId
INNER JOIN student_info_1 si ON si.std_code = hfc.stdCode
WHERE hfc.tranId = 'TRAN-AZZZY69454'
) AS tab1
If you need get table from string with delimiters:
SET #str = 'function1;function2;function3;function4;aaa;bbbb;nnnnn';
SET #delimeter = ';';
SET #sql_statement = CONCAT('SELECT '''
,REPLACE(#str, #delimeter, ''' UNION ALL SELECT ''')
,'''');
SELECT #sql_statement;
SELECT 'function1' UNION ALL SELECT 'function2' UNION ALL SELECT 'function3' UNION ALL SELECT 'function4' UNION ALL SELECT 'aaa' UNION ALL SELECT 'bbbb' UNION ALL SELECT 'nnnnn'
Search in a column containing comma-separated values
MySQL has a dedicated function FIND_IN_SET() that returns field index if the value is found in a string containing comma-separated values.
For example, the following statement returns one-based index of value C in string A,B,C,D.
SELECT FIND_IN_SET('C', 'A,B,C,D') AS result;
+--------+
| result |
+--------+
| 3 |
+--------+
If the given value is not found, FIND_IN_SET() function returns 0.
SELECT FIND_IN_SET('Z', 'A,B,C,D') AS result;
+--------+
| result |
+--------+
| 0 |
+--------+
I used the above logic but modified it slightly.
My input is of format :
"apple:100|pinapple:200|orange:300" stored in a variable #updtAdvanceKeyVal
Here is the function block :
set #res = "";
set #i = 1;
set #updtAdvanceKeyVal = updtAdvanceKeyVal;
REPEAT
-- set r = replace(SUBSTRING(SUBSTRING_INDEX(#updtAdvanceKeyVal, "|", #i),
-- LENGTH(SUBSTRING_INDEX(#updtAdvanceKeyVal, "|", #i -1)) + 1),"|","");
-- wrapping the function in "replace" function as above causes to cut off a character from
-- the 2nd splitted value if the value is more than 3 characters. Writing it in 2 lines causes no such problem and the output is as expected
-- sample output by executing the above function :
-- orange:100
-- pi apple:200 !!!!!!!!strange output!!!!!!!!
-- tomato:500
set #r = SUBSTRING(SUBSTRING_INDEX(#updtAdvanceKeyVal, "|", #i),
LENGTH(SUBSTRING_INDEX(#updtAdvanceKeyVal, "|", #i -1)) + 1);
set #r = replace(#r,"|","");
if #r <> "" then
set #key = SUBSTRING_INDEX(#r, ":",1);
set #val = SUBSTRING_INDEX(#r, ":",-1);
select #key, #val;
end if;
set #i = #i + 1;
until #r = ""
END REPEAT;
I just had a similar issue with a field like that which I solved a different way. My use case was needing to take those ids in a comma separated list for use in a join.
I was able to solve it using a like, but it was made easier because in addition to the comma delimiter the ids were also quoted like so:
keys
"1","2","6","12"
Because of that, I was able to do a LIKE
SELECT twwf.id, jtwi.id joined_id
FROM table_with_weird_field twwf
INNER JOIN join_table_with_ids jtwi
ON twwf.delimited_field LIKE CONCAT("%\"", jtwi.id, "\"%")
This basically just looks to see if the id from the table you're trying to join appears in the set and at that point you can join on it easily enough and return your records. You could also just create a view from something like this.
It worked well for my use case where I was dealing with a Wordpress plugin that managed relations in the way described. The quotes really help though because otherwise you run the risk of partial matches (aka - id 1 within 18, etc).
You can do this with JSON in more recent MySQL versions. It's a blast. We will have a quick preparation to create a numbers table. Then first we create an intermediary table to convert the comma delimited strings into a json array then we will use json_extract to blast them apart. I am encapsulating the strings in quotes carefully escaping existing quotes because I had semicolon separated strings containing commas.
So to create the numbers table, hopefully you have more clients than courses, choose an adequately big table if not.
CREATE TABLE numbers (n int PRIMARY KEY);
INSERT INTO numbers
SELECT #row := #row + 1
FROM clients JOIN (select #row:=0) t2;
Add LIMIT 50 if you know you only have 50 courses. Now, that was easy, wasn't it? Now on to the real work, honestly it's the quotes that make it uglier but at least it's more generic that way:
CREATE TABLE json_coursenames
SELECT clientId,clientName,CONCAT('["', REPLACE(REPLACE(courseName,'"','\\"'), ',', '","'), '"]') AS a
FROM clients;
CREATE TABLE extracted
SELECT clientId,clientName,REPLACE(TRIM(TRIM('"' FROM JSON_EXTRACT(a, concat('$[', n, ']')))), '\\"', '"')
FROM json_coursenames
INNER JOIN numbers ON n < JSON_LENGTH(a);
Wheee!
The meat here are these two: the CONCAT('["', REPLACE(coursename, ',', '","'), '"]') (I dropped the second REPLACE to make it more visible) will convert foo,bar,bar into "foo","bar","baz". The other trick is JSON_EXTRACT(a, concat('$[', n, ']') will become JSON_EXTRACT(a, $[12]) and that's the 13th element in the array, see JSON Path syntax.
SQL:
SELECT
DISTINCT *
FROM
JSON_TABLE(
concat('["',replace('AB01,AB02,AB03',',','","'),'"]'),
'$[*]'
COLUMNS(
Value VARCHAR(50) PATH '$'
)
) data
WHERE Value != '';
RESULT:
| | Value |
| -------- | :-------------- |
| 1 | AB01 |
| 2 | AB02 |
| 3 | AB03 |
Here's how you do it for SQL Server. Someone else can translate it to MySQL. Parsing CSV Values Into Multiple Rows.
SELECT Author,
NullIf(SubString(',' + Phrase + ',' , ID , CharIndex(',' , ',' + Phrase + ',' , ID) - ID) , '') AS Word
FROM Tally, Quotes
WHERE ID <= Len(',' + Phrase + ',') AND SubString(',' + Phrase + ',' , ID - 1, 1) = ','
AND CharIndex(',' , ',' + Phrase + ',' , ID) - ID > 0
The idea is to cross join to a predefined table Tally which contains integer 1 through 8000 (or whatever big enough number) and run SubString to find the right ,word, position.
Here's what I've got so far (found it on the page Ben Alpert mentioned):
SELECT REPLACE(
SUBSTRING(
SUBSTRING_INDEX(c.`courseNames`, ',', e.`courseId` + 1)
, LENGTH(SUBSTRING_INDEX(c.`courseNames`, ',', e.`courseId`)
) + 1)
, ','
, ''
)
FROM `clients` c INNER JOIN `clientenrols` e USING (`clientId`)
Well, nothing I used worked, so I decided creating a real simple split function, hope it helps:
DECLARE inipos INTEGER;
DECLARE endpos INTEGER;
DECLARE maxlen INTEGER;
DECLARE item VARCHAR(100);
DECLARE delim VARCHAR(1);
SET delim = '|';
SET inipos = 1;
SET fullstr = CONCAT(fullstr, delim);
SET maxlen = LENGTH(fullstr);
REPEAT
SET endpos = LOCATE(delim, fullstr, inipos);
SET item = SUBSTR(fullstr, inipos, endpos - inipos);
IF item <> '' AND item IS NOT NULL THEN
USE_THE_ITEM_STRING;
END IF;
SET inipos = endpos + 1;
UNTIL inipos >= maxlen END REPEAT;

Split comma separated string into rows in mysql

When I have string list like 1, 2, 3...
I'd like to use this as one column
Ids
1
2
3
Is it possible by sql query?
ex) SELECT Ids from (1, 2, 3...) <- I know this is not working.
Use a subquery of arbitrary digits to split your string.Instead of vals you can use '1,2,3'.
SELECT
DISTINCT SUBSTRING_INDEX(SUBSTRING_INDEX(vals, ',', n.digit+1), ',', -1) val
FROM
tt1
INNER JOIN
(SELECT 0 digit UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6) n
ON LENGTH(REPLACE(vals, ',' , '')) <= LENGTH(vals)-n.digit;
See it working
For MySQL 8.0.4+
SELECT *
FROM
JSON_TABLE(
CONCAT('[', '1,2,3,4', ']'),
"$[*]"
COLUMNS(
ids BIGINT(20) PATH "$"
)
) AS tt
Concatenate square brackets ([]) around your string to make it into a JSON array. Then use JSON_TABLE to convert it into a table. See the MySQL JSON Table Functions for more info.
You can use below stored procedure to split string delimted by any character:
CREATE PROCEDURE `split_delimited` (
IN inputstr NVARCHAR(1000),
IN delimiter CHAR(1)
)
BEGIN
DROP TEMPORARY TABLE Items;
CREATE TEMPORARY TABLE Items(item NVARCHAR(50));
WHILE LOCATE(delimiter,inputstr) > 1 DO
INSERT INTO Items SELECT SUBSTRING_INDEX(inputstr,delimiter,1);
SET inputstr = REPLACE (inputstr, (SELECT LEFT(inputstr,LOCATE(delimiter,inputstr))),'');
END WHILE;
INSERT INTO Items(item) VALUES(inputstr);
select * from Items;
END
Input: 'a,b,c'
Output: a
b
c

MYSQL sort words in field

Edit 1 the code is just an example, I do not have suburb data, my real data is inherited and messy and could be fixed by creating code out of the database or creating reference data. The question should have been something like does anyone have a good cell sort function or other solution that can be reused?
Table
CREATE TABLE postcode (
`id` int NOT NULL AUTO_INCREMENT,
`suburbs` varchar(2000) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE = InnoDB
DEFAULT CHARACTER SET = utf8
COLLATE = utf8_unicode_ci;
Insert
INSERT INTO postcode (`suburbs`)
VALUES ( 'Zackville;Astor;Mary-town;Jackson' );
Want field to be sorted an result to be
Astor;Jackson;Mary-town;Zackville
All comments have very valid points and you should try to avoid working with delimited values in RDBMS.
That being said, if you're stuck with the existing database and you for some reason want to do it on the database level rather than with client code you can leverage tally(number) table and SUBSTRING_INDEX() to split delimited values into rows, then GROUP_CONCAT() to pivot data back in the ordered manner.
One of the several ways to create a tally table:
CREATE TABLE tally (n int not null primary key);
INSERT INTO tally (n)
SELECT a.N + b.N * 10 + 1 n
FROM
(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) a
,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) b
ORDER BY n
Number of rows in the tally table should be greater or equal to the maximum possible number of delimited values.
Now to resort delimited values:
UPDATE postcode p JOIN
(
SELECT id, GROUP_CONCAT(suburbs ORDER BY suburbs SEPARATOR ';') suburbs
FROM
(
SELECT t.id, SUBSTRING_INDEX(SUBSTRING_INDEX(t.suburbs, ';', n.n), ';', -1) suburbs
FROM postcode t CROSS JOIN tally n
WHERE n.n <= 1 + (LENGTH(t.suburbs) - LENGTH(REPLACE(t.suburbs, ';', '')))
) q
GROUP BY id
) s
ON p.id = s.id
SET p.suburbs = s.suburbs;
Output:
| ID | SUBURBS |
|----|-----------------------------------|
| 1 | Astor;Jackson;Mary-town;Zackville |
Here is a SQLFiddle demo

Can you split/explode a field in a MySQL query?

I have to create a report on some student completions. The students each belong to one client. Here are the tables (simplified for this question).
CREATE TABLE `clients` (
`clientId` int(10) unsigned NOT NULL auto_increment,
`clientName` varchar(100) NOT NULL default '',
`courseNames` varchar(255) NOT NULL default ''
)
The courseNames field holds a comma-delimited string of course names, eg "AB01,AB02,AB03"
CREATE TABLE `clientenrols` (
`clientEnrolId` int(10) unsigned NOT NULL auto_increment,
`studentId` int(10) unsigned NOT NULL default '0',
`courseId` tinyint(3) unsigned NOT NULL default '0'
)
The courseId field here is the index of the course name in the clients.courseNames field. So, if the client's courseNames are "AB01,AB02,AB03", and the courseId of the enrolment is 2, then the student is in AB03.
Is there a way that I can do a single select on these tables that includes the course name? Keep in mind that there will be students from different clients (and hence have different course names, not all of which are sequential,eg: "NW01,NW03")
Basically, if I could split that field and return a single element from the resulting array, that would be what I'm looking for. Here's what I mean in magical pseudocode:
SELECT e.`studentId`, SPLIT(",", c.`courseNames`)[e.`courseId`]
FROM ...
Until now, I wanted to keep those comma separated lists in my SQL db - well aware of all warnings!
I kept thinking that they have benefits over lookup tables (which provide a way to a normalized data base). After some days of refusing, I've seen the light:
Using lookup tables is NOT causing more code than those ugly string operations when using comma separated values in one field.
The lookup table allows for native number formats and is thus NOT bigger than those csv fields. It is SMALLER though.
The involved string operations are slim in high level language code (SQL and PHP), but expensive compared to using arrays of integers.
Databases are not meant to be human readable, and it is mostly stupid to try to stick to structures due to their readability / direct editability, as I did.
In short, there is a reason why there is no native SPLIT() function in MySQL.
MySQL's only string-splitting function is SUBSTRING_INDEX(str, delim, count). You can use this, to, for example:
Return the item before the first separator in a string:
mysql> SELECT SUBSTRING_INDEX('foo#bar#baz#qux', '#', 1);
+--------------------------------------------+
| SUBSTRING_INDEX('foo#bar#baz#qux', '#', 1) |
+--------------------------------------------+
| foo |
+--------------------------------------------+
1 row in set (0.00 sec)
Return the item after the last separator in a string:
mysql> SELECT SUBSTRING_INDEX('foo#bar#baz#qux', '#', -1);
+---------------------------------------------+
| SUBSTRING_INDEX('foo#bar#baz#qux', '#', -1) |
+---------------------------------------------+
| qux |
+---------------------------------------------+
1 row in set (0.00 sec)
Return everything before the third separator in a string:
mysql> SELECT SUBSTRING_INDEX('foo#bar#baz#qux', '#', 3);
+--------------------------------------------+
| SUBSTRING_INDEX('foo#bar#baz#qux', '#', 3) |
+--------------------------------------------+
| foo#bar#baz |
+--------------------------------------------+
1 row in set (0.00 sec)
Return the second item in a string, by chaining two calls:
mysql> SELECT SUBSTRING_INDEX(SUBSTRING_INDEX('foo#bar#baz#qux', '#', 2), '#', -1);
+----------------------------------------------------------------------+
| SUBSTRING_INDEX(SUBSTRING_INDEX('foo#bar#baz#qux', '#', 2), '#', -1) |
+----------------------------------------------------------------------+
| bar |
+----------------------------------------------------------------------+
1 row in set (0.00 sec)
In general, a simple way to get the nth element of a #-separated string (assuming that you know it definitely has at least n elements) is to do:
SUBSTRING_INDEX(SUBSTRING_INDEX(your_string, '#', n), '#', -1);
The inner SUBSTRING_INDEX call discards the nth separator and everything after it, and then the outer SUBSTRING_INDEX call discards everything except the final element that remains.
If you want a more robust solution that returns NULL if you ask for an element that doesn't exist (for instance, asking for the 5th element of 'a#b#c#d'), then you can count the delimiters using REPLACE and then conditionally return NULL using IF():
IF(
LENGTH(your_string) - LENGTH(REPLACE(your_string, '#', '')) / LENGTH('#') < n - 1,
NULL,
SUBSTRING_INDEX(SUBSTRING_INDEX(your_string, '#', n), '#', -1)
)
Of course, this is pretty ugly and hard to understand! So you might want to wrap it in a function:
CREATE FUNCTION split(string TEXT, delimiter TEXT, n INT)
RETURNS TEXT DETERMINISTIC
RETURN IF(
(LENGTH(string) - LENGTH(REPLACE(string, delimiter, ''))) / LENGTH(delimiter) < n - 1,
NULL,
SUBSTRING_INDEX(SUBSTRING_INDEX(string, delimiter, n), delimiter, -1)
);
You can then use the function like this:
mysql> SELECT SPLIT('foo,bar,baz,qux', ',', 3);
+----------------------------------+
| SPLIT('foo,bar,baz,qux', ',', 3) |
+----------------------------------+
| baz |
+----------------------------------+
1 row in set (0.00 sec)
mysql> SELECT SPLIT('foo,bar,baz,qux', ',', 5);
+----------------------------------+
| SPLIT('foo,bar,baz,qux', ',', 5) |
+----------------------------------+
| NULL |
+----------------------------------+
1 row in set (0.00 sec)
mysql> SELECT SPLIT('foo###bar###baz###qux', '###', 2);
+------------------------------------------+
| SPLIT('foo###bar###baz###qux', '###', 2) |
+------------------------------------------+
| bar |
+------------------------------------------+
1 row in set (0.00 sec)
Seeing that it's a fairly popular question - the answer is YES.
For a column column in table table containing all of your coma separated values:
CREATE TEMPORARY TABLE temp (val CHAR(255));
SET #S1 = CONCAT("INSERT INTO temp (val) VALUES ('",REPLACE((SELECT GROUP_CONCAT( DISTINCT `column`) AS data FROM `table`), ",", "'),('"),"');");
PREPARE stmt1 FROM #s1;
EXECUTE stmt1;
SELECT DISTINCT(val) FROM temp;
Please remember however to not store CSV in your DB
Per #Mark Amery - as this translates coma separated values into an INSERT statement, be careful when running it on unsanitised data
Just to reiterate, please don't store CSV in your DB; this function is meant to translate CSV into sensible DB structure and not to be used anywhere in your code. If you have to use it in production, please rethink your DB structure
You can create a function for this:
/**
* Split a string by string (Similar to the php function explode())
*
* #param VARCHAR(12) delim The boundary string (delimiter).
* #param VARCHAR(255) str The input string.
* #param INT pos The index of the string to return
* #return VARCHAR(255) The (pos)th substring
* #return VARCHAR(255) Returns the [pos]th string created by splitting the str parameter on boundaries formed by the delimiter.
* #{#example
* SELECT SPLIT_STRING('|', 'one|two|three|four', 1);
* This query
* }
*/
DROP FUNCTION IF EXISTS SPLIT_STRING;
CREATE FUNCTION SPLIT_STRING(delim VARCHAR(12), str VARCHAR(255), pos INT)
RETURNS VARCHAR(255) DETERMINISTIC
RETURN
REPLACE(
SUBSTRING(
SUBSTRING_INDEX(str, delim, pos),
LENGTH(SUBSTRING_INDEX(str, delim, pos-1)) + 1
),
delim, ''
);
Converting the magical pseudocode to use this, you would have:
SELECT e.`studentId`, SPLIT_STRING(',', c.`courseNames`, e.`courseId`)
FROM...
Based on Alex answer above (https://stackoverflow.com/a/11022431/1466341) I came up with even better solution. Solution which doesn't contain exact one record ID.
Assuming that the comma separated list is in table data.list, and it contains listing of codes from other table classification.code, you can do something like:
SELECT
d.id, d.list, c.code
FROM
classification c
JOIN data d
ON d.list REGEXP CONCAT('[[:<:]]', c.code, '[[:>:]]');
So if you have tables and data like this:
CLASSIFICATION (code varchar(4) unique): ('A'), ('B'), ('C'), ('D')
MY_DATA (id int, list varchar(255)): (100, 'C,A,B'), (150, 'B,A,D'), (200,'B')
above SELECT will return
(100, 'C,A,B', 'A'),
(100, 'C,A,B', 'B'),
(100, 'C,A,B', 'C'),
(150, 'B,A,D', 'A'),
(150, 'B,A,D', 'B'),
(150, 'B,A,D', 'D'),
(200, 'B', 'B'),
Building on Alwin Kesler's solution, here's a bit of a more practical real world example.
Assuming that the comma separated list is in my_table.list, and it's a listing of ID's for my_other_table.id, you can do something like:
SELECT
*
FROM
my_other_table
WHERE
(SELECT list FROM my_table WHERE id = '1234') REGEXP CONCAT(',?', my_other_table.id, ',?');
I've resolved this kind of problem with a regular expression pattern. They tend to be slower than regular queries but it's an easy way to retrieve data in a comma-delimited query column
SELECT *
FROM `TABLE`
WHERE `field` REGEXP ',?[SEARCHED-VALUE],?';
the greedy question mark helps to search at the beggining or the end of the string.
Hope that helps for anyone in the future
It is possible to explode a string in a MySQL SELECT statement.
Firstly generate a series of numbers up to the largest number of delimited values you wish to explode. Either from a table of integers, or by unioning numbers together. The following generates 100 rows giving the values 1 to 100. It can easily be expanded to give larger ranges (add another sub query giving the values 0 to 9 for hundreds - hence giving 0 to 999, etc).
SELECT 1 + units.i + tens.i * 10 AS aNum
FROM (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) units
CROSS JOIN (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) tens
This can be cross joined against your table to give you the values. Note that you use SUBSTRING_INDEX to get the delimited value up to a certain value, and then use SUBSTRING_INDEX to get that value, excluding previous ones.
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(clients.courseNames, ',', sub0.aNum), ',', -1) AS a_course_name
FROM clients
CROSS JOIN
(
SELECT 1 + units.i + tens.i * 10 AS aNum, units.i + tens.i * 10 AS aSubscript
FROM (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) units
CROSS JOIN (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) tens
) sub0
As you can see there is a slight issue here that the last delimited value is repeated many times. To get rid of this you need to limit the range of numbers based on how many delimiters there are. This can be done by taking the length of the delimited field and comparing it to the length of the delimited field with the delimiters changed to '' (to remove them). From this you can get the number of delimiters:-
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(clients.courseNames, ',', sub0.aNum), ',', -1) AS a_course_name
FROM clients
INNER JOIN
(
SELECT 1 + units.i + tens.i * 10 AS aNum
FROM (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) units
CROSS JOIN (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) tens
) sub0
ON (1 + LENGTH(clients.courseNames) - LENGTH(REPLACE(clients.courseNames, ',', ''))) >= sub0.aNum
In the original example field you could (for example) count the number of students on each course based on this. Note that I have changed the sub query that gets the range of numbers to bring back 2 numbers, 1 is used to determine the course name (as these are based on starting at 1) and the other gets the subscript (as they are based starting at 0).
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(clients.courseNames, ',', sub0.aNum), ',', -1) AS a_course_name, COUNT(clientenrols.studentId)
FROM clients
INNER JOIN
(
SELECT 1 + units.i + tens.i * 10 AS aNum, units.i + tens.i * 10 AS aSubscript
FROM (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) units
CROSS JOIN (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) tens
) sub0
ON (1 + LENGTH(clients.courseNames) - LENGTH(REPLACE(clients.courseNames, ',', ''))) >= sub0.aNum
LEFT OUTER JOIN clientenrols
ON clientenrols.courseId = sub0.aSubscript
GROUP BY a_course_name
As you can see, it is possible but quite messy. And with little opportunity to use indexes it is not going to be efficient. Further the range must cope with the greatest number of delimited values, and works by excluding lots of duplicates; if the max number of delimited values is very large then this will slow things down dramatically. Overall it is generally far better to just properly normalise the database.
There's an easier way, have a link table, i.e.:
Table 1: clients, client info, blah blah blah
Table 2: courses, course info, blah blah
Table 3: clientid, courseid
Then do a JOIN and you're off to the races.
SELECT
tab1.std_name, tab1.stdCode, tab1.payment,
SUBSTRING_INDEX(tab1.payment, '|', 1) as rupees,
SUBSTRING(tab1.payment, LENGTH(SUBSTRING_INDEX(tab1.payment, '|', 1)) + 2,LENGTH(SUBSTRING_INDEX(tab1.payment, '|', 2))) as date
FROM (
SELECT DISTINCT
si.std_name, hfc.stdCode,
if(isnull(hfc.payDate), concat(hfc.coutionMoneyIn,'|', year(hfc.startDtae), '-', monthname(hfc.startDtae)), concat(hfc.payMoney, '|', monthname(hfc.payDate), '-', year(hfc.payDate))) AS payment
FROM hostelfeescollection hfc
INNER JOIN hostelfeecollectmode hfm ON hfc.tranId = hfm.tranId
INNER JOIN student_info_1 si ON si.std_code = hfc.stdCode
WHERE hfc.tranId = 'TRAN-AZZZY69454'
) AS tab1
If you need get table from string with delimiters:
SET #str = 'function1;function2;function3;function4;aaa;bbbb;nnnnn';
SET #delimeter = ';';
SET #sql_statement = CONCAT('SELECT '''
,REPLACE(#str, #delimeter, ''' UNION ALL SELECT ''')
,'''');
SELECT #sql_statement;
SELECT 'function1' UNION ALL SELECT 'function2' UNION ALL SELECT 'function3' UNION ALL SELECT 'function4' UNION ALL SELECT 'aaa' UNION ALL SELECT 'bbbb' UNION ALL SELECT 'nnnnn'
Search in a column containing comma-separated values
MySQL has a dedicated function FIND_IN_SET() that returns field index if the value is found in a string containing comma-separated values.
For example, the following statement returns one-based index of value C in string A,B,C,D.
SELECT FIND_IN_SET('C', 'A,B,C,D') AS result;
+--------+
| result |
+--------+
| 3 |
+--------+
If the given value is not found, FIND_IN_SET() function returns 0.
SELECT FIND_IN_SET('Z', 'A,B,C,D') AS result;
+--------+
| result |
+--------+
| 0 |
+--------+
I used the above logic but modified it slightly.
My input is of format :
"apple:100|pinapple:200|orange:300" stored in a variable #updtAdvanceKeyVal
Here is the function block :
set #res = "";
set #i = 1;
set #updtAdvanceKeyVal = updtAdvanceKeyVal;
REPEAT
-- set r = replace(SUBSTRING(SUBSTRING_INDEX(#updtAdvanceKeyVal, "|", #i),
-- LENGTH(SUBSTRING_INDEX(#updtAdvanceKeyVal, "|", #i -1)) + 1),"|","");
-- wrapping the function in "replace" function as above causes to cut off a character from
-- the 2nd splitted value if the value is more than 3 characters. Writing it in 2 lines causes no such problem and the output is as expected
-- sample output by executing the above function :
-- orange:100
-- pi apple:200 !!!!!!!!strange output!!!!!!!!
-- tomato:500
set #r = SUBSTRING(SUBSTRING_INDEX(#updtAdvanceKeyVal, "|", #i),
LENGTH(SUBSTRING_INDEX(#updtAdvanceKeyVal, "|", #i -1)) + 1);
set #r = replace(#r,"|","");
if #r <> "" then
set #key = SUBSTRING_INDEX(#r, ":",1);
set #val = SUBSTRING_INDEX(#r, ":",-1);
select #key, #val;
end if;
set #i = #i + 1;
until #r = ""
END REPEAT;
I just had a similar issue with a field like that which I solved a different way. My use case was needing to take those ids in a comma separated list for use in a join.
I was able to solve it using a like, but it was made easier because in addition to the comma delimiter the ids were also quoted like so:
keys
"1","2","6","12"
Because of that, I was able to do a LIKE
SELECT twwf.id, jtwi.id joined_id
FROM table_with_weird_field twwf
INNER JOIN join_table_with_ids jtwi
ON twwf.delimited_field LIKE CONCAT("%\"", jtwi.id, "\"%")
This basically just looks to see if the id from the table you're trying to join appears in the set and at that point you can join on it easily enough and return your records. You could also just create a view from something like this.
It worked well for my use case where I was dealing with a Wordpress plugin that managed relations in the way described. The quotes really help though because otherwise you run the risk of partial matches (aka - id 1 within 18, etc).
You can do this with JSON in more recent MySQL versions. It's a blast. We will have a quick preparation to create a numbers table. Then first we create an intermediary table to convert the comma delimited strings into a json array then we will use json_extract to blast them apart. I am encapsulating the strings in quotes carefully escaping existing quotes because I had semicolon separated strings containing commas.
So to create the numbers table, hopefully you have more clients than courses, choose an adequately big table if not.
CREATE TABLE numbers (n int PRIMARY KEY);
INSERT INTO numbers
SELECT #row := #row + 1
FROM clients JOIN (select #row:=0) t2;
Add LIMIT 50 if you know you only have 50 courses. Now, that was easy, wasn't it? Now on to the real work, honestly it's the quotes that make it uglier but at least it's more generic that way:
CREATE TABLE json_coursenames
SELECT clientId,clientName,CONCAT('["', REPLACE(REPLACE(courseName,'"','\\"'), ',', '","'), '"]') AS a
FROM clients;
CREATE TABLE extracted
SELECT clientId,clientName,REPLACE(TRIM(TRIM('"' FROM JSON_EXTRACT(a, concat('$[', n, ']')))), '\\"', '"')
FROM json_coursenames
INNER JOIN numbers ON n < JSON_LENGTH(a);
Wheee!
The meat here are these two: the CONCAT('["', REPLACE(coursename, ',', '","'), '"]') (I dropped the second REPLACE to make it more visible) will convert foo,bar,bar into "foo","bar","baz". The other trick is JSON_EXTRACT(a, concat('$[', n, ']') will become JSON_EXTRACT(a, $[12]) and that's the 13th element in the array, see JSON Path syntax.
SQL:
SELECT
DISTINCT *
FROM
JSON_TABLE(
concat('["',replace('AB01,AB02,AB03',',','","'),'"]'),
'$[*]'
COLUMNS(
Value VARCHAR(50) PATH '$'
)
) data
WHERE Value != '';
RESULT:
| | Value |
| -------- | :-------------- |
| 1 | AB01 |
| 2 | AB02 |
| 3 | AB03 |
Here's how you do it for SQL Server. Someone else can translate it to MySQL. Parsing CSV Values Into Multiple Rows.
SELECT Author,
NullIf(SubString(',' + Phrase + ',' , ID , CharIndex(',' , ',' + Phrase + ',' , ID) - ID) , '') AS Word
FROM Tally, Quotes
WHERE ID <= Len(',' + Phrase + ',') AND SubString(',' + Phrase + ',' , ID - 1, 1) = ','
AND CharIndex(',' , ',' + Phrase + ',' , ID) - ID > 0
The idea is to cross join to a predefined table Tally which contains integer 1 through 8000 (or whatever big enough number) and run SubString to find the right ,word, position.
Here's what I've got so far (found it on the page Ben Alpert mentioned):
SELECT REPLACE(
SUBSTRING(
SUBSTRING_INDEX(c.`courseNames`, ',', e.`courseId` + 1)
, LENGTH(SUBSTRING_INDEX(c.`courseNames`, ',', e.`courseId`)
) + 1)
, ','
, ''
)
FROM `clients` c INNER JOIN `clientenrols` e USING (`clientId`)
Well, nothing I used worked, so I decided creating a real simple split function, hope it helps:
DECLARE inipos INTEGER;
DECLARE endpos INTEGER;
DECLARE maxlen INTEGER;
DECLARE item VARCHAR(100);
DECLARE delim VARCHAR(1);
SET delim = '|';
SET inipos = 1;
SET fullstr = CONCAT(fullstr, delim);
SET maxlen = LENGTH(fullstr);
REPEAT
SET endpos = LOCATE(delim, fullstr, inipos);
SET item = SUBSTR(fullstr, inipos, endpos - inipos);
IF item <> '' AND item IS NOT NULL THEN
USE_THE_ITEM_STRING;
END IF;
SET inipos = endpos + 1;
UNTIL inipos >= maxlen END REPEAT;