How to split / unpivot group_concat in mysql? - mysql

I currently have a view that has a column called alt_email_contact and I used the group_concat function in order to get multiple emails associated with one contact. However I want to be able to extract each email and create a separate column for each.
Example:
id email
1 SkyW#gmail.com, SW#gmail.com, WW#gmail.com, WalterW#gmail.com
the amount of emails is subject to change from one user to another therefore there wont always be four emails for each user. I want to create a new column per each email like so:
id email_1 email_2 email_3 email_4
1 SkyW#gmail.com SW#gmail.com WW#gmail.com WalterW#gmail.com
(I am using phpmyadmin) I would like to be able to modify my view to contain the variable amount of emails per user.

You can use substring_index() to achieve what you want. It is not really pretty, but it will work:
select id,
substring_index(emails, ', ', 1) as email_1
(case when length(emails) - length(replace(emails, ',', '')) >= 1
then substring_index(substring_index(emails, ', ', 2), ', ', -1)
end) as email_2,
(case when length(emails) - length(replace(emails, ',', '')) >= 2
then substring_index(substring_index(emails, ', ', 3), ', ', -1)
end) as email_3,
(case when length(emails) - length(replace(emails, ',', '')) >= 3
then substring_index(substring_index(emails, ', ', 4), ', ', -1)
end) as email_4,
(case when length(emails) - length(replace(emails, ',', '')) >= 4
then substring_index(substring_index(emails, ', ', 5), ', ', -1)
end) as email_5
from table t;
You can insert these values into another table, if you like.

Don't use that view, since GROUP_CONCAT() has already ruined the normalization. You want to simulate a pivot table using the limited SQL capabilities of MySQL.
Let's assume that your view is based on a Contacts table that looks like this:
CREATE TABLE Contacts
( id INTEGER NOT NULL
, alt_email_contact VARCHAR(256) NOT NULL
);
Create this helper view instead (which is basically RANK() OVER (PARTITION BY id ORDER BY alt_email_contact), except that MySQL doesn't support RANK()):
CREATE VIEW NumberedContacts AS
SELECT c1.id, c1.alt_email_contact, COUNT(*) AS rank
FROM Contacts c1
INNER JOIN Contacts c2
ON c2.id = c1.id AND
c1.alt_email_contact >= c2.alt_email_contact
GROUP BY c1.id, c1.alt_email_contact;
Then you can write this query or view, which gives you up to 5 alternate e-mail addresses, ordered alphabetically:
CREATE VIEW ContactsForImport AS
SELECT c1.id
, c1.alt_email_contact AS email_1
, c2.alt_email_contact AS email_2
, c3.alt_email_contact AS email_3
, c4.alt_email_contact AS email_4
, c5.alt_email_contact AS email_5
FROM NumberedContacts AS c1
LEFT OUTER JOIN NumberedContacts AS c2
ON c1.id = c2.id AND c2.rank = 2
LEFT OUTER JOIN NumberedContacts AS c3
ON c1.id = c3.id AND c3.rank = 3
LEFT OUTER JOIN NumberedContacts AS c4
ON c1.id = c4.id AND c4.rank = 4
LEFT OUTER JOIN NumberedContacts AS c5
ON c1.id = c5.id AND c5.rank = 5
WHERE c1.rank = 1;
SQL Fiddle

Related

How to put values starting from the right side into columns? [duplicate]

I have a table with a single column in a Postgres 13.1 database. It consists of many rows with comma-separated values - around 20 elements at most.
I want to split the data into multiple columns. But I have only a limited number of columns say 5 and more than 5 CSV values in a single row, so excess values must be shifted to new/next row). How to do this?
Example:
a1, b1, c1
a2, b2, c2, d2, e2, f2
a3, b3, c3, d3, e3, f3, g3, h3, i3, j3
a4
a5, b5, c5
'
'
'
Columns are only 5, so the output would be like:
c1 c2 c3 c4 c5
---------------
a1 b1 c1
a2 b2 c2 d2 e2
f2
a3 b3 c3 d3 e3
f3 g3 h3 i3 j3
a4
a5 b5 c5
'
'
'
It is typically bad design to store CSV values in a single column. If at all possible, use an array or a properly normalized design instead.
While stuck with your current situation ...
For known small maximum number of elements
A simple solution without trickery or recursion will do:
SELECT id, 1 AS rnk
, split_part(csv, ', ', 1) AS c1
, split_part(csv, ', ', 2) AS c2
, split_part(csv, ', ', 3) AS c3
, split_part(csv, ', ', 4) AS c4
, split_part(csv, ', ', 5) AS c5
FROM tbl
WHERE split_part(csv, ', ', 1) <> '' -- skip empty rows
UNION ALL
SELECT id, 2
, split_part(csv, ', ', 6)
, split_part(csv, ', ', 7)
, split_part(csv, ', ', 8)
, split_part(csv, ', ', 9)
, split_part(csv, ', ', 10)
FROM tbl
WHERE split_part(csv, ', ', 6) <> '' -- skip empty rows
-- three more blocks to cover a maximum "around 20"
ORDER BY id, rnk;
db<>fiddle here
id being the PK of the original table.
This assumes ', ' as separator, obviously.
You can adapt easily.
Related:
Split comma separated column data into additional columns
For unknown number of elements
Various ways. One way use regexp_replace() to replace every fifth separator before unnesting ...
-- for any number of elements
SELECT t.id, c.rnk
, split_part(c.csv5, ', ', 1) AS c1
, split_part(c.csv5, ', ', 2) AS c2
, split_part(c.csv5, ', ', 3) AS c3
, split_part(c.csv5, ', ', 4) AS c4
, split_part(c.csv5, ', ', 5) AS c5
FROM tbl t
, unnest(string_to_array(regexp_replace(csv, '((?:.*?,){4}.*?),', '\1;', 'g'), '; ')) WITH ORDINALITY c(csv5, rnk)
ORDER BY t.id, c.rnk;
db<>fiddle here
This assumes that the chosen separator ; never appears in your strings. (Just like , can never appear.)
The regular expression pattern is the key: '((?:.*?,){4}.*?),'
(?:) ... “non-capturing” set of parentheses
() ... “capturing” set of parentheses
*? ... non-greedy quantifier
{4}? ... sequence of exactly 4 matches
The replacement '\1;' contains the back-reference \1.
'g' as fourth function parameter is required for repeated replacement.
Further reading:
PostgreSQL & regexp_split_to_array + unnest
Apply `trim()` and `regexp_replace()` on text array
PostgreSQL unnest() with element number
Other ways to solve this include a recursive CTE or a set-returning function ...
Fill from right to left
(Like you added in How to put values starting from the right side into columns?)
Simply count down numbers like:
SELECT t.id, c.rnk
, split_part(c.csv5, ', ', 5) AS c1
, split_part(c.csv5, ', ', 4) AS c2
, split_part(c.csv5, ', ', 3) AS c3
, split_part(c.csv5, ', ', 2) AS c4
, split_part(c.csv5, ', ', 1) AS c5
FROM ...
db<>fiddle here
CREATE UNLOGGED TABLE foo( x TEXT );
\copy foo FROM stdin
a1, b1, c1
a2, b2, c2, d2, e2, f2
a3, b3, c3, d3, e3, f3, g3, h3, i3, j3
a4
a5, b5, c5
\.
From lines to single column...
SELECT (ROW_NUMBER() OVER () - 1)/5 AS r, u FROM (SELECT unnest(string_to_array(x,', ')) u from foo) y;
r | u
---+----
0 | a1
0 | b1
0 | c1
0 | a2
0 | b2
1 | c2
1 | d2
...etc
...and back to lines of known length.
SELECT r,array_agg(u) a FROM (
SELECT (ROW_NUMBER() OVER () - 1)/5 AS r, u FROM (
SELECT unnest(string_to_array(x,', ')) u from foo) y) y1
GROUP BY r ORDER BY r;
r | a
---+------------------
0 | {a1,b1,c1,a2,b2}
1 | {c2,d2,e2,f2,a3}
2 | {b3,c3,d3,e3,f3}
3 | {g3,h3,i3,j3,a4}
4 | {a5,b5,c5}
After this you can insert it into a table using a[] for each column. What to do with the last line is left as an exercise to the reader...
Answer to related question: How to put values starting from the right side into columns?
The accepted great answer from #ErwinBrandstetter can be easily adapted to required right-to-left output.
You just need to change to order of the split parts. So you don't return split parts 1-5 and 6-10 but 5-1 and 10-6:
demo:db<>fiddle
SELECT id, 1 AS rnk
, split_part(csv, ', ', 5) AS c1
, split_part(csv, ', ', 4) AS c2
, split_part(csv, ', ', 3) AS c3
, split_part(csv, ', ', 2) AS c4
, split_part(csv, ', ', 1) AS c5
FROM tbl
WHERE split_part(csv, ', ', 1) <> '' -- skip empty rows
UNION ALL
SELECT id, 2
, split_part(csv, ', ', 10)
, split_part(csv, ', ', 9)
, split_part(csv, ', ', 8)
, split_part(csv, ', ', 7)
, split_part(csv, ', ', 6)
FROM tbl
WHERE split_part(csv, ', ', 6) <> '' -- skip empty rows
-- more?
ORDER BY id, rnk;
You need to do this in the whatever backend layer you are using.
First, convert the CSV rows to array of string
Then, use logic something like this to add Values to the database
int row = 0; // database row index - can be used to just have a count
final int MAX_COLUMNS = 5;
for(int i = 0; i<rows.length; i++) {
// Convert csv row string to array of each value.
String [] values = rows[i].split(",");
// Dividing whole row into chunks of size of number of columns
for(int j = 0; j < (values.length/(MAX_COLUMNS)) + 1; j++) {
Add Values [MAX_COLUMNS*j,MAX_COLUMNS*j+(MAX_COLUMNS - 1)] to the row [row + j]
row++;
}
}

Get Values not in the second table using find_in_set

I have two tables and i need to get list of all store_ids that are not in the other table
BusinessUnit Table User Table
StoreId(varchar) StoreId(varchar)
1 1,2
2 3,4
3 1,5
4 4,6
7 4
How to get values of storeid 5,6 which are not present in the business unit table but are present in the user Table? Tried to use several using find_in_set and nothing works.
Use SUBSTRING_INDEX to get all the values from the CSV field. Since there can be up to 6 IDs in the CSV, you need to call it once for each position.
SELECT u.StoreId
FROM (
select substring_index(StoreId, ',', 1) AS StoreID
FROM User
UNION
select substring_index(substring_index(StoreId, ',', 2), ',', -1)
FROM User
UNION
select substring_index(substring_index(StoreId, ',', 3), ',', -1)
FROM User
UNION
select substring_index(substring_index(StoreId, ',', 4), ',', -1)
FROM User
UNION
select substring_index(substring_index(StoreId, ',', 5), ',', -1)
FROM User
UNION
select substring_index(substring_index(StoreId, ',', 6), ',', -1)
FROM User) AS u
LEFT JOIN BusinessUnit AS b ON u.StoreId = b.StoreID
WHERE b.StoreId IS NULL
DEMO
IF you know all the possible values (and the number of them is reasonably manageable) you can populate a new table with them (you can make it TEMPORARY or just DROP it afterwards), and do this
SELECT *
FROM (
SELECT allIDs.Id
FROM allIDs
INNER JOIN `User` AS u
-- ON CONCAT(',', u.StoreID, ',') LIKE CONCAT('%,', allIDs.Id, ',%')
ON FIND_IN_SET(allIDs.Id, u.StoreID)
) AS IDsInUserTable
LEFT JOIN `BusinessUnit` AS b ON IDsInUserTable.Id = b.StoreID
HAVING b.StoreID IS NULL
;
In this example, allIDs is the aforementioned "possible values" table.

MySQL GROUP BY each comma separated value

Before anyone comments, I did not design this database with comma separated values :)
I have spent time trying to find the answer but all I could find was GROUP_CONCAT() which seemed to do the opposite of what I wanted.
I would like to GROUP BY each of the values within the comma separated value field.
SELECT round(avg(DATEDIFF( dateClosed , dateAded ) * 1.0), 2) AS avg, department
FROM tickets GROUP BY assignedto
the assignedto field is the comma separated value field
row1 54,69,555
row2 54,75,555
row3 75,555
DESIRED OUTPUT: an average rounded figure for each value in assignedto field grouped.
EDIT - TRYING TO TAKE THIS TO THE NEXT LEVEL:
I want to include the ticket answer table to get the first response for that ticket, use its datetime field to work out the average response time for each user.
SELECT a.id as theuser, round(avg(DATEDIFF( ta.dateAded , t.dateAded ) * 1.0), 2) as avg
FROM tickets t join
mdl_user a
on find_in_set(a.id, t.assignedto) > 0
INNER JOIN (SELECT MIN(ta.dateAded) as started FROM ticketanswer GROUP BY ta.ticketId) ta ON t.id = ta.ticketId
GROUP BY a.id ORDER BY avg ASC
Yuck. You can do this, assuming you know the maximum number of assignments. Here is an approach:
select substring_index(substring_index(assignedto, ',', n.n), ',', -1) as assignedto,
round(avg(DATEDIFF( dateClosed , dateAded ) * 1.0), 2) as avg
from tickets t join
(select 1 as n union all select 2 union all select 3)
on length(assignedto) - length(replace(assignedto, ',', '')) < n.n
group by substring_index(substring_index(assignedto, ',', n.n), ',', -1);
Or, an easier way if you have a list of assigned values, say in an AssignedTo table:
select a.assignedto, round(avg(DATEDIFF( dateClosed , dateAded ) * 1.0), 2) as avg
from tickets t join
assignedto a
on find_in_set(a.assignedto, t.assignedto) > 0
group by a.assignedto;
I'm sorry you have to deal with this malformed database structure.

mysql select substrings and group them by column

I am trying to divide data in one onf the tables on my MySQL database.
Column contains data like this:
de:"Sweatjacke*";en:"jacket*";pl:"bluza*";
de:"*";en:"*";pl:"bluza*";
fr:"*";de:"*";en:"*";pl:"dres junior*";cz:"*";
pl:"bluza";
And I am trying to divide all of the translations into separate columns. Already came with solution to do this by using:
SELECT
SUBSTRING_INDEX(SUBSTRING_INDEX(name, ';', 1), ';', -1) as tr1,
SUBSTRING_INDEX(SUBSTRING_INDEX(name, ';', 2), ';', -1) as tr2,
SUBSTRING_INDEX(SUBSTRING_INDEX(name, ';', 3), ';', -1) as tr3,
SUBSTRING_INDEX(SUBSTRING_INDEX(name, ';', 4), ';', -1) as tr4,
SUBSTRING_INDEX(SUBSTRING_INDEX(name, ';', 5), ';', -1) as tr5
FROM product;
statement, but that results in:
tr1 tr2 tr3 tr4 tr5
fr:"*" de:"*" en:"*" pl:"bluza*" cz:"*"
fr:"*" de:"Sweatjacke*" en:"jacket*" pl:"bluza*" cz:"*"
de:"Sweatjacke*" en:"jacket*" pl:"bluza*"
And I want to have the results gruped by translation type (pl/de/en) so in each collumn one type of translatoin is present. For example in column1 = pl:, column2 = en: etc.
Any one came across similar problem and knows a way to solve it?
You need to unpivot the data, then select the first and second part of each value and then re-aggregate it.
However, a better form for the data is really to have language/translation. The following produces this:
select substring_index(tr, ':', 1) as l, substring_index(tr, ':', 2) as t, name
from (select SUBSTRING_INDEX(SUBSTRING_INDEX(name, ';', n.n), ';', -1) as tr, n, name
from product p cross join
(select 1 as n union all select 2 union all select 3 union all select 4 union all
select 5
) n
) n
You would probably want an "id" column or "word" column to identify each row, rather than the name column.
You can now pivot this result to get what you want:
select max(case when l = 'en' then name end) as en,
max(case when l = 'fr' then name end) as fr,
max(case when l = 'de' then name end) as de,
max(case when l = 'pl' then name end) as pl,
max(case when l = 'cz' then name end) as cz
from (select substring_index(tr, ':', 1) as l, substring_index(tr, ':', 2) as t, name
from (select SUBSTRING_INDEX(SUBSTRING_INDEX(name, ';', n.n), ';', -1) as tr, n, name
from product p cross join
(select 1 as n union all select 2 union all select 3 union all select 4 union all
select 5
) n
) n
) lt
group by name;
Managed to solve it by using some of the string related functions funcitons:
SELECT
SUBSTRING_INDEX( SUBSTRING( name, LOCATE( "pl:", name ) , 150 ) , ';', 1 ) AS pl,
SUBSTRING_INDEX( SUBSTRING( name, LOCATE( "en:", name ) , 150 ) , ';', 1 ) AS en,
SUBSTRING_INDEX( SUBSTRING( name, LOCATE( "de:", name ) , 150 ) , ';', 1 ) AS de,
SUBSTRING_INDEX( SUBSTRING( name, LOCATE( "fr:", name ) , 150 ) , ';', 1 ) AS fr
FROM product
Thanks to everyone for help.
As far as I understand you want to UNPIVOT your data. There is no such function in MySQL, so you might want to export your data into MSSQL (you can use free MSSQL Express) and use UNPIVOT function: http://technet.microsoft.com/en-us/library/ms177410(v=sql.105).aspx

SQL GROUP_CONCAT split in different columns

I searched a lot, but didn't find a proper solution to my problem.
What do I want to do?
I have 2 tables in MySQL:
- Country
- Currency
(I join them together via CountryCurrency --> due to many to many relationship)
See this for a working example: http://sqlfiddle.com/#!2/317d3/8/0
I want to link both tables together using a join, but I want to show just one row per country (some countries have multiple currencies, so that was the first problem).
I found the group_concat function:
SELECT country.Name, country.ISOCode_2, group_concat(currency.name) AS currency
FROM country
INNER JOIN countryCurrency ON country.country_id = countryCurrency.country_id
INNER JOIN currency ON currency.currency_id = countryCurrency.currency_id
GROUP BY country.name
This has the following result:
NAME ISOCODE_2 CURRENCY
Afghanistan AF Afghani
Åland Islands AX Euro
Albania AL Lek
Algeria DZ Algerian Dinar
American Samoa AS US Dollar,Kwanza,East Caribbean Dollar
But what I want now is to split the currencies in different columns (currency 1, currency 2, ...). I already tried functions like MAKE_SET() but this doesn't work.
You can do this with substring_index(). The following query uses yours as a subquery and then applies this logic:
select Name, ISOCode_2,
substring_index(currencies, ',', 1) as Currency1,
(case when numc >= 2 then substring_index(substring_index(currencies, ',', 2), ',', -1) end) as Currency2,
(case when numc >= 3 then substring_index(substring_index(currencies, ',', 3), ',', -1) end) as Currency3,
(case when numc >= 4 then substring_index(substring_index(currencies, ',', 4), ',', -1) end) as Currency4,
(case when numc >= 5 then substring_index(substring_index(currencies, ',', 5), ',', -1) end) as Currency5,
(case when numc >= 6 then substring_index(substring_index(currencies, ',', 6), ',', -1) end) as Currency6,
(case when numc >= 7 then substring_index(substring_index(currencies, ',', 7), ',', -1) end) as Currency7,
(case when numc >= 8 then substring_index(substring_index(currencies, ',', 8), ',', -1) end) as Currency8
from (SELECT country.Name, country.ISOCode_2, group_concat(currency.name) AS currencies,
count(*) as numc
FROM country
INNER JOIN countryCurrency ON country.country_id = countryCurrency.country_id
INNER JOIN currency ON currency.currency_id = countryCurrency.currency_id
GROUP BY country.name
) t
The expression substring_index(currencies, ',' 2) takes the list in currencies up to the second one. For American Somoa, that would be 'US Dollar,Kwanza'. The next call with -1 as the argument takes the last element of the list, which would be 'Kwanza', which is the second element of currencies.
Also note that SQL queries return a well-defined set of columns. A query cannot have a variable number of columns (unless you are using dynamic SQL through a prepare statement).
Use this query to work out the number of currency columns you'll need:
SELECT MAX(c) FROM
((SELECT count(currency.name) AS c
FROM country
INNER JOIN countryCurrency ON country.country_id = countryCurrency.country_id
INNER JOIN currency ON currency.currency_id = countryCurrency.currency_id
GROUP BY country.name) as t)
Then dynamically create and execute prepared statement to generate the result, using Gordon Linoff solution with query result above to in this thread.
Ypu can use dynamic SQL, but you will have to use procedure