MySQL group by comma seperated list unique [duplicate] - mysql

This question already has answers here:
Is storing a delimited list in a database column really that bad?
(10 answers)
Closed 2 years ago.
The column textfield has comma-seperated list values
ID | textfield
1 | english,russian,german
2 | german,french
3 | english
4 | null
I'm attempting to count the amount of languages in textfield. The default language is "English", so if null then "English". The correct amount of languages is 4(english,russian,german,french).
Here is my query to attempt doing this:
SELECT SUM((length(`textfield`) - length(replace(`textfield`, ',', '')) + 1)) as my
FROM yourtable;
The result i get is 6, i don't know how to group the languages.
Here is fiddle
http://sqlfiddle.com/#!9/0e532/1
The desired result is 4. How do i solve?

Identifying the source of error
What your query is doing is counting how many languages in each row, and adding them all together. Your query does not take into account duplicates. Since English shows up twice in the table, it is counted twice (and German, too), hence in your example six. Also, another issue is that your current code considers null as what null truly means, the absence of a value.
For example, if your database was
ID | textfield
---|----------
1 | null
you would also be arriving at incorrect results (more on this below).
Solution
This gets you a comma separated result of the languages, no duplicates.
SELECT
GROUP_CONCAT(DISTINCT SUBSTRING_INDEX(SUBSTRING_INDEX(textfield, ',', n.digit+1), ',', -1)) textfield
FROM
yourtable
INNER JOIN
(SELECT 0 digit UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6) n
ON LENGTH(REPLACE(textfield, ',', '')) <= LENGTH(textfield)-n.digit;
This query can serve as a subquery for what you were attempting to do in the question prompt. In other words, instead of the length('textfield') ... you would provide the resulting column name from this query
Null as in English
This logic should not be implemented at the database level, IMHO. If you want to go ahead and consider null entries as English, that is fine. The downside is the example I provided for you before. When you have a query that solves for the total languages in the database, if English wasn't an explicitly stated language and instead just a null value, then the query wouldn't 'count' English (it's null). But you can't just add 1 every time you find the amount of languages because English might already be explicit.
Recommendations:
Avoid comma separated lists in databases by normalizing your data
No value makes sense for a null field

For version 5.6 (like in the fiddle)
SELECT COUNT(DISTINCT SUBSTRING_INDEX(SUBSTRING_INDEX(languages.textfield, ',', numbers.num), ',', -1)) languages_count
FROM (SELECT COALESCE(textfield, 'english') textfield
FROM yourtable) languages
JOIN (SELECT 1 num UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5) numbers
ON numbers.num <= LENGTH(languages.textfield) - LENGTH(REPLACE(languages.textfield, ',', '')) + 1;
fiddle
For version 8.x (as claimed in a comment)
SELECT COUNT(DISTINCT jsontable.value) languages_count
FROM yourtable
CROSS JOIN JSON_TABLE( CONCAT('["', REPLACE(COALESCE(textfield, 'english'), ',', '","'), '"]'),
"$[*]" COLUMNS( value VARCHAR(254) PATH "$" )
) AS jsontable;
fiddle

Related

counting comma separated values mysql-postgre

I have a column called "feedback", and have 1 field called "emotions". In those emotions field, we can see the random values and random length like
emotions
sad, happy
happy, angry, boring
boring
sad, happy, boring, laugh
etc with different values and different length.
so, the question is, what's query to serve the mysql or postgre data:
emotion
count
happy
3
angry
1
sad
2
boring
3
laugh
1
based on SQL: Count of items in comma-separated column in a table we could try using
SELECT value as [Holiday], COUNT(*) AS [Count]
FROM OhLog
CROSS APPLY STRING_SPLIT([Holidays], ',')
GROUP BY value
but it wont help because that is for sql server, not mysql or postgre. or anyone have idea to translation those sqlserver query to mysql?
thank you so much.. I really appreciate it
Using Postgres:
create table emotions(id integer, emotions varchar);
insert into emotions values (1, 'sad, happy');
insert into emotions values (2, 'happy, angry, boring');
insert into emotions values (3, 'boring');
insert into emotions values (4, 'sad, happy, boring, laugh');
select
emotion, count(*)
from
(select
trim(regexp_split_to_table(emotions, ',')) as emotion
from emotions) as t
group by
emotion;
emotion | count
---------+-------
happy | 3
sad | 2
boring | 3
laugh | 1
angry | 1
From String functions regexp_split_to_table will split the string on ',' and return the individual elements as rows. Since there are spaces between the ',' and the word use trim to get rid of the spaces. This then generates a 'table' that is used as a sub-query. In the outer query group by the emotion field and count them.
Try the following using MySQL 8.0:
WITH recursive numbers AS
(
select 1 as n
union all
select n + 1 from numbers where n < 100
)
,
Counts as (
select trim(substring_index(substring_index(emotions, ',', n),',',-1)) as emotions
from Emotions
join numbers
on char_length(emotions) - char_length(replace(emotions, ',', '')) >= n - 1
)
select emotions,count(emotions) as counts from Counts
group by emotions
order by emotions
See a demo from db-fiddle.
The recursive query is to generate numbers from 1 to 100, supposing that the maximum number of sub-strings is 100, you may change this number accordingly.
I've used MySQL 8.0, the query has no string limits. (Thanks to Ahmed for the intuition on recursive clause)
WITH RECURSIVE cte AS (
SELECT ( LENGTH(REGEXP_REPLACE(emotions, ' ?[A-z]+ ?', ''))+1) AS n, emotions AS subs
FROM feedback
UNION ALL
SELECT n-1 AS n, ( SUBSTRING_INDEX(subs, ', ', n-1) ) AS subs
FROM cte
HAVING n>0
)
SELECT SUBSTRING_INDEX(subs, ', ', -1) AS emotions, COUNT(subs) AS cnt
FROM cte
GROUP BY emotions

Find all numbers that are present more then 3 times in a column of CSVs

I basically want this: if certain number is present >= 3 times then do some action ...
My table's column is this:
As you can see here that number 38 is present >= 3 times in absent_sids column, so I want to have some actions on him like ban or something else. But I don't know what sql query should I write because;
1. I am quite new to php/mysql
2. The column has comma separated numbers, and its quite difficult for me to search in this column through mysql query and bring the absent_sid that is >= 3 times in a given period of time/date.
Plz help
This is quite long but working. Steps: 1) convert the array into rows using CHAR_LENGTH and REPLACE function. 2) Use GROUP BY and HAVING COUNT to search for numbers that exists 3 or more times
See demo here: http://sqlfiddle.com/#!9/39afc0/2
SELECT absent_sids
from (
SELECT
tablename.aid,
SUBSTRING_INDEX(
SUBSTRING_INDEX(
tablename.absent_sids, ',', numbers.n), ',', -1) as absent_sids
FROM
(select ORDINAL_POSITION as n
from INFORMATION_SCHEMA.COLUMNS
where table_name='COLUMNS'
and ORDINAL_POSITION <= (
select round(max(length(absent_sids))/2)
from tablename)) numbers
INNER JOIN tablename
ON CHAR_LENGTH(tablename.absent_sids)
-CHAR_LENGTH(REPLACE(tablename.absent_sids, ',', ''))
>= numbers.n-1) tab
GROUP BY absent_sids
HAVING COUNT(*) >= 3

UNNEST function in MYSQL like POSTGRESQL

Is there a function like "unnest" from POSTGRESQL on MYSQL?
Query (PSQL):
select unnest('{1,2,3,4}'::int[])
Result (as table):
int |
_____|
1 |
_____|
2 |
_____|
3 |
_____|
4 |
_____|
Short answer
Yes, it is possible. From technical viewpoint, you can achieve that with one query. But the thing is - most probably, you are trying to pass some logic from application to data storage. Data storage is intended to store data, not to represent/format it or, even more, apply some logic to it.
Yes, MySQL doesn't have arrays data type, but in most cases it won't be a problem and architecture can be created so it will fit those limitations. And in any case, even if you'll achieve it somehow (like - see below) - you won't be possible to properly work later with that data, since it will be just result set. You may store it, of course - so to, let's say, index later, but then it's again a task for an application - so to create that import.
Also, make sure that it is not a Jaywalker case, so not about storing delimiter-separated values and later trying to extract them.
Long answer
From technical viewpoint, you can do it with Cartesian product of the two row sets. Then use a well known formula:
N = d1x101 + d2x102 + ...
Thus, you'll be able to create a "all-numbers" table and later iterate through it. That iteration, together with MySQL string functions, may lead you to something like this:
SELECT
data
FROM (
SELECT
#next:=LOCATE(#separator,#search, #current+1) AS next,
SUBSTR(SUBSTR(#search, #current, #next-#current), #length+1) AS data,
#next:=IF(#next, #next, NULL) AS marker,
#current:=#next AS current
FROM
(SELECT 0 as i UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) as n1
CROSS JOIN
(SELECT 0 as i UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) as n2
CROSS JOIN
(SELECT
-- set your separator here:
#separator := ',',
-- set your string here:
#data := '1,25,42,71',
-- and do not touch here:
#current := 1,
#search := CONCAT(#separator, #data, #separator),
#length := CHAR_LENGTH(#separator)) AS init
) AS joins
WHERE
marker IS NOT NULL
The corresponding fiddle would be here.
You should also notice: this is not a function. And with functions (I mean, user-defined with CREATE FUNCTION statement) it's impossible to get result row set since function in MySQL can not return result set by definition. However, it's not true to say that it's completely impossible to perform requested transformation with MySQL.
But remember: if you are able to do something, that doesn't mean you should do it.
This sample fetchs all "catchwords" from Table data, wich are seperated by ","
Maximum values in the commaseparated list is 100
WITH RECURSIVE num (n) AS (
SELECT 1
UNION ALL
SELECT n+1 FROM num WHERE n<100 -- change this, if more than 100 elements
)
SELECT DISTINCT substring_index(substring_index(catchwords, ',', n), ',', -1) as value
FROM data
JOIN num
ON char_length(catchwords) - char_length(replace(catchwords, ',', '')) >= n - 1
In newer Version of MySQL/MariaDB you can use JSON_TABLE if you can JOIN the elements:
SELECT cat.catchword, dat.*
FROM data dat
CROSS JOIN json_table(concat('[',dat.catchwords, ']')
, '$[*]' COLUMNS(
catchword VARCHAR(50) PATH '$'
)
) AS words

SQL query - tagging system, delimited string + unique values

I am working for a client that stores item tags in the MySQL DB like so (I know, I know - not ideal):
coats_and_jackets-Woven_Jacket-brand:Hobbs;
coats_and_jackets-Woven_Jacket-color:Black;
coats_and_jackets-Woven_Jacket-style:Boucle;
coats_and_jackets-Woven_Jacket-pattern:Plain;
dresses-Pinafore-brand:COS;
dresses-Pinafore-color:Blue _ Navy;
dresses-Pinafore-style:Wool;
dresses-Pinafore-pattern:Plain;
shoes-Ankle_Boot-brand:Topshop;
shoes-Ankle_Boot-color:Black;
shoes-Ankle_Boot-style:Leather;
shoes-Ankle_Boot-pattern:Plain;
bags-Tote-brand:Mulberry;
bags-Tote-color:Brown _ Tan;
bags-Tote-style:Leather;
bags-Tote-pattern:Plain;
shoes-Ballet_shoes-brand:Chanel;
shoes-Ballet_shoes-color:Black;
shoes-Ballet_shoes-style:Leather;
shoes-Ballet_shoes-pattern:Plain;
accessories-Scarf-brand:Zara;
accessories-Scarf-color:Brown _ Tan;
accessories-Scarf-style:Wool;
accessories-Scarf-pattern:Checked;
Each tag is broken down into 4 parts like so: category-type-brand, category-type-color, category-type-style, category-type-pattern
Not all 4 parts of a tag are required and can be omitted from the DB.
I have been tasked with finding out how many tags an item has, so in this example 6 tags have been used, each with all 4 parts.
The query I have so far counts all the tag parts, in this example 24, but I cannot assume that each tag will have all 4 parts stored. So cannot divide the parts amount by 4 to get the amount of tags.
In this example, the 6 tags used are as follows:
Coats & Jackets (Woven Jacket)
Dresses (Pinafore)
Shoes (Ankle boot)
Bags (Tote)
Shoes (Ballet Shoes)
Accessories (Scarf)
Now I'm not concerned about the category, type or parts (brand, color, style, pattern) - I'm just concerned about fetching the total amount of tags for this item.
Also, the data example above would be stored in a db row that looks like:
+----------+-------------+----------------------------+
| ID | meta_key | meta_value |
+----------+-------------+----------------------------+
| 1 | tags | coats_and_jackets-wove... |
+----------+-------------+----------------------------+
| 2 | item_desc | Fashion editor |
+----------+-------------+----------------------------+
Help structuring this query would be much appreciated.
The tags use hyphen as a separator. Here is a method for finding the number of tags used by a given item:
select it.*, length(it.tags) - length(replace(it.tags, '-', ''))+1
from itemtags it
This replaces the hyphen with an empty string, and measures the difference in lengths.
Assuming I'm understanding your requirement correctly, how about something like this (with CTE used to demonstrate assumed table structure)
WITH CTE1(tag) AS(
select 'coats_and_jackets-Woven_Jacket-brand:Hobbs' union
-- ...
select 'accessories-Scarf-color:Brown _ Tan' union
select 'accessories-Scarf-style:Wool' union
select 'accessories-Scarf-pattern:Checked'
)
, CTE2(tag_prefix) AS(
select LEFT(tag, CHARINDEX('-', tag, CHARINDEX('-', tag) + 1) - 1) from CTE1
)
select tag_prefix, COUNT(*) from CTE2 group by tag_prefix
This will give you results of...
accessories-Scarf 4
bags-Tote 4
coats_and_jackets-Woven_Jacket 4
dresses-Pinafore 4
shoes-Ankle_Boot 4
shoes-Ballet_shoes 4
... which gives you the tag prefix and number of parts used. From there you can count the individual rows or sum the number of parts or whatever else you need...
I've just realised that my solution is completely pointless given that I missed the 'mysql' tag ;) but I'll post it up here anyway. Hopefully it can give you a pointer on how to proceed.
WITH CTE1(ID, meta_key, meta_value) AS(
select 1, 'tags', 'coats_and_jackets-Wo...' union all
select 2, 'item_desc', 'Fashion editor'
)
, TagsCTE AS(
select t.ID, x.Item as tag_and_value
from CTE1 t
cross apply dbo.fn_SplitString(t.meta_value, ';') x
where meta_key = 'tags' and LEN(x.Item) > 0
)
select ID, COUNT(parts_count) from (
select ID, COUNT(*) as parts_count
from TagsCTE
group by ID, LEFT(tag_and_value, CHARINDEX('-', tag_and_value, CHARINDEX('-', tag_and_value) + 1) - 1)
) a group by ID
This gives results of:
1 6
Good luck.

Counting comma separated values in TSQL

SCHEMA / DATA for TABLE :
SubscriberId NewsletterIdCsv
------------ ---------------
11 52,52,,52
We have this denormalized data, where I need to count the number of comma separated values, for which I am doing this :
SELECT SUM(len(newsletteridcsv) - len(replace(rtrim(ltrim(newsletteridcsv)), ',','')) +1) as SubscribersSubscribedtoNewsletterCount
FROM TABLE
WHERE subscriberid = 11
Result :
SubscribersSubscribedtoNewsletterCount
--------------------------------------
4
The problem is some of our data has blanks / spaces in between the comma separated values, if I run the above query the expected result should be 3 (as one of the value is blank space), how do I check in my query to exclude the blank spaces?
EDIT :
DATA :
SubscriberId NewsletterIdCsv
------------ ---------------
11 52,52,,52
12 22,23
I need to get an accumulative SUM instead of just each rows sum, so for this above data I need to have just a final count i.e. 5 in this case, excluding the blank space.
Here's one solution, although their may be a more efficient way:
SELECT A.[SubscriberId],
SUM(CASE WHEN Split.a.value('.', 'VARCHAR(100)') = '' THEN 0 ELSE 1 END) cnt
FROM
(
SELECT [SubscriberId],
CAST ('<M>' + REPLACE(NewsletterIdCsv, ',', '</M><M>') + '</M>' AS XML) AS String
FROM YourTable
) AS A
CROSS APPLY String.nodes ('/M') AS Split(a)
GROUP BY A.[SubscriberId]
And the SQL Fiddle.
Basically it converts your NewsletterIdCsv field to XML and then uses CROSS APPLY to split the data. Finally, using CASE to see if it's blank and SUM the non-blank values. Alternatively, you could probably build a UDF to do something similar.