Counting comma separated values in TSQL - csv

SCHEMA / DATA for TABLE :
SubscriberId NewsletterIdCsv
------------ ---------------
11 52,52,,52
We have this denormalized data, where I need to count the number of comma separated values, for which I am doing this :
SELECT SUM(len(newsletteridcsv) - len(replace(rtrim(ltrim(newsletteridcsv)), ',','')) +1) as SubscribersSubscribedtoNewsletterCount
FROM TABLE
WHERE subscriberid = 11
Result :
SubscribersSubscribedtoNewsletterCount
--------------------------------------
4
The problem is some of our data has blanks / spaces in between the comma separated values, if I run the above query the expected result should be 3 (as one of the value is blank space), how do I check in my query to exclude the blank spaces?
EDIT :
DATA :
SubscriberId NewsletterIdCsv
------------ ---------------
11 52,52,,52
12 22,23
I need to get an accumulative SUM instead of just each rows sum, so for this above data I need to have just a final count i.e. 5 in this case, excluding the blank space.

Here's one solution, although their may be a more efficient way:
SELECT A.[SubscriberId],
SUM(CASE WHEN Split.a.value('.', 'VARCHAR(100)') = '' THEN 0 ELSE 1 END) cnt
FROM
(
SELECT [SubscriberId],
CAST ('<M>' + REPLACE(NewsletterIdCsv, ',', '</M><M>') + '</M>' AS XML) AS String
FROM YourTable
) AS A
CROSS APPLY String.nodes ('/M') AS Split(a)
GROUP BY A.[SubscriberId]
And the SQL Fiddle.
Basically it converts your NewsletterIdCsv field to XML and then uses CROSS APPLY to split the data. Finally, using CASE to see if it's blank and SUM the non-blank values. Alternatively, you could probably build a UDF to do something similar.

Related

MySQL group by comma seperated list unique [duplicate]

This question already has answers here:
Is storing a delimited list in a database column really that bad?
(10 answers)
Closed 2 years ago.
The column textfield has comma-seperated list values
ID | textfield
1 | english,russian,german
2 | german,french
3 | english
4 | null
I'm attempting to count the amount of languages in textfield. The default language is "English", so if null then "English". The correct amount of languages is 4(english,russian,german,french).
Here is my query to attempt doing this:
SELECT SUM((length(`textfield`) - length(replace(`textfield`, ',', '')) + 1)) as my
FROM yourtable;
The result i get is 6, i don't know how to group the languages.
Here is fiddle
http://sqlfiddle.com/#!9/0e532/1
The desired result is 4. How do i solve?
Identifying the source of error
What your query is doing is counting how many languages in each row, and adding them all together. Your query does not take into account duplicates. Since English shows up twice in the table, it is counted twice (and German, too), hence in your example six. Also, another issue is that your current code considers null as what null truly means, the absence of a value.
For example, if your database was
ID | textfield
---|----------
1 | null
you would also be arriving at incorrect results (more on this below).
Solution
This gets you a comma separated result of the languages, no duplicates.
SELECT
GROUP_CONCAT(DISTINCT SUBSTRING_INDEX(SUBSTRING_INDEX(textfield, ',', n.digit+1), ',', -1)) textfield
FROM
yourtable
INNER JOIN
(SELECT 0 digit UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6) n
ON LENGTH(REPLACE(textfield, ',', '')) <= LENGTH(textfield)-n.digit;
This query can serve as a subquery for what you were attempting to do in the question prompt. In other words, instead of the length('textfield') ... you would provide the resulting column name from this query
Null as in English
This logic should not be implemented at the database level, IMHO. If you want to go ahead and consider null entries as English, that is fine. The downside is the example I provided for you before. When you have a query that solves for the total languages in the database, if English wasn't an explicitly stated language and instead just a null value, then the query wouldn't 'count' English (it's null). But you can't just add 1 every time you find the amount of languages because English might already be explicit.
Recommendations:
Avoid comma separated lists in databases by normalizing your data
No value makes sense for a null field
For version 5.6 (like in the fiddle)
SELECT COUNT(DISTINCT SUBSTRING_INDEX(SUBSTRING_INDEX(languages.textfield, ',', numbers.num), ',', -1)) languages_count
FROM (SELECT COALESCE(textfield, 'english') textfield
FROM yourtable) languages
JOIN (SELECT 1 num UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5) numbers
ON numbers.num <= LENGTH(languages.textfield) - LENGTH(REPLACE(languages.textfield, ',', '')) + 1;
fiddle
For version 8.x (as claimed in a comment)
SELECT COUNT(DISTINCT jsontable.value) languages_count
FROM yourtable
CROSS JOIN JSON_TABLE( CONCAT('["', REPLACE(COALESCE(textfield, 'english'), ',', '","'), '"]'),
"$[*]" COLUMNS( value VARCHAR(254) PATH "$" )
) AS jsontable;
fiddle

(SQL) Get comma separated specific value total count

I need to get the total number of occurrences by separate server ID like below :
-----------------------------
logID serversID
-------------------------------
1 50,51,51,50
2 51,52
3 50,50
I want a result like this:
ServerID Count
------------ ---------------
50 4
51 3
52 1
Thanks you for your help.
Fix your data model! A string is the wrong way to store multiple values. A string is the wrong way to store numbers. The correct way to represent this data is to use a second table, with one row per logid and serverid.
If you are stuck with this data model and you don't have a reference table for servers, you can split the values . . . painfully:
select substring_index(substring_index(t.serversid, ',', n.n), ',', -1) as server, count(*)
from (select 1 as n union all
select 2 union all
select 3 union all
. . . -- as many as the biggest list
) n join
t
on t.servers like concat(repeat('%,', n.n - 1), '%')
group by server;
Here is a db<>fiddle.

convert all JSON columns into new table

I currently have a table structured like:
customer_id name phoneNumbers
1 Adam [{'type':'home','number':'687-5309'} , {'type':'cell','number':'123-4567'}]
2 Bill [{'type':'home','number':'987-6543'}]
With the phoneNumbers column set as a JSON column type.
For simplicity sake though I am wanting to covert all the JSON phone numbers into a new separate table.
Something like:
phone_id customer_id type number
1 1 home 687-5309
2 1 cell 123-4567
3 2 home 987-6543
It seems like it should be do-able with OPENJSON but so far I haven't had any luck in figuring out how to declare it correctly. Any help is appreciated.
USE recursive CTE with 1 and recurse upto json_length.
SELECT c.*, JSON_LENGTH(c.phoneNumbers) as json_length
from customers c;
then use concat to pass that element_id in Extract Query:
(json_unquote(JSON_EXTRACT(phoneNumbers, CONCAT('$.type.',1))), json_unquote(JSON_EXTRACT(phoneNumbers, CONCAT('$.number.',1))))
(json_unquote(JSON_EXTRACT(phoneNumbers, CONCAT('$.type.',2))), json_unquote(JSON_EXTRACT(phoneNumbers, CONCAT('$.number.',1))))
-
-
-
(json_unquote(JSON_EXTRACT(phoneNumbers, CONCAT('$.type.',json_length))), json_unquote(JSON_EXTRACT(phoneNumbers, CONCAT('$.number.',json_length))))
You can do something like this:
SELECT id,
name,
JSON_UNQUOTE(JSON_EXTRACT(phone, CONCAT("$[", seq.i, "]", ".", "number"))) AS NUMBER,
JSON_UNQUOTE(JSON_EXTRACT(phone, CONCAT("$[", seq.i, "]", ".", "type"))) AS TYPE
FROM customer, (SELECT 0 AS I UNION ALL SELECT 1) AS seq
WHERE seq.i < json_length(phone)
The trick is (SELECT 0 as i union all SELECT 1), depends on your JSON array's length you may need to add more index. You can find out the max length by:
SELECT MAX(JSON_LENGTH(phone)) FROM customer;
Please change CTE defination syntax according to MySQL\Maria versions.
WITH RECURSIVE cte_recurse_json AS
(
SELECT customer_id, phone_numbers, 0 as recurse, JSON_LENGTH(c.phoneNumbers) as json_length,
json_unquote(JSON_EXTRACT(phoneNumbers, CONCAT('$[',0,'].type'))) as type,
json_unquote(JSON_EXTRACT(phoneNumbers, CONCAT('$[',0,'].number'))) as number
FROM table
UNION ALL
SELECT t.customer_id, t.phone_numbers, ct.recurse + 1 as recurse, t.json_length,
json_unquote(JSON_EXTRACT(ct.phoneNumbers, CONCAT('$[',ct.recurse,'].type'))) as type,
json_unquote(JSON_EXTRACT(ct.phoneNumbers, CONCAT('$[',ct.recurse,'].number'))) as number
FROM TABLE t
INNER JOIN cte_recurse_json ct ON t.customer_id = ct.customer_id
WHERE ct.recurse < json_length
)
SELECT customer_id, type, number FROM cte_recurse_json;

Populate column with number of substrings in another column

I have two tables "A" and "B". Table "A" has two columns "Body" and "Number." The column "Number" is empty, the purpose is to populate it.
Table A: Body / Number
ABABCDEF /
IJKLMNOP /
QRSTUVWKYZ /
Table "B" only has one column:
Table B: Values
AB
CD
QR
Here is what I am looking for as a result:
ABABCDEF / 3
IJKLMNOP / 0
QRSTUVWKYZ / 1
In other words, I want to create a query that looks up, for each string in the "Body" column, how many times the substrings in the "Values" column appear.
How would you advise me to do that?
Here's the finished query; explanation will follow:
SELECT
Body,
SUM(
CASE WHEN Value IS NULL THEN 0
ELSE (LENGTH(Body) - LENGTH(REPLACE(Body, Value, ''))) / LENGTH(Value)
END
) AS Val
FROM (
SELECT TableA.Body, TableB.Value
FROM TableA
LEFT JOIN TableB ON INSTR(TableA.Body, TableB.Value) > 0
) CharMatch
GROUP BY Body
There's a SQL Fiddle here.
Now for the explanation...
The inner query matches TableA strings with TableB substrings:
SELECT TableA.Body, TableB.Value
FROM TableA
LEFT JOIN TableB ON INSTR(TableA.Body, TableB.Value) > 0
Its results are:
BODY VALUE
-------------------- -----
ABABCDEF AB
ABABCDEF CD
IJKLMNOP
QRSTUVWKYZ QR
If you just count these you'll only get a value of 2 for the ABABCDEF string because it just looks for the existence of the substrings and doesn't take into consideration that AB occurs twice.
MySQL doesn't appear to have an OCCURS type function, so to count the occurrences I used the workaround of comparing the length of the string to its length with the target string removed, divided by the length of the target string. Here's an explanation:
REPLACE('ABABCDEF', 'AB', '') ==> 'CDEF'
LENGTH('ABABCDEF') ==> 8
LENGTH('CDEF') ==> 4
So the length of the string with all AB occurrences removed is 8 - 4, or 4. Divide the 4 by 2 (LENGTH('AB')) to get the number of AB occurrences: 2
String IJKLMNOP will mess this up. It doesn't have any of the target values so there's a divide by zero risk. The CASE inside the SUM protects against this.
You want an update query:
update A
set cnt = (select sum((length(a.body) - length(replace(a.body, b.value, '')) / length(b.value))
from b
)
This uses a little trick for counting the number of occurrence of b.value in a given string. It replaces each occurrence with an empty string and counts the difference in length of the strings. This is divided by the length of the string being replaced.
If you just wanted the number of matches (so the first value would be "2" instead of "3"):
update A
set cnt = (select count(*)
from b
where a.body like concat('%', b.value, '%')
)

Select data which have same letters

I'm having trouble with this SQL:
$sql = mysql_query("SELECT $menucompare ,
(COUNT($menucompare ) * 100 / (SELECT COUNT( $menucompare )
FROM data WHERE $ww = $button )) AS percentday FROM data WHERE $ww >0 ");
$menucompare is table fields names what ever field is selected and contains data bellow
$button is the week number selected (lets say week '6')
$ww table field name with row who have the number of week '6'
For example, I have data in $menucompare like that:
123456bool
521478bool
122555heel
147788itoo
and I want to select those, who have same word in the last of the data and make percentage.
The output should be like that:
bool -- 50% (2 entries)
heel -- 25% (1 entry)
itoo -- 25% (1 entry)
Any clearness to my SQL will be very appreciated.
I didn't find anything like that around.
Well, keeping data in such format probably not the best way, if possible, split the field into 2 separate ones.
First, you need to extract the string part from the end of the field.
if the length of the string / numeric parts is fixed, then it's quite easy;
if not, you should use regular expressions which, unfortunately, are not there by default with MySQL. There's a solution, check this question: How to do a regular expression replace in MySQL?
I'll assume, that numeric part is fixed:
SELECT s.str, CAST(count(s.str) AS decimal) / t.cnt * 100 AS pct
FROM (SELECT substr(entry, 7) AS str FROM data) AS s
JOIN (SELECT count(*) AS cnt FROM data) AS t ON 1=1
GROUP BY s.str, t.cnt;
If you'll have regexp_replace function, then substr(entry, 7) should be replaced to regexp_replace(entry, '^[0-9]*', '') to achieve the required result.
Variant with substr can be tested here.
When sorting out problems like this, I would do it in two steps:
Sort out the SQL independently of the presentation language (PHP?).
Sort out the parameterization of the query and the presentation of the results after you know you've got the correct query.
Since this question is tagged 'SQL', I'm only going to address the first question.
The first step is to unclutter the query:
SELECT menucompare,
(COUNT(menucompare) * 100 / (SELECT COUNT(menucompare) FROM data WHERE ww = 6))
AS percentday
FROM data
WHERE ww > 0;
This removes the $ signs from most of the variable bits, and substitutes 6 for the button value. That makes it a bit easier to understand.
Your desired output seems to need the last four characters of the string held in menucompare for grouping and counting purposes.
The data to be aggregated would be selected by:
SELECT SUBSTR(MenuCompare, -4) AS Last4
FROM Data
WHERE ww = 6
The divisor in the percentage is the count of such rows, but the sub-stringing isn't necessary to count them, so we can write:
SELECT COUNT(*) FROM Data WHERE ww = 6
This is exactly what you have anyway.
The divdend in the percentage will be the group count of each substring.
SELECT Last4, COUNT(Last4) * 100.0 / (SELECT COUNT(*) FROM Data WHERE ww = 6)
FROM (SELECT SUBSTR(MenuCompare, -4) AS Last4
FROM Data
WHERE ww = 6
) AS Week6
GROUP BY Last4
ORDER BY Last4;
When you've demonstrated that this works, you can re-parameterize the query and deal with the presentation of the results.