Mysql - count values from comma-separated field [duplicate] - mysql

This question already has answers here:
How to count items in comma separated list MySQL
(6 answers)
Closed last year.
I have to do some statics from read-only db where value are stored in a weird form
example:
I have 2 rows like
ID text field
1 1001,1003,1004
2 1003, 1005
I need to be able to count that this is "5".
I don't have write access so don't know how to read and count right away without creation a function or something like that.

Clever solution here on SO: How to count items in comma separated list MySQL
LENGTH(textfield) - LENGTH(REPLACE(textfield, ',', '')) + 1
EDIT
Yes you can select it as an additional column: and correcting with the CHAR_LENGTH from #HamletHakobyan's answer:
SELECT
ID,
textfield,
(CHAR_LENGTH(textfield) - CHAR_LENGTH(REPLACE(textfield, ',', '')) + 1) as total
FROM table

SELECT SUM(LENGTH(textfield) - LENGTH(REPLACE(textfield, ',', '')) + 1)
FROM tablename

There is a small but significant omission in all answers. All will work only if database character set is utf8 or so, i.e. where symbol , gets one byte. The fact that the LENGTH function returns number of bytes instead of chars. Right answer is to use CHAR_LENGTH which returns number of characters.
SELECT
SUM(CHAR_LENGTH(textfield) - CHAR_LENGTH(REPLACE(textfield, ',', '')) + 1) cnt
FROM yourTable

You could use something like this:
select sum(total) TotalWords
from
(
select length(`text field`) - length(replace(`text field`, ',', '')) + 1 total
from yourtable
) x
See SQL Fiddle with Demo

SELECT (LENGTH(column_name) - LENGTH(REPLACE(column_name, ',', '')) + 1) as value_count
FROM table_name
Here LENGTH(column_name) - LENGTH(REPLACE(column_name, ',', '')) gives the number of commas in the value of each column. And +1 with this value provides the number of values separated by comma.

All is wrong and doesn't works for me.
The only one that work is this bellow
SELECT (length(`textfield`) - length(replace(`textfield`, ',', '')) + 1) as my
FROM yourtable;
This is my fiddle
http://sqlfiddle.com/#!9/d5a8e1/10

If someone looking for a solution to return 0 for empty fields.
IF(LENGTH(column_name) > 0, LENGTH(column_name) - LENGTH(REPLACE(column_name, ',', '')) + 1, 0)

Related

How to natural sort “X-Y” string data, first by X and then by Y?

Given this data:
W18-40461
W19-1040
W20-4617
W20-100
I've tried several of the common natural sorting methods for mysql, but they won't sort these in a natural descending way, like:
W20-4617
W20-100
W19-1040
W18-40461
For example:
select theID
from Table
where theID
order by lpad(theID, 9, 0) desc
Assuming the parts on either side of the - are limited to 2 digits and 5 digits respectively, you can extract the two numeric values using SUBSTR (and LOCATE to find the - between the two numbers) and then LPAD to pad each of those values out to 2 and 5 digits to allow them to be sorted numerically:
SELECT *
FROM data
ORDER BY LPAD(SUBSTR(id, 2, LOCATE('-', id) - 2), 2, '0') DESC,
LPAD(SUBSTR(id, LOCATE('-', id) + 1), 5, '0') DESC
Output (for my expanded sample):
id
W20-12457
W20-4617
W20-100
W19-1040
W18-40461
W4-2017
Demo on db-fiddle
If the values can have more than 2 or 5 digits respectively, just change the second parameters to LPAD to suit.
I would do this as:
order by substring_index(col, '-', 1) desc,
substring_index(col, '-', -1) + 0 desc
This orders by the part before the hyphen as a string. And it converts the part after the hyphen to a number for sorting purposes.

MySQL group by comma seperated list unique [duplicate]

This question already has answers here:
Is storing a delimited list in a database column really that bad?
(10 answers)
Closed 2 years ago.
The column textfield has comma-seperated list values
ID | textfield
1 | english,russian,german
2 | german,french
3 | english
4 | null
I'm attempting to count the amount of languages in textfield. The default language is "English", so if null then "English". The correct amount of languages is 4(english,russian,german,french).
Here is my query to attempt doing this:
SELECT SUM((length(`textfield`) - length(replace(`textfield`, ',', '')) + 1)) as my
FROM yourtable;
The result i get is 6, i don't know how to group the languages.
Here is fiddle
http://sqlfiddle.com/#!9/0e532/1
The desired result is 4. How do i solve?
Identifying the source of error
What your query is doing is counting how many languages in each row, and adding them all together. Your query does not take into account duplicates. Since English shows up twice in the table, it is counted twice (and German, too), hence in your example six. Also, another issue is that your current code considers null as what null truly means, the absence of a value.
For example, if your database was
ID | textfield
---|----------
1 | null
you would also be arriving at incorrect results (more on this below).
Solution
This gets you a comma separated result of the languages, no duplicates.
SELECT
GROUP_CONCAT(DISTINCT SUBSTRING_INDEX(SUBSTRING_INDEX(textfield, ',', n.digit+1), ',', -1)) textfield
FROM
yourtable
INNER JOIN
(SELECT 0 digit UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6) n
ON LENGTH(REPLACE(textfield, ',', '')) <= LENGTH(textfield)-n.digit;
This query can serve as a subquery for what you were attempting to do in the question prompt. In other words, instead of the length('textfield') ... you would provide the resulting column name from this query
Null as in English
This logic should not be implemented at the database level, IMHO. If you want to go ahead and consider null entries as English, that is fine. The downside is the example I provided for you before. When you have a query that solves for the total languages in the database, if English wasn't an explicitly stated language and instead just a null value, then the query wouldn't 'count' English (it's null). But you can't just add 1 every time you find the amount of languages because English might already be explicit.
Recommendations:
Avoid comma separated lists in databases by normalizing your data
No value makes sense for a null field
For version 5.6 (like in the fiddle)
SELECT COUNT(DISTINCT SUBSTRING_INDEX(SUBSTRING_INDEX(languages.textfield, ',', numbers.num), ',', -1)) languages_count
FROM (SELECT COALESCE(textfield, 'english') textfield
FROM yourtable) languages
JOIN (SELECT 1 num UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5) numbers
ON numbers.num <= LENGTH(languages.textfield) - LENGTH(REPLACE(languages.textfield, ',', '')) + 1;
fiddle
For version 8.x (as claimed in a comment)
SELECT COUNT(DISTINCT jsontable.value) languages_count
FROM yourtable
CROSS JOIN JSON_TABLE( CONCAT('["', REPLACE(COALESCE(textfield, 'english'), ',', '","'), '"]'),
"$[*]" COLUMNS( value VARCHAR(254) PATH "$" )
) AS jsontable;
fiddle

Find all numbers that are present more then 3 times in a column of CSVs

I basically want this: if certain number is present >= 3 times then do some action ...
My table's column is this:
As you can see here that number 38 is present >= 3 times in absent_sids column, so I want to have some actions on him like ban or something else. But I don't know what sql query should I write because;
1. I am quite new to php/mysql
2. The column has comma separated numbers, and its quite difficult for me to search in this column through mysql query and bring the absent_sid that is >= 3 times in a given period of time/date.
Plz help
This is quite long but working. Steps: 1) convert the array into rows using CHAR_LENGTH and REPLACE function. 2) Use GROUP BY and HAVING COUNT to search for numbers that exists 3 or more times
See demo here: http://sqlfiddle.com/#!9/39afc0/2
SELECT absent_sids
from (
SELECT
tablename.aid,
SUBSTRING_INDEX(
SUBSTRING_INDEX(
tablename.absent_sids, ',', numbers.n), ',', -1) as absent_sids
FROM
(select ORDINAL_POSITION as n
from INFORMATION_SCHEMA.COLUMNS
where table_name='COLUMNS'
and ORDINAL_POSITION <= (
select round(max(length(absent_sids))/2)
from tablename)) numbers
INNER JOIN tablename
ON CHAR_LENGTH(tablename.absent_sids)
-CHAR_LENGTH(REPLACE(tablename.absent_sids, ',', ''))
>= numbers.n-1) tab
GROUP BY absent_sids
HAVING COUNT(*) >= 3

How to sum a comma separated string in SQL? [duplicate]

This question already has answers here:
Summing a comma separated column in MySQL 4 (not 5)
(4 answers)
Closed 9 years ago.
id value
1 1,2,3,4
2 2,3,4
So I want to get this result:
id sum
1 10
2 9
Can I do it in SQL(MySQL)?
With great effort, you can do this. Really, though, this is a very, very bad way to store data.
In the spirit that sometimes we have to use data whose format is not under our control:
select id,
(substring_index(value, ',', 1) +
substring_index(substring_index(concat(value, ',0'), ',', 2), ',', -1) +
substring_index(substring_index(concat(value, ',0'), ',', 3), ',', -1) +
substring_index(substring_index(concat(value, ',0'), ',', 4), ',', -1) +
substring_index(substring_index(concat(value, ',0'), ',', 5), ',', -1)
) as thesum
from t;
The nested called to substring_index() fetch the nth value in the string. The concat(value, ',0') is to handle the case where there are fewer values than expressions. In this case, the nested substring_index() will return the last value for any value of n greater than the number of items in the list. Concatenating 0 to the list ensures that this doesn't affect the sum.
The SQL Fiddle is here.
you can do it more dynamically Creating a function. Please follow the following steps
create a function that give the sum of a comma separated value
CREATE FUNCTION GetToalOfCommaSeperatedVal
(
#commaSeperatedVal varchar(100)
)
RETURNS int
AS
BEGIN
declare #sum int
DECLARE #x XML
SELECT #x = CAST('<A>'+ REPLACE(#commaSeperatedVal,',','</A><A>')+ '</A>' AS XML)
SELECT #sum=sum(t.value('.', 'int'))
FROM #x.nodes('/A') AS x(t)
return #sum
END
GO
the do a just select command in the following way
select id,dbo.GetToalOfCommaSeperatedVal(value) from YOUR_TABLE

Mysql + count all words in a Column

I have 2 columns in a table and I would like to roughly report on the total number of words.
Is it possible to run a MySQL query and find out the total number of words down a column.
It would basically be any text separated by a space or multiple space.
Doesn't need to be 100% accurate as its just a general guide.
Is this possible?
Try something like this:
SELECT COUNT(LENGTH(column) - LENGTH(REPLACE(column, ' ', '')) + 1)
FROM table
This will count the number of caracters in your column, and substracts the number of caracters in your column removing all the spaces. Hereby you know how many spaces you have in your row and hereby know how many words there are (roughly because you can also type in a double space, this wil count as two words but you say you want it roughly so this should suffice).
Count simply gives you the number of found rows. You need to use SUM instead.
SELECT SUM(LENGTH(column) - LENGTH(REPLACE(column, ' ', '')) + 1) FROM table
A less rough count:
SELECT LENGTH(column) - LENGTH(REPLACE(column, SPACE(1), ''))
FROM
( SELECT CONCAT(TRIM(column), SPACE(1)) AS column
FROM
( SELECT REPLACE(column, SPACE(2), SPACE(1)) AS column
FROM
( SELECT REPLACE(column, SPACE(3), SPACE(1)) AS column
FROM
( SELECT REPLACE(column, SPACE(5), SPACE(1)) AS column
FROM
( SELECT REPLACE(column, SPACE(9), SPACE(1)) AS column
FROM
( SELECT REPLACE(column, SPACE(17), SPACE(1)) AS column
FROM
( SELECT REPLACE(column, SPACE(33), SPACE(1)) AS column
FROM tableX
) AS x
) AS x
) AS x
) AS x
) AS x
) AS x
) AS x
I stumbled upon this post while I was looking for an answer myself and truthfully I've tested all of the answers here and the closest one was #fikre's answer. However, I have concern over data that have leading spaces and/or extra spaces between the words (trailing spaces doesn't seem to have effect to fikre's query during my testing). So, I'm looking for a way to identify any spaces in between words and remove them. While I found a few answers using advanced function (which is beyond my skill set), I did find a very simple way to do it.
tl;dr > #fikre's answer is the only one working for me but I did a minor tweak to ensure that I'll get the most accurate word count.
Query 1 -- This will return 5 "Word Count"
SELECT SUM(LENGTH(input) - LENGTH(REPLACE(input, ' ', '')) + 1) AS "Word Count" FROM
(SELECT TRIM(REPLACE(REPLACE(REPLACE(input,' ','<>'),'><',''),'<>',' ')) AS input
FROM (SELECT ' too late to the party ' AS input) i) r;
Query 2 -- This will return 13 "Word Count"
SELECT SUM(LENGTH(input) - LENGTH(REPLACE(input, ' ', '')) + 1) AS "Word Count"
FROM (SELECT ' too late to the party ' AS input) i;
-- breakdown ' too late to the party '
1 leading space= 1 word count
2 spaces after the first space from the word 'too'= 2 word count
1 space after the first space from the word 'late'= 1 word count
4 spaces after the first space from the word 'the'= 4 word count
trailing space(s) wasn't counted at all.
Total spaces > 1+2+1+4=8 + 5 word count = 13
So, basically if the data row contains even a million spaces in between (disclaimer: an assumption. I've only tested 336,896 spaces), Query 1 will still return Word count=5.
Note: The mid part REPLACE(REPLACE(REPLACE(input,' ','<>'),'><',''),'<>',' ') I took from this answer https://stackoverflow.com/a/55476224/10910692