Mysql + count all words in a Column - mysql

I have 2 columns in a table and I would like to roughly report on the total number of words.
Is it possible to run a MySQL query and find out the total number of words down a column.
It would basically be any text separated by a space or multiple space.
Doesn't need to be 100% accurate as its just a general guide.
Is this possible?

Try something like this:
SELECT COUNT(LENGTH(column) - LENGTH(REPLACE(column, ' ', '')) + 1)
FROM table
This will count the number of caracters in your column, and substracts the number of caracters in your column removing all the spaces. Hereby you know how many spaces you have in your row and hereby know how many words there are (roughly because you can also type in a double space, this wil count as two words but you say you want it roughly so this should suffice).

Count simply gives you the number of found rows. You need to use SUM instead.
SELECT SUM(LENGTH(column) - LENGTH(REPLACE(column, ' ', '')) + 1) FROM table

A less rough count:
SELECT LENGTH(column) - LENGTH(REPLACE(column, SPACE(1), ''))
FROM
( SELECT CONCAT(TRIM(column), SPACE(1)) AS column
FROM
( SELECT REPLACE(column, SPACE(2), SPACE(1)) AS column
FROM
( SELECT REPLACE(column, SPACE(3), SPACE(1)) AS column
FROM
( SELECT REPLACE(column, SPACE(5), SPACE(1)) AS column
FROM
( SELECT REPLACE(column, SPACE(9), SPACE(1)) AS column
FROM
( SELECT REPLACE(column, SPACE(17), SPACE(1)) AS column
FROM
( SELECT REPLACE(column, SPACE(33), SPACE(1)) AS column
FROM tableX
) AS x
) AS x
) AS x
) AS x
) AS x
) AS x
) AS x

I stumbled upon this post while I was looking for an answer myself and truthfully I've tested all of the answers here and the closest one was #fikre's answer. However, I have concern over data that have leading spaces and/or extra spaces between the words (trailing spaces doesn't seem to have effect to fikre's query during my testing). So, I'm looking for a way to identify any spaces in between words and remove them. While I found a few answers using advanced function (which is beyond my skill set), I did find a very simple way to do it.
tl;dr > #fikre's answer is the only one working for me but I did a minor tweak to ensure that I'll get the most accurate word count.
Query 1 -- This will return 5 "Word Count"
SELECT SUM(LENGTH(input) - LENGTH(REPLACE(input, ' ', '')) + 1) AS "Word Count" FROM
(SELECT TRIM(REPLACE(REPLACE(REPLACE(input,' ','<>'),'><',''),'<>',' ')) AS input
FROM (SELECT ' too late to the party ' AS input) i) r;
Query 2 -- This will return 13 "Word Count"
SELECT SUM(LENGTH(input) - LENGTH(REPLACE(input, ' ', '')) + 1) AS "Word Count"
FROM (SELECT ' too late to the party ' AS input) i;
-- breakdown ' too late to the party '
1 leading space= 1 word count
2 spaces after the first space from the word 'too'= 2 word count
1 space after the first space from the word 'late'= 1 word count
4 spaces after the first space from the word 'the'= 4 word count
trailing space(s) wasn't counted at all.
Total spaces > 1+2+1+4=8 + 5 word count = 13
So, basically if the data row contains even a million spaces in between (disclaimer: an assumption. I've only tested 336,896 spaces), Query 1 will still return Word count=5.
Note: The mid part REPLACE(REPLACE(REPLACE(input,' ','<>'),'><',''),'<>',' ') I took from this answer https://stackoverflow.com/a/55476224/10910692

Related

Find all numbers that are present more then 3 times in a column of CSVs

I basically want this: if certain number is present >= 3 times then do some action ...
My table's column is this:
As you can see here that number 38 is present >= 3 times in absent_sids column, so I want to have some actions on him like ban or something else. But I don't know what sql query should I write because;
1. I am quite new to php/mysql
2. The column has comma separated numbers, and its quite difficult for me to search in this column through mysql query and bring the absent_sid that is >= 3 times in a given period of time/date.
Plz help
This is quite long but working. Steps: 1) convert the array into rows using CHAR_LENGTH and REPLACE function. 2) Use GROUP BY and HAVING COUNT to search for numbers that exists 3 or more times
See demo here: http://sqlfiddle.com/#!9/39afc0/2
SELECT absent_sids
from (
SELECT
tablename.aid,
SUBSTRING_INDEX(
SUBSTRING_INDEX(
tablename.absent_sids, ',', numbers.n), ',', -1) as absent_sids
FROM
(select ORDINAL_POSITION as n
from INFORMATION_SCHEMA.COLUMNS
where table_name='COLUMNS'
and ORDINAL_POSITION <= (
select round(max(length(absent_sids))/2)
from tablename)) numbers
INNER JOIN tablename
ON CHAR_LENGTH(tablename.absent_sids)
-CHAR_LENGTH(REPLACE(tablename.absent_sids, ',', ''))
>= numbers.n-1) tab
GROUP BY absent_sids
HAVING COUNT(*) >= 3

MySQL Select Column With X Words

I am trying to come up with a query that will select a row if a column has at least X words in it. For example:
SELECT * FROM TABLE IF COLUMN <has >= 3 words>
Coming up empty though. Any ideas?
3 words need at least 2 spaces. You can count the spaces in your column
select * from your_table
where length(your_column) - length(replace(your_column, ' ', '')) > 1

Count the frequency of each word

I've been trolling the internet and realize that MySQL is not the best way to get at this but I'm asking anyway. What query, function or stored procedure has anyone seen or used that will get the frequency of a word across a text column.
ID|comment
----------------------
Ex. 1|I love this burger
2|I hate this burger
word | count
-------|-------
burger | 2
I | 2
this | 2
love | 1
hate | 1
This solution seems to do the job (stolen almost verbatim from this page). It requires an auxiliary table, filled with sequential numbers from 1 to at least the expected number of distinct words. This is quite important to check that the auxiliary table is large enough, or results will be wrong (showing no error).
SELECT
SUBSTRING_INDEX(SUBSTRING_INDEX(maintable.comment, ' ', auxiliary.id), ' ', -1) AS word,
COUNT(*) AS frequency
FROM maintable
JOIN auxiliary ON
LENGTH(comment)>0 AND SUBSTRING_INDEX(SUBSTRING_INDEX(comment, ' ', auxiliary.id), ' ', -1)
<> SUBSTRING_INDEX(SUBSTRING_INDEX(comment, ' ', auxiliary.id-1), ' ', -1)
GROUP BY word
HAVING word <> ' '
ORDER BY frequency DESC;
SQL Fiddle
This approach is as inefficient as one can be, because it cannot use any index.
As an alterative, I would use a statistics table that I would keep up-to-date with triggers. Perhaps initialise the stats table with the above.
Something like this should work. Just make sure you don't pass in a 0 length string.
SET #searchString = 'burger';
SELECT
ID,
LENGTH(comment) - LENGTH(REPLACE(comment, #searchString, '')) / LENGTH(#searchString) AS count
FROM MyTable;

finding a number in space separated list with REGEXP

I am writing a SQL query to select row, where a field with space separated numbers contains a single number, in this example the 1.
Example fields:
"1 2 3 4 5 11 12 21" - match, contains number one
"2 3 4 6 11 101" - no match, does not contain number one
The best query so far is:
$sql = "SELECT * from " . $table . " WHERE brands REGEXP '[/^1$/]' ORDER BY name ASC;";
Problem is that this REGEXP also finds 11 a match
I read many suggestions on other post, for instance [\d]{1}, but the result always is the same.
Is it possible to accomplish what I want, and how?
You don't need regex: You can use LIKE if you add a space to the front and back of the column:
SELECT * from $table
WHERE CONCAT(' ', brands, ' ') LIKE '% 1 %'
ORDER BY name
Try:
WHERE brands REGEXP '[[:<:]]1[[:>:]]'
[[:<:]] and [[:>:]] match word boundaries before and after a word.
Why not FIND_IN_SET() + REPLACE() ?
SELECT
*
FROM
`table`
WHERE
FIND_IN_SET(1, REPLACE(`brands`, ' ', ','))
ORDER BY
`name` ASC;

Mysql - count values from comma-separated field [duplicate]

This question already has answers here:
How to count items in comma separated list MySQL
(6 answers)
Closed last year.
I have to do some statics from read-only db where value are stored in a weird form
example:
I have 2 rows like
ID text field
1 1001,1003,1004
2 1003, 1005
I need to be able to count that this is "5".
I don't have write access so don't know how to read and count right away without creation a function or something like that.
Clever solution here on SO: How to count items in comma separated list MySQL
LENGTH(textfield) - LENGTH(REPLACE(textfield, ',', '')) + 1
EDIT
Yes you can select it as an additional column: and correcting with the CHAR_LENGTH from #HamletHakobyan's answer:
SELECT
ID,
textfield,
(CHAR_LENGTH(textfield) - CHAR_LENGTH(REPLACE(textfield, ',', '')) + 1) as total
FROM table
SELECT SUM(LENGTH(textfield) - LENGTH(REPLACE(textfield, ',', '')) + 1)
FROM tablename
There is a small but significant omission in all answers. All will work only if database character set is utf8 or so, i.e. where symbol , gets one byte. The fact that the LENGTH function returns number of bytes instead of chars. Right answer is to use CHAR_LENGTH which returns number of characters.
SELECT
SUM(CHAR_LENGTH(textfield) - CHAR_LENGTH(REPLACE(textfield, ',', '')) + 1) cnt
FROM yourTable
You could use something like this:
select sum(total) TotalWords
from
(
select length(`text field`) - length(replace(`text field`, ',', '')) + 1 total
from yourtable
) x
See SQL Fiddle with Demo
SELECT (LENGTH(column_name) - LENGTH(REPLACE(column_name, ',', '')) + 1) as value_count
FROM table_name
Here LENGTH(column_name) - LENGTH(REPLACE(column_name, ',', '')) gives the number of commas in the value of each column. And +1 with this value provides the number of values separated by comma.
All is wrong and doesn't works for me.
The only one that work is this bellow
SELECT (length(`textfield`) - length(replace(`textfield`, ',', '')) + 1) as my
FROM yourtable;
This is my fiddle
http://sqlfiddle.com/#!9/d5a8e1/10
If someone looking for a solution to return 0 for empty fields.
IF(LENGTH(column_name) > 0, LENGTH(column_name) - LENGTH(REPLACE(column_name, ',', '')) + 1, 0)