MySQL: Selecting rows ordered by word count - mysql

How can I order my query by word count? Is it possible?
I have some rows in table, with text fields. I want to order them by word count of these text fields.
Second problem is, that I need to select only these rows, which have for example minimum 10 words, or maximum 20.

Well, this will not perform very well since string calculations need to be performed for all rows:
You can count number of words in a MySQL column like so: SELECT SUM( LENGTH(name) - LENGTH(REPLACE(name, ' ', ''))+1) FROM table (provided that words are defined as "whatever-delimited-by-a-whitespace")
Now, add this to your query:
SELECT
<fields>
FROM
<table>
WHERE
<condition>
ORDER BY SUM(LENGTH(<fieldWithWords>) - LENGTH(REPLACE(<fieldWithWords>, ' ', '')) + 1)
Or, add it to the condition:
SELECT
<fields>
FROM
<table>
WHERE
SUM(LENGTH(<fieldWithWords>) - LENGTH(REPLACE(<fieldWithWords>, ' ', '')) + 1) BETWEEN 10 AND 20
ORDER BY <something>

Maybe something like this:
SELECT Field1, SUM( LENGTH(Field2) - LENGTH(REPLACE(Field2, ' ', ''))+1)
AS cnt
FROM tablename
GROUP BY Field1
ORDER BY cnt
Field2 is the string field in which you'd like to count words.

Related

How to find the biggest word (in terms of LENGTH) in a MySQL column?

I have a table with a few mudiumtext datatype columns. Now I need to find the longest word in each row, in terms of word LENGTH. Like below.
The text column is for storing product description (like a paragraph in general understanding). So the column has multiple words. And I need to find what is the longest word in the column.
I tried with union all, but word count in rows is dynamic.
select sum(len) from (
SELECT LENGTH(description) - LENGTH(REPLACE(description, ' ', '')) + 1 as len
FROM test.city
union all
SELECT LENGTH(name) - LENGTH(REPLACE(name, ' ', '')) + 1 as len
FROM test.city
) as tablen;
I would do it by sorting the list according to the length of the words and then pick up the first element:
SELECT * FROM TEST ORDER BY CHAR_LENGTH(name) DESC LIMIT 1;
And for calculating the length of the row you can use something like
SELECT *, CHAR_LENGTH(name) + CHAR_LENGTH(description) as len FROM TEST ORDER BY len DESC LIMIT 1;
Hope that will solve your problem if I correctly understood it

How to get words from a varchar column and their frequency of occurrence in mysql

I have a varchar(255) column with FULLTEXT index. I need a query to get the most frequent words in the entire column as
Word Frequency
key1 4533
key2 4332
key3 2932
Note 1: I would prefer to skip common words such as prepositions, but it is not critical as I can filter them later. Just mentioned if it can speed up the query.
Note 2: It is a table with over a million rows. It is not a regular query but should be practically fast.
If you even give a hint how the query should look like, it will be a great help.
This is not really something that is easy to do in MySQL. The full text index is not available for querying. One thing you can do is extract words. This is a bit painful. The following assumes that words are separated by a single space and gets the frequencies of the first three words:
select substring_index(substring_index(t.words, ' ', n.n), ' ', -1) as word, count(*)
from t cross join
(select 1 as n union all select 2 union all select 3
) n
on n.n <= length(t.words) - length(replace(t.words, ' ', '') + 1
group by substring_index(substring_index(t.words, ' ', n.n), ' ', -1)
order by count(*) desc;

Query to count the distinct words of all values in a column

I have a mysql table "post" :
id Post
-----------------------------
1 Post Testing
2 Post Checking
3 My First Post
4 My first Post Check
I need to count the number of distinct words in all the values for the Post column.
Is there any way to get the following results using a single query?
post count
------------------
Post 4
Testing 1
checking 1
My 2
first 2
check 1
Not in an easy way. If you know the maximum number of words, then you can do something like this:
select substring_index(substring_index(p.post, ' ', n.n), ' ', -1) as word,
count(*)
from post p join
(select 1 as n union all select 2 union all select 3 union all select 4
) n
on length(p.post) - length(replace(p.post, ' ', '')) < n.n
group by word;
Note that this only works if the words are separated by single spaces. If you have a separate dictionary of all possibly words, you can also use that, something like:
select d.word, count(p.id)
from dictionary d left join
posts p
on concat(' ', p.post, ' ') like concat(' %', d.word, ' %')
group by d.word
You can use a FULLTEXT index.
First add a FULLTEXT index to your column like:
CREATE FULLTEXT INDEX ft_post
ON post(Post);
Then flush the index to disk using optimize table:
SET GLOBAL innodb_optimize_fulltext_only=ON;
OPTIMIZE TABLE post;
SET GLOBAL innodb_optimize_fulltext_only=OFF;
Set the aux table:
SET GLOBAL innodb_ft_aux_table = '{yourDb}/post';
And now you may simply select for word and word counts like:
SELECT word, doc_count FROM INFORMATION_SCHEMA.INNODB_FT_INDEX_TABLE;

mysql query where second words starts with a letter

I have a table with:
Name Surname and Outputname ( = Name Surname )
Name and Surname should always have a value, but some time are empty.
I need to select from table by surname and i can't use surname field.
I can't even change field values with a script because i get this table from an outside source and can be change any time.
Database is MySQL 4.x
Can i select by second word in Outputname starting with some letter?
something like
SELECT Outputname FROM USERS
WHERE seconword(Outputname) LIKE 'A%' SORT BY seconword(Outputname) ASC
try this
SELECT
SUBSTRING_INDEX(Outputname, ' ', -1) as SecondWord
FROM USERS
WHERE SUBSTRING_INDEX(Outputname, ' ', -1) LIKE 'A%'
ORDER BY SecondWord ASC
demo
One possible approach:
SELECT Surname
FROM (
SELECT SUBSTRING_INDEX(Outputname, ' ', -1)
AS Surname
FROM Users) AS S
WHERE Surname LIKE 'A%'
ORDER BY Surname;
SQL Fiddle. This method is based on assumption that Outputname's format is always 'FirstName LastName' (i.e., ' ' symbol is used as a delimiter, and used only once each time).
I didn't understand your question at all. So you only have access to Outputname (composed by two words name+surname) and you have to sort it by the second word right?
Try something like (it worked for me):
SELECT
Outputname,
SUBSTRING_INDEX(Outputname, ' ', -1) as SecondWord
FROM USERS
ORDER BY SecondWord ASC
Clause SUBSTRING_INDEX(Outputname, ' ', -1) as SecondWord returns the last word placed after a space. So If you have Outputname = 'Maria Callas' it returns 'Callas', if you have Outputname = 'Sophia Cecilia Kalos' it returns 'Kalos'.

SQL: List unique substring from fields in table

I'm running a query to retrieve the first word from a string in a colum like so..
SELECT SUBSTRING_INDEX( `field` , ' ', 1 ) AS `field_first_word`FROM `your_table`
But this allows duplicates, what do I need to add to the statement to get unique list of words out?
DISTINCT
For example:
SELECT DISTINCT SUBSTRING_INDEX( `field` , ' ', 1 ) AS `field_first_word` FROM `your_table`
Note: There are performance implications for DISTINCT. However, in your case, there is likely limited pure MySQL alternatives.
To remove duplicates in a SELECT statement, change to SELECT DISTINCT column. In this case:
SELECT DISTINCT SUBSTRING_INDEX( `field` , ' ', 1 ) AS `field_first_word`FROM `your_table`