How to find most popular word occurrences in MySQL? - mysql

I have a table called results with 5 columns.
I'd like to use the title column to find rows that are say: WHERE title like '%for sale%' and then listing the most popular words in that column. One would be for and another would be sale but I want to see what other words correlate with this.
Sample data:
title
cheap cars for sale
house for sale
cats and dogs for sale
iphones and androids for sale
cheap phones for sale
house furniture for sale
Results (single words):
for 6
sale 6
cheap 2
and 2
house 2
furniture 1
cars 1
etc...

You can extract words with some string manipulation. Assuming you have a numbers table and that words are separated by single spaces:
select substring_index(substring_index(r.title, ' ', n.n), ' ', -1) as word,
count(*)
from results r join
numbers n
on n.n <= length(title) - length(replace(title, ' ', '')) + 1
group by word;
If you don't have a numbers table, you can construct one manually using a subquery:
from results r join
(select 1 as n union all select 2 union all select 3 union all . . .
) n
. . .
The SQL Fiddle (courtesy of #GrzegorzAdamKowalski) is here.

You can use ExtractValue in some interesting way. See SQL fiddle here: http://sqlfiddle.com/#!9/0b0a0/45
We need only one table:
CREATE TABLE text (`title` varchar(29));
INSERT INTO text (`title`)
VALUES
('cheap cars for sale'),
('house for sale'),
('cats and dogs for sale'),
('iphones and androids for sale'),
('cheap phones for sale'),
('house furniture for sale')
;
Now we construct series of selects which extract whole words from text converted to XML. Each select extracts N-th word from the text.
select words.word, count(*) as `count` from
(select ExtractValue(CONCAT('<w>', REPLACE(title, ' ', '</w><w>'), '</w>'), '//w[1]') as word from `text`
union all
select ExtractValue(CONCAT('<w>', REPLACE(title, ' ', '</w><w>'), '</w>'), '//w[2]') from `text`
union all
select ExtractValue(CONCAT('<w>', REPLACE(title, ' ', '</w><w>'), '</w>'), '//w[3]') from `text`
union all
select ExtractValue(CONCAT('<w>', REPLACE(title, ' ', '</w><w>'), '</w>'), '//w[4]') from `text`
union all
select ExtractValue(CONCAT('<w>', REPLACE(title, ' ', '</w><w>'), '</w>'), '//w[5]') from `text`) as words
where length(words.word) > 0
group by words.word
order by `count` desc, words.word asc

This would give you single words (Just if I understand what your single word means.):
select concat(val,' ',cnt) as result from(
select (substring_index(substring_index(t.title, ' ', n.n), ' ', -1)) val,count(*) as cnt
from result t cross join(
select a.n + b.n * 10 + 1 n
from
(select 0 as n union all select 1 union all select 2 union all select 3
union all select 4 union all select 5 union all select 6
union all select 7 union all select 8 union all select 9) a,
(select 0 as n union all select 1 union all select 2 union all select 3
union all select 4 union all select 5 union all select 6
union all select 7 union all select 8 union all select 9) b
order by n
) n
where n.n <= 1 + (length(t.title) - length(replace(t.title, ' ', '')))
group by val
order by cnt desc
) as x
Result should be looks like this :
Result
--------
for 6
sale 6
house 2
and 2
cheap 2
phones 1
iphones 1
dogs 1
furniture 1
cars 1
androids 1
cats 1
But if the single word you need like this :
result
-----------
for 6 sale 6 house 2 and 2 cheap 2 phones 1 iphones 1 dogs 1 furniture 1 cars 1 androids 1 cats 1
Just modify the query above to:
select group_concat(concat(val,' ',cnt) separator ' ') as result from( ...

Update
Idea taken from https://stackoverflow.com/a/17942691/98491
This query works on my machine (MySQL 5.7), however Sqlfiddle reports an error.
The basic idea is that you should either create a table with numbers from 1 to maximum word occurence (like 4) in your field or as I did, use a UNION 1 .. 4 for simplicity.
CREATE TABLE products (
`id` int,
`name` varchar(45)
);
INSERT INTO products
(`id`, `name`)
VALUES
(1, 'for sale'),
(2, 'for me'),
(3, 'for you'),
(4, 'you and me')
;
SELECT name, COUNT(*) as count FROM
(
SELECT
product.id,
SUBSTRING_INDEX(SUBSTRING_INDEX(product.name, ' ', numbers.n), ' ', -1) name
FROM
(
SELECT 1 AS n
UNION SELECT 2
UNION SELECT 3
UNION SELECT 4
) AS numbers
INNER JOIN products product
ON CHAR_LENGTH(product.name)
-CHAR_LENGTH(REPLACE(product.name, ' ', ''))>=numbers.n-1
ORDER BY
id, n
)
AS result
GROUP BY name
ORDER BY count DESC
Result will be
for | 3
you | 2
me | 2
and | 1
sale| 1

SQL is not well suited for this task, While possible there are limitations (the number of words for example)
a quick PHP script to do the same task may be easier to use long term (and likely quicker too)
<?php
$rows = [
"cheap cars for sale",
"house for sale",
"cats and dogs for sale",
"iphones and androids for sale",
"cheap phones for sale",
"house furniture for sale",
];
//rows here should be replaced by the SQL result
$wordTotals = [];
foreach ($rows as $row) {
$words = explode(" ", $row);
foreach ($words as $word) {
if (isset($wordTotals[$word])) {
$wordTotals[$word]++;
continue;
}
$wordTotals[$word] = 1;
}
}
arsort($wordTotals);
foreach($wordTotals as $word => $count) {
echo $word . " " . $count . PHP_EOL;
}
Output
for 6
sale 6
and 2
cheap 2
house 2
phones 1
androids 1
furniture 1
cats 1
cars 1
dogs 1
iphones 1

Here is working SQL Fiddle: http://sqlfiddle.com/#!9/0b0a0/32
Let's start with two tables - one for texts and one for numbers:
CREATE TABLE text (`title` varchar(29));
INSERT INTO text
(`title`)
VALUES
('cheap cars for sale'),
('house for sale'),
('cats and dogs for sale'),
('iphones and androids for sale'),
('cheap phones for sale'),
('house furniture for sale')
;
CREATE TABLE iterator (`index` int);
INSERT INTO iterator
(`index`)
VALUES
(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),
(16),(17),(18),(19),(20),(21),(22),(23),(24),(25),(26),(27),(28),(29),(30)
;
The second table, iterator must contains numbers from 1 to N where N higher or equal to the lenght of the longest string in text.
Then, run this query:
select
words.word, count(*) as `count`
from
(select
substring(concat(' ', t.title, ' '), i.index+1, j.index-i.index) as word
from
text as t, iterator as i, iterator as j
where
substring(concat(' ', t.title), i.index, 1) = ' '
and substring(concat(t.title, ' '), j.index, 1) = ' '
and i.index < j.index
) AS words
where
length(words.word) > 0
and words.word not like '% %'
group by words.word
order by `count` desc, words.word asc
There are two selects. Outer one simply groups and counts single words (words of length greater than 0 and without any spaces). Inner one extracts all strings starting from any space character and ending with any other space character, so strings aren't words (despite naming this subquery words) because they can contain other spaces than starting and ending one.
Results:
word count
for 6
sale 6
and 2
cheap 2
house 2
androids 1
cars 1
cats 1
dogs 1
furniture 1
iphones 1
phones 1

Related

SQL: How can I get the items value in one row using sql query?

How can I get the items value in one row using sql query like 20,10,15,40,15?
Table Name : ites
id | items
===============
1 | 20
2 | 10
3 | 15
4 | 40
5 | 15
As example we can get sum like this
Query
SELECT SUM(items) FROM `ites`
Result
SUM(items)
100
In same manner I want to get the value like 20,10,15,40,15 using a query?
You can use GROUP_CONCAT:
select GROUP_CONCAT(items) FROM ites
Ignoring the IDs of item
select GROUP_CONCAT(items)as myvalues FROM ites
With multiple occurrence of items
SELECT ID
,STUFF((SELECT ', ' + CAST(items AS VARCHAR(10)) [text()]
FROM iten
WHERE ID = t.ID
FOR XML PATH(''), TYPE)
.value('.','NVARCHAR(MAX)'),1,2,' ') myvalues
FROM iten t
GROUP BY ID

Retrieve Distinct concat values from MySQL table

I have an SQL table advert
id name cat
11 abc ab
12 acb ab, bc
13 abb bcd
14 abcd ad
15 acbd de
16 abbd ad
On using DISTINCT function I am getting an output like this
Query:
SELECT DISTINCT cat FROM advert;
Output:
ab
ab, bc
bcd
ad
de
WHAT changes do I need to make in my query for output like this
ab
bc
bcd
ad
de
select distinct trim(substring_index(substring_index(cat,',',n),',',-1)) as cat
from t join (select 1 as n union all select 2 union all select 3) r
on cat like concat('%',repeat(',%',n-1))
I think you should change your table structure and make it like this.
tblName
id | name
11 abc
12 acb
13 abb
14 abcd
15 acbd
16 abbd
tblCat
id | name_id | cat
some ids* 11 ab
12 ab
12 bc
13 bcd
14 ad
15 de
16 ad
In this way you can easily query and manage your data in your tables.
You should fix your data structure so you are not storing comma-delimited lists in columns. That is the wrong way to store data in a relational database . . . as you can see by the problems for answering this simple question. What you want is a junction table.
Sometimes, we are stuck with other peoples bad designs. You say that there are only two or values, then you can do:
select cat
from ((select substring_index(cat, ', ', 1) as cat
from advert
) union all
(select substring_index(substring_index(cat, ', ', 2), ', ', -1) as cat
from advert
where cat like '%, %'
) union all
(select substring_index(substring_index(cat, ', ', 3), ', ', -1) as cat
from advert
where cat like '%, %, %'
)
) c
group by cat;
First... I would create a statement that would turn all the rows into one big massive comma delimited list.
DECLARE #tmp VarChar(max)
SET #tmp = ''
SELECT #tmp = #tmp + ColumnA + ',' FROM TableA
Then use the table valued udf split described by this SO article to turn that massive string back into a table with a distinct clause to ensure that it's unique.
https://stackoverflow.com/a/2837662/261997
SELECT DISTINCT * FROM dbo.Split(',', #tmp)
Full code example:
if object_id('dbo.Split') is not null
drop function dbo.Split
go
CREATE FUNCTION dbo.Split (#sep char(1), #s varchar(512))
RETURNS table
AS
RETURN (
WITH Pieces(pn, start, stop) AS (
SELECT 1, 1, CHARINDEX(#sep, #s)
UNION ALL
SELECT pn + 1, stop + 1, CHARINDEX(#sep, #s, stop + 1)
FROM Pieces
WHERE stop > 0
)
SELECT pn,
SUBSTRING(#s, start, CASE WHEN stop > 0 THEN stop-start ELSE 512 END) AS s
FROM Pieces
)
go
declare #t table (colA varchar(max))
insert #t select '111, 223'
union all select '333'
union all select '444'
union all select '777,999';
select ltrim(rtrim(s.s)) as colC
from #t t
cross apply
dbo.split(',', t.colA) s

MySQL - count and group by - display all results in one row

I think this should be simple, but I can't figure it out or find a solution here.
I have a table ITEM_PROPERTIES
item_ID int
property_name char(20)
property_value char(20)
Sample data
5 Colour Black
5 Size M
6 Colour Blue
6 Size L
7 Colour Purple
7 Size M
8 Colour Blue
8 Size S
9 Colour Wine
9 Size M
If I do:
SELECT property_value, count(property_value)
FROM ITEM_PROPERTIES where property_name='Colour'
group by property_value
I get :
Black 1
Blue 2
Purple 1
Wine 1
How can I get this on one row like:
Black 1, Blue 2, Purple 1, Wine 1
I tried :
SELECT group_concat(property_value, count(property_value))
FROM ITEM_PROPERTIES where property_name='Colour'
group by property_value
SELECT group_concat(concat(property_value, count(property_value)))
FROM ITEM_PROPERTIES where property_name='Colour'
group by property_value
But both give an error Invalid use of group function
select group_concat(concat(property_value, ' ', cnt))
from (
SELECT property_value, count(property_value) as cnt
FROM ITEM_PROPERTIES
WHERE property_name = 'Colour'
GROUP BY property_value
) c
You can use two levels of aggregation:
SELECT group_concat(property_value, ' ', cnt separator ', ')
FROM (SELECT property_value, count(property_value) as cnt
FROM ITEM_PROPERTIES
WHERE property_name = 'Colour'
GROUP BY property_value
) ip;
Hey please check the following function it will be useful for any query for any later uses :
CREATE OR REPLACE FUNCTION "JOIN"
(
p_cursor sys_refcursor
,p_del VARCHAR2:=','
)
RETURN VARCHAR2
IS
l_value VARCHAR2(32767);
l_result VARCHAR2(32767);
BEGIN
LOOP
FETCH p_cursor INTO l_value;
EXIT WHEN p_cursor%NOTFOUND;
IF l_result IS NOT NULL
THEN
l_result:=l_result || p_del;
END IF;
l_result:=l_result || l_value;
END LOOP;
close p_cursor;
RETURN l_result;
END JOIN;
To call this function use the following sample :
SELECT Join(CURSOR(select column from tableName)) FROM DUAL;
Cheers

looking for a field name that contains specific data (mySql)

i got table named "teachers_db" with field names:
teacher_name
instrument1
instrument2
payment1
payment2
In my program i get list of teachers and what instrument they are teaching.
e.i:
Mathew - piano
Mathew - drums
Adam - guitar
I want to phrase a code line in mySQL that get a teacher_name and instrument (instrument1 = "piano" or instrumnet2 = "drums" and so on), and return the payment for it (payment1 if its a piano, playment2 if its a drums).
i'm hammering my head with this problem, please help :)
One way to get the result is to use two separate queries, combined with a UNION ALL operator. For example:
SELECT q.teacher_name
, q.instrument1 AS instrument
, q.payment1 AS payment
FROM teachers_db q
WHERE q.instrument1 <> ''
UNION ALL
SELECT r.teacher_name
, r.instrument2 AS instrument
, r.payment2 AS payment
WHERE r.instrument2 <> ''
ORDER BY 1,2
Another approach:
SELECT t.teacher_name
, CASE i.n WHEN 1 THEN t.instrument1 WHEN 2 THEN t.instrument2 END AS instrument
, CASE i.n WHEN 1 THEN t.payment1 WHEN 2 THEN t.payment2 END AS payment
FROM (SELECT 1 AS n UNION ALL SELECT 2) i
CROSS
JOIN teachers_db t
HAVING instrument <> ''
ORDER BY 1,2

MySQL - Complex Query w/ GROUP_CONCAT... Single Table

I have the following table:
id | billingno | location
-------------------------
1 | 9999999 | Toronto
2 | 9999999 | Toronto
3 | 7777777 | Toronto
4 | 7777777 | Quebec
I need a query that would generate me something that looked like this:
location | total | display
--------------------------
Toronto | 3 | 9999999 - 2, 7777777 - 1
Quebec | 1 | 7777777 - 1
So, it groups by location, displays the total number of billingno's for that location, and then the display column lists each billingno and how many times they were in that location. I have been trying to write this for some time, my closest attempt is this:
SELECT location, COUNT(*) AS total, GROUP_CONCAT(DISTINCT CAST(CONCAT(CONVERT(billingno,CHAR(16)), ' - ', THIS_COUNT_PART_FOR_EACH_LOCATION_IN_DISPLAY_DOESNT_WORK)AS CHAR)
SEPARATOR ' - ') AS display
FROM table GROUP BY location
ORDER BY COUNT(*) DESC
It gives me everything I need except I cannot for the life of me figure out how to count the number of each billingno's under display. If I use COUNT() it gives me an error about grouping. Please help!
Oh, I also had to use the convert to char so it would show up as text and not a BLOB in phpMyAdmin. Thanks again!
Sample data:
create table location (
id int,
billingno varchar(10),
location varchar(10)
);
insert into location
select 1, '9999999', 'Toronto' union
select 2, '9999999', 'Toronto' union
select 3, '7777777', 'Toronto' union
select 4, '7777777', 'Quebec' ;
Query:
select
location,
sum(qty) as total,
group_concat(concat(billingno, ' - ', cast(qty as char(7)))
order by qty desc separator ', '
) as display
from (
select billingno, location, count(*) as qty
from location
group by billingno, location
) t
group by location
order by total desc
Result:
location total display
Toronto 3 9999999 - 2, 7777777 - 1
Quebec 1 7777777 - 1
How about this,
SELECT table.location,
SUM(LocationCount) AS Total,
GROUP_CONCAT(CAST(CONCAT(CONVERT(billingno,CHAR(16)), ' - ', THIS_COUNT_PART_FOR_EACH_LOCATION_IN_DISPLAY_DOESNT_WORK)AS CHAR)
SEPARATOR ' - ') AS display
FROM table
LEFT JOIN
(SELECT location , COUNT(id) AS LocationCount
FROM table
GROUP BY location) t on t.location = table.location
GROUP BY location
ORDER BY SUM(LocationCount) DESC
SELECT location, SUM( total ) AS total, GROUP_CONCAT( CONCAT( billingno, ' - ', billing_count ) ) AS display
FROM (
SELECT location, COUNT( billingno ) AS total, billingno, COUNT( billingno ) AS billing_count
FROM billing
GROUP BY location, billingno
ORDER BY COUNT( * ) DESC
) AS t
GROUP BY location