calculating the word occurrence in mysql table

calculating the word occurrence in mysql table - mysql

I am getting the reviews from different sites and storing into table. For each review I am getting adjective and noun list in separate column.
So for each review there are main 3 values here.
review, adjective_list, rate
Now I want to count number of times adjectives repeats. After that recommending only those review which has adjectives which repeats maximum time and having review 4-5.
Which is correct way to do this?
My thought about this:
Creating trigger which perform action when ever there is insert review operation.
This trigger will read column having adjectives, calculate the occurrence(don't know how?) and storing top adjectives with their occurrence.
While recommendation selecting adjective with maximum occurrence, and looking into 4-5 rated review.
I am not sure what is correct way. Any help is appreciable
Main table looks like this:

Not tested, but if I understand you requirement correctly you should be able to base a query on something like this to do the job:-
SELECT id, SUBSTRING_INDEX(SUBSTRING_INDEX(adj_noun, ',', aCnt + 1), ',', -1), COUNT(*)
FROM Main_Table
INNER JOIN
(
SELECT Units.i + Tends.i * 10 + Hundreds.i * 100 AS aCnt
(SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) Units
(SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) Tens
(SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) Hundreds
) Integers_Query
ON aCnt <= (LENGTH(adj_noun) - LENGTH(REPLACE(adj_noun, ',', '')))
GROUP BY id, SUBSTRING_INDEX(SUBSTRING_INDEX(adj_noun, ',', aCnt + 1), ',', -1)
This uses a subquery to get a range of numbers (0 to 999), and does a join of this against your table where the number is less than or equal to the number of time a comma appears in the adj_noun column (ie, subtract the length of adj_noun with all the commas removed from the full length of adj_noun). Then use SUBSTRING_INDEX to get the string up to the aCnt comma, and again use SUBSTRING_INDEX to get the string from that comma back to the previous comma (excludes the commas from the result).
The COUNT / GROUP BY should get you the number of times each word appears in the resulting list for each item.
Probably fairly inefficient. Only copes with 1000 comma separated words (easily extended, but will be slower).

Related

Separate field into different rows [duplicate]

This question already has answers here:
SQL split values to multiple rows
(12 answers)
Closed 3 years ago.
I have a table called tableA, and the id field may stored values with comma at one record.
When I use sql: "select id from tableA", I will get the below result.
I do not know how to separate a row record into several rows like below:
27
19
7
18
...
Is it any hints for the sql script? Thank you.

I'm not confident there isn't a nicer way, but maybe something like this is a good starting point:
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(tablea.id, ',', numbers.row), ',', -1)
FROM
tablea INNER JOIN
-- This numbers subquery is taken from #Unreason's answer in https://stackoverflow.com/questions/304461/generate-an-integer-sequence-in-mysql
(
SELECT #row := #row + 1 AS row FROM
(select 0 union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) t,
(select 0 union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) t2,
(SELECT #row:=0) n) numbers
on numbers.row <= LENGTH(tablea.id)-LENGTH(REPLACE(tablea.id, ',', ''))+1
Given tablea.id='27,19,8', SUBSTRING_INDEX(SUBSTRING_INDEX(tablea.id, ',', 1), ',', -1) will return the first element (27) and SUBSTRING_INDEX(SUBSTRING_INDEX(tablea.id, ',', 2), ',', -1) will return the second element (19), and so forth.
Therefore I join that with a list of numbers (in this case my list goes up to 100, so I am assuming a single tablea.id field never has more than 100 comma-separated values, although the details could be changed). The join condition uses LENGTH(tablea.id)-LENGTH(REPLACE(tablea.id, ',', ''))+1 which is a count of how many comma-separated values are in the field.
Here's a db fiddle of it working to play around with.

mysql find numbers in query that are NOT in table

Is there a simple way to compare a list of numbers in my query to a column in a table to return the ones that are NOT in the db?
I have a comma separated list of numbers (1,57, 888, 99, 76, 490, etc etc) that I need to compare to the number column in a table in my DB. SOME of those numbers are in the table, some are not. I need the query to return those that are in my comma separated list, but are NOT in the DB...

I would put the list of numbers to be checked in a table of their own, then use WHERE NOT EXISTS to check whether they exist in the table to be queried. See this SQLFiddle demo for an example of how this might be accomplished:
If you're comfortable with this syntax, you can even avoid putting into a temp table:
SELECT * FROM (
SELECT 1 AS mycolumn
UNION
SELECT 2
UNION
SELECT 3
UNION
SELECT 4
UNION
SELECT 5
UNION
SELECT 6
UNION
SELECT 7
) a
WHERE NOT EXISTS ( SELECT 1 FROM mytable b
WHERE b.mycolumn = a.mycolumn )
UPDATE per comments from OP
If you can insert your very long list of numbers into a table, then query as follows to get the numbers that are not found in the other table:
SELECT mynumber
FROM mytableof37000numbers a
WHERE NOT EXISTS ( SELECT 1 FROM myothertable b
WHERE b.othernumber = a.mynumber)
Alternately
SELECT mynumber
FROM mytableof37000numbers a
WHERE a.mynumber NOT IN ( SELECT b.othernumber FROM myothertable b )
Hope this helps.

May be this is what you are looking for.
Convert your CSV to rows using SUBSTRING_INDEX. Use NOT IN operator to find the values which is not present in DB
Then Convert the result back to CSV using Group_Concat.
select group_concat(value) from(
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(t.a, ',', n.n), ',', -1) value
FROM csv t CROSS JOIN
(
SELECT a.N + b.N * 10 + 1 n
FROM
(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) a
,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) b
ORDER BY n
) n
WHERE n.n <= 1 + (LENGTH(t.a) - LENGTH(REPLACE(t.a, ',', '')))) ou
where value not in (select a from db)
SQLFIDDLE DEMO
CSV TO ROWS referred from this ANSWER

You could use the 'IN' clause of MySQL. Maybe check this out IN clause tutorial

how to split a string to 4 characters sub strings then concat them by hyphen?

I need to write a mysql query that does the following:
I have list of serial numbers in a database table, the query should only update serial number rows where the length of a serial number is more than 15, then each serial number is divided to 4-digits sub-strings separated by a hyphen. example :
'10028641232912356' should become '1002-8641-1232-9123-56'
I started with this, which only inserts a hyphen after first 4 :
SELECT serial_no, CHAR_LENGTH(TRIM(serial_no)) as 'length',
CONCAT_WS('-',SUBSTRING(TRIM(serial_no),1,4),
SUBSTRING(TRIM(serial_no),5,4)) as result
FROM pos where
serial_no is not null and CHAR_LENGTH(TRIM(serial_no))>=15 ;
this is only the select statement, at first I just want to get (as in select) the new format of the serial no then i'll update it, but since I dont know the exact length of each serial number I need to figure out this part:
CONCAT_WS('-',SUBSTRING(TRIM(serial_no),1,4)
This is must only be done using mysql functions
Any help is appreciated

Introduction
Since we've figured out that your strings can have any length; your issue boils down to how to split strings with any length in to it's chunks of some length in MySQL. Unfortunately there is a serious problem in MySQL that prevent us from doing this in "native" way. And this problem is - MySQL does not support sequences. That means you can not use some sort of internal iterator like construct to loop over your string.
But there is always a way
Building sequence
We can use a trick to do this. First part of trick: use CROSS JOIN to produce the desired row sets. If you are not aware how it works then I'll remind you. It will produce a Cartesian product of the two row sets. The second part of the trick: use a well known formula.
N = d1x101 + d2x102 + ...
Actually you can do that with any base, not just 10. For this demonstration I will use 10. Usage:
SELECT
n1.i+10*n2.i AS num
FROM
(SELECT 0 as i UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) as n1
CROSS JOIN
(SELECT 0 as i UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) as n2
So we're using UNION ALL for the numbers 0..9 to produce a multiplication part (our di). With the query above you'll get consecutive 0..99 numbers. This is because we're using base powers till 2 only, and 102=100. You can check fiddle to make sure that it will work properly.
Iterating through string
Now with these "pseudo-generator" we can emulate iteration through the string. To do that, there are MySQL variables that will be our iterator. Of course separation of string piece is a work for SUBSTR(). So basic skel is:
SELECT
SUBSTR(#str, #i:=#i+#len, #len) as chunk
FROM
(SELECT 0 as i UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) as n1
CROSS JOIN
(SELECT 0 as i UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) as n2
CROSS JOIN
(SELECT #str:='ABCD1234EFGH5678IJKL90', #len:=4, #i:=1-#len) as init
LIMIT 6;
(fiddle for sample above is here). We are just iterating through sequence and then using the iterator to create the correct offset. All that is left to do now is gather our string and insert the hyphens. Hopefully there is GROUP_CONCAT() in MySQL for that:
SELECT
GROUP_CONCAT(chunk SEPARATOR '-') AS string
FROM
(SELECT
SUBSTR(#str, #i:=#i+#len, #len) as chunk
FROM
(SELECT 0 as i UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) as n1
CROSS JOIN
(SELECT 0 as i UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) as n2
CROSS JOIN
(SELECT #str:='ABCD1234EFGH5678IJKL90', #len:=4, #i:=1-#len) as init
) AS data
WHERE chunk!='';
(again, fiddle is here).
With whole table
Now it is a sample and you want to select that from a table. It will be more complicated:
SELECT
serial_no,
GROUP_CONCAT(chunk SEPARATOR '-') AS serial
FROM
(SELECT
SUBSTR(
IF(#str=serial_no, #str, serial_no),
#i:=IF(#str=serial_no, #i+#len, 1),
#len
) AS chunk,
#str:=serial_no AS serial_no
FROM
(SELECT
serial_no
FROM
pos
CROSS JOIN
(SELECT 0 as i UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) as n1
CROSS JOIN
(SELECT 0 as i UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) as n2
ORDER BY
serial_no) AS data
CROSS JOIN
(SELECT #len:=4, #i:=1-#len) AS init) AS seq
WHERE
chunk!=''
GROUP BY
serial_no;
Fiddle is available here. There is another trick to produce the row sets. We'll use CROSS JOIN now instead of initializing string. So we should pass the current pos value of the serial_no. In order to make sure all the rows will be iterated properly we have to order them (that is done with inner ORDER BY).
Limitations
Well as you already know, sequence is limited with 99; thus, with our #len defined as 4 we'll be able to split only by a 4000 length string maximum. Also this will use the whole sequence in any case. Even if your string is much shorter (chances are it is). Thus performance impact may have place.
Is it worth the effort?
My point is: mostly, no. It may be ok to use it once, maybe. But it won't be re-usable and it won't be readable. Thus there is little sense in doing such string operations with DBMS functions because we have applications for such things. It should be used for that and using it you actually can create re-usable/scalable/or/whatever code.
Another way may be to create stored procedure in which we do the desired thing (so, split the string by given length & concatenate it with given delimiter). But honestly, it's just an attempt to "hide the problem". Because even if it would be re-usable, it still will have the same weakness; performance impact. Even more so if we're going to create code for the DBMS. Then again, why don't we place that code in the application? In 99% of cases DBMS is the place for data storage and the application is the place for code (e.g. logic). Mixing these two things almost always ends in a bad result.

Making an assumption that the serial number is unique then the following should do it:-
SELECT serial_no, GROUP_CONCAT(SUBSTRING(serial_no, anInt, 4) ORDER BY anInt SEPARATOR '-')
FROM
(
SELECT serial_no, anInt
FROM pos
CROSS JOIN
(
SELECT (4 * (units.i + 10 * tens.i + 100 * hundreds.i)) + 1 AS anInt
FROM (SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) units
CROSS JOIN (SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) tens
CROSS JOIN (SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) hundreds
) sub1
WHERE LENGTH(serial_no) >= anInt
AND LENGTH(serial_no) > 15
) sub2
GROUP BY serial_no
This will cope with a serial number up to ~4000 characters long.
This works by using a series of unioned fixed queries to get the numbers 0 to 9. This is cross joined against itself 3 times with the first being units, the next being tens and the last being hundreds (you can carry on adding more if you want). From these the numbers between 0 and 999 are generated, then multiplied by 4 and with 1 added (so giving 1 to 3997 in steps of 4), which is the starting position of each group of 4. The WHERE clause checks that this generated number is less than the length of serial_no (if it is you land up with duplicates), and that serial_no is longer than 15.
This will generate a list of all the numbers, each one repeated as many times as their are groups of 4 numbers (or partial groups), along with the start position of a group.
The outer SELECT then takes this list and uses substring to extract each group, and uses GROUP_CONCAT to join he results together again with '-' as the separator between each group. If also specifies the start position of each group as the order to join them again (would probably be fine without this, but I wouldn't guarantee it).
SQL fiddle here:-
http://www.sqlfiddle.com/#!2/eb2d0/2

Changing a Query with a numbered result set (with gaps,) to return result with no gaps, containing every number.

I have a select statement: select a, b, [...]; which returns the results:
a|b
---------
1|8688798
2|355744
4|457437
7|27834
I want it to return:
a|b
---------
1|8688798
2|355744
3|0
4|457437
5|0
6|0
7|27834
An example query that does not do what I would like, since it does not have the gap numbers:
select
sub.num_of_ratings,
count(sub.rater)
from
(
select
r.rater_id as rater,
count(r.id) as num_of_ratings
from ratings r
group by rater
) as sub
group by num_of_ratings;
Explanation of the query:
If a user rates another user, the rating is listed in the table ratings and the id of the rating user is kept in the field rater_id. Effectively I check for all users who are referred to in ratings and count how many ratings records I find for that user, which is rater / num_of_ratings, and then I use this result to find how many users have rated a given number of times.
At the end I know how many users rated once, how many users rated twice, etc. My problem is that the numbers for count(sub.rater) start fine from 1,2,3,4,5... However, for bigger numbers there are gaps. This is because there might be one user who rated 1028 times - but no user who rated 1027 times.
I don't want to apply stored procedures looping over the result or something like that. Is it possible to fill those gaps in the result without using stored procedures, looping, or creating temporary tables?

If you have a sequence of numbers, then you can do a JOIN with that table and fill in the gaps properly.
You can check out this questions on how to get the sequence:
generate an integer sequence in MySQL
Here is one of the answers posted that might be easily used with the limitation that generates numbers from 1 to 10,000:
SELECT #row := #row + 1 as row FROM
(select 0 union all select 1 union all select 3 union all select 4 union all select 5 union all select 6 union all select 6 union all select 7 union all select 8 union all select 9) t,
(select 0 union all select 1 union all select 3 union all select 4 union all select 5 union all select 6 union all select 6 union all select 7 union all select 8 union all select 9) t2,
(select 0 union all select 1 union all select 3 union all select 4 union all select 5 union all select 6 union all select 6 union all select 7 union all select 8 union all select 9) t3,
(select 0 union all select 1 union all select 3 union all select 4 union all select 5 union all select 6 union all select 6 union all select 7 union all select 8 union all select 9) t4,
(SELECT #row:=0) t5

Using a sequence of numbers, you can join your result set. For instance, assuming your number list is in a table called numbersList, with column number:
Select number, Count
from
numbersList left outer join
(select
sub.num_of_ratings,
count(sub.rater) as Count
from
(
select
r.rater_id as rater,
count(r.id) as num_of_ratings
from ratings r
group by rater
) as sub
group by num_of_ratings) as num
on num.num_of_ratings=numbersList.number
where numbersList.number<max(num.num_of_ratings)
Your numbers list must be larger than your largest value, obviously, and the restriction will allow it to not have all numbers up to the maximum. (If MySQL does not allow that type of where clause, you can either leave the where clause out to list all numbers up to the maximum, or modify the query in various ways to achieve the same result.)

#mazzucci: the query is too magical and you are not actually explaining the query.
#David: I cannot create a table for that purpose (as stated in the question)
Basically what I need is a select that returns a gap-less list of numbers. Then I can left join on that result set and treat NULL as 0.
What I need is an arbitrary table that keeps more records than the length of the final list. I use the table user for that in the following example:
select #row := #row + 1 as index
from (select #row := -1) r, users u
limit 101;
This query returns a set of the numbers von 0 to 100. Using it as a subquery in a left join finally fills the gap.
users is just a dummy to keep the relational engine going and hence producing the numbers incrementally.
select t1.index as a, ifnull(t2.b, 0) as b
from (
select #row := #row + 1 as index
from (select #row := 0) r, users u
limit 7
) as t1
left join (
select a, b [...]
) as t2
on t1.index = t2.a;
I didn't try this very query live, so have merci with me if there is a little flaw. but technically it works. you get my point.
EDIT:
just used this concept to gain a gapless list of dates to left join measures onto it:
select #date := date_add(#date, interval 1 day) as date
from (select #date := '2010-10-14') d, users u
limit 700
starts from 2010/10/15 and iterates 699 more days.

Counting word occurrences in a table column

I have a table with a varchar(255) field. I want to get (via a query, function, or SP) the number of occurences of each word in a group of rows from this table.
If there are 2 rows with these fields:
"I like to eat bananas"
"I don't like to eat like a monkey"
I want to get
word | count()
---------------
like 3
eat 2
to 2
i 2
a 1
Any idea? I am using MySQL 5.2.

#Elad Meidar, I like your question and I found a solution:
SELECT SUM(total_count) as total, value
FROM (
SELECT count(*) AS total_count, REPLACE(REPLACE(REPLACE(x.value,'?',''),'.',''),'!','') as value
FROM (
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(t.sentence, ' ', n.n), ' ', -1) value
FROM table_name t CROSS JOIN
(
SELECT a.N + b.N * 10 + 1 n
FROM
(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) a
,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) b
ORDER BY n
) n
WHERE n.n <= 1 + (LENGTH(t.sentence) - LENGTH(REPLACE(t.sentence, ' ', '')))
ORDER BY value
) AS x
GROUP BY x.value
) AS y
GROUP BY value
Here is the full working fiddle: http://sqlfiddle.com/#!2/17481a/1
First we do a query to extract all words as explained here by #peterm(follow his instructions if you want to customize the total number of words processed). Then we convert that into a sub-query and then we COUNT and GROUP BY the value of each word, and then make another query on top of that to GROUP BY not grouped words cases where accompanied signs might be present. ie: hello = hello! with a REPLACE

I would recommend not to do this in SQL at all. You're loading DB with something that it isn't best at. Selecting a group of rows and doing frequency calculation on the application side will be easier to implement, will work faster and will be maintained with less issues/headaches.

You can try this perverted-a-little way:
SELECT
(LENGTH(field) - LENGTH(REPLACE(field, 'word', ''))) / LENGTH('word') AS `count`
ORDER BY `count` DESC
This query can be very slow. Also, it looks pretty ugly.

I think you should do it like indexing, with additional table.
Whenever u create, update, or delete a row in your original table, you should update your indexing table. That indexing table should have the columns: word, and the number of occurrences.

I think you are trying to do too much with SQL if all the words are in one field of each row. I recommend to do any text processing/counting with your application after you grab the text fields from the db.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

calculating the word occurrence in mysql table - mysql

Related

Separate field into different rows [duplicate]

mysql find numbers in query that are NOT in table

how to split a string to 4 characters sub strings then concat them by hyphen?

Changing a Query with a numbered result set (with gaps,) to return result with no gaps, containing every number.

Counting word occurrences in a table column

Categories

Resources