MySQL count occurrences greater than 2 - mysql

I have the following table structure
+ id + word +
+------+--------+
The table gets filled with the words in lower cas of a given text, so the text
Hello bye hello
would result in
+ id + word +
+------+--------+
+ 1 + hello +
+------+--------+
+ 2 + bye +
+------+--------+
+ 3 + hello +
+------+--------+
I want to make a SELECT query that will return the number of words that get repeated at least two times in the table (like hello)
SELECT COUNT(id) FROM words WHERE (SELECT COUNT(words.word))>1
which of course is so wrong and super overloading when table is big. Any idea on how to achieve such purpose? In the given example inhere-above, I would expect 1

To get a list of the words that appear more than once together with how often they occur, use a combination of GROUP BY and HAVING:
SELECT word, COUNT(*) AS cnt
FROM words
GROUP BY word
HAVING cnt > 1
To find the number of words in the above result set, use that as a subquery and count the rows in an outer query:
SELECT COUNT(*)
FROM
(
SELECT NULL
FROM words
GROUP BY word
HAVING COUNT(*) > 1
) T1

SELECT count(word) as count
FROM words
GROUP BY word
HAVING count >= 2;

SELECT word, COUNT(*) FROM words GROUP by word HAVING COUNT(*) > 1

The HAVING option can be used for this purpose and query should be
SELECT word, COUNT(*) FROM words
GROUP BY word
HAVING COUNT(*) > 1;

Related

How to retrieve individual tags from TAGS column?

I have the following table in MySQL database:
id creation_date score tags
1 2016-02-09 07:24:59.097000+00:00 -1 html|javascript
2 2016-02-09 08:10:00.000000+00:00 0 xml|css
3 2016-02-10 08:00:15.000000+00:00 2 html|javascript
4 2016-02-11 07:00:45.000000+00:00 -5 html|css
I want to retrieve the tags and order them by scores. Then I want to sort the tags by the frequency of negative scores, so that the worst tags would appear on top.
The expected result for the above-given query would be:
TAG FREQUENCY
html 2
css 1
javascript 1
xml 0
I get stuck with the retrieval of individual tags from columns.
SELECT tags, COUNT(*)
FROM my_table
WHERE score < 0
When you are stuck with such an awful data format, you can do something with it. A table of numbers can help, but here is an example that will extract up to the first 3 items:
select substring_index(substring_index(tags, '|', n.n), '|', -1) as tag, count(*)
from (select 1 as n union all
select 2 as n union all
select 3 as n
) n join
t
on n.n <= length(tags) - length(replace(t.tags, '|', '')) + 1
group by tag;
What is this doing? The on clause is making sure there are at least n tags in the string, for a given value of n (larger values are filtered out).
The two substring_index() functions are extracting the nth tag from the list. And then there is aggregation.

mysql : display multiple results with heading

I'm a complete newb so forgive me.
I'm trying to get the results to display 2 or more different headings.
SELECT sum(fare) AS totalfare, count(*) AS fare10
where fare>10
FROM tbl
I'm trying to get the WHERE statement apply to only count, not the sum, and have the result display as "totalfare" "fare10"
SELECT sum(fare) AS totalfare
FROM tbl
union
SELECT count(*) AS watever
FROM tbl
where fare > 10
I've tried this way, but the result grid would spit out the answers under 1 heading as totalfare. Is it possible to display it as totalfare | whatever?
Finally you explained your question. You can do UNION only when you have tables (result sets) with the same fields. This is what you need. The above query selects directly from the derived table created by the two sub-queries.
SELECT
*
FROM
(SELECT
SUM(fare) AS totalfare
FROM
tbl) a,
(SELECT
COUNT(*) AS watever
FROM
tbl
WHERE
fare > 10) b
You will get results as one row
[ totalfare | watever ]
number number
You want conditional aggregation:
SELECT sum(fare) AS totalfare,
sum(case when fare > 10 then 1 else 0 end) as fare10
FROM tbl;
In MySQL you can also shorten this to:
SELECT sum(fare) AS totalfare,
sum(fare > 10) as fare10
FROM tbl;

selecting the next m rows after row number n in a mysql table

In a mysql table with "R rows" I want to select the next "m rows" after "nth row" in such a way that if n+m>R it returns R-n rows from the end of table and m+n-R rows from the beginning of the table.
e.g in this table:
id firstname
1 john
2 robert
3 bob
4 adam
5 david
I want to get the next 4 rows after row number 3 (bob), in this fashion:
4 adam
5 david
1 john
2 robert
I have searched a lot and found that the following query just returns the last 2 rows.
SELECT * FROM table LIMIT 4 OFFSET 3;
I know that I can implement this specific query using php and bunch of conditional statements but I am curious to know whether it has been implemented in mysql or not?
One approach is to use union all in a subquery. This allows you to "duplicate" the table, with a newly calculated id at the end of the table:
select t.*
from ((select t.*, id as newid from table t) union all
(select t.*, id + cnt as newid
from table t cross join
(select count(*) as cnt from table) cnt
)
) t
order by newid
limit 4 offset 3;
For small tables, this should be fine. For larger tables, you probably don't want to do this because the MySQL materializes the subquery -- adding overhead to the processing of the query.

MySQL - select maximum sum

I have a MySQL table as below.
**AuthorID**, **PublicationName**, ReferenceCount, CitationCount
AuthorID and PublicationName act as the primary key. I need to find the maximum sum of ReferenceCount and CitationCount for all the authors. For example, the data is as below.
1 AAA 2 5
1 BBB 1 3
1 CCC 2 4
2 AAA 1 4
In this case, I need my output as,
1 AAA 7
2 AAA 5
I tried the below query.
SELECT AuthorID, PublicationName, Max(Sum(ReferenceCount + CitationCount))
from Author
Group by AuthorID, PublicationName
If I use max(sum(ReferenceCount + CitationCount)) group by AuthorID, PublicationName I get an error as "Invalid use of Group function". I believe I should include Having clause in my query. But am not sure on how to do the same.
If I understand the question right, you want all the records for the publication that has the most citations. The publication and their citation counts is given by:
SELECT PublicationName, Sum(ReferenceCount + CitationCount)
from Author
Group by PublicationName
order by Sum(ReferenceCount + CitationCount) desc
limit 1;
The order by and limit 1 give you the highest value.
If you want all records for the publication with the maximum sum:
select a.*
from Author a join
(SELECT PublicationName, Sum(ReferenceCount + CitationCount)
from Author
Group by PublicationName
order by Sum(ReferenceCount + CitationCount) desc
limit 1
) asum
on a.PublicationName = asum.PublicationName
Try this:
SELECT AuthorID, PublicationName, Max(ReferenceCount+CitationCount)
FROM Author
GROUP BY AuthorID
The problem with your query was that the SUM() sums a column's value for many rows. It cannot be used to sum columns the way you wanted. For that, just use the plus (+), normally.

How can I get the percentage of total rows with mysql for a group?

below I have a query that will get the most common user agents for a site from a table of user agents and a linked table of ip addresses:
SELECT count(*) as num, string FROM `useragent_ip`
left join useragents on useragent_id = useragents.id
group by useragent_id
having num > 2
order by num desc, string
Sometimes it will show me something like
25 Firefox
22 IE
11 Chrome
3 Safari
1 Spider 1
1 Spider 2
1 Spider 3
My question is if there is a way that since the numbers on the left represent percentages of a whole, and will grow with time, can I have part of the sql statement to show each group's percentage of the whole? So that instead of using having num > 2 then I could do something that would say get the percentage of the total rows instead of just the number of rows?
Yes you can:
select num, string, 100 * num / total as percent
from (
select count(*) as num, string
from useragent_ip
left join useragents on useragent_id = useragents.id
group by useragent_id) x
cross join (
select count(*) as total
from useragent_ip
left join useragents on useragent_id = useragents.id) y
order by num desc, string;
I removed the having num > 2, because it didn't seem to make sense.
If you add with rollup after your group by clause, then you will get a row where string is NULL and num is the total of all the browsers. You can then use that number to generate percentages.
I can't really imagine a single query doing the calculation and being more efficient than using with rollup.