How to count and group query to get proper results? - mysql

I have a problem, please see my database:
-------------------
| id | article_id |
-------------------
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 4 | 2 |
| 5 | 2 |
| 6 | 3 |
| 7 | 3 |
| 8 | 3 |
| 9 | 3 |
| 10 | 3 |
And I want to receive something like this (order by votes, from max to min):
---------------------------
| id | article_id | votes |
---------------------------
| 1 | 3 | 5 |
| 2 | 1 | 3 |
| 3 | 2 | 2 |
Could you please help me to write proper sql query?

SET #currentRow = 0;
SELECT #currentRow := #currentRow + 1 AS id, t.article_id, t.c AS `votes`
FROM (
SELECT article_id, count(*) as `c`
FROM table_votes
GROUP BY article_id
) t
ORDER BY t.c DESC
please note that you can't select an id column like this in this context, and your "expected result" is incorrect. I tried to adapt it at a maximum.
cheers

SELECT article_id, COUNT(article_id) AS votes
FROM votes_table
GROUP BY article_id
ORDER BY votes DESC;

Related

How to query MIN value of MAX subquery with two distinct columns?

I have a table like this:
+---------------+--------------+------+-----+----------+
| Field | Type | Null | Key | Default |
+---------------+--------------+------+-----+----------+
| id | smallint(6) | NO | PRI | NULL |
| Book | tinyint(4) | NO | | NULL |
| Chapter | smallint(6) | NO | | NULL |
| Paragraph | smallint(6) | NO | | NULL |
| Text | text | YES | | NULL |
| RevisionNum | mediumint(9) | NO | PRI | NULL |
+---------------+--------------+------+-----+----------+
mysql> select id,Book,Chapter,Paragraph,RevisionNum FROM MyTable ORDER BY id LIMIT 11;
+-----+------+---------+-----------+-------------+
| id | Book | Chapter | Paragraph | RevisionNum |
+-----+------+---------+-----------+-------------+
| 1 | 1 | 1 | 1 | 0 |
| 1 | 1 | 1 | 1 | 1 |
| 1 | 1 | 1 | 1 | 2 |
| 2 | 1 | 2 | 2 | 0 |
| 2 | 1 | 2 | 2 | 1 |
| 2 | 1 | 2 | 2 | 2 |
| 2 | 1 | 2 | 2 | 3 |
| 3 | 1 | 2 | 3 | 0 |
| 4 | 1 | 2 | 4 | 0 |
| 4 | 1 | 2 | 4 | 1 |
| 5 | 1 | 3 | 5 | 0 |
+-----+------+---------+-----------+-------------+
To find a book or chapter which has no unrevised paragraph,
I wish to query either the minimum value of the maximums of
all the distinct id's for that chapter or book, or else in
some fashion determine that no id remains unedited (with a
MAX(RevisionNum) of zero).
Most of my attempts to date have ended in errors like this one:
SELECT DISTINCT Book,RecordNum FROM MyTable
-> WHERE 0 < ALL (SELECT DISTINCT RecordNum,MAX(RevisionNum)
FROM MyTable
WHERE MAX(RevisionNum) > 0);
ERROR 1111 (HY000): Invalid use of group function
...And I wasn't using the "GROUP BY" function at all!
The following query produces results, but simply
gives ALL id's, and does not actually show a unique
set of Book records, as requested. How could this happen?
SELECT DISTINCT Book,id,MAX(RevisionNum) FROM MyTable GROUP BY id LIMIT 5;
+------+----+------------------+
| Book | id | MAX(RevisionNum) |
+------+----+------------------+
| 1 | 1 | 30 |
| 1 | 2 | 16 |
| 1 | 3 | 15 |
| 1 | 4 | 10 |
| 1 | 5 | 9 |
+------+----+------------------+
What would the correct query be to give results more like this:
+------+-----+-----------------------+
| Book | id | MIN(MAX(RevisionNum)) |
+------+-----+-----------------------+
| 1 | 5 | 3 |
| 2 | 17 | 1 |
| 3 | 33 | 2 |
| 4 | 147 | 0 |
| 5 | 225 | 2 |
+------+-----+-----------------------+
Are you looking for two levels of aggregation?
select id, book, min(max_revisionnum)
from (select id, book, chapter, paragraph, max(revisionnum) as max_revisionnum
from mytable
group by id, book, chapter, paragraph
) t
group by id, book;
EDIT:
Based on your comment, you can use:
select *
from (select id, book, chapter, paragraph, max(revisionnum) as max_revisionnum,
row_number() over (partition by book order by max(revisionnum) desc) as seqnum
from mytable
group by id, book, chapter, paragraph
) t
where seqnum = 1;
Here is a db<>fiddle.
In older versions of MariaDB, you can use a correlated subquery:
select t.*
from mytable t
where (id, book, chapter, paragraph, revisionnum) = (select t2.id, t2.book, t2.chapter, t2.paragraph, t2.revisionnum
from mytable t2
where t2.book = t.book
order by t2.revisionnum desc
limit 1
);
For this query, try adding an index on (book, revisionnum desc).

mysql table ordering incorrect with group by and order by

table 1: forum_threads
+-----+------+-------+
| id | title| status|
+-----+------+-------+
| 1 | a | 1 |
| 2 | b | 1 |
| 3 | c | 1 |
| 4 | d | 1 |
| 5 | e | 1 |
| 6 | f | 1 |
+-----+------+-------+
table 2: forum_comments
+-----+----------+--------------------+
| id | thread_id| comment |
+-----+----------+--------------------+
| 1 | 4 | hai |
| 2 | 4 | hello |
| 3 | 2 | welcome |
| 4 | 2 | whats your name |
| 5 | 6 | how are you |
| 6 | 5 | how old are you |
| 7 | 5 | good |
+-----+----------+--------------------+
wanted output
+-----------+----------+-----------------+
| thread_id | title | comment_count |
+-----------+----------+-----------------+
| 5 | e | 2 |
| 6 | f | 1 |
| 2 | b | 2 |
| 4 | d | 2 |
+-----------+----------+-----------------+
my Query
SELECT forum_threads.*,forum_comments.*,count(forum_comments.id) as comment_count
FROM forum_comments
LEFT JOIN forum_threads ON forum_comments.thread_id = forum_threads.id
GROUP BY forum_threads.id
ORDER BY forum_comments.id desc
Here I am trying to get the titles by the latest comment.
when I give ORDER BY forum_comments.id this returns the wrong order.
I need to order by the latest comments in the forum_comments table.
this query returns the wrong order please help me to find out the correct order.
how could I solve this easily?
This query should give you the expected result:
select t2.thread_id, t1.title, t2.comment_count from forum_threads as t1,
(SELECT id, thread_id, count(comment) as comment_count from forum_comments group by thread_id) as t2
where t1.id = t2.thread_id order by t2.id desc;
Instead of using forum_threads.* and forum_comments.* can you give specific column names and try.
If that doesn't work you should try explicitly assigning primary and foreign keys.

MySQL Query to get Similar likes

I am designing a simple architecture where i have a table which stores users and some elements that they like so my table structure is something like this:
+---------+---------+
| user_id | like_id |
+---------+---------+
| 1 | 4 |
| 2 | 2 |
| 4 | 4 |
| 4 | 3 |
| 5 | 4 |
| 6 | 7 |
| 7 | 5 |
| 34 | 6 |
| 3 | 8 |
| 2 | 3 |
| 2 | 5 |
| 1 | 3 |
| 1 | 10 |
| 1 | 12 |
| 2 | 10 |
+---------+---------+
Now what i will have is id of any user (lets say user_id = 1 ) and i want a query to get all the other users who have similar Likes as that of 1.
So in the Output for user_id = 1 will be :
+---------------------------+------------------------+----------------+
| users_with_common_likes | no_of_common_likes | common_likes |
+---------------------------+------------------------+----------------+
| 4 | 2 | 3,4 |
| 2 | 2 | 3,10 |
| 5 | 1 | 4 |
+---------------------------+------------------------+----------------+
What I have achieved :
I can do this using a sub-query as below :
SELECT user_id
FROM `user_likes`
WHERE `like_id`
IN (
SELECT GROUP_CONCAT( `like_id` )
FROM user_likes
WHERE user_id =1
)
AND user_id !=1
LIMIT 0 , 30
However this query is not giving all the users,it misses the user_id = 2 which has like id 3 in common with user_id=1.
and i cant figure out how to find the remaining 2 columns.
Also I feel that this is not the best way to to this as this table will contain thousands of data and it may effect system performance.
I would like to do this with a single Mysql Query.
This assumes a PK formed on user_id,like_id...
SELECT y.user_id
, GROUP_CONCAT(y.like_id) likes
, COUNT(*) total
FROM my_table x
JOIN my_table y
ON y.like_id = x.like_id
AND y.user_id <> x.user_id
WHERE x.user_id = 1
GROUP
BY y.user_id;

SQL, difficult fetching data query

Suppose I have such a table:
+-----+---------+-------+
| ID | TIME | DAY |
+-----+---------+-------+
| 1 | 1 | 1 |
| 2 | 2 | 1 |
| 3 | 3 | 1 |
| 1 | 1 | 2 |
| 2 | 2 | 2 |
| 3 | 3 | 2 |
| 1 | 1 | 3 |
| 2 | 2 | 3 |
| 3 | 3 | 3 |
| 1 | 1 | 4 |
| 2 | 2 | 4 |
| 3 | 3 | 4 |
| 1 | 1 | 5 |
| 2 | 2 | 5 |
| 3 | 3 | 5 |
+-----+---------+-------+
I want to fetch a table which represents 2 IDs which got the largest sum of TIME within the last 3 days (means from 3 to 5 in a DAY column)
So the correct result would be:
+-----+---------+
| ID | SUM |
+-----+---------+
| 3 | 9 |
| 2 | 6 |
+-----+---------+
The original table is much larger and more complex. So i need a generic approach.
Thanks in advance.
And so I just learned that MySQL used LIMIT instead of TOP...
fiddle
CREATE TABLE tbl (ID INT,tm INT,dy INT);
INSERT INTO tbl (id, tm, dy) VALUES
(1,1,1)
,(2,2,1)
,(3,3,1)
,(1,1,2)
,(1,1,1)
SELECT ID
,SUM(SumTimeForDay) SumTimeFromLastThreeDays
FROM (SELECT ID
,SUM(tm) SumTimeForDay
FROM tbl
GROUP BY ID, dy
HAVING dy > MAX(dy) -3) a
GROUP BY id
ORDER BY SUM(SumTimeForDay) DESC
LIMIT 2
select t1.`id`, sum(t1.`time`) as `sum`
from `table` t1
inner join ( select distinct `day` from `table` order by `day` desc limit 3 ) t2
on t2.`da`y = t1.`day`
group by t1.`id`
order by sum(t1.`time`) desc
limit 2

Advanced MySQL: Find correlations between poll responses

I've got four MySQL tables:
users (id, name)
polls (id, text)
options (id, poll_id, text)
responses (id, poll_id, option_id, user_id)
Given a particular poll and a particular option, I'd like to generate a table that shows which options from other polls are most strongly correlated.
Suppose this is our data set:
TABLE users:
+------+-------+
| id | name |
+------+-------+
| 1 | Abe |
| 2 | Bob |
| 3 | Che |
| 4 | Den |
+------+-------+
TABLE polls:
+------+-----------------------+
| id | text |
+------+-----------------------+
| 1 | Do you like apples? |
| 2 | What is your gender? |
| 3 | What is your height? |
| 4 | Do you like polls? |
+------+-----------------------+
TABLE options:
+------+----------+---------+
| id | poll_id | text |
+------+----------+---------+
| 1 | 1 | Yes |
| 2 | 1 | No |
| 3 | 2 | Male |
| 4 | 2 | Female |
| 5 | 3 | Short |
| 6 | 3 | Tall |
| 7 | 4 | Yes |
| 8 | 4 | No |
+------+----------+---------+
TABLE responses:
+------+----------+------------+----------+
| id | poll_id | option_id | user_id |
+------+----------+------------+----------+
| 1 | 1 | 1 | 1 |
| 2 | 1 | 2 | 2 |
| 3 | 1 | 2 | 3 |
| 4 | 1 | 2 | 4 |
| 5 | 2 | 3 | 1 |
| 6 | 2 | 3 | 2 |
| 7 | 2 | 3 | 3 |
| 8 | 2 | 4 | 4 |
| 9 | 3 | 5 | 1 |
| 10 | 3 | 6 | 2 |
| 10 | 3 | 5 | 3 |
| 10 | 3 | 6 | 4 |
| 10 | 4 | 7 | 1 |
| 10 | 4 | 7 | 2 |
| 10 | 4 | 7 | 3 |
| 10 | 4 | 7 | 4 |
+------+----------+------------+----------+
Given the poll ID 1 and the option ID 2, the generated table should be something like this:
+----------+------------+-----------------------+
| poll_id | option_id | percent_correlated |
+----------+------------+-----------------------+
| 4 | 7 | 100 |
| 2 | 3 | 66.66 |
| 3 | 6 | 66.66 |
| 2 | 4 | 33.33 |
| 3 | 5 | 33.33 |
| 4 | 8 | 0 |
+----------+------------+-----------------------+
So basically, we're identifying all of the users who responded to poll ID 1 and selected option ID 2, and we're looking through all the other polls to see what percentage of them also selected each other option.
Don't have an instance handy to test, can you see if this gets proper results:
select
poll_id,
option_id,
((psum - (sum1 * sum2 / n)) / sqrt((sum1sq - pow(sum1, 2.0) / n) * (sum2sq - pow(sum2, 2.0) / n))) AS r,
n
from
(
select
poll_id,
option_id,
SUM(score) AS sum1,
SUM(score_rev) AS sum2,
SUM(score * score) AS sum1sq,
SUM(score_rev * score_rev) AS sum2sq,
SUM(score * score_rev) AS psum,
COUNT(*) AS n
from
(
select
responses.poll_id,
responses.option_id,
CASE
WHEN user_resp.user_id IS NULL THEN SELECT 0
ELSE SELECT 1
END CASE as score,
CASE
WHEN user_resp.user_id IS NULL THEN SELECT 1
ELSE SELECT 0
END CASE as score_rev,
from responses left outer join
(
select
user_id
from
responses
where
poll_id = 1 and
option_id = 2
)user_resp
ON (user_resp.user_id = responses.user_id)
) temp1
group by
poll_id,
option_id
)components
After a few hours of trial and error, I managed to put together a query that works correctly:
SELECT poll_id AS p_id,
option_id AS o_id,
COUNT(*) AS optCount,
(SELECT COUNT(*) FROM response WHERE option_id = o_id AND user_id IN
(SELECT user_id FROM response WHERE poll_id = '1' AND option_id = '2')) /
(SELECT COUNT(*) FROM response WHERE poll_id = p_id AND user_id IN
(SELECT user_id FROM response WHERE poll_id = '1' AND option_id = '2'))
AS percentage
FROM response
INNER JOIN
(SELECT user_id FROM response WHERE poll_id = '1' AND option_id = '2') AS user_ids
ON response.user_id = user_ids.user_id
WHERE poll_id != '1'
GROUP BY option_id DESC
ORDER BY percentage DESC, optCount DESC
Based on a tests with a small data set, this query looks to be reasonably fast, but I'd like to modify it so the "IN" subquery is not repeated three times. Any suggestions?
This seems to give the right results for me:
select poll_stats.poll_id,
option_stats.option_id,
(100 * option_responses / poll_responses) as percent_correlated
from (select response.poll_id,
count(*) as poll_responses
from response selecting_response
join response on response.user_id = selecting_response.user_id
where selecting_response.poll_id = 1 and selecting_response.option_id = 2
group by response.poll_id) poll_stats
join (select options.poll_id,
options.id as option_id,
count(response.id) as option_responses
from options
left join response on response.poll_id = options.poll_id
and response.option_id = options.id
and exists (
select 1 from response selecting_response
where selecting_response.user_id = response.user_id
and selecting_response.poll_id = 1
and selecting_response.option_id = 2)
group by options.poll_id, options.id
) as option_stats
on option_stats.poll_id = poll_stats.poll_id
where poll_stats.poll_id <> 1
order by 3 desc, option_responses desc