SQL syntax to retrieve votes strangely misses some results - mysql

I've tried to look other posts to do a Mysql query, I think I'm almost there but for some reason I have a bug.
I have those tables :
POSTS [id_post (int) / activity (int) ]
VOTES [id_vote (int) / id_post (int) / user_id (int)]
ACTIVITIES [id_activity (int) / activity_name (varchar)]
I have in fact more tables and fields than that, but these are the relevant one for my problem. I created a vote system that adds a user vote to the VOTES table, refering to the post that is being voted and the user account of the voting person.
So the votes table may look like this :
id_vote | id_post | user_id
1 5 8
2 6 8
Every post belongs to an activity. I would like to lists the posts which have the most votes for each activity.
That's where I am so far :
SELECT activities.id_activity, activities.activity_name, votes.id_post, votes.totalvotes AS allvotes
FROM ( SELECT votes.id_post, COUNT(*) as totalvotes
FROM `votes`
GROUP BY votes.id_post
) AS votes
JOIN posts ON posts.id_post = votes.id_post
JOIN activities ON posts.activity = activities.id_activity
GROUP BY activities.id_activity
HAVING allvotes = MAX(totalvotes)
This works well, I retrieve what I want except that if 2 posts in the same activity have the same amount of votes, I have no idea which one only appears after grouping those posts, and why it's not the other one.
id_activity | activity_name | id_post | allvotes
5 eating 5 2
3 sleeping 6 1
More importantly, what is really bugging me is that some activities won't show up for some reason. I noticed that, in the above example, if the post 5 belonging to the eating category is the one that has the most votes indeed, then the category appears. BUT if it happens that the post for the eating category which have the mosts votes IS NOT post 5 (which is the one MYSQL decided to show up by default), the the whole row about the eating category just WON'T SHOW UP at all.
It's been 2 days I'm on this...
Any ideas ?
Thanks a bunch.

It is little complex to implement in a single query. it will be easy with some temporary tables. if you want it in a single query then you can try with row_number.
example:
SELECT id_activity,activity_name,id_post,v_count FROM(
SELECT id_activity,activity_name,id_post,v_count
,#rownum := IF(#prev_value=id_activity,#rownum+1,1) AS RowNumber
,#prev_value := id_activity
FROM (
select a.id_activity,a.activity_name,p.id_post,v.v_count
from activities a
left join posts p on p.activity=a.id_activity
left join (
select id_post,count(*) v_count from votes v1
group by id_post
) v on v.id_post=p.id_post
order by a.id_activity, v.v_count desc) as tmp
,(SELECT #rownum := 0) r
,(SELECT #prev_value := '') y
)tmp2
WHERE rownumber=1
FIDDLE

i guess its because you group by the activity, i wonder why this statement is allowed in mysql, in mssql you are not allowed to group only by activities.id_activity in this case, you had to group by activity and/or id_post
try this:
SELECT activities.id_activity, activities.activity_name, votes.id_post, COUNT(*) as totalvotes
FROM votes
INNER JOIN posts ON posts.id_post = votes.id_post
INNER JOIN activities ON posts.activity = activities.id_activity
GROUP BY votes.id_post
HAVING totalvotes = MAX(totalvotes)
and lmk if it works :)
or your statement modified:
SELECT activities.id_activity, activities.activity_name, votes.id_post, votes.totalvotes AS allvotes
FROM ( SELECT votes.id_post, COUNT(*) as totalvotes
FROM `votes`
GROUP BY votes.id_post
) AS votes
JOIN posts ON posts.id_post = votes.id_post
JOIN activities ON posts.activity = activities.id_activity
GROUP BY votes.id_post
HAVING allvotes = MAX(totalvotes)
if i get you right then its because you have 2 posts with the same activity, as soon as you group the activities and not group the posts, one post disappears.

Related

Selecting a count of rows having a max value

Working example: http://sqlfiddle.com/#!9/80995/20
I have three tables, a user table, a user_group table, and a link table.
The link table contains the dates that users were added to user groups. I need a query that returns the count of users currently in each group. The most recent date determines the group that the user is currently in.
SELECT
user_groups.name,
COUNT(l.name) AS ct,
GROUP_CONCAT(l.`name` separator ", ") AS members
FROM user_groups
LEFT JOIN
(SELECT MAX(added), group_id, name FROM link LEFT JOIN users ON users.id = link.user_id GROUP BY user_id) l
ON l.group_id = user_groups.id
GROUP BY user_groups.id
My question is if the query I have written could be optimized, or written better.
Thanks!
Ben
You actual query is not giving you the answer you want; at least, as far as I understand your question. John actually joined group 2 on 2017-01-05, yet it appears on group 1 (that he joined on 2017-01-01) on your results. Note also you're missing one Group 4.
Using standard SQL, I think the next query is what you're looking for. The comments in the query should clarify what each part is doing:
SELECT
user_groups.name AS group_name,
COUNT(u.name) AS member_count,
group_concat(u.name separator ', ') AS members
FROM
user_groups
LEFT JOIN
(
SELECT * FROM
(-- For each user, find most recent date s/he got into a group
SELECT
user_id AS the_user_id, MAX(added) AS last_added
FROM
link
GROUP BY
the_user_id
) AS u_a
-- Join back to the link table, so that the `group_id` can be retrieved
JOIN link l2 ON l2.user_id = u_a.the_user_id AND l2.added = u_a.last_added
) AS most_recent_group ON most_recent_group.group_id = user_groups.id
-- And get the users...
LEFT JOIN users u ON u.id = most_recent_group.the_user_id
GROUP BY
user_groups.id, user_groups.name
ORDER BY
user_groups.name ;
This can be written in a more compact way in MySQL (abusing the fact that, in older versions of MySQL, it doesn't follow the SQL standard for the GROUP BY restrictions).
That's what you'll get:
group_name | member_count | members
:--------- | -----------: | :-------------
Group 1 | 2 | Mikie, Dominic
Group 2 | 2 | John, Paddy
Group 3 | 0 | null
Group 4 | 1 | Nellie
dbfiddle here
Note that this query can be simplified if you use a database with window functions (such as MariaDB 10.2). Then, you can use:
SELECT
user_groups.name AS group_name,
COUNT(u.name) AS member_count,
group_concat(u.name separator ', ') AS members
FROM
user_groups
LEFT JOIN
(
SELECT
user_id AS the_user_id,
last_value(group_id) OVER (PARTITION BY user_id ORDER BY added) AS group_id
FROM
link
GROUP BY
user_id
) AS most_recent_group ON most_recent_group.group_id = user_groups.id
-- And get the users...
LEFT JOIN users u ON u.id = most_recent_group.the_user_id
GROUP BY
user_groups.id, user_groups.name
ORDER BY
user_groups.name ;
dbfiddle here

MySQL Frequency of frequency report

Given many users have many posts
I have a table of posts that has a foreign key user_id
I want to generate a report that shows the frequency of users against frequency of posts
e.g.
3 users wrote 2 posts each
2 users wrote 1 post each
1 user wrote 4 posts
Number of users | Number of posts
--------------- | ------------------
1 | 4
2 | 1
3 | 2
My attempt:
SELECT inner_table.frequency_posts,
Count(*) AS frequency_users
FROM posts
INNER JOIN (SELECT user_id,
Count(*) AS frequency_posts
FROM posts
GROUP BY user_id) AS inner_table
ON posts.user_id = inner_table.user_id
GROUP BY inner_table.frequency_posts
I think frequency_posts is working but counting frequency_users isn't giving the right values - when I look at the inner select on it's own and manually add up the posts I don't get the same values
You have to use Group by twice:
SELECT
COUNT(*) AS NumberOfUsers,
foo.NumberOfPosts
FROM
(SELECT
p.UserId AS UserId,
COUNT(*) AS NumberOfPosts
FROM
posts AS p
GROUP BY UserId) as foo
GROUP BY foo.NumberOfPosts

Sql conditional count with join

I cannot find the answer to my problem here on stackoverflow. I have a query that spans 3 tables:
newsitem
+------+----------+----------+----------+--------+----------+
| Guid | Supplier | LastEdit | ShowDate | Title | Contents |
+------+----------+----------+----------+--------+----------+
newsrating
+----+----------+--------+--------+
| Id | NewsGuid | UserId | Rating |
+----+----------+--------+--------+
usernews
+----+----------+--------+----------+
| Id | NewsGuid | UserId | ReadDate |
+----+----------+--------+----------+
Newsitem obviously contains newsitems, newsrating contains ratings that users give to newsitems, and usernews contains the date when a user has read a newsitem.
In my query I want to get every newsitem, including the number of ratings for that newsitem and the average rating, and how many times that newsitem has been read by the current user.
What I have so far is:
select newsitem.guid, supplier, count(newsrating.id) as numberofratings,
avg(newsrating.rating) as rating,
count(case usernews.UserId when 3 then 1 else null end) as numberofreads from newsitem
left join newsrating on newsitem.guid = newsrating.newsguid
left join usernews on newsitem.guid = usernews.newsguid
group by newsitem.guid
I have created an sql fiddle here: http://sqlfiddle.com/#!9/c8add/8
Both count() calls don't return the numbers I want. numberofratings should return the total number of ratings for that newsitem (by all users). numberofreads should return the number of reads for the current user for that newsitem.
So, newsitem with guid d104c330-c319-40e8-8be3-a7c4f549d35c should have 2 ratings and 3 reads for the current user with userid = 3.
I have tried conditional counts and sums, but no success yet. How can this be accomplished?
The main problem that I see is that you're joining in both tables together, which means that you're going to effectively be multiplying out by both numbers, which is why your counts aren't going to be correct. For example, if the Newsitem has been read 3 times by the user and rated by 8 users then you're going to end up getting 24 rows, so it will look like it has been rated 24 times. You can add a DISTINCT to your COUNT of the ratings IDs and that should correct that issue. Average should be unaffected because the average of 1 and 2 is the same as the average of 1, 1, 2, & 2 (for example).
You can then handle the reads by adding the userid to the JOIN condition (since it's an OUTER JOIN it shouldn't cause any loss of results) instead of in a CASE statement for your COUNT, then you can do a COUNT on distinct id values from Usernews. The resulting query would be:
SELECT
I.guid,
I.supplier,
COUNT(DISTINCT R.id) AS number_of_ratings,
AVG(R.rating) AS avg_rating,
COUNT(DISTINCT UN.id) AS number_of_reads
FROM
NewsItem I
LEFT OUTER JOIN NewsRating R ON R.newsguid = I.guid
LEFT OUTER JOIN UserNews UN ON
UN.newsguid = I.guid AND
UN.userid = #userid
GROUP BY
I.guid,
I.supplier
While that should work, you might get better results from a subquery, as the above needs to explode out the results and then aggregate them, perhaps unnecessarily. Also, some people might find the below to be a little clearer.
SELECT
I.guid,
I.supplier,
R.number_of_ratings,
R.avg_rating,
COUNT(*) AS number_of_reads
FROM
NewsItem I
LEFT OUTER JOIN
(
SELECT
newsguid,
COUNT(*) AS number_of_ratings,
AVG(rating) AS avg_rating
FROM
NewsRating
GROUP BY
newsguid
) R ON R.newsguid = I.guid
LEFT OUTER JOIN UserNews UN ON UN.newsguid = I.guid AND UN.userid = #userid
GROUP BY
I.guid,
I.supplier,
R.number_of_ratings,
R.avg_rating
I'm with Tom you should use a subquery to calculate the user count.
SQL Fiddle Demo
SELECT NI.guid,
NI.supplier,
COUNT(NR.ID) as numberofratings,
AVG(NR.rating) as rating,
user_read as numberofreads
FROM newsitem NI
LEFT JOIN newsrating NR
ON NI.guid = NR.newsguid
LEFT JOIN (SELECT NewsGuid, COUNT(*) user_read
FROM usernews
WHERE UserId = 3 -- use a variable #user_id here
GROUP BY NewsGuid) UR
ON NI.guid = UR.NewsGuid
GROUP BY NI.guid,
NI.supplier,
numberofreads;

sql join query to get most viewed atricles by country

having a hard time pulling articles based on user country views
i have the following tables with their fields
user: id, country_id
article: id
article_views: id, user_id, article_id
each time a user views an article I insert it into the article_views table, like this:
article_views.id article_id user_id
2 1 1
3 2 1
4 2 2
5 2 2
I want to pull the highest viewed article for current user.country_id. I imagine it will contain:
order by article_views.article_id DESC
Any suggestions?
This should work fine:
select article_id, count(*) as views
from article_views inner join user on article_views.user_id = user.id
group by user.country_id
order by views desc;
Edit. I forgot about grouping article_id as #Josien pointed, the correct query is:
select country_id, article_id, count(*) as views
from article_views inner join user on article_views.user_id = user.id
where country_id = ':countryID'
group by user.country_id, article_id
order by views desc;
select av.id, count(*) as views
from article_views av inner join user u on av.user_id = u.id
where u.country_id = 'cc'
group by u.country_id
order by views desc;

Combine 2 SQL queries into one

I have an article table which contains all the articles of my website.
Design looks like this:
id | user_id | name | date
I also have a followers table which contains all followers that follow each article
It's design looks like this:
id | article_id | user_id | date
Currently I have a SQL query which brings me 50 articles that exist ordered by date
SELECT id,user_id,name,date
FROM articles
ORDER BY date
LIMIT 50
I also have one more query which shows me 50 of the followers of an article
SELECT id, article_id, user_id, date
FROM followers
WHERE article_id = 5
LIMIT 50
How can i make this one query so I don't have to call 2 queries since I find it useless and could take more time to load?
Try this solution:
SELECT a.*, b.*
FROM
(
SELECT *
FROM articles
ORDER BY date
LIMIT 50
) a
INNER JOIN
(
SELECT aa.*
FROM followers aa
LEFT JOIN followers bb ON aa.article_id = bb.article_id AND aa.id < bb.id
GROUP BY aa.id
HAVING COUNT(1) < 50
) b ON a.id = b.article_id
This gets the 50 oldest articles joined with the 50 latest followers for each of those articles.
It is also quite flexible:
Change aa.id < bb.id to aa.id > bb.id to get the 50 first followers.
Change ORDER BY date to ORDER BY date DESC to get the 50 latest articles.
Change LIMIT n where n represents the amount of articles you want to retrieve.
Change HAVING COUNT(1) < n where n represents the amount of followers you want to retrieve per article.
And you can mix all four up to facilitate any combination you want.
SELECT a.id, a.user_id, a.name, a.date, f.id, f.article_id, f.user_id, f.date
FROM articles a, followers f
WHERE f.article_id = 5
ORDER BY a.date
LIMIT 50