MySQL query broke after adding join - mysql

I have a query that loads data for a loop in my CMS to display posts. Everything was working fine in the vote columns until I left-joined the comment column. Comments display OK, but the totalvote, upvote and downvote counts are wildly off. Let me know if you need to see the tables.
SELECT
count(DISTINCT comment.comment ) AS Comment,
idea.dateofcreation AS timestamp,
idea.userId AS userId,
idea.id AS ID,
idea.text AS Idea,
page.permalink AS Permalink,
user.name AS Username,
COUNT(CASE WHEN votelog.vote !="" THEN 1 END) AS 'totalvotes',
COUNT(CASE WHEN votelog.vote = '1' THEN 1 END) AS 'upvote',
COUNT(CASE WHEN votelog.vote = '-1' THEN 1 END) AS 'downvote'
FROM idea
LEFT JOIN votelog ON idea.id = votelog.ideaid
LEFT JOIN user ON idea.userId = user.id
LEFT JOIN page ON idea.id = page.ideaid
LEFT join comment ON comment.ideaid = idea.id
GROUP BY idea.id
ORDER BY totalvotes DESC

It seems you decided to join the table comments only to count how many comments are there.
But by doing so, you are now producing multiple rows, one per comment, and that gets all other counts off.
I would suggest using a scalar subquery to count the comments, and removing the join. Something like:
SELECT
(select count(DISTINCT comment) from comment c where c.ideaid = idea.id) AS Comment,
idea.dateofcreation AS timestamp,
idea.userId AS userId,
idea.id AS ID,
idea.text AS Idea,
page.permalink AS Permalink,
user.name AS Username,
COUNT(CASE WHEN votelog.vote !="" THEN 1 END) AS 'totalvotes',
COUNT(CASE WHEN votelog.vote = '1' THEN 1 END) AS 'upvote',
COUNT(CASE WHEN votelog.vote = '-1' THEN 1 END) AS 'downvote'
FROM idea
LEFT JOIN votelog ON idea.id = votelog.ideaid
LEFT JOIN user ON idea.userId = user.id
LEFT JOIN page ON idea.id = page.ideaid
GROUP BY idea.id
ORDER BY totalvotes DESC

You have more than one comment. So, aggregate before joining:
SELECT c.num_comments,
i.dateofcreation AS timestamp, i.userId AS userId, i.id AS ID, i.text AS Idea,
p.permalink AS Permalink, p.name AS Username,
SUM( vl.vote <> '' ) AS totalvotes,
SUM( vl.vote = 1 ) AS upvote,
SUM( vl.vote = -1 ) AS downvote
FROM idea i LEFT JOIN
votelog vl
ON i.id = vl.ideaid LEFT JOIN
user u
ON i.userId = u.id LEFT JOIN
page p
ON i.id = p.ideaid LEFT JOIN
(SELECT c.ideaid, COUNT(*) as num_comments
FROM comment c
GROUP BY c.ideaid
) c
ON c.ideaid = i.id
GROUP BY c.num_comments,
i.dateofcreation i.userId, i.id, i.text,
p.permalink, p.name
ORDER BY totalvotes DESC;
Notes:
Table aliases make a query easier to write and to read.
It is a good practice to include all non-aggregated columns in the GROUP BY (and more recent versions of MySQL tend to enforce this).
I assume the votes are numbers. Compare them to numbers, not to strings.
MySQL has a nice shorthand for counting the number of times a boolean expression is true.

Related

How to get correct SUM() when joining table with duplicated row data?

Currently I have 3 tables, the first table 'Users' contains id and user_name. The second table 'listings' contains refno and agent_id. And my third table 'logs' contains refno and status. Now I want to display the name of a person next to their status. So basically I want the count of status entries from logs and put their respective username next to it.
To do this, I have to reference refno of 'logs' to refno of 'listings' and the agent_id of 'listings' to id of 'Users'. For this I have used the following statement:
select SUM(CASE WHEN status = 'Draft' THEN 1 END) AS draft,
SUM(CASE WHEN status = 'Publish' THEN 1 END) AS publish,
u.name
from logs t
inner join listings l on t.refno = l.refno
inner join users u on l.agent_id=u.id
But this returns an output like:
Which is wrong, the output I want is like this:
Draft
Publish
Name
1
1
Jason
0
1
Jam
I've added a sqlfiddle with data to make the reference easier to understand: http://sqlfiddle.com/#!9/22b6e4/5
The glaring problem to overcome is the fact that you have non-unique data in your listings table -- this is skewing your sums.
You need to join only on unique rows so that you don't count a subsequently joined row more than once.
SELECT u.id,
u.name,
SUM(status = 'Draft') AS draft,
SUM(status = 'Publish') AS publish
FROM users AS u
JOIN (SELECT DISTINCT * FROM listings) AS l ON u.id = l.agent_id
JOIN logs AS t ON l.refno = t.refno
GROUP BY u.id
I prefer to include the id in the result set because names are often not unique.
http://sqlfiddle.com/#!9/22b6e4/48
Update:
Based upon your comments below, you need to first create pseudo table inside the FROM clause that must contain all the necessary data in order to get your desired results.
The below subquery creates a pseudo table that contains all the required data.
SELECT u.id,
u.name,
t.status,
t.refno
FROM logs t
INNER JOIN listings l ON t.refno = l.refno
INNER JOIN users u ON l.agent_id = u.id
GROUP BY t.refno, u.name, t.status;
You simply need to wrap the above query as a subquery inside the FROM clause of your original query.
So here is the final query that gets your desired output.
SELECT SUM(CASE WHEN tab.status = 'Draft' THEN 1 ELSE 0 END) AS draft,
SUM(CASE WHEN tab.status = 'Publish' THEN 1 ELSE 0 END) AS publish,
tab.name
FROM (SELECT u.id,
u.name,
t.status,
t.refno
FROM logs t
INNER JOIN listings l ON t.refno = l.refno
INNER JOIN users u ON l.agent_id = u.id
GROUP BY t.refno, u.name, t.status) AS tab
GROUP BY tab.name
ORDER BY tab.id;
Original Answer:
You need to add a GROUP BY clause to group the result according to your desired parameter(s).
Here you can either group by l.agent_id or u.id.
Another thing that i noticed is that you need to add an ELSE clause to your SUM statement to return 0 in case an unexpected status is returned from the query.
Something like this:
SUM(CASE WHEN status = 'Publish' THEN 1 ELSE 0 END)
So your final query becomes something like this:
SELECT SUM(CASE WHEN status = 'Draft' THEN 1 ELSE 0 END) AS draft,
SUM(CASE WHEN status = 'Publish' THEN 1 ELSE 0 END) AS publish,
u.name
FROM logs t
INNER JOIN listings l ON t.refno = l.refno
INNER JOIN users u ON l.agent_id=u.id
GROUP BY u.id;

How to properly join these three tables in SQL?

I'm currently creating a small application where users can post a text which can be commented and the post can also be voted (+1 or -1).
This is my database:
Now I want to select all information of all posts with status = 1 plus two extra columns: One column containing the count of comments and one column containing the sum (I call it score) of all votes.
I currently use the following query, which correctly adds the count of the comments:
SELECT *, COUNT(comments.fk_commented_post) as comments
FROM posts
LEFT JOIN comments
ON posts.id_post = comments.fk_commented_post
AND comments.status = 1
WHERE posts.status = 1
GROUP BY posts.id_post
Then I tried to additionally add the sum of the votes, using the following query:
SELECT *, COUNT(comments.fk_commented_post) as comments, SUM(votes_posts.type) as score
FROM posts
LEFT JOIN comments
ON posts.id_post = comments.fk_commented_post
AND comments.status = 1
LEFT JOIN votes_posts
ON posts.id_post = votes_posts.fk_voted_post
WHERE posts.status = 1
GROUP BY posts.id_post
The result is no longer correct for either the votes or the comments. Somehow some of the values seem to be getting multiplied...
This is probably simpler using correlated subqueries:
select p.*,
(select count(*)
from comments c
where c.fk_commented_post = p.id_post and c.status = 1
) as num_comments,
(select sum(vp.type)
from votes_posts vp
where c.fk_voted_post = p.id_post
) as num_score
from posts p
where p.status = 1;
The problem with join is that the counts get messed up because the two other tables are not related to each tother -- so you get a Cartesian product.
You want to join comments counts and votes counts to the posts. So, aggregate to get the counts, then join.
select
p.*,
coalesce(c.cnt, 0) as comments,
coalesce(v.cnt, 0) as votes
from posts p
left join
(
select fk_commented_post as id_post, count(*) as cnt
from comments
where status = 1
group by fk_commented_post
) c on c.id_post = p.id_post
left join
(
select fk_voted_post as id_post, count(*) as cnt
from votes_posts
group by fk_voted_post
) v on v.id_post = p.id_post
where p.status = 1
order by p.id_post;

Count elements from a table with differents conditions in mySql?

I wanna count all the orders a user has and all the complete orders a user has. I came with this but it´s not working
select
count(a.id) as total,
count(b.id) as complete
from
user
join
orders a on user.id = a.user_id
join
orders b on user.id = b.user_id
where
a.id = 1
and
(b.id = 1 and b.complete = 'yes');
Any idea?
you could sum the order with yes and count the distinct id group by user
select user.id, sum(if(a.complete ='yes',1,0)), count(distinct a.id)
from user
INNER join orders a on user.id = a.user_id
group by user.id
I believe you are searching for grouping (MySQL GROUP BY) by the differents users, and then count all the orders related to each user plus the completed ones. For this approach, you will need to:
(1) Join users with they orders.
(2) Use GROUP BY clause on user.id column.
(3) Count all orders related to each user with COUNT()
(4) Sum all orders related to each user having some specific condition with SUM(CASE WHEN <specific_condition> THEN 1 ELSE 0 END).
In summary, a query like next one should work:
SELECT
u.id,
COUNT(o.id) AS total_orders,
SUM(CASE WHEN o.complete = "yes" THEN 1 ELSE 0 END) AS complete_orders
FROM
user AS u
INNER JOIN
orders AS o ON o.user_id = u.id
GROUP BY
u.id

Count votes in subquery or use join - which is faster?

I am working on a forum system (mysql) and I'm not sure which path to choose for better performance when retrieving in a single query posts, up and down votes and if the current user voted for each post.
The first option is this:
SELECT posts.post_id, post_content, display_name,
(SELECT COUNT(post_id) FROM post_votes WHERE post_votes.post_id=posts.post_id AND post_votes.user_id='+user_id+') voted,
(SELECT COUNT(post_id) FROM post_votes WHERE post_votes.post_id=posts.post_id AND up_vote=1) upvotes,
(SELECT COUNT(post_id) FROM post_votes WHERE post_votes.post_id=posts.post_id AND up_vote=0) downvotes
FROM posts JOIN users ON users.user_id=posts.user_id WHERE parent_id ='+parent_id+' ORDER BY post_id DESC
The second option is to replace all the count sub-queries with LEFT JOIN and count.
Are there any advantages to one method over the other?
Edit:
Since I'm looking to retrieve all posts rather than a single row that groups posts, I came up with this query (with some inspiration from here):
SELECT p.post_id, post_content, display_name,
COALESCE(v.upvotes, 0) AS upvotes,
COALESCE(v.downvotes, 0) AS downvotes,
COALESCE(v.voted, 0) AS voted
FROM posts p
LEFT JOIN (
SELECT post_id,
SUM(vt.up_vote = 1) AS upvotes,
SUM(vt.up_vote = 0) AS downvotes,
MAX(IF(vt.user_id = ' + user_id + ', vt.up_vote, NULL)) voted
FROM post_votes vt
GROUP BY vt.post_id
)
v ON v.post_id = p.post_id
JOIN users ON users.user_id=p.user_id
WHERE parent_id =' + parent_id + ' ORDER BY post_id DESC
I have ran both solutions on my demo db (tiny at the moment, contains less than 100 rows in each table) and the durations were identical.
The question is which one will be faster for the long term.
I can hardly think of anything where a subquery was faster than a join.
In this case you don't even need a join. Do it all in one query:
SELECT
p.post_id,
p.post_content,
u.display_name,
COUNT(pv.post_id) AS voted,
SUM(pv.up_vote = 1) AS upvotes,
SUM(pv.up_vote = 0) downvotes
FROM posts p
JOIN users u ON u.user_id = p.user_id
LEFT JOIN post_votes pv ON posts.post_id = pv.post_id AND pv.user_id ='whatever'
WHERE p.parent_id ='+parent_id+'
GROUP BY p.post_id
ORDER BY p.post_id DESC
The pv.up_vote = 'whatever' inside the SUM() function returns either true or false, 1 or 0. That's why we use the SUM() function here. And voila, everything in one query.

LEFT JOIN after GROUP BY?

I have a table of "Songs", "Songs_Tags" (relating songs with tags) and "Songs_Votes" (relating songs with boolean like/dislike).
I need to retrieve the songs with a GROUP_CONCAT() of its tags and also the number of likes (true) and dislikes (false).
My query is something like that:
SELECT
s.*,
GROUP_CONCAT(st.id_tag) AS tags_ids,
COUNT(CASE WHEN v.vote=1 THEN 1 ELSE NULL END) as votesUp,
COUNT(CASE WHEN v.vote=0 THEN 1 ELSE NULL END) as votesDown,
FROM Songs s
LEFT JOIN Songs_Tags st ON (s.id = st.id_song)
LEFT JOIN Votes v ON (s.id=v.id_song)
GROUP BY s.id
ORDER BY id DESC
The problem is that when a Song has more than 1 tag, it gets returned more then once, so when I do the COUNT(), it returns more results.
The best solution I could think is if it would be possible to do the last LEFT JOIN after the GROUP BY (so now there would be only one entry for each song). Then I'd need another GROUP BY m.id.
Is there a way to accomplish that? Do I need to use a subquery?
There've been some good answers so far, but I would adopt a slightly different method quite similar to what you described originally
SELECT
songsWithTags.*,
COALESCE(SUM(v.vote),0) AS votesUp,
COALESCE(SUM(1-v.vote),0) AS votesDown
FROM (
SELECT
s.*,
COLLATE(GROUP_CONCAT(st.id_tag),'') AS tags_ids
FROM Songs s
LEFT JOIN Songs_Tags st
ON st.id_song = s.id
GROUP BY s.id
) AS songsWithTags
LEFT JOIN Votes v
ON songsWithTags.id = v.id_song
GROUP BY songsWithTags.id DESC
In this the subquery is responsible for collating songs with tags into a 1 row per song basis. This is then joined onto Votes afterwards. I also opted to simply sum up the v.votes column as you have indicated it is 1 or 0 and therefore a SUM(v.votes) will add up 1+1+1+0+0 = 3 out of 5 are upvotes, while SUM(1-v.vote) will sum 0+0+0+1+1 = 2 out of 5 are downvotes.
If you had an index on votes with the columns (id_song,vote) then that index would be used for this so it wouldn't even hit the table. Likewise if you had an index on Songs_Tags with (id_song,id_tag) then that table wouldn't be hit by the query.
edit added solution using count
SELECT
songsWithTags.*,
COUNT(CASE WHEN v.vote=1 THEN 1 END) as votesUp,
COUNT(CASE WHEN v.vote=0 THEN 1 END) as votesDown
FROM (
SELECT
s.*,
COLLATE(GROUP_CONCAT(st.id_tag),'') AS tags_ids
FROM Songs s
LEFT JOIN Songs_Tags st
ON st.id_song = s.id
GROUP BY s.id
) AS songsWithTags
LEFT JOIN Votes v
ON songsWithTags.id = v.id_song
GROUP BY songsWithTags.id DESC
Try this:
SELECT
s.*,
GROUP_CONCAT(DISTINCT st.id_tag) AS tags_ids,
COUNT(DISTINCT CASE WHEN v.vote=1 THEN id_vote ELSE NULL END) AS votesUp,
COUNT(DISTINCT CASE WHEN v.vote=0 THEN id_vote ELSE NULL END) AS votesDown
FROM Songs s
LEFT JOIN Songs_Tags st ON (s.id = st.id_song)
LEFT JOIN Votes v ON (s.id=v.id_song)
GROUP BY s.id
ORDER BY id DESC
Your code results in a mini-Cartesian product because you are doing two Joins in 1-to-many relationships and the 1 table is on the same side of both joins.
Convert to 2 subqueries with groupings and then Join:
SELECT
s.*,
COALESCE(st.tags_ids, '') AS tags_ids,
COALESCE(v.votesUp, 0) AS votesUp,
COALESCE(v.votesDown, 0) AS votesDown
FROM
Songs AS s
LEFT JOIN
( SELECT
id_song,
GROUP_CONCAT(id_tag) AS tags_ids
FROM Songs_Tags
GROUP BY id_song
) AS st
ON s.id = st.id_song
LEFT JOIN
( SELECT
id_song,
COUNT(CASE WHEN v.vote=1 THEN id_vote END) AS votesUp,
COUNT(CASE WHEN v.vote=0 THEN id_vote END) AS votesDown
FROM Votes
GROUP BY id_song
) AS v
ON s.id = v.id_song
ORDER BY s.id DESC