LEFT JOIN after GROUP BY? - mysql

I have a table of "Songs", "Songs_Tags" (relating songs with tags) and "Songs_Votes" (relating songs with boolean like/dislike).
I need to retrieve the songs with a GROUP_CONCAT() of its tags and also the number of likes (true) and dislikes (false).
My query is something like that:
SELECT
s.*,
GROUP_CONCAT(st.id_tag) AS tags_ids,
COUNT(CASE WHEN v.vote=1 THEN 1 ELSE NULL END) as votesUp,
COUNT(CASE WHEN v.vote=0 THEN 1 ELSE NULL END) as votesDown,
FROM Songs s
LEFT JOIN Songs_Tags st ON (s.id = st.id_song)
LEFT JOIN Votes v ON (s.id=v.id_song)
GROUP BY s.id
ORDER BY id DESC
The problem is that when a Song has more than 1 tag, it gets returned more then once, so when I do the COUNT(), it returns more results.
The best solution I could think is if it would be possible to do the last LEFT JOIN after the GROUP BY (so now there would be only one entry for each song). Then I'd need another GROUP BY m.id.
Is there a way to accomplish that? Do I need to use a subquery?

There've been some good answers so far, but I would adopt a slightly different method quite similar to what you described originally
SELECT
songsWithTags.*,
COALESCE(SUM(v.vote),0) AS votesUp,
COALESCE(SUM(1-v.vote),0) AS votesDown
FROM (
SELECT
s.*,
COLLATE(GROUP_CONCAT(st.id_tag),'') AS tags_ids
FROM Songs s
LEFT JOIN Songs_Tags st
ON st.id_song = s.id
GROUP BY s.id
) AS songsWithTags
LEFT JOIN Votes v
ON songsWithTags.id = v.id_song
GROUP BY songsWithTags.id DESC
In this the subquery is responsible for collating songs with tags into a 1 row per song basis. This is then joined onto Votes afterwards. I also opted to simply sum up the v.votes column as you have indicated it is 1 or 0 and therefore a SUM(v.votes) will add up 1+1+1+0+0 = 3 out of 5 are upvotes, while SUM(1-v.vote) will sum 0+0+0+1+1 = 2 out of 5 are downvotes.
If you had an index on votes with the columns (id_song,vote) then that index would be used for this so it wouldn't even hit the table. Likewise if you had an index on Songs_Tags with (id_song,id_tag) then that table wouldn't be hit by the query.
edit added solution using count
SELECT
songsWithTags.*,
COUNT(CASE WHEN v.vote=1 THEN 1 END) as votesUp,
COUNT(CASE WHEN v.vote=0 THEN 1 END) as votesDown
FROM (
SELECT
s.*,
COLLATE(GROUP_CONCAT(st.id_tag),'') AS tags_ids
FROM Songs s
LEFT JOIN Songs_Tags st
ON st.id_song = s.id
GROUP BY s.id
) AS songsWithTags
LEFT JOIN Votes v
ON songsWithTags.id = v.id_song
GROUP BY songsWithTags.id DESC

Try this:
SELECT
s.*,
GROUP_CONCAT(DISTINCT st.id_tag) AS tags_ids,
COUNT(DISTINCT CASE WHEN v.vote=1 THEN id_vote ELSE NULL END) AS votesUp,
COUNT(DISTINCT CASE WHEN v.vote=0 THEN id_vote ELSE NULL END) AS votesDown
FROM Songs s
LEFT JOIN Songs_Tags st ON (s.id = st.id_song)
LEFT JOIN Votes v ON (s.id=v.id_song)
GROUP BY s.id
ORDER BY id DESC

Your code results in a mini-Cartesian product because you are doing two Joins in 1-to-many relationships and the 1 table is on the same side of both joins.
Convert to 2 subqueries with groupings and then Join:
SELECT
s.*,
COALESCE(st.tags_ids, '') AS tags_ids,
COALESCE(v.votesUp, 0) AS votesUp,
COALESCE(v.votesDown, 0) AS votesDown
FROM
Songs AS s
LEFT JOIN
( SELECT
id_song,
GROUP_CONCAT(id_tag) AS tags_ids
FROM Songs_Tags
GROUP BY id_song
) AS st
ON s.id = st.id_song
LEFT JOIN
( SELECT
id_song,
COUNT(CASE WHEN v.vote=1 THEN id_vote END) AS votesUp,
COUNT(CASE WHEN v.vote=0 THEN id_vote END) AS votesDown
FROM Votes
GROUP BY id_song
) AS v
ON s.id = v.id_song
ORDER BY s.id DESC

Related

How to properly join these three tables in SQL?

I'm currently creating a small application where users can post a text which can be commented and the post can also be voted (+1 or -1).
This is my database:
Now I want to select all information of all posts with status = 1 plus two extra columns: One column containing the count of comments and one column containing the sum (I call it score) of all votes.
I currently use the following query, which correctly adds the count of the comments:
SELECT *, COUNT(comments.fk_commented_post) as comments
FROM posts
LEFT JOIN comments
ON posts.id_post = comments.fk_commented_post
AND comments.status = 1
WHERE posts.status = 1
GROUP BY posts.id_post
Then I tried to additionally add the sum of the votes, using the following query:
SELECT *, COUNT(comments.fk_commented_post) as comments, SUM(votes_posts.type) as score
FROM posts
LEFT JOIN comments
ON posts.id_post = comments.fk_commented_post
AND comments.status = 1
LEFT JOIN votes_posts
ON posts.id_post = votes_posts.fk_voted_post
WHERE posts.status = 1
GROUP BY posts.id_post
The result is no longer correct for either the votes or the comments. Somehow some of the values seem to be getting multiplied...
This is probably simpler using correlated subqueries:
select p.*,
(select count(*)
from comments c
where c.fk_commented_post = p.id_post and c.status = 1
) as num_comments,
(select sum(vp.type)
from votes_posts vp
where c.fk_voted_post = p.id_post
) as num_score
from posts p
where p.status = 1;
The problem with join is that the counts get messed up because the two other tables are not related to each tother -- so you get a Cartesian product.
You want to join comments counts and votes counts to the posts. So, aggregate to get the counts, then join.
select
p.*,
coalesce(c.cnt, 0) as comments,
coalesce(v.cnt, 0) as votes
from posts p
left join
(
select fk_commented_post as id_post, count(*) as cnt
from comments
where status = 1
group by fk_commented_post
) c on c.id_post = p.id_post
left join
(
select fk_voted_post as id_post, count(*) as cnt
from votes_posts
group by fk_voted_post
) v on v.id_post = p.id_post
where p.status = 1
order by p.id_post;

How to limit record before group by for pagination?

I have this query that will LEFT JOIN and GROUP BY to get SUM of column.
SELECT
c.id,
SUM(
r.score
) AS score_sum,
SUM(
CASE WHEN r.is_active = '0' THEN r.negative ELSE 0 END
) AS negative_sum
FROM comments AS c
LEFT JOIN rates AS r ON (r.comment_id = c.id)
WHERE r.comment_id = c.id
GROUP BY c.id
DB Fiddle link:
https://dbfiddle.uk/?rdbms=mysql_5.7&fiddle=fadba795d8426f91471fa4db83845b6f
The query works, but if the comments records is large (10K for example), I need to implement pagination, how do I modify this query to limit the comments records first before GROUP BY?
In short:
Get the first 5 comments by limit to 5
Left join the table rates
Get the SUM by group by
Example, show the first 4 comments SUM
Thanks
You can use subquery to "select c.id from comments limit N" in the FROM clause.
select c.id,
sum(r.score) as score_sum,
SUM(
CASE WHEN r.is_active = '0' THEN r.negative ELSE 0 END
) AS negative_sum
from ( select c.id from comments c limit 2) c
LEFT JOIN rates AS r ON (r.comment_id = c.id)
GROUP BY c.id;
You may apply order by in the subquery to determine order in which you want to select the comments (Top N).
DB Fiddle link
Try the following:
SELECT
c.id,
SUM(
r.score
) AS score_sum,
SUM(
CASE WHEN r.is_active = '0' THEN r.negative ELSE 0 END
) AS negative_sum
FROM comments AS c
LEFT JOIN rates AS r ON (r.comment_id = c.id)
WHERE r.comment_id = c.id
GROUP BY c.id
ORDER BY c.id ASC
LIMIT 5
The rationale behind the above query is that id is the Primary key (hence indexed) in your comments table. Also, your GROUP BY and ORDER BY is on the same column, that is, id; so MySQL will first utilize the index on id and get first 5 rows (due to LIMIT), and then proceed forward to JOIN with other tables and do aggregation etc.
Give it a Try!! More details here: https://dev.mysql.com/doc/refman/5.7/en/order-by-optimization.html
We can confirm the same using EXPLAIN .. on this query.

MySQL query broke after adding join

I have a query that loads data for a loop in my CMS to display posts. Everything was working fine in the vote columns until I left-joined the comment column. Comments display OK, but the totalvote, upvote and downvote counts are wildly off. Let me know if you need to see the tables.
SELECT
count(DISTINCT comment.comment ) AS Comment,
idea.dateofcreation AS timestamp,
idea.userId AS userId,
idea.id AS ID,
idea.text AS Idea,
page.permalink AS Permalink,
user.name AS Username,
COUNT(CASE WHEN votelog.vote !="" THEN 1 END) AS 'totalvotes',
COUNT(CASE WHEN votelog.vote = '1' THEN 1 END) AS 'upvote',
COUNT(CASE WHEN votelog.vote = '-1' THEN 1 END) AS 'downvote'
FROM idea
LEFT JOIN votelog ON idea.id = votelog.ideaid
LEFT JOIN user ON idea.userId = user.id
LEFT JOIN page ON idea.id = page.ideaid
LEFT join comment ON comment.ideaid = idea.id
GROUP BY idea.id
ORDER BY totalvotes DESC
It seems you decided to join the table comments only to count how many comments are there.
But by doing so, you are now producing multiple rows, one per comment, and that gets all other counts off.
I would suggest using a scalar subquery to count the comments, and removing the join. Something like:
SELECT
(select count(DISTINCT comment) from comment c where c.ideaid = idea.id) AS Comment,
idea.dateofcreation AS timestamp,
idea.userId AS userId,
idea.id AS ID,
idea.text AS Idea,
page.permalink AS Permalink,
user.name AS Username,
COUNT(CASE WHEN votelog.vote !="" THEN 1 END) AS 'totalvotes',
COUNT(CASE WHEN votelog.vote = '1' THEN 1 END) AS 'upvote',
COUNT(CASE WHEN votelog.vote = '-1' THEN 1 END) AS 'downvote'
FROM idea
LEFT JOIN votelog ON idea.id = votelog.ideaid
LEFT JOIN user ON idea.userId = user.id
LEFT JOIN page ON idea.id = page.ideaid
GROUP BY idea.id
ORDER BY totalvotes DESC
You have more than one comment. So, aggregate before joining:
SELECT c.num_comments,
i.dateofcreation AS timestamp, i.userId AS userId, i.id AS ID, i.text AS Idea,
p.permalink AS Permalink, p.name AS Username,
SUM( vl.vote <> '' ) AS totalvotes,
SUM( vl.vote = 1 ) AS upvote,
SUM( vl.vote = -1 ) AS downvote
FROM idea i LEFT JOIN
votelog vl
ON i.id = vl.ideaid LEFT JOIN
user u
ON i.userId = u.id LEFT JOIN
page p
ON i.id = p.ideaid LEFT JOIN
(SELECT c.ideaid, COUNT(*) as num_comments
FROM comment c
GROUP BY c.ideaid
) c
ON c.ideaid = i.id
GROUP BY c.num_comments,
i.dateofcreation i.userId, i.id, i.text,
p.permalink, p.name
ORDER BY totalvotes DESC;
Notes:
Table aliases make a query easier to write and to read.
It is a good practice to include all non-aggregated columns in the GROUP BY (and more recent versions of MySQL tend to enforce this).
I assume the votes are numbers. Compare them to numbers, not to strings.
MySQL has a nice shorthand for counting the number of times a boolean expression is true.

mysql query to count no of rows in joining three tables and count rows of one table

I have a problem in MYSQL query.
I have three tables one is voucher table other clients and third is voucher_client.
In voucher_client table I have voucher id column that relate to voucher table and I want to count related rows from client table.
Like if voucher table has id 2 and voucher clients are 2 then query will check from client table age_group column where age_group is adult ,child or infant
here some pictures of tables for more detail.
Please help me out
Voucher table
Client table
Voucher client table
I am trying to do like this
SELECT `v`.*, `a`.`name` as `agent_name`, COUNT(CASE WHEN c.age_group = 'Adult' THEN c.id END) AS t_adult, COUNT(CASE WHEN c.age_group = 'Child' THEN c.id END) AS t_child, COUNT(CASE WHEN c.age_group = 'Infant' THEN c.id END) AS t_infant, COUNT(c.id) as total FROM `voucher` `v` JOIN `voucher_client` `vc` ON `vc`.`voucher_id`=`v`.`id` JOIN `client` `c` ON `vc`.`client_id`=`c`.`id` JOIN `tbl_users` `a` ON `a`.`userId`=`v`.`agent_id` LEFT JOIN `voucher_hotel` `vh` ON `vh`.`voucher_id`=`v`.`id` WHERE `v`.`isDeleted` =0 GROUP BY `v`.`id` ORDER BY `v`.`id` DESC
expected output like this
voucher_id t_adult t_child t_infant
1 2 0 0
2 1 0 0
If only want to show v.id in the result, then replace v.* by v.id in the query.
(Btw, most databases wouldn't even allow a * when there's group by. MySql deviates from the ANSI SQL standard in that aspect.)
And if you need to join to an extra table with an 1-N relationship? Then you can count the distinct values. So that the totals only reflect the unique clientid's.
SELECT
v.id AS voucher_id,
COUNT(DISTINCT CASE WHEN c.age_group = 'Adult' THEN c.id END) AS t_adult,
COUNT(DISTINCT CASE WHEN c.age_group = 'Child' THEN c.id END) AS t_child,
COUNT(DISTINCT CASE WHEN c.age_group = 'Infant' THEN c.id END) AS t_infant
-- , COUNT(*) as total
-- , COUNT(c.id) as total_clientid -- count on value doesn't count NULL's
-- , COUNT(DISTINCT c.id) as total_unique_clientid
FROM voucher v
JOIN voucher_client vc ON vc.voucher_id = v.id
JOIN client c ON c.id = vc.client_id
-- LEFT JOIN voucher_hotel vh ON vh.voucher_id = v.id
WHERE v.isDeleted = 0
-- AND c.age_group = 'Adult' -- uncomment this to only count the adults
GROUP BY v.id
ORDER BY v.id

Mysql SUM Float give wrong value [duplicate]

I'm looking for help using sum() in my SQL query:
SELECT links.id,
count(DISTINCT stats.id) as clicks,
count(DISTINCT conversions.id) as conversions,
sum(conversions.value) as conversion_value
FROM links
LEFT OUTER JOIN stats ON links.id = stats.parent_id
LEFT OUTER JOIN conversions ON links.id = conversions.link_id
GROUP BY links.id
ORDER BY links.created desc;
I use DISTINCT because I'm doing "group by" and this ensures the same row is not counted more than once.
The problem is that SUM(conversions.value) counts the "value" for each row more than once (due to the group by)
I basically want to do SUM(conversions.value) for each DISTINCT conversions.id.
Is that possible?
I may be wrong but from what I understand
conversions.id is the primary key of your table conversions
stats.id is the primary key of your table stats
Thus for each conversions.id you have at most one links.id impacted.
You request is a bit like doing the cartesian product of 2 sets :
[clicks]
SELECT *
FROM links
LEFT OUTER JOIN stats ON links.id = stats.parent_id
[conversions]
SELECT *
FROM links
LEFT OUTER JOIN conversions ON links.id = conversions.link_id
and for each link, you get sizeof([clicks]) x sizeof([conversions]) lines
As you noted the number of unique conversions in your request can be obtained via a
count(distinct conversions.id) = sizeof([conversions])
this distinct manages to remove all the [clicks] lines in the cartesian product
but clearly
sum(conversions.value) = sum([conversions].value) * sizeof([clicks])
In your case, since
count(*) = sizeof([clicks]) x sizeof([conversions])
count(*) = sizeof([clicks]) x count(distinct conversions.id)
you have
sizeof([clicks]) = count(*)/count(distinct conversions.id)
so I would test your request with
SELECT links.id,
count(DISTINCT stats.id) as clicks,
count(DISTINCT conversions.id) as conversions,
sum(conversions.value)*count(DISTINCT conversions.id)/count(*) as conversion_value
FROM links
LEFT OUTER JOIN stats ON links.id = stats.parent_id
LEFT OUTER JOIN conversions ON links.id = conversions.link_id
GROUP BY links.id
ORDER BY links.created desc;
Keep me posted !
Jerome
Jeromes solution is actually wrong and can produce incorrect results!!
sum(conversions.value)*count(DISTINCT conversions.id)/count(*) as conversion_value
let's assume the following table
conversions
id value
1 5
1 5
1 5
2 2
3 1
the correct sum of value for distinct ids would be 8.
Jerome's formula produces:
sum(conversions.value) = 18
count(distinct conversions.id) = 3
count(*) = 5
18*3/5 = 9.6 != 8
For an explanation of why you were seeing incorrect numbers, read this.
I think that Jerome has a handle on what is causing your error. Bryson's query would work, though having that subquery in the SELECT could be inefficient.
Use the following query:
SELECT links.id
, (
SELECT COUNT(*)
FROM stats
WHERE links.id = stats.parent_id
) AS clicks
, conversions.conversions
, conversions.conversion_value
FROM links
LEFT JOIN (
SELECT link_id
, COUNT(id) AS conversions
, SUM(conversions.value) AS conversion_value
FROM conversions
GROUP BY link_id
) AS conversions ON links.id = conversions.link_id
ORDER BY links.created DESC
I use a subquery to do this. It eliminates the problems with grouping.
So the query would be something like:
SELECT COUNT(DISTINCT conversions.id)
...
(SELECT SUM(conversions.value) FROM ....) AS Vals
How about something like this:
select l.id, count(s.id) clicks, count(c.id) clicks, sum(c.value) conversion_value
from (SELECT l.id id, l.created created,
s.id clicks,
c.id conversions,
max(c.value) conversion_value
FROM links l
LEFT JOIN stats s ON l.id = s.parent_id
LEFT JOIN conversions c ON l.id = c.link_id
GROUP BY l.id, l.created, s.id, c.id) t
order by t.created
This will do the trick, just divide the sum with the count of conversation id which are duplicate.
SELECT a.id,
a.clicks,
SUM(a.conversion_value/a.conversions) AS conversion_value,
a.conversions
FROM (SELECT links.id,
COUNT(DISTINCT stats.id) AS clicks,
COUNT(conversions.id) AS conversions,
SUM(conversions.value) AS conversion_value
FROM links
LEFT OUTER JOIN stats ON links.id = stats.parent_id
LEFT OUTER JOIN conversions ON links.id = conversions.link_id
GROUP BY conversions.id,links.id
ORDER BY links.created DESC) AS a
GROUP BY a.id
Select sum(x.value) as conversion_value,count(x.clicks),count(x.conversions)
FROM
(SELECT links.id,
count(DISTINCT stats.id) as clicks,
count(DISTINCT conversions.id) as conversions,
conversions.value,
FROM links
LEFT OUTER JOIN stats ON links.id = stats.parent_id
LEFT OUTER JOIN conversions ON links.id = conversions.link_id
GROUP BY conversions.id) x
GROUP BY x.id
ORDER BY x.created desc;
I believe this will give you the answer that you are looking for.