SQL get aggregate breakdown by joined table - mysql

I have the following entities and relationships in a MySQL database:
Each Post has N Review
Each Review has 1 Comment and a state (is_accepted)
Each Comment has 1 User
Desired result:
I'm trying to get an aggregate report of reviews on a specific post, grouped by user:
+---------+--------------+-----------------------+---------------------+
| user_id | review_count | review_accepted_count | review_denied_count |
+---------+--------------+-----------------------+---------------------+
| 1 | 3 | 2 | 1 |
| 2 | 5 | 1 | 4 |
| 3 | 1 | 1 | 0 |
+---------+--------------+-----------------------+---------------------+
What I've tried:
SELECT
C.user_id,
COUNT(C.user_id) AS review_count,
(SELECT COUNT(*) FROM reviews WHERE `post_id` = R.post_id AND `user_id` = C.user_id AND `is_accepted` = 1) review_accepted_count,
(SELECT COUNT(*) FROM reviews WHERE `post_id` = R.post_id AND `user_id` = C.user_id AND `is_accepted` = 0) review_denied_count
FROM reviews R
INNER JOIN comments C ON C.id = R.comment_id
WHERE post_id = 1234
GROUP BY C.user_id
Actual result:
The returned review_accepted_count and review_denied_count columns are the total across all reviews, not grouped per user

Try this:
SELECT
C.user_id,
COUNT(C.user_id) AS review_count,
SUM(CASE WHEN `is_accepted` = 1 THEN 1 ELSE 0 END) AS review_accepted_count,
SUM(CASE WHEN `is_accepted` = 0 THEN 1 ELSE 0 END) AS review_denied_count
FROM reviews R
INNER JOIN comments C ON C.id = R.comment_id
WHERE post_id = 1234
GROUP BY C.user_id
In your subqueries review_accepted_count and review_denied_count you should join (in WHERE clause) by the review primary key. You don't need to make subqueries to get the result. This way is faster.
If you only have 1s and 0s in column is_accepted you can do:
SUM(`is_accepted`) AS review_accepted_count

Related

SQL selecting most recent row inside join

I have 2 tables companies and invoices
I want to select all companies with their most recent invoice price.
I don't seem to get it working.
This is what I tried:
SELECT *
FROM companies H INNER JOIN
invoices V
ON H.company_id = V.BC_ID
WHERE V.ISCOMMISSIE = 0 AND
V.DATE = (SELECT MAX(v2.DATE) FROM invoices v2 WHERE v2.BC_ID = V.BC_ID AND v2.ISCOMMISSIE = 0);
But the query loads very long and I don't know why.
The structure looks like this:
companies
company_id | company_name |
1 | company 1 |
2 | company 2 |
invoices
invoice_id | BC_ID | DATE | ISCOMMISSIE | price |
1 | 2 | 2020-01-01 | 0 | 340,40 |
2 | 1 | 2020-01-11 | 0 | 240,40 |
3 | 1 | 2020-01-08 | 0 | 250,30 |
4 | 2 | 2020-01-18 | 0 | 150,30 |
5 | 2 | 2020-01-19 | 1 | 150,30 |
The BC_ID is the same as the company_id and ISCOMMISSIE should be 0.
I want to select the most recent date.
Does someone have an idea on how to do this and also make the query as fast as possible?
http://sqlfiddle.com/#!9/2fc3a/1
Try:
SELECT H.*, V.*
FROM companies H
INNER JOIN invoices V ON H.company_id = V.BC_ID
INNER JOIN ( SELECT v2.BC_ID, MAX(v2.DATE) DATE
FROM invoices v2
WHERE v2.ISCOMMISSIE = 0
GROUP BY v2.BC_ID ) v3 ON v.BC_ID = v3.BC_ID
AND v.DATE = v3.DATE
AND V.ISCOMMISSIE = 0
And the index invoices (ISCOMMISSIE, BC_ID, DATE) may help...
Your query is fine:
SELECT *
FROM companies H INNER JOIN
invoices V
ON H.company_id = V.BC_ID
WHERE V.ISCOMMISSIE = 0 AND
V.DATE = (SELECT MAX(v2.DATE)
FROM invoices v2
WHERE v2.BC_ID = V.BC_ID AND
v2.ISCOMMISSIE = 0
);
For performance, you want an index on invoices(BC_ID, ISCOMMISSIE, DATE).
A good alternative is to use window functions:
SELECT *
FROM companies H INNER JOIN
(SELECT V.*,
ROW_NUMBER() OVER (PARTITION BY BC_ID ORDER BY DATE DESC) as seqnum
FROM invoices V
WHERE V.ISCOMMISSIE = 0
) V
ON H.company_id = V.BC_ID
WHERE seqnum = 1;
Depending on columns you need, you might not need to join with companies table. Also it is not needed to test for iscommissie = 0 two times, you can just test it one time in the subquery before joining.
See the query below :
SELECT i.*
FROM invoices i
JOIN (
SELECT i.bc_id, MAX(date) AS max_date
FROM invoices i
WHERE iscommissie = 0
GROUP BY i.bc_id
) i_temp ON i.bc_id = i_temp.bc_id AND i.date = i_temp.max_date
FIND A DEMO HERE
Another way to get the expected output:
select * from companies A join (
select * from invoices where (BC_ID,DATE) in(
select BC_ID as BC_ID, MAX(DATE) DATE from invoices where ISCOMMISSIE = 0 group by
BC_ID
))B on A.company_id=B.BC_ID;

Strange duplicate behavior from GROUP_CONCAT of two LEFT JOINs of GROUP_BYs

Here is all my tables' structure and the query (please focus on the last query, appended below). As you see in the fiddle, here is the current output:
+---------+-----------+-------+------------+--------------+
| user_id | user_name | score | reputation | top_two_tags |
+---------+-----------+-------+------------+--------------+
| 1 | Jack | 0 | 18 | css,mysql |
| 4 | James | 1 | 5 | html |
| 2 | Peter | 0 | 0 | null |
| 3 | Ali | 0 | 0 | null |
+---------+-----------+-------+------------+--------------+
It's correct and all fine.
Now I have one more existence named "category". Each post can has only one category. And I also want to get top two categories for each user. And here is my new query. As you see in the result, some duplicates happened:
+---------+-----------+-------+------------+--------------+------------------------+
| user_id | user_name | score | reputation | top_two_tags | top_two_categories |
+---------+-----------+-------+------------+--------------+------------------------+
| 1 | Jack | 0 | 18 | css,css | technology,technology |
| 4 | James | 1 | 5 | html | political |
| 2 | Peter | 0 | 0 | null | null |
| 3 | Ali | 0 | 0 | null | null |
+---------+-----------+-------+------------+--------------+------------------------+
See? css,css, technology, technology. Why these are duplicate? I've just added one more LEFT JOIN for categories, exactly like tags. But it doesn't work as expected and even affects on the tags either.
Anyway, this is the expected result:
+---------+-----------+-------+------------+--------------+------------------------+
| user_id | user_name | score | reputation | top_two_tags | category |
+---------+-----------+-------+------------+--------------+------------------------+
| 1 | Jack | 0 | 18 | css,mysql | technology,social |
| 4 | James | 1 | 5 | html | political |
| 2 | Peter | 0 | 0 | null | null |
| 3 | Ali | 0 | 0 | null | null |
+---------+-----------+-------+------------+--------------+------------------------+
Does anybody know how can I achieve that?
CREATE TABLE users(id integer PRIMARY KEY, user_name varchar(5));
CREATE TABLE tags(id integer NOT NULL PRIMARY KEY, tag varchar(5));
CREATE TABLE reputations(
id integer PRIMARY KEY,
post_id integer /* REFERENCES posts(id) */,
user_id integer REFERENCES users(id),
score integer,
reputation integer,
date_time integer);
CREATE TABLE post_tag(
post_id integer /* REFERENCES posts(id) */,
tag_id integer REFERENCES tags(id),
PRIMARY KEY (post_id, tag_id));
CREATE TABLE categories(id INTEGER NOT NULL PRIMARY KEY, category varchar(10) NOT NULL);
CREATE TABLE post_category(
post_id INTEGER NOT NULL /* REFERENCES posts(id) */,
category_id INTEGER NOT NULL REFERENCES categories(id),
PRIMARY KEY(post_id, category_id)) ;
SELECT
q1.user_id, q1.user_name, q1.score, q1.reputation,
substring_index(group_concat(q2.tag ORDER BY q2.tag_reputation DESC SEPARATOR ','), ',', 2) AS top_two_tags,
substring_index(group_concat(q3.category ORDER BY q3.category_reputation DESC SEPARATOR ','), ',', 2) AS category
FROM
(SELECT
u.id AS user_Id,
u.user_name,
coalesce(sum(r.score), 0) as score,
coalesce(sum(r.reputation), 0) as reputation
FROM
users u
LEFT JOIN reputations r
ON r.user_id = u.id
AND r.date_time > 1500584821 /* unix_timestamp(DATE_SUB(now(), INTERVAL 1 WEEK)) */
GROUP BY
u.id, u.user_name
) AS q1
LEFT JOIN
(
SELECT
r.user_id AS user_id, t.tag, sum(r.reputation) AS tag_reputation
FROM
reputations r
JOIN post_tag pt ON pt.post_id = r.post_id
JOIN tags t ON t.id = pt.tag_id
WHERE
r.date_time > 1500584821 /* unix_timestamp(DATE_SUB(now(), INTERVAL 1 WEEK)) */
GROUP BY
user_id, t.tag
) AS q2
ON q2.user_id = q1.user_id
LEFT JOIN
(
SELECT
r.user_id AS user_id, c.category, sum(r.reputation) AS category_reputation
FROM
reputations r
JOIN post_category ct ON ct.post_id = r.post_id
JOIN categories c ON c.id = ct.category_id
WHERE
r.date_time > 1500584821 /* unix_timestamp(DATE_SUB(now(), INTERVAL 1 WEEK)) */
GROUP BY
user_id, c.category
) AS q3
ON q3.user_id = q1.user_id
GROUP BY
q1.user_id, q1.user_name, q1.score, q1.reputation
ORDER BY
q1.reputation DESC, q1.score DESC ;
Your second query is of the form:
q1 -- PK user_id
LEFT JOIN (...
GROUP BY user_id, t.tag
) AS q2
ON q2.user_id = q1.user_id
LEFT JOIN (...
GROUP BY user_id, c.category
) AS q3
ON q3.user_id = q1.user_id
GROUP BY -- group_concats
The inner GROUP BYs result in (user_id, t.tag) & (user_id, c.category) being keys/UNIQUEs. Other than that I won't address those GROUP BYs.
TL;DR When you join (q1 JOIN q2) to q3 it is not on a key/UNIQUE of one of them so for each user_id you get a row for every possible combination of tag & category. So the final GROUP BY inputs duplicates per (user_id, tag) & per (user_id, category) and inappropriately GROUP_CONCATs duplicate tags & categories per user_id. Correct would be (q1 JOIN q2 GROUP BY) JOIN (q1 JOIN q3 GROUP BY) in which all joins are on common key/UNIQUE (user_id) & there is no spurious aggregation. Although sometimes you can undo such spurious aggregation.
A correct symmetrical INNER JOIN approach: LEFT JOIN q1 & q2--1:many--then GROUP BY & GROUP_CONCAT (which is what your first query did); then separately similarly LEFT JOIN q1 & q3--1:many--then GROUP BY & GROUP_CONCAT; then INNER JOIN the two results ON user_id--1:1.
A correct symmetrical scalar subquery approach: SELECT the GROUP_CONCATs from q1 as scalar subqueries each with a GROUP BY.
A correct cumulative LEFT JOIN approach: LEFT JOIN q1 & q2--1:many--then GROUP BY & GROUP_CONCAT; then LEFT JOIN that & q3--1:many--then GROUP BY & GROUP_CONCAT.
A correct approach like your 2nd query: You first LEFT JOIN q1 & q2--1:many. Then you LEFT JOIN that & q3--many:1:many. It gives a row for every possible combination of a tag & a category that appear with a user_id. Then after you GROUP BY you GROUP_CONCAT--over duplicate (user_id, tag) pairs and duplicate (user_id, category) pairs. That is why you have duplicate list elements. But adding DISTINCT to GROUP_CONCAT gives a correct result. (Per wchiquito's comment.)
Which you prefer is as usual an engineering tradeoff to be informed by query plans & timings, per actual data/usage/statistics. input & stats for expected amount of duplication), timing of actual queries, etc. One issue is whether the extra rows of the many:1:many JOIN approach offset its saving of a GROUP BY.
-- cumulative LEFT JOIN approach
SELECT
q1.user_id, q1.user_name, q1.score, q1.reputation,
top_two_tags,
substring_index(group_concat(q3.category ORDER BY q3.category_reputation DESC SEPARATOR ','), ',', 2) AS category
FROM
-- your 1st query (less ORDER BY) AS q1
(SELECT
q1.user_id, q1.user_name, q1.score, q1.reputation,
substring_index(group_concat(q2.tag ORDER BY q2.tag_reputation DESC SEPARATOR ','), ',', 2) AS top_two_tags
FROM
(SELECT
u.id AS user_Id,
u.user_name,
coalesce(sum(r.score), 0) as score,
coalesce(sum(r.reputation), 0) as reputation
FROM
users u
LEFT JOIN reputations r
ON r.user_id = u.id
AND r.date_time > 1500584821 /* unix_timestamp(DATE_SUB(now(), INTERVAL 1 WEEK)) */
GROUP BY
u.id, u.user_name
) AS q1
LEFT JOIN
(
SELECT
r.user_id AS user_id, t.tag, sum(r.reputation) AS tag_reputation
FROM
reputations r
JOIN post_tag pt ON pt.post_id = r.post_id
JOIN tags t ON t.id = pt.tag_id
WHERE
r.date_time > 1500584821 /* unix_timestamp(DATE_SUB(now(), INTERVAL 1 WEEK)) */
GROUP BY
user_id, t.tag
) AS q2
ON q2.user_id = q1.user_id
GROUP BY
q1.user_id, q1.user_name, q1.score, q1.reputation
) AS q1
-- finish like your 2nd query
LEFT JOIN
(
SELECT
r.user_id AS user_id, c.category, sum(r.reputation) AS category_reputation
FROM
reputations r
JOIN post_category ct ON ct.post_id = r.post_id
JOIN categories c ON c.id = ct.category_id
WHERE
r.date_time > 1500584821 /* unix_timestamp(DATE_SUB(now(), INTERVAL 1 WEEK)) */
GROUP BY
user_id, c.category
) AS q3
ON q3.user_id = q1.user_id
GROUP BY
q1.user_id, q1.user_name, q1.score, q1.reputation
ORDER BY
q1.reputation DESC, q1.score DESC ;

Strange order of results when adding joins

I'm trying to build a commenting system on my website but having issues with ordering the comments correctly. This is a screenshot of what I had before it went wrong:
And this is the query before it went wrong:
SELECT
com.comment_id,
com.parent_id,
com.is_reply,
com.user_id,
com.comment,
com.posted,
usr.username
FROM
blog_comments AS com
LEFT JOIN
users AS usr ON com.user_id = usr.user_id
WHERE
com.article_id = :article_id AND com.moderated = 1 AND com.status = 1
ORDER BY
com.parent_id DESC;
I now want to include each comment's votes from my blog_comment_votes table, using a LEFT OUTER JOIN, and came up with this query, which works, but screws with the order of results:
SELECT
com.comment_id,
com.parent_id,
com.is_reply,
com.user_id,
com.comment,
com.posted,
usr.username,
IFNULL(c.cnt,0) votes
FROM
blog_comments AS com
LEFT JOIN
users AS usr ON com.user_id = usr.user_id
LEFT OUTER JOIN (
SELECT comment_id, COUNT(vote_id) as cnt
FROM blog_comment_votes
GROUP BY comment_id) c
ON com.comment_id = c.comment_id
WHERE
com.article_id = :article_id AND com.moderated = 1 AND com.status = 1
ORDER BY
com.parent_id DESC;
I now get this order, which is bizarre:
I tried adding a GROUP BY clause on com.comment_id but that failed too. I can't understand how adding a simple join can alter the order of results! Can anybody help back on the correct path?
EXAMPLE TABLE DATA AND EXPECTED RESULTS
These are my relevant tables with example data:
[users]
user_id | username
--------|-----------------
1 | PaparazzoKid
[blog_comments]
comment_id | parent_id | is_reply | article_id | user_id | comment
-----------|-----------|----------|------------|---------|---------------------------
1 | 1 | | 1 | 1 | First comment
2 | 2 | 1 | 1 | 20 | Reply to first comment
3 | 3 | | 1 | 391 | Second comment
[blog_comment_votes]
vote_id | comment_id | article_id | user_id
--------|------------|------------|--------------
1 | 2 | 1 | 233
2 | 2 | 1 | 122
So the order should be
First comment
Reply to first comment +2
Second Comment
It's difficult to say without looking at your query results, but my guess is that it's because you are only ordering by parent id and not saying how to order when two records have the same parent id. Try changing your query to look like this:
SELECT
com.comment_id,
com.parent_id,
com.is_reply,
com.user_id,
com.comment,
com.posted,
usr.username,
COUNT(c.votes) votes
FROM
blog_comments AS com
LEFT JOIN
users AS usr ON com.user_id = usr.user_id
LEFT JOIN
blog_comment_votes c ON com.comment_id = c.comment_id
WHERE
com.article_id = :article_id AND com.moderated = 1 AND com.status = 1
GROUP BY
com.comment_id,
com.parent_id,
com.is_reply,
com.user_id,
com.comment,
com.posted,
usr.username
ORDER BY
com.parent_id DESC, com.comment_id;

MySQL syntax issue, multiple SELECT statements

I am working of a project that has 2 tables as follows: users_fb and posts
I spent 3 hours playing with the code and then I gave up.
table: posts
+-----+---------+---------+---------+---------+-----------+
| id | by_user | by_page | votes | status | time |
+-----+---------+---------+---------+---------+-----------+
| 1 | 1 | 0 | 20 | 1 | 372041014 |
+-----+---------+---------+---------+---------+-----------+
table: users_fb
+-----+-----------+-------+---------+--------+-------+
| id | username | name | gender | fb_id | email |
+-----+-----------+-------+---------+--------+-------+
SELECT username,
(
SELECT COUNT(b.by_user)
FROM users_fb a LEFT JOIN posts b ON a.id = b.by_user
WHERE b.by_page = '0'
GROUP BY a.username
) AS totalCount ,
(
SELECT IFNULL(SUM(b.votes),0)
FROM users_fb a LEFT JOIN posts b ON a.id = b.by_user
GROUP BY users_fb.id
) AS total_votes
FROM users_fb ORDER BY total_votes DESC
The desired output
+-------------------+-------------+-------------+
| username | totalCount | total_votes |
+-------------------+-------------+-------------+
| user4 | 1 | 25 |
| user1 | 0 | 0 |
| user2 | 0 | 0 |
| user3 | 0 | 0 |
+-------------------+-------------+-------------+
UNFORTUNATELY: This is what I am getting
+-------------------+-------------+-------------+
| username | totalCount | total_votes |
+-------------------+-------------+-------------+
| user4 | 1 | 25 |
| user1 | 1 | 25 |
| user2 | 1 | 25 |
| user3 | 1 | 25 |
+-------------------+-------------+-------------+
If you need any further information, let me know. Thanks for your help.
You don't appear to have anything to join your tables together to match up with posts / votes go with which user.
Something like this should do it
SELECT users_fb.username, Sub1.postcount, Sub2.votecount
FROM users_fb
LEFT OUTER JOIN(
SELECT a.username, COUNT(*) AS postcount
FROM users_fb a
INNER JOIN posts b
ON a.id = b.by_user
WHERE b.by_page = '0'
GROUP BY a.username
) Sub1
ON users_fb.username = Sub1.username
LEFT OUTER JOIN(
SELECT users_fb.id, IFNULL(SUM(b.votes),0) AS votecount
FROM users_fb a
LEFT JOIN posts b
ON a.id = b.by_user
GROUP BY users_fb.id
) Sub2
ON users_fb.id = Sub2.id
Possibly simplified to
SELECT a.username, SUM(IF(b.by_page = '0', 1, 0)) AS postcount, IFNULL(SUM(b.votes),0) AS votecount
FROM users_fb a
LEFT JOIN posts b
ON a.id = b.by_user
GROUP BY a.username
Since you are doing no matching of the selects (i.e. there is no binding WHERE between them), MySQL has no wait to make them show together.
You should do something like this:
SELECT username, totalCount.count, total_votesGROUPED.sum
FROM users_fb
LEFT JOIN (
SELECT COUNT(b.by_user) as count, a.username
FROM users_fb a LEFT JOIN posts b ON a.id = b.by_user
WHERE b.by_page = '0'
GROUP BY a.username
) AS totalCount ON totalCount.username = users_fb.username,
(
SELECT IFNULL(SUM(b.votes),0) as sum, id
FROM users_fb a LEFT JOIN posts b ON a.id = b.by_user
GROUP BY users_fb.id
) AS total_votesGROUPED ON total_votesGROUPED.id = users_fb.id
ORDER BY total_votes DESC
If I had a bit more information, I could test it
Quite a few problems, the biggest is that you don't make any join between your "main query" and your "subqueries".
So finally, something like that should be better.
SELECT
a.username,
SUM (CASE WHEN b.by_page IS NOT NULL and b.by_page = '0' then 1 else 0 end) as cnt,
SUM (IFNULL(b.votes),0) as nbVotes,
FROM users_fb a
LEFT JOIN posts b ON a.id = b.by_user
GROUP BY users_fb.id
SELECT a.username,
COUNT(b.by_user) totalCount,
SUM(IFNULL(b.votes,0)) total_votes
FROM users_fb a
LEFT JOIN posts b
ON a.id = b.by_user
WHERE b.by_page = '0'
GROUP BY a.id,a.username
SELECT username, IFNULL(COUNT(b.by_user), 0) totalCount, IFNULL(SUM(c.votes), 0) total_votes
FROM users_fb a
LEFT JOIN posts b
ON a.id = b.by_user AND b.by_page = 0
LEFT JOIN posts c
ON a.id = b.by_user
GROUP BY username
SQLFIDDLE

Trouble with join statement for selecting data across tables with MYSQL

I'm trying to display events that a user has created and events that he has signed up for. I have three tables for this.
//events
| event_id | event_title | event_details | event_timestamp | userid
1 title1 test 1234 1
2 title2 testing2 123 2
//registration_items : event_id references events.event_id
| id | event_id | task_name
1 2 task 1
//registration_signup : id references registration_items.id
| id | userid | timestamp
1 1 1234
Here's the current query I have. Right now it only displays the event the user created. It should display both created events and ones he signed up for
select events.*, registration_items.*, registration_signup.*, users.username from events
INNER JOIN users on users.userid = events.userid
LEFT JOIN registration_items ON registration_items.event_id = events.event_id
LEFT JOIN registration_signup ON registration_signup.id = registration_items.id
WHERE events.userid = '$user_id' OR registration_signup.userid = '$user_id' ORDER BY events.event_timestamp DESC
For userid1 the output should be
Title
title1 (the user created this)
title2 (the user signed up for this)
For userid2 the output should be
Title
title2
select events.*, registration_items.*, registration_signup.*, users.username
from events
INNER JOIN users on users.userid = events.userid
LEFT JOIN registration_items ON registration_items.event_id = events.event_id
LEFT JOIN registration_signup ON registration_signup.id = registration_items.id
WHERE registration_signup.userid = '$user_id'
union
select events.*, registration_items.*, registration_signup.*, users.username
from events
INNER JOIN users on users.userid = events.userid
INNER JOIN registration_items ON registration_items.event_id = events.event_id
INNER JOIN registration_signup ON registration_signup.id = registration_items.id
WHERE events.userid = '$user_id'
ORDER BY events.event_timestamp DESC
I am not sure if it's correct to guess that you have a typo in the sample data. Because your query works the moment you change the user id to 1 for both events, signed up events.. So please take a look at this reference demo and comment if it's not a typo...
SQLFIDDLE DEMO
query: (your query..)
select u.userid,e.event_id as id,
ri.event_id as evt, e.event_title,
ri.task_name,
rs.timestamp,
u.name from events e
INNER JOIN users u on
u.userid = e.userid
LEFT JOIN registration_items ri ON
ri.event_id = e.event_id
LEFT JOIN registration_signup rs ON
rs.id = ri.id
WHERE e.userid = '1' or
rs.userid = '1'
ORDER BY e.event_timestamp DESC
;
Results:
| USERID | ID | EVT | EVENT_TITLE | TASK_NAME | TIMESTAMP | NAME |
------------------------------------------------------------------
| 1 | 1 | 1 | title1 | task 1 | 1234 | john |
| 1 | 2 | 2 | title2 | task 1.2 | 3456 | john |