Count not giving right results with three joins - mysql

I have three tables (MySQL)
forum: each line in this table is a comment in the forum related to the match by static_id and related to the author by user_id
|match_static_id| date | time | comments | user_id |
matches: this table contains matches with all its information
| static_id | localteam_name | visitorteam_name | date | time |.......
iddaa : this table contains a code for each match (some matches do not have codes here)
|match_static_id| iddaa_code |
I make a query like following:
SELECT forum.match_static_id, forum.date, forum.time,
count(forum.comments) 'comments_no', matches.*, users.username, iddaa.iddaa_code
FROM forum
INNER JOIN matches ON forum.match_static_id = matches.static_id
INNER JOIN users on forum.user_id = users.id
LEFT JOIN iddaa on forum.match_static_id = iddaa.match_static_id
GROUP BY forum.match_static_id
ORDER BY forum.date DESC, forum.time DESC
the query work as I want (I get the match information, iddaa code for the match if there is one, and the author of the comment(last comment) ).
The problem is in the "count function" I should get the number of the comments related to the same match bur the query returned (double of each value)
for example if I have 5 comments for a match it returns 10
I want to know if all parts of my query is right and any help will be good?

Maybe it can be wrapped in a sub query? Its hard when i dont have the table def + data.
SELECT Sub.*, COUNT(1) 'comments_no'
FROM
(
SELECT forum.match_static_id, forum.date, forum.time,
matches.*, users.username, iddaa.iddaa_code
FROM forum
INNER JOIN matches ON forum.match_static_id = matches.static_id
INNER JOIN users on forum.user_id = users.id
GROUP BY forum.match_static_id
) Sub
LEFT JOIN iddaa on Sub.match_static_id = iddaa.match_static_id
ORDER BY forum.date DESC, forum.time DESC

Related

Understanding use of multiple SUMs with LEFT JOINS in mysql

Using the GROUP BY command, it is possible to LEFT JOIN multiple tables and still get the desired number of rows from the first table.
For example,
SELECT b.title
FROM books `b`
LEFT JOIN orders `o`
ON o.bookid = b.id
LEFT JOIN authors `a`
ON b.authorid = a.id
GROUP BY b.id
However, since behind the scenes MYSQL is doing a cartesian product on the tables, if you include more than one SUM command you get incorrect values based on all the hidden rows. (The problem is explained fairly well here.)
SELECT b.title,SUM(o.id) as sales,SUM(a.id) as authors
FROM books `b`
LEFT JOIN orders `o`
ON o.bookid = b.id
LEFT JOIN authors `a`
ON b.authorid = a.id
GROUP BY b.id
There are a number of answers on SO about this, most using sub-queries in the JOINS but I am having trouble applying them to this fairly simple case.
How can you adjust the above so that you get the correct SUMs?
Edit
Example
books
id|title|authorid
1|Huck Finn|1
2|Tom Sawyer|1
3|Python Cookbook|2
orders
id|bookid
1|1
2|1
3|2
4|2
5|3
6|3
authors
id|author
1|Twain
2|Beazley
2|Jones
The "correct answer" for total # of authors of the Python Cookbook is 2. However, because there are two joins and the overall dataset is expanded by the join on number of orders, SUM(a.id) will be 4.
You are correct that by joining multiple tables you would not get the expected results.
But in this case you should use COUNT() instead of SUM() and count the distinct orders or authors.
Also by your design you should count the names of the authors and not the ids of the table authors:
SELECT b.title,
COUNT(DISTINCT o.id) as sales,
COUNT(DISTINCT a.author) as authors
FROM books `b`
LEFT JOIN orders `o` ON o.bookid = b.id
LEFT JOIN authors `a` ON b.authorid = a.id
GROUP BY b.id, b.title
See the demo.
Results:
| title | sales | authors |
| --------------- | ----- | ------- |
| Huck Finn | 2 | 1 |
| Tom Sawyer | 2 | 1 |
| Python Cookbook | 2 | 2 |
When dealing with separate aggregates, it is good style to aggregate before joining.
Your data model is horribly confusing, making it look like a book is written by one author only (referenced by books.authorid), while this "ID" is not an author's ID at all.
Your main problem is: You don't count! We count with COUNT. But you are mistakenly adding up ID values with SUM.
Here is a proper query, where I am aggregating before joining and using alias names to fight confusion and thus enhance the query's readability and maintainability.
SELECT
b.title,
COALESCE(o.order_count, 0) AS sales,
COALESCE(a.author_count, 0) AS authors
FROM (SELECT title, id AS book_id, authorid AS author_group_id FROM books) b
LEFT JOIN
(
SELECT id as author_group_id, COUNT(*) as author_count
FROM authors
GROUP BY id
) a ON a.author_group_id = b.author_group_id
LEFT JOIN
(
SELECT bookid AS book_id, COUNT(*) as order_count
FROM orders
GROUP BY bookid
) o ON o.book_id = b.book_id
ORDER BY b.title;
i don't think that your query would work like you eexspected.
Assume one book could have 3 authors.
For Authors:
So you would have three rows for that book in your books table,each one for every Author.
So a
SUM(b.authorid)
gives you the correct answer in your case.
For Orders:
you must use a subselect like
LEFT JOIN (SELECT SUM(id) o_sum,bookid FROM orders GROUP BY bookid) `o`
ON o.bookid = b.id
You should really reconsider your approach with books and authors.

Selecting a count of rows having a max value

Working example: http://sqlfiddle.com/#!9/80995/20
I have three tables, a user table, a user_group table, and a link table.
The link table contains the dates that users were added to user groups. I need a query that returns the count of users currently in each group. The most recent date determines the group that the user is currently in.
SELECT
user_groups.name,
COUNT(l.name) AS ct,
GROUP_CONCAT(l.`name` separator ", ") AS members
FROM user_groups
LEFT JOIN
(SELECT MAX(added), group_id, name FROM link LEFT JOIN users ON users.id = link.user_id GROUP BY user_id) l
ON l.group_id = user_groups.id
GROUP BY user_groups.id
My question is if the query I have written could be optimized, or written better.
Thanks!
Ben
You actual query is not giving you the answer you want; at least, as far as I understand your question. John actually joined group 2 on 2017-01-05, yet it appears on group 1 (that he joined on 2017-01-01) on your results. Note also you're missing one Group 4.
Using standard SQL, I think the next query is what you're looking for. The comments in the query should clarify what each part is doing:
SELECT
user_groups.name AS group_name,
COUNT(u.name) AS member_count,
group_concat(u.name separator ', ') AS members
FROM
user_groups
LEFT JOIN
(
SELECT * FROM
(-- For each user, find most recent date s/he got into a group
SELECT
user_id AS the_user_id, MAX(added) AS last_added
FROM
link
GROUP BY
the_user_id
) AS u_a
-- Join back to the link table, so that the `group_id` can be retrieved
JOIN link l2 ON l2.user_id = u_a.the_user_id AND l2.added = u_a.last_added
) AS most_recent_group ON most_recent_group.group_id = user_groups.id
-- And get the users...
LEFT JOIN users u ON u.id = most_recent_group.the_user_id
GROUP BY
user_groups.id, user_groups.name
ORDER BY
user_groups.name ;
This can be written in a more compact way in MySQL (abusing the fact that, in older versions of MySQL, it doesn't follow the SQL standard for the GROUP BY restrictions).
That's what you'll get:
group_name | member_count | members
:--------- | -----------: | :-------------
Group 1 | 2 | Mikie, Dominic
Group 2 | 2 | John, Paddy
Group 3 | 0 | null
Group 4 | 1 | Nellie
dbfiddle here
Note that this query can be simplified if you use a database with window functions (such as MariaDB 10.2). Then, you can use:
SELECT
user_groups.name AS group_name,
COUNT(u.name) AS member_count,
group_concat(u.name separator ', ') AS members
FROM
user_groups
LEFT JOIN
(
SELECT
user_id AS the_user_id,
last_value(group_id) OVER (PARTITION BY user_id ORDER BY added) AS group_id
FROM
link
GROUP BY
user_id
) AS most_recent_group ON most_recent_group.group_id = user_groups.id
-- And get the users...
LEFT JOIN users u ON u.id = most_recent_group.the_user_id
GROUP BY
user_groups.id, user_groups.name
ORDER BY
user_groups.name ;
dbfiddle here

Sql conditional count with join

I cannot find the answer to my problem here on stackoverflow. I have a query that spans 3 tables:
newsitem
+------+----------+----------+----------+--------+----------+
| Guid | Supplier | LastEdit | ShowDate | Title | Contents |
+------+----------+----------+----------+--------+----------+
newsrating
+----+----------+--------+--------+
| Id | NewsGuid | UserId | Rating |
+----+----------+--------+--------+
usernews
+----+----------+--------+----------+
| Id | NewsGuid | UserId | ReadDate |
+----+----------+--------+----------+
Newsitem obviously contains newsitems, newsrating contains ratings that users give to newsitems, and usernews contains the date when a user has read a newsitem.
In my query I want to get every newsitem, including the number of ratings for that newsitem and the average rating, and how many times that newsitem has been read by the current user.
What I have so far is:
select newsitem.guid, supplier, count(newsrating.id) as numberofratings,
avg(newsrating.rating) as rating,
count(case usernews.UserId when 3 then 1 else null end) as numberofreads from newsitem
left join newsrating on newsitem.guid = newsrating.newsguid
left join usernews on newsitem.guid = usernews.newsguid
group by newsitem.guid
I have created an sql fiddle here: http://sqlfiddle.com/#!9/c8add/8
Both count() calls don't return the numbers I want. numberofratings should return the total number of ratings for that newsitem (by all users). numberofreads should return the number of reads for the current user for that newsitem.
So, newsitem with guid d104c330-c319-40e8-8be3-a7c4f549d35c should have 2 ratings and 3 reads for the current user with userid = 3.
I have tried conditional counts and sums, but no success yet. How can this be accomplished?
The main problem that I see is that you're joining in both tables together, which means that you're going to effectively be multiplying out by both numbers, which is why your counts aren't going to be correct. For example, if the Newsitem has been read 3 times by the user and rated by 8 users then you're going to end up getting 24 rows, so it will look like it has been rated 24 times. You can add a DISTINCT to your COUNT of the ratings IDs and that should correct that issue. Average should be unaffected because the average of 1 and 2 is the same as the average of 1, 1, 2, & 2 (for example).
You can then handle the reads by adding the userid to the JOIN condition (since it's an OUTER JOIN it shouldn't cause any loss of results) instead of in a CASE statement for your COUNT, then you can do a COUNT on distinct id values from Usernews. The resulting query would be:
SELECT
I.guid,
I.supplier,
COUNT(DISTINCT R.id) AS number_of_ratings,
AVG(R.rating) AS avg_rating,
COUNT(DISTINCT UN.id) AS number_of_reads
FROM
NewsItem I
LEFT OUTER JOIN NewsRating R ON R.newsguid = I.guid
LEFT OUTER JOIN UserNews UN ON
UN.newsguid = I.guid AND
UN.userid = #userid
GROUP BY
I.guid,
I.supplier
While that should work, you might get better results from a subquery, as the above needs to explode out the results and then aggregate them, perhaps unnecessarily. Also, some people might find the below to be a little clearer.
SELECT
I.guid,
I.supplier,
R.number_of_ratings,
R.avg_rating,
COUNT(*) AS number_of_reads
FROM
NewsItem I
LEFT OUTER JOIN
(
SELECT
newsguid,
COUNT(*) AS number_of_ratings,
AVG(rating) AS avg_rating
FROM
NewsRating
GROUP BY
newsguid
) R ON R.newsguid = I.guid
LEFT OUTER JOIN UserNews UN ON UN.newsguid = I.guid AND UN.userid = #userid
GROUP BY
I.guid,
I.supplier,
R.number_of_ratings,
R.avg_rating
I'm with Tom you should use a subquery to calculate the user count.
SQL Fiddle Demo
SELECT NI.guid,
NI.supplier,
COUNT(NR.ID) as numberofratings,
AVG(NR.rating) as rating,
user_read as numberofreads
FROM newsitem NI
LEFT JOIN newsrating NR
ON NI.guid = NR.newsguid
LEFT JOIN (SELECT NewsGuid, COUNT(*) user_read
FROM usernews
WHERE UserId = 3 -- use a variable #user_id here
GROUP BY NewsGuid) UR
ON NI.guid = UR.NewsGuid
GROUP BY NI.guid,
NI.supplier,
numberofreads;

MySql In an inner join does it matter which table comes first?

I'm selecting data to output posts a user has book marked.
The main table which holds the ids of the posts a user has bookMarked, is called bookMarks. This is the table based on which posts will be selected from the posts table, to display to the user.
bookMarks
id | postId | userId
--------------------------
1 | US01 | 1
2 | US02 | 1
3 | US01 | 2
4 | US02 | 2
posts
id | postId | postTitle
--------------------------
1 | US01 | Title 1
2 | US02 | Title 2
3 | US03 | Title 3
4 | US04 | Title 4
My sql is currently like this:
select a.postsTitle
from posts a
inner join bookmarks b
on b.userId = a.userId
and b.userId = :userId
Notice, I have the table posts put first before the table bookmarks. But, since I'm selecting based on whats there in bookmarks, is it necessary I declare the table bookmarks first instead of post in the sql statement? Will doing it the way I'm doing it cause and problems in data selection or efficiency?
Or should I do it like:
select b.postsTitle
from bookmarks a
inner join posts b
on a.userId = b.userId
and a.userId = :userId
Notice, I have table bookmarks put first here.
Instead of the following:
select a.postsTitle
from posts a
inner join bookmarks b
on b.userId = a.userId
and b.userId = :userId
You should consider formatting your JOIN in this format, using the WHERE clause, and proper capitalization:
SELECT p.postsTitle
FROM bookmarks b
INNER JOIN posts p
ON p.userId = b.userId
WHERE b.userId = :userId
While it makes no difference (performance wise) to MySQL which order you put the tables in with INNER JOIN (MySQL treats them as equal and will optimize them the same way), it's convention to put the table that you are applying the WHERE clause to first. In fact, assuming proper indexes, MySQL will most likely start with the table that has the WHERE clause because it narrows down the result set, and MySQL likes to start with the set that has the fewest rows.
It's also convention to put the joined table's column first in the ON clause. It just reads more logically. While you're at it, use logical table aliases.
The only caveat is if you don't name your columns and instead use SELECT * like the following:
SELECT *
FROM bookmarks b
INNER JOIN posts p
ON p.userId = b.userId
WHERE b.userId = :userId
You'll get the columns in the order they're listed in the query. In this case, you'll get the columns for bookmarks, followed by the columns for posts.
Most would say never use SELECT * in a production query, but if you really must return all columns, and you needed the columns from posts first, you could simply do the following:
SELECT p.*, b.*
FROM bookmarks b
INNER JOIN posts p
ON p.userId = b.userId
WHERE b.userId = :userId
It's always good to be explicit about the returned result set.
There is no effect on query performance or final resultset with respect to placement of table on either side of JOIN clause if INNER JOIN is used .
The only difference observed is in order of columns returned and that too only if SELECT * is used . Suppose you have tableA(aid,col1,col2) and tableB(bid,col3)
SELECT *
FROM tableA
INNER JOIN tableB
ON tableA.aid=tableB.bid
returns column in order
aid|col1|col2|bid|col3
On otherhand
SELECT *
FROM tableB
INNER JOIN tableA
ON tableA.aid=tableB.bid
returns column in order
bid|col3|aid|col1|col2|
But it matters in case of LEFT JOIN or RIGHT JOIN.

SQL Join and show results if join false

I have a table - comments. Users can post if not a member of the site but want to show their details if they are.
So if a user comments who is NOT a member I show their posts but don't link to their profile, because they don't have one.
So, in the following query I want to return the rows even if there is no join:
select wc.comment, wc.comment_by_name, wc.user_id, u.url from comments wc
join users u on wc.wag_uid = u.user_id
where id = '1237' group by wc.comment order by wc.dateadded desc
I want to return:
comment comment_by_name user_id url
------- --------------- ------- ----
hello dan 12 /dan
hey jane /jane
world jack 10 /jack
But the above does not return the data for jane as she does not have a user_id
Is there a way to return all data even if the join is null?
use LEFT JOIN instead
SELECT wc.comment, wc.comment_by_name, wc.user_id, u.url
FROM comments wc
LEFT JOIN users u
on wc.wag_uid = u.user_id
WHERE id = '1237'
GROUP BY wc.comment
ORDER BY wc.dateadded DESC
basically INNER JOIN only select records which a record from one table has atleast one match on the other table while LEFT JOIN select all rows from the left hand side table (in your case, it's comments) whether it has no match on the other table.