Understanding use of multiple SUMs with LEFT JOINS in mysql

Understanding use of multiple SUMs with LEFT JOINS in mysql - mysql

Using the GROUP BY command, it is possible to LEFT JOIN multiple tables and still get the desired number of rows from the first table.
For example,
SELECT b.title
FROM books `b`
LEFT JOIN orders `o`
ON o.bookid = b.id
LEFT JOIN authors `a`
ON b.authorid = a.id
GROUP BY b.id
However, since behind the scenes MYSQL is doing a cartesian product on the tables, if you include more than one SUM command you get incorrect values based on all the hidden rows. (The problem is explained fairly well here.)
SELECT b.title,SUM(o.id) as sales,SUM(a.id) as authors
FROM books `b`
LEFT JOIN orders `o`
ON o.bookid = b.id
LEFT JOIN authors `a`
ON b.authorid = a.id
GROUP BY b.id
There are a number of answers on SO about this, most using sub-queries in the JOINS but I am having trouble applying them to this fairly simple case.
How can you adjust the above so that you get the correct SUMs?
Edit
Example
books
id|title|authorid
1|Huck Finn|1
2|Tom Sawyer|1
3|Python Cookbook|2
orders
id|bookid
1|1
2|1
3|2
4|2
5|3
6|3
authors
id|author
1|Twain
2|Beazley
2|Jones
The "correct answer" for total # of authors of the Python Cookbook is 2. However, because there are two joins and the overall dataset is expanded by the join on number of orders, SUM(a.id) will be 4.

You are correct that by joining multiple tables you would not get the expected results.
But in this case you should use COUNT() instead of SUM() and count the distinct orders or authors.
Also by your design you should count the names of the authors and not the ids of the table authors:
SELECT b.title,
COUNT(DISTINCT o.id) as sales,
COUNT(DISTINCT a.author) as authors
FROM books `b`
LEFT JOIN orders `o` ON o.bookid = b.id
LEFT JOIN authors `a` ON b.authorid = a.id
GROUP BY b.id, b.title
See the demo.
Results:
| title | sales | authors |
| --------------- | ----- | ------- |
| Huck Finn | 2 | 1 |
| Tom Sawyer | 2 | 1 |
| Python Cookbook | 2 | 2 |

When dealing with separate aggregates, it is good style to aggregate before joining.
Your data model is horribly confusing, making it look like a book is written by one author only (referenced by books.authorid), while this "ID" is not an author's ID at all.
Your main problem is: You don't count! We count with COUNT. But you are mistakenly adding up ID values with SUM.
Here is a proper query, where I am aggregating before joining and using alias names to fight confusion and thus enhance the query's readability and maintainability.
SELECT
b.title,
COALESCE(o.order_count, 0) AS sales,
COALESCE(a.author_count, 0) AS authors
FROM (SELECT title, id AS book_id, authorid AS author_group_id FROM books) b
LEFT JOIN
(
SELECT id as author_group_id, COUNT(*) as author_count
FROM authors
GROUP BY id
) a ON a.author_group_id = b.author_group_id
LEFT JOIN
(
SELECT bookid AS book_id, COUNT(*) as order_count
FROM orders
GROUP BY bookid
) o ON o.book_id = b.book_id
ORDER BY b.title;

i don't think that your query would work like you eexspected.
Assume one book could have 3 authors.
For Authors:
So you would have three rows for that book in your books table,each one for every Author.
So a
SUM(b.authorid)
gives you the correct answer in your case.
For Orders:
you must use a subselect like
LEFT JOIN (SELECT SUM(id) o_sum,bookid FROM orders GROUP BY bookid) `o`
ON o.bookid = b.id
You should really reconsider your approach with books and authors.

Related

SQL: many to many relationships select where multiple criteria

given these tables :
id_article | title
1 | super article
2 | another article
id_tag | title
1 | great
2 | awesome
id_relation | id_article | id_tag
1 | 1 | 1
2 | 1 | 2
3 | 2 | 1
I'd like to be able to select all articles that are "great" AND "awesome" (eventually, I'll probably have to implement OR too)
And basically, if I do a select on articles the relation table joining on id_article: of course, I cant join two different values of id_tag. Only lead I had with concatenating IDs to test as a string, but that seems so lame, there has to be a prettier solution.
Oh and if it matters, I use a MySQL server.
EDIT: for ByWaleed, the typical sql select that would surely fail that I cited in my original question:
SELECT
a.id_article,
a.title
FROM articles a, relations r
WHERE
r.id_article = a.id_article and r.id_tag = 1 and r.id_tag = 2
wouldnt work because r.id_tag cant obviously be 1 and 2 on the same line. I doubt w3schools has an article on that. My search on google didnt yield any result, probably because I searched with the wrong keyword.

If you do all the joins as normal, then aggregate the rows to one group by article, then you can assert that they must have at least two different tags.
(Having already filtered to great and/or awesome, that means they have both.)
SELECT
a.id_article,
a.title
FROM
articles a
INNER JOIN
relations r
ON r.id_article = a.id_article
INNER JOIN
tags t
ON t.id_tag = r.id_tag
WHERE
t.title IN ('great', 'awesome')
GROUP BY
a.id_article,
a.title
HAVING
COUNT(DISTINCT t.id_tag) = 2
(The DISTINCT is to avoid the possibility of one article having 'great' twice, for example.)
To do OR, you just remove the HAVING clause.

One approach is to aggregate by article, and then assert that the article both the "great" and "awesome" tags:
SELECT
a.id_article,
a.title
FROM articles a
INNER JOIN relations r
ON a.id_article = r.id_article
INNER JOIN tags t
ON r.id_tag = t.id_tag
WHERE
t.title IN ('great', 'awesome')
GROUP BY
a.id_article,
a.title
HAVING
MIN(t.title) <> MAX(t.title);
Demo
The logic here is that we first limit records, for each article, to only those of the two targets tags. Then we assert, in the HAVING clause, that both tags appear. I use a MIN/MAX trick here, because if the min and max differ, then it implies that there are two distinct tags.

Step 1: Use a temp table to get all articles with titles.
Step 2: If an article occurs multiple times in your temp table, that means it has great and awesome as titles.
Try:
CREATE TEMPORARY TABLE MyTempTable (
select t1.id_article, t2.title
from table1 t1
inner join table3 t3 on t3.id_article = t1.id_article
inner join table2 t2 on t2.id_tag = t3.id_tag
)
select m.id_article
from MyTempTable m
group by m.id_article
having count(*)>1
Edit: This solution assumes there are two possible tags, great and awesome. If more, please add a "where" clause to the select query for creating the temp table like where t2.title in ('great','awesome')

Count comments and get average rating from mysql

I just can't figure out how to get average rating and count comments from my mysql database.
I have 3 tables (activity, rating, comments) activity contains the main data the "activities", rating holds the ratings and comments - of course, the ratings.
activity_table
id | title |short_desc | long_desc | address | lat | long |last_updated
rating_table
id | activityid | userid | rating
comment_table
id | activityid | userid | rating
I'm now trying to the data from activity plus the comment_counts and average_rating in one query.
SELECT activity.*, AVG(rating.rating) as average_rating, count(comments.activityid) as total_comments
FROM activity LEFT JOIN
rating
ON activity.aid = rating.activityid LEFT JOIN
comments
ON activity.aid = comments.activityid
GROUP BY activity.aid
...doesn't do the job. It gives me the right average_rating, but the wrong amount of comments.
Any ideas?
Thanks a lot!

You are aggregating along two different dimensions. The Cartesian product generated by the joins affects the aggregation.
So, you should aggregate before the joins:
SELECT a.*, r.average_rating, COALESCE(c.total_comments, 0) as total_comments
FROM activity a LEFT JOIN
(SELECT r.activityid, AVG(r.rating) as average_rating
FROM rating r
GROUP BY r.activityid
) r
ON a.aid = r.activityid LEFT JOIN
(SELECT c.activityid, COUNT(*) as total_comments
FROM comments c
GROUP BY c.activityid
) c
ON a.aid = c.activityid;
Notice that the outer GROUP BY is no longer needed.

MySql In an inner join does it matter which table comes first?

I'm selecting data to output posts a user has book marked.
The main table which holds the ids of the posts a user has bookMarked, is called bookMarks. This is the table based on which posts will be selected from the posts table, to display to the user.
bookMarks
id | postId | userId
--------------------------
1 | US01 | 1
2 | US02 | 1
3 | US01 | 2
4 | US02 | 2
posts
id | postId | postTitle
--------------------------
1 | US01 | Title 1
2 | US02 | Title 2
3 | US03 | Title 3
4 | US04 | Title 4
My sql is currently like this:
select a.postsTitle
from posts a
inner join bookmarks b
on b.userId = a.userId
and b.userId = :userId
Notice, I have the table posts put first before the table bookmarks. But, since I'm selecting based on whats there in bookmarks, is it necessary I declare the table bookmarks first instead of post in the sql statement? Will doing it the way I'm doing it cause and problems in data selection or efficiency?
Or should I do it like:
select b.postsTitle
from bookmarks a
inner join posts b
on a.userId = b.userId
and a.userId = :userId
Notice, I have table bookmarks put first here.

Instead of the following:
select a.postsTitle
from posts a
inner join bookmarks b
on b.userId = a.userId
and b.userId = :userId
You should consider formatting your JOIN in this format, using the WHERE clause, and proper capitalization:
SELECT p.postsTitle
FROM bookmarks b
INNER JOIN posts p
ON p.userId = b.userId
WHERE b.userId = :userId
While it makes no difference (performance wise) to MySQL which order you put the tables in with INNER JOIN (MySQL treats them as equal and will optimize them the same way), it's convention to put the table that you are applying the WHERE clause to first. In fact, assuming proper indexes, MySQL will most likely start with the table that has the WHERE clause because it narrows down the result set, and MySQL likes to start with the set that has the fewest rows.
It's also convention to put the joined table's column first in the ON clause. It just reads more logically. While you're at it, use logical table aliases.
The only caveat is if you don't name your columns and instead use SELECT * like the following:
SELECT *
FROM bookmarks b
INNER JOIN posts p
ON p.userId = b.userId
WHERE b.userId = :userId
You'll get the columns in the order they're listed in the query. In this case, you'll get the columns for bookmarks, followed by the columns for posts.
Most would say never use SELECT * in a production query, but if you really must return all columns, and you needed the columns from posts first, you could simply do the following:
SELECT p.*, b.*
FROM bookmarks b
INNER JOIN posts p
ON p.userId = b.userId
WHERE b.userId = :userId
It's always good to be explicit about the returned result set.

There is no effect on query performance or final resultset with respect to placement of table on either side of JOIN clause if INNER JOIN is used .
The only difference observed is in order of columns returned and that too only if SELECT * is used . Suppose you have tableA(aid,col1,col2) and tableB(bid,col3)
SELECT *
FROM tableA
INNER JOIN tableB
ON tableA.aid=tableB.bid
returns column in order
aid|col1|col2|bid|col3
On otherhand
SELECT *
FROM tableB
INNER JOIN tableA
ON tableA.aid=tableB.bid
returns column in order
bid|col3|aid|col1|col2|
But it matters in case of LEFT JOIN or RIGHT JOIN.

Count not giving right results with three joins

I have three tables (MySQL)
forum: each line in this table is a comment in the forum related to the match by static_id and related to the author by user_id
|match_static_id| date | time | comments | user_id |
matches: this table contains matches with all its information
| static_id | localteam_name | visitorteam_name | date | time |.......
iddaa : this table contains a code for each match (some matches do not have codes here)
|match_static_id| iddaa_code |
I make a query like following:
SELECT forum.match_static_id, forum.date, forum.time,
count(forum.comments) 'comments_no', matches.*, users.username, iddaa.iddaa_code
FROM forum
INNER JOIN matches ON forum.match_static_id = matches.static_id
INNER JOIN users on forum.user_id = users.id
LEFT JOIN iddaa on forum.match_static_id = iddaa.match_static_id
GROUP BY forum.match_static_id
ORDER BY forum.date DESC, forum.time DESC
the query work as I want (I get the match information, iddaa code for the match if there is one, and the author of the comment(last comment) ).
The problem is in the "count function" I should get the number of the comments related to the same match bur the query returned (double of each value)
for example if I have 5 comments for a match it returns 10
I want to know if all parts of my query is right and any help will be good?

Maybe it can be wrapped in a sub query? Its hard when i dont have the table def + data.
SELECT Sub.*, COUNT(1) 'comments_no'
FROM
(
SELECT forum.match_static_id, forum.date, forum.time,
matches.*, users.username, iddaa.iddaa_code
FROM forum
INNER JOIN matches ON forum.match_static_id = matches.static_id
INNER JOIN users on forum.user_id = users.id
GROUP BY forum.match_static_id
) Sub
LEFT JOIN iddaa on Sub.match_static_id = iddaa.match_static_id
ORDER BY forum.date DESC, forum.time DESC

How to count the number of instances of each foreign-key ID in a table?

Here's my simple SQL question...
I have two tables:
Books
-------------------------------------------------------
| book_id | author | genre | price | publication_date |
-------------------------------------------------------
Orders
------------------------------------
| order_id | customer_id | book_id |
------------------------------------
I'd like to create a query that returns:
--------------------------------------------------------------------------
| book_id | author | genre | price | publication_date | number_of_orders |
--------------------------------------------------------------------------
In other words, return every column for ALL rows in the Books table, along with a calculated column named 'number_of_orders' that counts the number of times each book appears in the Orders table. (If a book does not occur in the orders table, the book should be listed in the result set, but "number_of_orders" should be zero.
So far, I've come up with this:
SELECT
books.book_id,
books.author,
books.genre,
books.price,
books.publication_date,
count(*) as number_of_orders
from books
left join orders
on (books.book_id = orders.book_id)
group by
books.book_id,
books.author,
books.genre,
books.price,
books.publication_date
That's almost right, but not quite, because "number_of_orders" will be 1 even if a book is never listed in the Orders table. Moreover, given my lack of knowledge of SQL, I'm sure this query is very inefficient.
What's the right way to write this query? (For what it's worth, this needs to work on MySQL, so I can't use any other vendor-specific features).

Your query is almost right and it's the right way to do that (and the most efficient)
SELECT books.*, count(orders.book_id) as number_of_orders
from books
left join orders
on (books.book_id = orders.book_id)
group by
books.book_id
COUNT(*) could include NULL values in the count because it counts all the rows, while COUNT(orders.book_id) does not because it ignores NULL values in the given field.

SELECT b.book_id,
b.author,
b.genre,
b.price,
b.publication_date,
coalesce(oc.Count, 0) as number_of_orders
from books b
left join (
select book_id, count(*) as Count
from Order
group by book_id
) oc on (b.book_id = oc.book_id)

Change count(*) to count(orders.book_id)

You're counting the wrong thing. You want to count the non-null book_id's.
SELECT
books.book_id,
books.author,
books.genre,
books.price,
books.publication_date,
count(orders.book_id) as number_of_orders
from books
left join orders
on (books.book_id = orders.book_id)
group by
books.book_id,
books.author,
books.genre,
books.price,
books.publication_date

select author.aname,count(book.author_id) as "number_of_books"
from author
left join book
on(author.author_id=book.author_id)
GROUP BY author.aname;

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Understanding use of multiple SUMs with LEFT JOINS in mysql - mysql

Related

SQL: many to many relationships select where multiple criteria

Count comments and get average rating from mysql

MySql In an inner join does it matter which table comes first?

Count not giving right results with three joins

How to count the number of instances of each foreign-key ID in a table?

Categories

Resources