I've been looking for a solution to this, there's plenty of similar questions but none have any proper answers that helped me solve the problem.
First up, my questions/problem:
I want to sum and count certain columns in a multiple join query
Is it not possible with multiple joins? Do I have to nest SELECT queries?
Here's a SQL dump of my database with sample data: http://pastie.org/private/vq7qkfer5mwyraudb5dh0a
This is the query I thought would do the trick:
SELECT firstname, lastname, sum(goal.goal), sum(assist.assist), sum(gw.gw), sum(win.win), count(played.idplayer) FROM player
LEFT JOIN goal USING (idplayer)
LEFT JOIN assist USING (idplayer)
LEFT JOIN gw USING (idplayer)
LEFT JOIN win USING (idplayer)
LEFT JOIN played USING (idplayer)
GROUP BY idplayer
What I'd like this to produce is a table where the columns for goal, assist, gw, win and played are a sum/count of every row in that column, like so: (with supplied sample data)
+-----------+----------+------+--------+----+-----+--------+
| firstname | lastname | goal | assist | gw | win | played |
+-----------+----------+------+--------+----+-----+--------+
| Gandalf | The White| 10 | 6 | 1 | 1 | 2 |
| Frodo | Baggins | 16 | 2 | 1 | 2 | 2 |
| Bilbo | Baggins | 7 | 3 | 0 | 0 | 2 |
+-----------+----------+------+--------+----+-----+--------+
So, to iterate the above questions again, is this possible with one query and multiple joins?
If you provide solutions/queries, please explain them! I'm new to proper relational databases and I have never used joins before this project. I'd also appreciate if you avoid aliases unless necessary.
I have run the above query without sum and grouping and I get a set of rows for each column I do a SELECT on, which I suspect is then multiplied or added together, but I was under the impression that grouping and/or doing sum(TABLE.COLUMN) would solve that.
Another thing is that, I think, doing a SELECT DISTINCT or any other DISTINCT operation won't work since that will leave out some ("duplicate") results.
PS. If it matters, my dev machine is a WAMP but release will be on ubuntu/apache/mysql/php.
To understand why you're not getting the answers you expect, take a look at this query:
SELECT * FROM player LEFT JOIN goal USING (idplayer)
As you can see, the rows on the left are duplicated for the matching rows on the right. That procedure is repeated for each join. Here's the raw data for your query:
SELECT * FROM player
LEFT JOIN goal USING (idplayer)
LEFT JOIN assist USING (idplayer)
LEFT JOIN gw USING (idplayer)
LEFT JOIN win USING (idplayer)
LEFT JOIN played USING (idplayer)
Those repeated values are then used for the SUM calculations. The SUMs need to be calculated before the rows are joined:
SELECT firstname, lastname, goals, assists, gws, wins, games_played
FROM player
INNER JOIN
(SELECT idplayer, SUM(goal) AS goals FROM goal GROUP BY idplayer) a
USING (idplayer)
INNER JOIN
(SELECT idplayer, SUM(assist) AS assists FROM assist GROUP BY idplayer) b
USING (idplayer)
INNER JOIN
(SELECT idplayer, SUM(gw) AS gws FROM gw GROUP BY idplayer) c
USING (idplayer)
INNER JOIN
(SELECT idplayer, SUM(win) AS wins FROM win GROUP BY idplayer) d
USING (idplayer)
INNER JOIN
(SELECT idplayer, COUNT(*) AS games_played FROM played GROUP BY idplayer) e
USING (idplayer)
SQLFiddle
Related
For transaction listing I need to provide the following columns:
log_out.timestamp
items.description
log_out.qty
category.name
storage.name
log_out.dnr ( Representing the users id )
Table structure from log_out looks like this:
| id | timestamp | storageid | itemid | qty | categoryid | dnr |
| | | | | | | |
| 1 | ........ | 2 | 23 | 3 | 999 | 123 |
As one could guess, I only store the corresponding ID's from other tables in this table. Note: log_out.id is the primary key in this table.
To get the the corresponding strings, int's or whatever back, I tried two queries.
Approach 1
SELECT i.description, c.name, s.name as sname, l.*
FROM items i, categories c, storages s, log_out l
WHERE l.itemid = i.id AND l.storageid = s.id AND l.categoryid = c.id
ORDER BY l.id DESC
Approach 2
SELECT log_out.id, items.description, storages.name, categories.name AS cat, timestamp, dnr, qty
FROM log_out
INNER JOIN items ON log_out.itemid = items.id
INNER JOIN storages ON log_out.storageid = storages.id
INNER JOIN categories ON log_out.categoryid = categories.id
ORDER BY log_out.id DESC
They both work fine on my developing machine, which has approx 99 dummy transactions stored in log_out. The DB on the main server got something like 1100+ tx stored in the table. And that's where trouble begins. No matter which of these two approaches I run on the main machine, it always returns 0 rows w/o any error *sigh*.
First I thought, it's because the main machine uses MariaDB instead of MySQL. But after I imported the remote's log_out table to my dev-machine, it does the same as the main machine -> return 0 rows w/o error.
You guys got any idea what's going on ?
If the table has the data then it probably has something to do with JOIN and related records in corresponding tables. I would start with log_out table and incrementally add the other tables in the JOIN, e.g.:
SELECT *
FROM log_out;
SELECT *
FROM log_out
INNER JOIN items ON log_out.itemid = items.id;
SELECT *
FROM log_out
INNER JOIN items ON log_out.itemid = items.id
INNER JOIN storages ON log_out.storageid = storages.id;
SELECT *
FROM log_out
INNER JOIN items ON log_out.itemid = items.id
INNER JOIN storages ON log_out.storageid = storages.id
INNER JOIN categories ON log_out.categoryid = categories.id;
I would execute all the queries one by one and see which one results in 0 records. Additional join in that query would be the one with data discrepancy.
You're queries look fine to me, which makes me think that it is probably something unexpected with the data. Most likely the ids in your joins are not maintained right (do all of them have a foreign key constraint?). I would dig around the data, like SELECT COUNT(*) FROM items WHERE id IN (SELECT itemid FROM log_out), etc, and seeing if the returns make sense. Sorry I can't offer more advise, but I would be interested in hearing if the problem is in the data itself.
Because I'm working with a framework (Magento) I don't have direct control of the SQL that is actually executed. I can build various parts of the query, but in different contexts its modified in different ways before it goes to the database.
Here is a simplified example of what I'm working with.
students enrolments
-------- ------------------
id| name student_id| class
--+----- ----------+-------
1| paul 1|biology
2|james 1|english
3| jo 2| maths
2|english
2| french
3|physics
3| maths
A query to show all students who are studying English together with all the courses those students are enrolled on, would be:
SELECT name, GROUP_CONCAT(enrolments.class) AS classes
FROM students LEFT JOIN enrolments ON students.id=enrolments.student_id
WHERE students.id IN ( SELECT e.student_id
FROM enrolments AS e
WHERE e.class LIKE "english" )
GROUP BY students.id
This will give the expected results
name| classes
----+----------------------
paul|biology, english
james|maths, english, french
Counting the number of students who study english would be trivial, if it weren't for the fact that Magento automatically uses portions of my first query. For the count, it modifies my original query as follows:
Removes the columns being selected. This would be the name and classes columns.
Adds a count(*) column to the select
Removes any group by clause
After this butchery, my query above becomes
SELECT COUNT(*)
FROM students LEFT JOIN enrolments ON students.id=enrolments.student_id
WHERE students.id IN ( SELECT e.student_id
FROM enrolments AS e
WHERE e.class LIKE "english" )
Which will not give me the number of students enrolled on the English course as I require. Instead it will give me the combined number of enrolments of all students who are enrolled on the English course.
I'm trying to come up with a query which can be used in both contexts, counting and getting the rows. I get to keep any join clauses and and where clauses and that's about it.
The problem with your original query is the GROUP BY clause. Selecting COUNT(*) by keeping the GROUP BY clause would result in two rows with a number of classes for each user:
| COUNT(*) |
|----------|
| 2 |
| 3 |
Removing the GROUP BY clause will just retun the number of all rows from the LEFT JOIN:
| COUNT(*) |
|----------|
| 5 |
The only way I see, magento could solve that problem, is to put the original query into a subquery (derived table) and count the rows of the result. But that might end up in terrible performance. I would also be fine with an exception, complaining that a query with a GROUP BY clause can not be used for pagination (or something like that). Just return an anexpected result is probably the worst what a library can do.
Well, it just so happens I have a solution. :-)
Use a corelated subquery for GROUP_CONCAT in the SELECT clause. This way you will not need a GROUP BY clause.
SELECT name, (SELECT GROUP_CONCAT(enrolments.class)
FROM enrolments
WHERE enrolments.student_id = students.id
) AS classes
FROM students
WHERE students.id IN ( SELECT e.student_id
FROM enrolments AS e
WHERE e.class LIKE "english" )
However, I would rewrite the query to use an INNER JOIN instead of an IN condition:
SELECT s.name, (
SELECT GROUP_CONCAT(e2.class)
FROM enrolments e2
WHERE e2.student_id = s.id
) AS classes
FROM students s
JOIN enrolments e1
ON e1.student_id = s.id
WHERE e1.class = "english";
Both queries will return the same result as your original one.
| name | classes |
|-------|----------------------|
| paul | biology,english |
| james | maths,english,french |
But also return the correct count when modified my magento.
| COUNT(*) |
|----------|
| 2 |
Demo: http://rextester.com/OJRU38109
Additionally - chances are good that it will even perform better, due to MySQLs optimizer, which often creates bad execution plans for queries with JOINs and GROUP BY.
I've searched this and found this problem, and the solution that worked for most people (using an outer join) is not working for me. I originally had an inner join, and switched it to an outer join but I am getting the same results. This is based off certain account numbers and it shows their total sales. If an account has 0 sales it does not show up, and I need it to show up. Here is my query.
Select a.accountnumber, SUM(a.totalsales) as Amount, c.companyname
FROM Sales a LEFT OUTER JOIN Accounts c on (a.Accountnumber = c.Accountnumber)
WHERE a.Salesdate between '1/1/2016' and '1/27/2016'
AND a.Accountnumber in ('1','2','3','4')
GROUP BY a.Accountnumber, c.companyname
And I'll get results like:
Accountnumber | Amount | Company
1 | 250.00 | A
3 | 500.00 | B
Since accountnumbers 2 and 4 dont have an amount, they are not showing up. I would like them to show up like
Accountnumber | Amount | Company
1 | 250.00 | A
2 | 0 | B
3 | 250.00 | C
4 | 0 | D
How can I achieve this? Any help would be appreciated. Thank you!
I think that RIGHT JOIN will not work, since there are conditions in WHERE.
Try this:
SELECT
c.accountnumber,
COALESCE(SUM(a.totalsales),0) AS Amount,
c.companyname
FROM Accounts c
LEFT OUTER JOIN Sales a
ON a.Accountnumber = c.Accountnumber
AND a.Salesdate BETWEEN '1/1/2016' AND '1/27/2016'
WHERE
c.Accountnumber IN ('1', '2', '3', '4')
GROUP BY c.Accountnumber, c.companyname
Just to clarify, the problem is not which JOIN is used, it can be either, but using WHERE condition ON non-existing (NULL) values, since all not matched values from outer joined table are NULL anyway, any condition applied, practically make those joins inner joins (unless they are IS NULL conditions), see: http://www.codeproject.com/Articles/33052/Visual-Representation-of-SQL-Joins
You should have two options.
Modify the query to select from the Accounts table first and then join the Sales table afterwards.
FROM Accounts c
LEFT OUTER JOIN Sales a on (a.Accountnumber = c.Accountnumber)
Use a RIGHT join instead of a LEFT one.
FROM Sales a
RIGHT OUTER JOIN Accounts c on (a.Accountnumber = c.Accountnumber)
Change it to a RIGHT OUTER JOIN and it should work.
A left join says keep everything in the left table, IE the table before the left join, even if it doesn't exist in the second table. In your case I sure hope there aren't any sales that don't have accounts. To keep all the Accounts even if they don't have a sale you need a RIGHT outer join. Alternatively you could change the order of the tables and do FROM Accounts c LEFT OUTER JOIN Sales a ...
Edit..
Blind got it. RIGHT OUTER JOIN is what you need to use given the way you wrote it but then you have a null for all the values in the columns from the sales table for 2 and 4 so the account number can't be in (1,2,3,4) or between the dates so they don't make it into the results.
I have 1 table of users, and 10 tables (articles, news, ...) where I save user's publications. I want to show how many publications has each user, in one query:
| ID_USER | COUNT(id_article) | COUNT(id_news) | etc...
-------------------------------------------------
| 1 | 0 | 3 |
| 2 | 2 | 9 |
| 3 | 14 | 5 |
| 4 | 0 | 0 |
If I use this query to show the number of articles...
SELECT id_user,COUNT(articles.id_article) FROM users
LEFT JOIN articles ON articles.id_user_article=users.id_user
GROUP BY users.id_user
... it shows the information correctly. But if I start to add the second table...
SELECT id_user,COUNT(articles.id_article),COUNT(news.id_news) FROM users
LEFT JOIN articles ON articles.id_user_article=users.id_user
LEFT JOIN news ON news.id_user_news=users.id_user
GROUP BY users.id_user
... it doesn't show the correct information.. and if I join all the rest tables, if shows really strange result (thousands of articles for first user, and NULL for the rest).
Which is the correct way of show this information using only one query? Thank you!
You can use a subselect instead of a left join for each table. The final result will be the same but maybe in that way is clearer.
SELECT u.id_user,
(SELECT COUNT(a.id_article)
FROM articles a
WHERE a.id_user_article = u.id_user) AS articles,
(SELECT COUNT(n.news)
FROM news n
WHERE n.id_user_news = u.id_user) AS news
FROM users u
Also if you only uses one column of each table, the subselect is a better option than multiple left joins.
Your problem is that you are joining along different dimensions, which creates cartesian products for each user. The solution by #rafa is actually a fine solution in MySQL. The use of count(distinct) works okay, but only when the counts are not very large. Another approach is to pre-aggregate the results along each dimension:
SELECT u.id_user, a.articles, n.news
FROM users u left outer join
(select id_user_article, count(*) as articles
from articles
group by id_user_article
) a
on u.id_user = a.id_user_article left outer join
(select id_user_news, count(*) as news
from news
group by id_user_news
) n
on u.id_user = n.id_user_news;
EDIT:
If you are using the count(distinct) approach, then you are generating a cross product. If every user had 3 articles and 4 news items, then the users would be multiplied by 12. Probably feasible.
If every user had 300 articles and 400 news items, then every user would be multiplied by 120,000. Probably not feasible.
I have a table named phpbb_pcp_market with these rows: http://pastebin.com/ZAFjawD8 (There are more obviously)
And I have another table named phpbb_pcp_market_cart that looks like this:
+----+---------+-----------+------------+
| id | item_id | player_id | time |
+----+---------+-----------+------------+
| 14 | 49 | 3 | 1384806292 |
+----+---------+-----------+------------+
I need to join these two tables based on item_id, but for some reason it's not working.
This is the query I've used:
SELECT m.*, c.* FROM (phpbb_pcp_market_cart c)
LEFT JOIN phpbb_pcp_market m
ON (c.item_id = m.item_id)
WHERE c.player_id = 3
ORDER BY c.time
For some reason, it's returning nothing.
I can't figure what I did wrong in the query. And no, I'm not good at SQL.
Everything looks fine with your SQL-code.
Look in the rest of your PHP-code if there is something wrong. The bug is not related to the SQL-part ;)
First double check your data, your query seems to be OK.
If you want to select all items for specific player_id don't use LEFT JOIN but simple JOIN, because you will never get rows where it could be NULL.
Also braces can be left out for simplicity:
SELECT m.*, c.* FROM phpbb_pcp_market_cart c
JOIN phpbb_pcp_market m
ON c.item_id = m.item_id
WHERE c.player_id = 3
ORDER BY c.time