COUNT(*) FROM different tables with LEFT JOIN - mysql

I have 1 table of users, and 10 tables (articles, news, ...) where I save user's publications. I want to show how many publications has each user, in one query:
| ID_USER | COUNT(id_article) | COUNT(id_news) | etc...
-------------------------------------------------
| 1 | 0 | 3 |
| 2 | 2 | 9 |
| 3 | 14 | 5 |
| 4 | 0 | 0 |
If I use this query to show the number of articles...
SELECT id_user,COUNT(articles.id_article) FROM users
LEFT JOIN articles ON articles.id_user_article=users.id_user
GROUP BY users.id_user
... it shows the information correctly. But if I start to add the second table...
SELECT id_user,COUNT(articles.id_article),COUNT(news.id_news) FROM users
LEFT JOIN articles ON articles.id_user_article=users.id_user
LEFT JOIN news ON news.id_user_news=users.id_user
GROUP BY users.id_user
... it doesn't show the correct information.. and if I join all the rest tables, if shows really strange result (thousands of articles for first user, and NULL for the rest).
Which is the correct way of show this information using only one query? Thank you!

You can use a subselect instead of a left join for each table. The final result will be the same but maybe in that way is clearer.
SELECT u.id_user,
(SELECT COUNT(a.id_article)
FROM articles a
WHERE a.id_user_article = u.id_user) AS articles,
(SELECT COUNT(n.news)
FROM news n
WHERE n.id_user_news = u.id_user) AS news
FROM users u
Also if you only uses one column of each table, the subselect is a better option than multiple left joins.

Your problem is that you are joining along different dimensions, which creates cartesian products for each user. The solution by #rafa is actually a fine solution in MySQL. The use of count(distinct) works okay, but only when the counts are not very large. Another approach is to pre-aggregate the results along each dimension:
SELECT u.id_user, a.articles, n.news
FROM users u left outer join
(select id_user_article, count(*) as articles
from articles
group by id_user_article
) a
on u.id_user = a.id_user_article left outer join
(select id_user_news, count(*) as news
from news
group by id_user_news
) n
on u.id_user = n.id_user_news;
EDIT:
If you are using the count(distinct) approach, then you are generating a cross product. If every user had 3 articles and 4 news items, then the users would be multiplied by 12. Probably feasible.
If every user had 300 articles and 400 news items, then every user would be multiplied by 120,000. Probably not feasible.

Related

SELECT using three tables w/ 1000+ entries

For transaction listing I need to provide the following columns:
log_out.timestamp
items.description
log_out.qty
category.name
storage.name
log_out.dnr ( Representing the users id )
Table structure from log_out looks like this:
| id | timestamp | storageid | itemid | qty | categoryid | dnr |
| | | | | | | |
| 1 | ........ | 2 | 23 | 3 | 999 | 123 |
As one could guess, I only store the corresponding ID's from other tables in this table. Note: log_out.id is the primary key in this table.
To get the the corresponding strings, int's or whatever back, I tried two queries.
Approach 1
SELECT i.description, c.name, s.name as sname, l.*
FROM items i, categories c, storages s, log_out l
WHERE l.itemid = i.id AND l.storageid = s.id AND l.categoryid = c.id
ORDER BY l.id DESC
Approach 2
SELECT log_out.id, items.description, storages.name, categories.name AS cat, timestamp, dnr, qty
FROM log_out
INNER JOIN items ON log_out.itemid = items.id
INNER JOIN storages ON log_out.storageid = storages.id
INNER JOIN categories ON log_out.categoryid = categories.id
ORDER BY log_out.id DESC
They both work fine on my developing machine, which has approx 99 dummy transactions stored in log_out. The DB on the main server got something like 1100+ tx stored in the table. And that's where trouble begins. No matter which of these two approaches I run on the main machine, it always returns 0 rows w/o any error *sigh*.
First I thought, it's because the main machine uses MariaDB instead of MySQL. But after I imported the remote's log_out table to my dev-machine, it does the same as the main machine -> return 0 rows w/o error.
You guys got any idea what's going on ?
If the table has the data then it probably has something to do with JOIN and related records in corresponding tables. I would start with log_out table and incrementally add the other tables in the JOIN, e.g.:
SELECT *
FROM log_out;
SELECT *
FROM log_out
INNER JOIN items ON log_out.itemid = items.id;
SELECT *
FROM log_out
INNER JOIN items ON log_out.itemid = items.id
INNER JOIN storages ON log_out.storageid = storages.id;
SELECT *
FROM log_out
INNER JOIN items ON log_out.itemid = items.id
INNER JOIN storages ON log_out.storageid = storages.id
INNER JOIN categories ON log_out.categoryid = categories.id;
I would execute all the queries one by one and see which one results in 0 records. Additional join in that query would be the one with data discrepancy.
You're queries look fine to me, which makes me think that it is probably something unexpected with the data. Most likely the ids in your joins are not maintained right (do all of them have a foreign key constraint?). I would dig around the data, like SELECT COUNT(*) FROM items WHERE id IN (SELECT itemid FROM log_out), etc, and seeing if the returns make sense. Sorry I can't offer more advise, but I would be interested in hearing if the problem is in the data itself.

Mysql select between two table without limiting if record appear on the joined table

I have been trying to figure out how to select data related to one id between to tables without limit it to the joined table. I tried using UNION, Inner join, JOIN, but it limit me to show records that are only in both tables. By example:
Table 1 (users)
id | name | register
1 | John | 2014-03-01
2 | Kate | 2014-03-02
etc..
Table 2 (birthdays by example)
id | user | birthday
1 | 1 | 1989-09-09
Note that kate dont have a record on the birthdays table, if i do:
SELECT U.id, name, register, B.birthday FROM users as U INNER JOIN birthday as B ON B.user = U.id
it will only shows JOHN data, i would like to select all my users and if the record do not exist on the joined table, still be able to select all my users, sort of:
id | name | register | birthday
1 | John | 2014-03-01 | 1989-09-09
2 | kate | 2014-03-02 | null or ''
3
4
etc.
Sorry if its a stupid question but i dont find the light on this one. I would appreciate the help.
Regards
You need a LEFT OUTER JOIN instead of the plain JOIN (also known as INNER JOIN), like this:
SELECT U.id, name, register, B.birthday
FROM users as U
LEFT JOIN birthday as B
ON B.user = U.id
A LEFT JOIN between users and birthday tables will contain all records of the "left" table (users), even if the join-condition does not find any matching record in the "right" table (birthday).
This excellent article on The Code Project will help you a lot: Visual Representation of SQL Joins.
Summary of all JOIN types:
Note: Mysql does not support FULL OUTER JOIN but it can be emulated. Useful articles:
https://stackoverflow.com/a/4796911
http://www.sql-tutorial.ru/en/book_full_join_and_mysql.html
http://www.xaprb.com/blog/2006/05/26/how-to-write-full-outer-join-in-mysql/
Use left outer join instead of inner join..
SELECT U.id, name, register, B.birthday
FROM users as U left join birthday as B ON B.user = U.id

MySQL select distinct across multiple tables

I have a query that selects all columns from multiple tables, but it's returning multiples of the same values (I only want distinct values).
How can I incorporate something like this? When I try this, it still
Select Distinct A.*, B.*, C.*....
Does distinct only work when selecting the column names and not all (*) ? In this reference it says distinct in reference to column names, not across all of the tables. Is there any way that I can do this?
edit - I added more info below
Sorry guys, I just got back onto my computer. Also, I just realized that my query itself is the issue, and Distinct has nothing to do with it.
So, the overall goal of my Query is to do the following
Generate a list of friends that a user has
Go through the friends and check their activities (posting, adding friends, etc..)
Display a list of friends and their activities sorted by date (I guess like a facebook wall kind of deal).
Here are my tables
update_id | update | userid | timestamp //updates table
post_id | post | userid | timestamp //posts table
user_1 | user_2 | status | timestamp //friends table
Here is my query
SELECT U.* , P.* ,F.* FROM posts AS P
JOIN updates AS U ON P.userid = U.userid
JOIN friends AS F ON P.userid = F.user_2 or F.user_1
WHERE P.userid IN (
select user_1 from friends where user_2 = '1'
union
select user_2 from friends where user_1 = '1'
union
select userid from org_members where org_id = '1'
union
select org_id from org_members where userid = '1'
)
ORDER BY P.timestamp, U.timestamp, F.timestamp limit 30
The issue I'm having with this (that I thought was related to distinct), is that if values are found to meet the requirements in, say table Friends, a value for the Posts table will appear too. This means when I'm displaying the output of the SQL statement, it appears as if the Posts value is shown multiple times, when the actual values I'm looking for are also displayed
The output will appear something like this (notice difference between post value in the rows)
update_id | update | userid | timestamp | post_id | post | userid | timestamp | user_1 | user_2 | status | timestamp
1 | update1 | 1 | 02/01/2013 | 1 | post1| 1 | 2/02/2013| 1 | 2 | 1 | 01/30/2013
1 | update1 | 1 | 02/01/2013 | 2 | post2| 1 | 2/03/2013| 1 | 2 | 1 | 01/30/2013
So, as you can see, I thought I was having a distinct issue (because update1 appeared both times), but the query actually just selects all the values regardless. I get the results I'm looking for in the Post table, but all the other values are returned. So, when I display the table in PHP/HTML, the Post value will display, but I also get duplicates of the updates (just for this example)
When you select distinct *, you select every row, including the one that makes the record unique. If you want something better than what you are getting, you have to type the individual column names in your select clause.
It would be easy if you explain a little more what is the connection between the tables you'r querying, because you can use joins, unions (as mentioned above) or even group by's ...
Your updated post shows one of the JOIN conditions as:
JOIN friends AS F ON P.userid = F.user_2 OR F.user_1
This is equivalent to:
JOIN friends AS F ON (P.userid = F.user_2 OR F.user_1 != 0)
and will include many rows that you did not intend to include.
You probably intended:
JOIN friends AS F ON (P.userid = F.user_2 OR P.userid = F.user_1)
I think you want this:
select *
from tableA
union
select *
from tableB
union
select *
from tableC
This assumes that HHS tables all have the same number of columns and they are of the same data type. This not, you'll have to select specific columns to make it so.

Using SUM with multiple joins in mysql

I've been looking for a solution to this, there's plenty of similar questions but none have any proper answers that helped me solve the problem.
First up, my questions/problem:
I want to sum and count certain columns in a multiple join query
Is it not possible with multiple joins? Do I have to nest SELECT queries?
Here's a SQL dump of my database with sample data: http://pastie.org/private/vq7qkfer5mwyraudb5dh0a
This is the query I thought would do the trick:
SELECT firstname, lastname, sum(goal.goal), sum(assist.assist), sum(gw.gw), sum(win.win), count(played.idplayer) FROM player
LEFT JOIN goal USING (idplayer)
LEFT JOIN assist USING (idplayer)
LEFT JOIN gw USING (idplayer)
LEFT JOIN win USING (idplayer)
LEFT JOIN played USING (idplayer)
GROUP BY idplayer
What I'd like this to produce is a table where the columns for goal, assist, gw, win and played are a sum/count of every row in that column, like so: (with supplied sample data)
+-----------+----------+------+--------+----+-----+--------+
| firstname | lastname | goal | assist | gw | win | played |
+-----------+----------+------+--------+----+-----+--------+
| Gandalf | The White| 10 | 6 | 1 | 1 | 2 |
| Frodo | Baggins | 16 | 2 | 1 | 2 | 2 |
| Bilbo | Baggins | 7 | 3 | 0 | 0 | 2 |
+-----------+----------+------+--------+----+-----+--------+
So, to iterate the above questions again, is this possible with one query and multiple joins?
If you provide solutions/queries, please explain them! I'm new to proper relational databases and I have never used joins before this project. I'd also appreciate if you avoid aliases unless necessary.
I have run the above query without sum and grouping and I get a set of rows for each column I do a SELECT on, which I suspect is then multiplied or added together, but I was under the impression that grouping and/or doing sum(TABLE.COLUMN) would solve that.
Another thing is that, I think, doing a SELECT DISTINCT or any other DISTINCT operation won't work since that will leave out some ("duplicate") results.
PS. If it matters, my dev machine is a WAMP but release will be on ubuntu/apache/mysql/php.
To understand why you're not getting the answers you expect, take a look at this query:
SELECT * FROM player LEFT JOIN goal USING (idplayer)
As you can see, the rows on the left are duplicated for the matching rows on the right. That procedure is repeated for each join. Here's the raw data for your query:
SELECT * FROM player
LEFT JOIN goal USING (idplayer)
LEFT JOIN assist USING (idplayer)
LEFT JOIN gw USING (idplayer)
LEFT JOIN win USING (idplayer)
LEFT JOIN played USING (idplayer)
Those repeated values are then used for the SUM calculations. The SUMs need to be calculated before the rows are joined:
SELECT firstname, lastname, goals, assists, gws, wins, games_played
FROM player
INNER JOIN
(SELECT idplayer, SUM(goal) AS goals FROM goal GROUP BY idplayer) a
USING (idplayer)
INNER JOIN
(SELECT idplayer, SUM(assist) AS assists FROM assist GROUP BY idplayer) b
USING (idplayer)
INNER JOIN
(SELECT idplayer, SUM(gw) AS gws FROM gw GROUP BY idplayer) c
USING (idplayer)
INNER JOIN
(SELECT idplayer, SUM(win) AS wins FROM win GROUP BY idplayer) d
USING (idplayer)
INNER JOIN
(SELECT idplayer, COUNT(*) AS games_played FROM played GROUP BY idplayer) e
USING (idplayer)
SQLFiddle

Problem with one of my LEFT JOIN and SUM the result of it

So I got a question about LEFT JOIN, this code returns different values for totalPoints depending on if the user got the group or not. (if user don't got group or event it returns the correct value)
I just want to grasp how to get the LEFT JOIN flow_has_vote ON flow_has_vote.flow_id=flows.id to work every time. I did a solution before with three query's, one that gets the group and event rule, one that checks if the user got the group or event considering the security and one to get the flow...
And I guess I could solve this by having two query's, one that gets the group and event rules and also check if the user got the group and event and then one that gets the flow depending on the user should have access to it.
Right now I'm getting every information needed in ONE query and then checking with IF statements if it should be printed or not...
So, my question is, is it possible to get the SUM(flow_has_vote.points) AS totalPoints to work this way? And do you know how?
And also I'm a bit curios, is one query the best way to work with this? Would it be justified to use two when you take into account performance?
SELECT
flows.id AS flowId,
flows.security,
SUM(flow__has__vote.points) AS totalPoints,
users.id AS userId,
users.alias,
flows.event_id AS eventId,
events.group_id AS groupId,
events.membershipRules AS eMR,
groups.membershipRules AS gMR,
user__has__group.permission AS userHasGroup,
user__has__event.permission AS userHasEvent
FROM
users,
events LEFT JOIN user__has__event ON user__has__event.user_id = '.$userId.',
groups LEFT JOIN user__has__group ON user__has__group.user_id = '.$userId.',
flows LEFT JOIN flow__has__vote ON flow__has__vote.flow_id=flows.id
WHERE
flows.user_id = users.id AND
events.id = flows.event_id AND
groups.id = events.group_id AND
flows.id='.$flowId
And if you wonder what the SQL-statement is doing, getting the information for the flow(post), the information about the event and group that the flow is in, checking the user access to the group and event and also getting all the votes for the flow...
This is how the tables looks like...
FLOWS id,security,event_id,user_id
USERS id, alias
EVENTS id, name group_id, membershipRules
GROUPS id, name, membershipRules
USER__HAS__GROUP user_id,group_id,permission
USER__HAS__EVENT user_id,event_id,permission
FLOW__HAS__VOTE flow_id,user_id,points
This is the result I wish for...
+--------+----------+-------------+--------+--------+---------+---------+-----+-----+--------------+--------------+
| flowId | security | totalPoints | userId | alias | eventId | groupId | eMR | gMR | userHasGroup | userHasEvent |
+--------+----------+-------------+--------+--------+---------+---------+-----+-----+--------------+--------------+
| 1 | 2 | 1337 | 5 | Pontus | 15 | 2 | 2 | 2 | 4 | 4 |
+--------+----------+-------------+--------+--------+---------+---------+-----+-----+--------------+--------------+
and one more example...
+--------+----------+-------------+--------+--------+---------+---------+-----+-----+--------------+--------------+
| flowId | security | totalPoints | userId | alias | eventId | groupId | eMR | gMR | userHasGroup | userHasEvent |
+--------+----------+-------------+--------+--------+---------+---------+-----+-----+--------------+--------------+
| 1 | 2 | 1337 | 6 | Kezia | 15 | 2 | 2 | 2 | null | null |
+--------+----------+-------------+--------+--------+---------+---------+-----+-----+--------------+--------------+
Enjoy your life ~ Pontus
So, basically the main point (IMHO) is not to include conditions on tables you LEFT JOINed in the WHERE clause, since this makes the LEFT JOIN behave like an INNER JOIN.
Start with trying this query (although I am sure you will have to make adjustments as I am not sure exactly what you want as a result, more about this later):
SELECT
flows.id AS flowId,
flows.security,
SUM(flow__has__vote.points) AS totalPoints,
users.id AS userId,
users.alias,
flows.event_id AS eventId,
events.group_id AS groupId,
events.membershipRules AS eMR,
groups.membershipRules AS gMR,
user__has__group.permission AS userHasGroup,
user__has__event.permission AS userHasEvent
FROM users,
LEFT JOIN user__has__event
ON user__has__event.user_id = users.id,
LEFT JOIN events
ON user__has__event.event_id = events.id
LEFT JOIN user__has__group
ON user__has__group.user_id = users.id,
LEFT JOIN groups
ON user__has__group.group_id = groups.id
AND groups.id = events.group_id
LEFT JOIN flows
ON flows.user_id = users.id
AND events.id = flows.event_id
AND flows.id='.$flowId'
LEFT JOIN flow__has__vote
ON flow__has__vote.flow_id = flows.id
WHERE users.id = '.$userId.'
GROUP BY users.id
Here, I LEFT JOINed everything to the user, and also grouped by the user. I have a feeling you will want to add columns to the group by (flows.id?, events.id?)
Also, you may want to turn some of the LEFT JOINs to JOIN, so you will get only users who have a 'flow', for example.