MySQL left join counts - mysql

I have a left join to a table and want to count columns from it, after grouping by a column of the parent table:
SELECT * , COUNT(list.id) AS listcount, COUNT(uploads.id) AS uploadcount
FROM members
LEFT JOIN lists ON members.id= list.mid
LEFT JOIN uploads ON members.id= uploads.mid
GROUP BY members.id
Assume that a user can have either lists or uploads based on the type of user. Then is above query good enough? If not why?
Or do I have to use this query?
SELECT * , l.listcount, u.uploadcount
FROM members
LEFT JOIN (select count(lists.id) as listscount,mid from lists group by mid) as l
on l.mid = m.id
LEFT JOIN (select count(uploads.id) as uploadscount
,mid from uploads group by mid) as u on u.mid = m.id
GROUP BY members.id
Or correlated subqueries?
SELECT *,
(select count(lists.id) as listscount from lists as l where l.mid = m.id
group by mid) as listcount
(select count(uploads.id) from uploads as u where u.mid = m.id
group by mid) as uploadscount
FROM members
GROUP BY members.id
And which is best solution?

The alias m for members is missing in query 2 and 3. Otherwise they should give the same numbers.
Query 2 (fixed) will perform fastest.
Query 1 is different in that it will give a higher number for uploads, if there are cases of multiple lists per member. After joining to lists, there will be multiple rows for a member too, which will increase the count for uploads. So query 1 is probably wrong.
Also, NULL values are not counted. The manual informs:
COUNT(expr)
Returns a count of the number of non-NULL values of expr in the rows
retrieved by a SELECT statement. The result is a BIGINT value.

Related

Very slow sql query for count

I need get report count for each user role, but my sql query very slow (40 sec on good server). My sql query:
SELECT `auth_assignment`.`item_name`, COUNT(*) as count
FROM `report`
LEFT JOIN `company` ON company.id = report.company_id
LEFT JOIN `auth_assignment`
ON auth_assignment.user_id = company.user_id
GROUP BY `auth_assignment`.`item_name`
ORDER BY `count`
auth_assignment.item_name is role type.
auth_assignment has ~23k rows.
company ~11k rows.
reports ~12k rows (one company can have many reports).
report.id and company.id, have binding
First, you are aggregating on a column from the third table in a left join. I'm guessing you don't want NULL for the value, so use inner join or change the order of the tables.
Table aliases make the query easier to write and to read:
SELECT aa.item_name, COUNT(*) as cnt
FROM report r JOIN
company c
ON c.id = r.company_id JOIN
auth_assignment aa
ON aa.user_id = c.user_id
GROUP BY aa.item_name
ORDER BY cnt;
Assuming the join's are correct for the tables, then you just want to be sure that you have indexes. These should go on the columns used for the joins: company(id, user_id), auth_assignment(user_id, item_name).

How to get count of two fields from two different table with grouping a field from another table in mysql

I have three tables projects, discussions, and comments.
I have tried it like this:
SELECT p.PRO_Name, COUNT( d.DIS_Id ) AS nofdisc, COUNT( c.COM_Id ) AS nofcom
FROM projects p
LEFT JOIN discussions d ON p.PRO_Id = d.PRO_Id
LEFT JOIN comments c ON d.DIS_Id = c.DIS_Id
GROUP BY p.PRO_Name LIMIT 0 , 30
But it's taking all the rows from discussions and the count of comments is the same as the count of discussions.
count counts the number of non-null values of the given parameter. The join you have will create a row per comment, where both dis_id and com_id are not null, so their counts would be the same. Since these are IDs, you could just count the distinct number of occurrences to get the response you'd want:
(EDIT: Added an order by clause as per the request in the comments)
SELECT p.PRO_Name,
COUNT(DISTINCT d.DIS_Id) AS nofdisc,
COUNT(DISTINCT c.COM_Id) AS nofcom
FROM projects p
LEFT JOIN discussions d ON p.PRO_Id = d.PRO_Id
LEFT JOIN comments c ON d.DIS_Id = c.DIS_Id
GROUP BY p.PRO_Name
ORDER BY 2,3
LIMIT 0 , 30

Why doesn't my content field match my MAX(id) field in MySQL?

I'm trying to get a subset of data based on the latest id and dates. It seems that when selecting other fields in the table they are not in sync with the max id and dates returned.
Any idea how I can fix this?
MySQL:
SELECT MAX(m.id) as id, m.sender_id, m.receiver_id, MAX(m.date) as date, m.content, l.username, p.gender
FROM messages m
LEFT JOIN login_users l on l.user_id = m.sender_id
LEFT JOIN profiles p ON p.user_id = l.user_id
WHERE m.receiver_id=3
GROUP BY m.sender_id ORDER BY date DESC LIMIT 0, 7
The data for content isn't the correct one. It seems to be returning random content and not the content that is tied to the row for max id and max date.
Do I need to do some sort of sub select to fix this?
To answer the question in the title, "Why doesn't my content field match my MAX(id) field", that's because there is no guarantee that the values returned for the non-aggregate fields will be from the row where the MAX value is found. This is the documented behavior, and this is what we expect.
Other DBMS would throw an error on the statement, MySQL is just more lax, and you are getting values from one row, but it's not guaranteed to be the row that either of the MAX values (id or date) is found on.
You have two separate aggregate expression MAX(m.id) and MAX(m.date). Note that there is no guarantee that those values will come from the same row.
The rule in other databases is that every non-aggregate expression in the SELECT list needs to appear in the GROUP BY. (MySQL is more lax about that, and doesn't make that a requirement.)
One way to "fix" the query so that it does return values from the row with the MAX value is to use an inline view (query) that gets the MAX(id) grouped by what you want to GROUP BY, and then a JOIN back to the original table to get other values on the row.
From your statement it's not clear what result set you want returned. If you want the row that has the maximum id and you also want the row with maximum date, then you could something like this:
SELECT m.id
, m.sender_id
, m.receiver_id
, m.date
, m.content
, l.username
, p.gender
FROM ( SELECT t.sender_id
, t.receiver_id
, MAX(t.id) AS max_id
, MAX(t.date) AS max_date
FROM messages t
WHERE t.receiver_id=3
GROUP
BY t.sender_id
, t.receiver_id
) s
JOIN messages m
ON m.sender_id = s.sender_id
AND m.receiver_id = s.receiver_id
AND ( m.id = s.max_id OR m.date = s.max_date)
LEFT
JOIN login_users l on l.user_id = m.sender_id
LEFT
JOIN profiles p ON p.user_id = l.user_id
ORDER BY m.date DESC LIMIT 0, 7
The inline view aliased as "s" returns the max values, and then that gets joined back to the messages table, aliased as "m".
NOTE
In most cases, we find that a JOIN (query) will perform better than an IN (query), because of the different access plans. You can see the difference in plans with an EXPLAIN.
For performance, you'll want an index
... ON messages (`receiver_id`, `sender_id`, `id`, `date`)
There's an equality predicate on receiver_id, so that should be the leading column, to get a range scan (instead of a full scan). You want the sender_id column next, because that should allow MySQL to avoid a "Using filesort" operation to get the rows grouped. The id and date columns are included, so that the inline view query can be satisfied entirely from the index pages without a need to access the pages in the table. (The EXPLAIN should show "Using where; Using index".)
That same index should also suitable for the outer query, though it does need to access the "content" column from the table pages, so the EXPLAIN will not show "Using index" for that step. (It's likely that the "content" column is much longer than we would want in the index.)
Using a join
SELECT LatestM.id, m.sender_id, m.receiver_id, m.date, m.content, l.username, p.gender
(
SELECT sender_id, MAX(id) AS id
FROM messages
WHERE receiver_id=3
GROUP BY sender_id
) LatestM
INNER JOIN messages m
ON LatestM.sender_id = m.sender_id AND LatestM.id = m.id
LEFT JOIN login_users l on l.user_id = m.sender_id
LEFT JOIN profiles p ON p.user_id = l.user_id
WHERE m.receiver_id = 3
ORDER BY date DESC
LIMIT 0, 7
Problem with this is that if the latest id does not reflect the latest date then the date returned will not be the latest one.
Well, you could probably solve it without a subselect, but doing one is fairly straight forward. Something like this should work, just make the subselect return the id's of the interesting rows in messages, and get the data for only them.
SELECT m.id as id, m.sender_id, m.receiver_id, m.date as date,
m.content, l.username, p.gender
FROM messages m
LEFT JOIN login_users l on l.user_id = m.sender_id
LEFT JOIN profiles p ON p.user_id = l.user_id
WHERE m.id IN (
SELECT max(id) FROM messages
WHERE receiver_id=3
GROUP BY sender_id
)
ORDER BY date DESC
LIMIT 0, 7
The reason that your original query does not match up fields is that GROUP BY really requires aggregate functions (like MAX/MIN/SUM/...) applied to every field you select that's not grouped by. The reason the query even runs is that MySQL does not enforce that, but instead returns indeterminate fields from any row that is matching. Afaik, all other SQL RDBMS' refuse to run the query.
EDIT: As for performance, a few indexes that are likely to help are;
CREATE INDEX ix_inner ON messages(receiver_id, sender_id, id);
CREATE INDEX ix_login_users ON login_users(user_id);
CREATE INDEX ix_profiles ON profiles(user_id);

Combining RIGHT JOIN with COUNT

I'm trying to get a list of the number of entries in the changes_cc table by each user. Not all users have made entries into it, however for some reason it's returning "1" for each user that has 0 entries. I'm assuming that it's because it's counting the entries in the JOINed table. How can I make it so that it is "0" instead?
SELECT COUNT(*) as num, users.id, realname, username
FROM changes_cc
RIGHT JOIN users
ON changes_cc.user_id = users.id
GROUP BY users.id
I think this should work -- count a specific field in the changes_cc table vs counting *:
SELECT u.id, realname, username, COUNT(c.id) as num
FROM users u
LEFT JOIN changes_cc c
ON u.user_id = c.id
GROUP BY u.id
I prefer reading a LEFT JOIN over a RIGHT JOIN, but they are both OUTER JOINs and work the same.
You should not be using COUNT(*) (counts the record including null values) because it will normally give atleast 1 since it returns all records from the right table. If you specify the column name to be counted, it will gove you the result you want because COUNT only counts for NON_NULL value.
SELECT COUNT(changes_cc.user_id) as num,
users.id,
realname,
username
FROM changes_cc
RIGHT JOIN users
ON changes_cc.user_id = users.id
GROUP BY users.id
Instead of using count(*), use count(changes_cc.user_id).
The problem is that you are counting rows (with the *) rather than counting the non-NULL values in the "right-joined" table.

SQL Join returns only one record

I'm writing a query whereby I'm trying to count the total number of records in report and assignment table, whiles at the same time retrieving information from the main table group. Group has a primary key id which is saved in the other tables as gid. This is the query:
SELECT `group`.`id` AS `gid`
, `group`.`name` AS `g_name`
, COUNT(`report`.`id`) AS `reports`
FROM `group`
LEFT OUTER JOIN `report` ON `report`.`gid` = `group`.`id`
LEFT OUTER JOIN `assignment` ON `assignment`.`gid` = `group`.`id`
WHERE `group`.`active` = 0
ORDER BY
`group`.`name`;
My problem is whenever I execute this only one record is returned even if theirs multiple groups.
Thanks in advance.
Well, your query is far from correct :) First of all, you should not have aggregated functions (in this case count) without a group by clause. Now, even if you have that clause the query will summarize information and you want both: the detail and a summary in the same query. I'd recommend 2 separate queries to retrieve this information, but if you want information mixed in only one query (the detail and also the "total number of records in report and assignment table") try the following query:
SELECT
`group`.id AS gid,
`group`.name AS g_name,
(SELECT COUNT(*) from report) as ReportTotalCount,
(SELECT COUNT(*) from assignment) as AssignmentTotalCount,
FROM `group`
WHERE `group`.`active` = 0
LEFT OUTER JOIN report ON report.gid = `group`.id
LEFT OUTER JOIN assignment ON assignment.gid = `group`.id
ORDER BY `group`.name;
I whish I could understand exactly what you're looking for but this might give you an idea on how to get the result you expect.
Can't see anything obvious in your query that would limit it to returning one record.
You are going to have to break it up to see where the problem is against your existing data.
So how many groups where acitive = 0, ahow many with a corresponding assignment record, etc.
maybe it will help:
SELECT
groupid,
groupname,
reports,
assignments,
FROM
(SELECT group.id, group.name, COUNT(*) AS reports from group
INNER JOIN report ON (report.gid = group.id)
WHERE group.active = 0
GROUP BY group.id ) AS ReportForGroup
CROSS JOIN
(SELECT group.id AS groupid, group.name AS groupname, COUNT(*) AS assignments from group
INNER JOIN assignmentON (assignment.gid = group.id)
WHERE group.active = 0
GROUP BY group.id ) AS AssignmentForGroup
ON (ReportForGroup.groupid = AssignmentForGroup.groupid)
ORDER BY groupname;
I'm can't check it so if LEFT JOIN returns to COUNT(*) 0 or 1. if it returns 0 just change the INNERs to LEFTs and use INNER JOIN between the two queries