Multiple Many to Many Relationships JOINed in MySQL - mysql

For a project I'm working on, I am trying to query a time clock but when I LEFT JOIN multiple many to many (or in a single users' record sense, 1 to many) it creates duplicate entries, so when it's grouped by, the aggregate totals are incorrect.
Given the below mock schema:
And a query:
SELECT
UserTbl.UserID,
CONCAT_WS(", ", UserTbl.LastName, UserTbl.FirstName) AS UserName,
SUM(TIMESTAMPDIFF(MINUTE, TimeClockTbl.StartDateTime, TimeClockTbl.EndDateTime)) AS ClockedInMinutes,
FROM
Users AS UserTbl
LEFT JOIN
TimeClock AS TimeClockTbl
ON UserTbl.UserID = TimeClockTbl.UserID
LEFT JOIN
UserRoles AS UserRoleTbl
ON UserTbl.UserID = UserRoleTbl.UserID
WHERE
UserRoleTbl.RoleID IN (1,2,3)
GROUP BY
UserTbl.UserID
ORDER BY
UserTbl.LastName ASC,
UserTbl.FirstName ASC;
If the user only has 1 role, assigned, it works fine, but if there is a second or third role assigned, it seems to multiply the final result. I considered using a GROUP_CONCAT for the roles and filtering after, but that doesn't seem to be efficient. I also considered subqueries to calculate the clocked in hours for a given user, but I felt that would have the same result. It's also important to note that this is scaled to have a TimeClock table with multiple entries, and a Scheduled table with multiple entries as well.
How can I do this with a decent amount of efficiency?

Simple decision:
SELECT UserTbl.UserID,
CONCAT_WS(", ", UserTbl.LastName, UserTbl.FirstName) AS UserName,
SUM(TIMESTAMPDIFF(MINUTE, TimeClockTbl.StartDateTime, TimeClockTbl.EndDateTime)) AS ClockedInMinutes,
FROM Users AS UserTbl
LEFT JOIN TimeClock AS TimeClockTbl ON UserTbl.UserID = TimeClockTbl.UserID
WHERE UserTbl.UserID IN( SELECT UserID FROM UserRoles WHERE RoleID IN (1,2,3) )
GROUP BY UserTbl.UserID
ORDER BY UserTbl.LastName ASC, UserTbl.FirstName ASC;
Concept for similar situations - consistent join:
SELECT A.*,
SUM(TIMESTAMPDIFF(MINUTE, TimeClockTbl.StartDateTime, TimeClockTbl.EndDateTime)) AS ClockedInMinutes,
MAX(A.RolesTitle) AS RolesTitle
FROM (
SELECT UserTbl.UserID,
CONCAT_WS(", ", UserTbl.LastName, UserTbl.FirstName) AS UserName,
FirstName, LastName,
GROUP_CONCAT(Roles.Title) as RolesTitle
FROM Users AS UserTbl
JOIN UserRoles AS UserRoleTbl ON UserTbl.UserID = UserRoleTbl.UserID
JOIN Roles ON Roles.RoleID=UserRoleTbl.RoleID
WHERE UserRoleTbl.RoleID IN (1,2,3)
GROUP BY UserTbl.UserID
) A
LEFT JOIN TimeClock AS TimeClockTbl ON A.UserID = TimeClockTbl.UserID
GROUP BY A.UserID
ORDER BY A.LastName ASC, A.FirstName ASC;

Related

Very slow sql query for count

I need get report count for each user role, but my sql query very slow (40 sec on good server). My sql query:
SELECT `auth_assignment`.`item_name`, COUNT(*) as count
FROM `report`
LEFT JOIN `company` ON company.id = report.company_id
LEFT JOIN `auth_assignment`
ON auth_assignment.user_id = company.user_id
GROUP BY `auth_assignment`.`item_name`
ORDER BY `count`
auth_assignment.item_name is role type.
auth_assignment has ~23k rows.
company ~11k rows.
reports ~12k rows (one company can have many reports).
report.id and company.id, have binding
First, you are aggregating on a column from the third table in a left join. I'm guessing you don't want NULL for the value, so use inner join or change the order of the tables.
Table aliases make the query easier to write and to read:
SELECT aa.item_name, COUNT(*) as cnt
FROM report r JOIN
company c
ON c.id = r.company_id JOIN
auth_assignment aa
ON aa.user_id = c.user_id
GROUP BY aa.item_name
ORDER BY cnt;
Assuming the join's are correct for the tables, then you just want to be sure that you have indexes. These should go on the columns used for the joins: company(id, user_id), auth_assignment(user_id, item_name).

How to select comment count, Sum of votes, and whether active user has voted

Im having trouble structuring my MySQL query to return an accurate comment count, sum of votes, and the active users vote.
My tables are
wall_posts ( id, message, username, etc )
comments ( id, wall_id, username, text, etc )
votes ( id, wall_id, vote (+1 or -1), username )
My query looks like this
SELECT
wall_posts.*,
COUNT( comments.wall_id ) AS comment_count,
COALESCE( SUM( v1.vote ), 0 ) AS vote_tally,
v2.vote
FROM
wall_posts
LEFT JOIN comments ON wall_posts.id = comments.wall_id
LEFT JOIN votes v1 ON wall_posts.id = v1.wall_id
LEFT JOIN votes v2 ON wall_posts.id = v2.wall_id AND v2.username=:username
WHERE
symbol =: symbol
GROUP BY
wall_posts.id
ORDER BY
date DESC
LIMIT 15
It works for always returning the correct value for the specific active users vote (+1 or -1) or null if hasnt voted. If there are no comments on an item, the total vote sum is correct. If there are any comments, the vote sum will always be equal to the comment count, possibly with a negative sign if there are down votes but always equal to the amount of comments.
I think its obviously the way ive connected my tables but i just cant figure out why its copying the comment count, 1000000 points to someone who can explain this to me :)
You need to perform the aggregate operations in subqueries. Right now instead you're JOINing all of the tables (pre-aggregation) together. If you remove the aggregates (and the GROUP BY) you'll see the large mass of data which doesn't really mean anything.
Instead, try this (note I'm using a VIEW):
CREATE VIEW walls_posts_stats AS
SELECT
wall_posts.id,
COALESCE( comments_stats.comment_count, 0 ) AS comment_count,
COALESCE( votes_stats.vote_tally, 0 ) AS vote_tally
FROM
wall_posts
LEFT OUTER JOIN
(
SELECT
wall_id,
COUNT(*) AS comment_count
FROM
comments
GROUP BY
wall_id
) AS comments_stats ON wall_posts.id = comments_stats.wall_id
LEFT OUTER JOIN
(
SELECT
wall_id,
SUM( vote ) AS vote_tally
FROM
votes
GROUP BY
wall_id
) AS votes_stats ON wall_posts.id = votes_stats.wall_id
Then you can query it JOINed with your original wall data:
SELECT
wall_posts.*, -- note: avoid the use of * in production queries
stats.comment_count,
stats.vote_tally,
user_votes.vote
FROM
wall_posts
INNER JOIN walls_posts_stats AS stats ON wall_posts.id = stats.id
LEFT OUTER JOIN
(
SELECT
wall_id,
vote
FROM
votes
WHERE
username = :username
) AS user_votes ON wall_posts.id = user_votes.wall_id
ORDER BY
date DESC
LIMIT 15
Hypothetically you could combine it into a single large query (basically copy+paste the VIEW body into the INNER JOIN walls_posts_stats clause) but I feel that would introduce maintainability issues.
While MySQL does support views, it does not support parameterized views (aka composable table-valued functions; stored procedures are not composable) so that's why the user_votes subquery isn't in the walls_posts_stats VIEW.

Add results of two mysql queries

How can I summarize results of two queries below?
select firstname, surname, COUNT(*) as Built
from orders
join users on orders.builder = users.id
where bStop > 1461496211 and bStop < 1461582649
group by users.id;
select firstname, surname, COUNT(*) as Built
from production_points
join users on production_points.rewarded = users.id
where Date(datetime) = '2016-04-25'
group by users.id
Same user can be in both tables, so i want to sum his results, don't want two separate lines i.e. first one showing 4 and second one 6. Just total 10
You can use Union.
If this is mysql you can see its syntax here: http://dev.mysql.com/doc/refman/5.7/en/union.html
It is similar in the other DB vendors.
Can you maybe get the result of each and assign them to different variables.
And sum up the variables.
After research as advised by you guys, here is the solution:
select uid, firstname, surname, Count(*) as Built from (
select users.id as uid, firstname, surname from orders join users on orders.builder = users.id where bStop > 1461542400 and bStop < 1461592622
union all
select users.id as uid, firstname, surname from production_points join users on production_points.rewarded = users.id where Date(datetime) ='2016-04-25'
) performance group by uid;

select a column corresponding to max value in two joined tables

I have two tables, say Users and Interviews. One user can have multiple interview records.
Users
-----
UserID
FirstName
LastName
Interviews
----------
InterviewID
UserID
DateOfInterview
I want to get only the latest interview records. Here's my query
select u.UserID, firstname, lastname, max(DateOfInterview) as latestDOI
from users u
left join interviews i
on u.UserID = i.UserID
GROUP BY u.UserID, firstname, lastname
ORDER BY max(DateOfInterview) DESC
How do I update the query to return the InterviewID as well (i.e. the one which corresponds to max(DateOfInterview))?
Instead of using an aggregate function in your select list, you can use an aggregate subquery in your WHERE clause:
select u.UserID, firstname, lastname, i.InterviewId, DateOfInterview as latestDOI
from users u
left join interviews i
on u.UserID = i.UserID
where i.UserId is null or i.DateOfInterview = (
select max(DateOfInterview)
from interviews i2
where i2.UserId = u.UserId
)
That does suppose that max(DateOfInterview) will be unique per user, but the question has no well-defined answer otherwise. Note that the main query is no longer an aggregate query, so the constraints of such queries do not apply.
There are other ways to approach the problem, and it is worthwhile to look into them because a correlated subquery such as I present can be a performance concern. For example, you could use an inline view to generate a table of the per-user latest interview dates, and use joins to that view to connect users with the ID of their latest interview:
select u.*, im.latestDOI, i2.InterviewId
from
users u
left join (
select UserID, max(DateOfInterview) as latestDOI
from interviews i
group by UserID
) im
on u.UserId = im.UserId
left join interviews i2
on im.UserId = i2.UserId and im.latestDOI = i2.DateOfInterview
There are other alternatives, too, some standard and others DB-specific.
Rewrite to use an OUTER APPLY when grabbing your interview, that way you can use order by rather than MAX
select u.UserID, firstname, lastname, LatestInterviewDetails.ID, LatestInterviewDetails.DateOfInterview as latestDOI
from users u
OUTER APPLY (SELECT TOP 1 Id, DateOfInterview
FROM interviews
WHERE interviews.UserID = u.UserId
ORDER BY interviews.DateOfInterview DESC
) as LatestInterviewDetails
Note: This is providing you are using Microsoft SQL Server

Wrong use of inner join function / group function?

I have the following problem with my query:
I have two tables:
Customer
Subscriber
linked together by customer.id=subscriber.customer_id
in the subscriber table, I have records with id_customer=0 (these are email records, that do not have a full customer account)
Now i want to show how many customers I have per day, and how many subscribers with id_customer, and how many subscribers WITH id_customer=0 (emailonlies i call them)
Somehow, i cannot manage to get those emailonlies.
Perhaps it has something to do with not using the right join type.
When i use left join, i get the right amount of customers, but not the right amount of emailonlies. When I use inner join i get the wrong amount of customers. Am i using the group function correctly? i think it has something to do with that.
THIS IS MY QUERY:
` SELECT DATE(c.date_register),
COUNT(DISTINCT c.id) AS newcustomers,
COUNT(DISTINCT s.customer_id) AS newsubscribedcustomers,
COUNT(DISTINCT s.subscriber_id AND s.customer_id=0) AS emailonlies
FROM customer c
LEFT JOIN subscriber s ON s.customer_id=c.id
GROUP BY DATE(c.date_register)
ORDER BY DATE(c.date_register) DESC
LIMIT 10
;`
I'm not entirely sure, but I think in DISTINCT s.subscriber_id AND s.customer_id=0, it runs the AND before the DISTINCT, so the DISTINCT only ever sees true and false.
Why don't you just take
COUNT(DISTINCT s.subscriber_id) - (COUNT(DISTINCT s.customer_id) - 1)?
(The -1 is there because DISTINCT s.customer_id will count 0.)
Got it, only risk is that i get no email onlies if there are no customers on this day, becuase of the left join. But this one works:
SELECT customers.regdatum,customers.customersqty,subscribers.emailonlies
FROM (
(SELECT DATE(c.date_register) AS regdatum,COUNT(DISTINCT c.id) AS customersqty
FROM customer c
GROUP BY DATE(c.date_register)
) AS customers
LEFT JOIN
(SELECT DATE(s.added) AS voegdatum,COUNT(DISTINCT s.subscriber_id) AS emailonlies
FROM subscriber s
WHERE s.customer_id=0
GROUP BY DATE(s.added)
) AS subscribers
ON customers.regdatum=subscribers.voegdatum
)
ORDER BY customers.regdatum DESC
;