Using count in mysql join - mysql

I have been trying to write a sql query to get some stats from bugzilla. Here is the query
select bugs.bug_id AS bug_id,
COUNT(map_pingpong.bug_when) AS re_open,
MAX(map_closetime.bug_when) AS closed_date
from bugs
LEFT JOIN bugs_activity AS map_pingpong
ON ((map_pingpong.bug_id = bugs.bug_id
and map_pingpong.fieldid=15))
LEFT JOIN bugs_activity AS map_closetime
ON ((bugs.bug_id = map_closetime.bug_id
and map_closetime.fieldid=8
and bugs.bug_status = 'CLOSED' ))
where (bugs.assigned_to = 480)
GROUP BY bugs.bug_id
ORDER BY bug_id;
So, the query supposed to return two things
1) Count of an event happening
2) Date of an event happening
So when i break the query into two different queries they are returning right values. If i run it as above, the count values are wrong (Date is correct though). I am not supposed to run two joins on the same table ? or count should not be there when you use join ?

As Barmar said, you have to dissociate the resultsets to get correct counts:
SELECT
bugs.bug_id AS bug_id,
map_pingpong.cnt AS re_open,
map_closetime.mx AS closed_date
FROM bugs
LEFT JOIN (
SELECT bug_id, COUNT(bug_when) AS cnt
FROM bugs_activity
WHERE fieldid = 15
GROUP BY bug_id
) AS map_pingpong ON map_pingpong.bug_id = bugs.bug_id
LEFT JOIN (
SELECT ba.bug_id, MAX(ba.bug_when) AS mx
FROM bugs_activity ba JOIN bugs ON bugs.bug_status = 'CLOSED' AND ba.bug_id = bug.bug_id AND bugs.assigned_to = 480
WHERE ba.fieldid = 8
GROUP BY ba.bug_id
) AS map_closetime ON bugs.bug_id = map_closetime.bug_id
WHERE bugs.assigned_to = 480
GROUP BY bugs.bug_id
ORDER BY bug_id;
As this is correct functionally speaking, it might be a complete disaster in terms of performance so, be careful...

Related

Query gets very slows if i add a where

SELECT a.emp_id,s.name, s.department,s.register, z.Compoff_Count as Extra, ifnull(COUNT(DISTINCT TO_DAYS(a.punchdate)),0) as Monthly_Count
FROM machinedata a left join
(SELECT a.emp_id, ifnull(COUNT(DISTINCT TO_DAYS(a.punchdate)),0) as Compoff_Count
FROM machinedata a
RIght JOIN time_dimension c on c.db_date = a.punchdate
where ( year(c.db_date) = 2016 and month(c.db_date) = 8 and (c.holiday_flag = 't' or c.weekend_flag ='t' ))
GROUP BY a.emp_id) Z
on z.emp_id = a.emp_id
RIght JOIN time_dimension c on c.db_date = a.punchdate
left join emp s on s.emp_id = a.emp_id
where (year(c.db_date) = 2016 and month(c.db_date) = 8 and c.holiday_flag = 'f' and c.weekend_flag ='f' )
GROUP BY emp_id
The above query works fine.. but if i add s.department='yes' in the last where the query takes more than 40 seconds.
What shall i do to improve the query performance ?
Your initial query can be simplified I believe by using "conditional aggregates" which places case expressions inside the count() function. This avoids repeated sans of data and unnecessary joins to derived tables.
You should also avoid using functions on data to suit where clause conditions i.e. Instead of YEAR() and MONTH() simply use date boundaries. This allows an index on the date column to be used in the query execution.
I'm not sure if you really need to use TO_DAYS() but I suspect it isn't needed either.
SELECT
a.emp_id
, s.name
, s.department
, s.register
, COUNT(DISTINCT CASE WHEN (c.holiday_flag = 't' OR
c.weekend_flag = 't') THEN c.db_date END) AS Compoff_Count
, COUNT(DISTINCT CASE WHEN NOT (c.holiday_flag = 't' OR
c.weekend_flag = 't') THEN a.punchdate END) AS Monthly_Count
FROM time_dimension c
LEFT JOIN machinedata a ON c.db_date = a.punchdate
LEFT JOIN emp s ON a.emp_id = s.emp_id
WHERE c.db_date >= '2016-08-01'
AND c.db_date < '2016-09-01'
GROUP BY
a.emp_id
, s.name
, s.department
, s.register
If this re-write produces correct results then you could try adding and s.department='yes' into the where clause to assess the impact. If it is still substantially slower then get an explain plan and add it to the question. The most likley cause of slowness is lack of an index but without an explain plan it's not possible to be certain.
Please note that this suggestion is just that; and is prepared without sample data and expected results.

SQL query that combines 2 into 1, specifically it counts the number of people in each group for each group

I've found a few posts in here that are similar, but doesn't work with what i'd like to do...
similar post: Trying to write a query that counts multiple things with different where cases
similar post: Query that Counts records with a WHERE clause
what I want to do is I have some... 200 groups, and within those groups are people with specific application dates. I want a count of how many people are in those groups that have a application date that falls within a specific range.
So this is the first method i've been using, but it only works for 1 group at a time
SELECT count(*) as count
FROM membersapplication ma
INNER JOIN members mb on mb.mbr_id = ma.mbr_id
WHERE (GPL_ID = 20179) and (ma.mpl_effectivedate >= '2/01/2015' and ma.mpl_effectivedate <= '4/30/2015') and (ma.mpl_cancellationdate is null)
This code takes the count of anyone that falls under GPL_ID 20179 (group placement id), i have 200 GPL_ID's that I would like this to run for, there is never a duplicate GPL_ID.
SELECT Gr.GPL_ID, Gr.GPL_Effectivedate, G.GRP_Enrolltype, G.GRP_Name, G.GRP_ID, G.GRP_Executive
FROM groupsreview gr
INNER JOIN groups g on gr.grp_ID = g.grp_ID
WHERE (GRP_ENROLLTYPE = 1) and (gp.gpl_effectivedate >= '4/30/2014' and gp.gpl_effectivedate <= '4/30/2015')
order by grp_name asc
This code gives me a list of every GPL_ID that I want (based off GRP_Enrolltype = 1) that falls within my desired date range
I basically would like to combine the two codes so that the 2nd set of code adds another column that has a count based off the fist code
Seems you really just need add GROUP BY to your query:
SELECT ma.GPL_ID, count(*) as count
FROM membersapplication ma
INNER JOIN members mb
ON mb.mbr_id = ma.mbr_id
where (ma.mpl_effectivedate >= '2/01/2015' and ma.mpl_effectivedate <= '4/30/2015')
AND (ma.mpl_cancellationdate is null)
GROUP BY ma.GPL_ID
This should do it. But I would double check the dates, I just used the ones you supplied; they don't match, and I am not sure if they should:
SELECT ma.GPL_ID, count(*) as count
FROM groups g
INNER JOIN groupsreview AS gr ON g.grp_ID = gr.grp_ID
INNER JOIN membersapplication AS ma ON gr.GPL_ID = ma.GPL_ID
INNER JOIN members AS mb ON mb.mbr_id = ma.mbr_id
WHERE g.GRP_ENROLLTYPE = 1
AND gr.gpl_effectivedate BETWEEN 20140430 AND 20150430
AND ma.mpl_effectivedate BETWEEN 20150201 and 20150430
AND ma.mpl_cancellationdate IS NULL
GROUP BY ma.GPL_ID
;
Judging from your question's wording, it feels a little odd to group by GPL_ID instead of grp_ID.
Not sure if this will work, but I can give it a try:
SELECT
*
FROM
(SELECT
count(*) as count, GPL_ID
FROM
membersapplication ma
inner join members mb ON mb.mbr_id = ma.mbr_id
where
(ma.mpl_effectivedate >= '2/01/2015'
and ma.mpl_effectivedate <= '4/30/2015')
and (ma.mpl_cancellationdate is null)
GROUP BY GPL_ID) T1
INNER JOIN
(SELECT
Gr.GPL_ID,
Gr.GPL_Effectivedate,
G.GRP_Enrolltype,
G.GRP_Name,
G.GRP_ID,
G.GRP_Executive
FROM
groupsreview gr
inner join groups g ON gr.grp_ID = g.grp_ID
WHERE
(GRP_ENROLLTYPE = 1)
and (gp.gpl_effectivedate >= '4/30/2014'
and gp.gpl_effectivedate <= '4/30/2015')) T2
ON T1.GPL_ID = T2.GPL_ID
Basically you should approach this by combining joins and then grouping on GPL_ID along with a having clause. Here's what came up with.
SELECT Gr.GPL_ID, Gr.GPL_Effectivedate, G.GRP_Enrolltype, G.GRP_Name, G.GRP_ID, G.GRP_Executive
count(*) as grp_count
FROM membersapplication ma
INNER JOIN members mb on mb.mbr_id = ma.mbr_id
INNER JOIN groupsreview gr on mb.GPL_ID = gr.GPL_ID
INNER JOIN groups g on gr.grp_ID = g.grp_ID
WHERE (GRP_ENROLLTYPE = 1) and (gp.gpl_effectivedate >= '4/30/2014' and gp.gpl_effectivedate <= '4/30/2015')
GROUP BY Gr.GPL_ID, Gr.GPL_Effectivedate, G.GRP_Enrolltype, G.GRP_Name, G.GRP_ID, G.GRP_Executive
HAVING (ma.mpl_effectivedate >= '2/01/2015' and ma.mpl_effectivedate <= '4/30/2015') and (ma.mpl_cancellationdate is null)
order by grp_name asc
Hopefully this helps

MySQL Help: Return invoices and payments by date

I am having trouble getting a MySQL query to work for me. Here is the setup.
A customer has asked me to compile a report from some accounting data. He wants to select a date (and possibly other criteria) and have it return all of the following (an OR statement):
1.) All invoices that were inserted on or after that date
2.) All invoices regardless of their insert date that have corresponding payments in a separate table whose insert dates are on or after the selected date.
The first clause is basic, but I am having trouble pairing it with the second.
I have assembled a comparable set of test data in an SQL Fiddle. The query that I currently have is provided.
http://www.sqlfiddle.com/#!2/d8d9c/3/2
As noted in the comments of the fiddle, I am working with July 1, 2013 as my selected date. For the test to work, I need invoices 1 through 5 to appear, but not invoice #6.
Try this: http://www.sqlfiddle.com/#!2/d8d9c/9
Here are the summarized changes
I got rid of your GROUP BY. You did not have any aggregate functions. I used DISTINCT instead to eliminate duplicate records
I removed your implicit joins and put explicit joins in their place for readability. Then I changed them to LEFT JOINs. I am not sure what your data looks like but at a minimum, I would assume you need the payments LEFT JOINed if you want to select an invoice that has no payments.
This will probably get you the records you want, but those subselects in the SELECT clause may perform better as LEFT JOINs then using the SUM function
Here is the query
SELECT DISTINCT
a.abbr landowner,
CONCAT(f.ForestLabel, '-', l.serial, '-', l.revision) leasenumber,
i.iid,
FROM_UNIXTIME(i.dateadded,'%M %d, %Y') InvoiceDate,
(SELECT IFNULL(SUM(ch.amount), 0.00) n FROM test_charges ch WHERE ch.invoiceid = i.iid) totalBilled,
(SELECT SUM(p1.amount) n FROM test_payments p1 WHERE p1.invoiceid = i.iid AND p1.transtype = 'check' AND p1.status = 2) checks,
(SELECT SUM(p1.amount) n FROM test_payments p1 WHERE p1.invoiceid = i.iid AND p1.transtype = 'ach' AND p1.status = 2) ach,
CASE WHEN i.totalbilled < 0 THEN i.totalbilled * -1 ELSE 0.00 END credits,
CASE WHEN i.balance >= 0 THEN i.balance ELSE 0.00 END balance,
t.typelabel, g.groupname
FROM test_invoices i
LEFT JOIN test_contracts c
ON i.contractid = c.cid
LEFT JOIN test_leases l
ON c.leaseid = l.bid
LEFT JOIN test_forest f
ON l.forest = f.ForestID
LEFT JOIN test_leasetypes t
ON l.leasetype = t.tid
LEFT JOIN test_accounts a
ON l.account = a.aid
LEFT JOIN test_groups g
ON c.groupid = g.gid
LEFT JOIN test_payments p
ON p.invoiceid = i.iid
WHERE (i.dateadded >= #startdate) OR (p.dateadded >= #startdate)
Try this.
http://www.sqlfiddle.com/#!2/d8d9c/11/2
TL;DR:
… AND (i.dateadded > #startdate
OR EXISTS (
SELECT * FROM test_payments
WHERE test_payments.invoiceid = i.iid
AND test_payments.dateadded >= #startdate))

How to optimize count and group by subquery operation

My host is saying that the following query is taking lots of Server CPU. Please tell me how can I optimize it.
SELECT COUNT(*) FROM (SELECT COUNT(*) AS tot,wallpapers.*,resolutions.res_height,resolutions.res_width FROM wallpapers
INNER JOIN analytics ON analytics.`wall_id` = wallpapers.`wall_id`
INNER JOIN resolutions ON resolutions.`res_id` = wallpapers.`res_id`
WHERE analytics.ana_date >= '2013-09-01 16:36:56' AND wallpapers.wall_status = 'public'
GROUP BY analytics.`wall_id`) as Q
Please note that the analytics table contains the records for all the pageviews and clicks. So it is very very large.
As far as I can tell, your query just counts distinct wall_id values after filtering via the joins and the WHERE clause. Something like this should be close:
SELECT COUNT(DISTINCT analytics.wall_id)
FROM wallpapers
INNER JOIN analytics ON analytics.wall_id = wallpapers.wall_id
INNER JOIN resolutions ON resolutions.res_id = wallpapers.res_id
WHERE analytics.ana_date >= '2013-09-01 16:36:56'
AND wallpapers.wall_status = 'public'
This is your query:
SELECT COUNT(*)
FROM (SELECT COUNT(*) AS tot, wallpapers.*, resolutions.res_height, resolutions.res_width
FROM wallpapers INNER JOIN
analytics
ON analytics.`wall_id` = wallpapers.`wall_id` INNER JOIN
resolutions
ON resolutions.`res_id` = wallpapers.`res_id`
WHERE analytics.ana_date >= '2013-09-01 16:36:56' AND
wallpapers.wall_status = 'public'
GROUP BY analytics.`wall_id`
) as Q
The subquery requires extra effort as does the group by. You can replace this with:
SELECT COUNT(distinct analytics.wall_id)
FROM wallpapers INNER JOIN
analytics
ON analytics.`wall_id` = wallpapers.`wall_id` INNER JOIN
resolutions
ON resolutions.`res_id` = wallpapers.`res_id`
WHERE analytics.ana_date >= '2013-09-01 16:36:56' AND
wallpapers.wall_status = 'public';
You might then be able to do further optimizations using indexes, but it would be helpful to see an explain of this query and the current indexes on the tables.

Converting subquery to joins for performance

I have taken over a big project, and as the database is becoming large, some of the code stopped working,
Here is the query to find those rendering_requests who's last rending_log is pending, sometimes there are log entries which have no status change and recorded as noaction we dont need to count them. That is what I understood from the query.
SELECT
COUNT(rr.rendering_id) AS recordCount
FROM
rendering_request rr, rendering_log rl
WHERE
rl.rendering_id = rr.rendering_id
AND rl.status = 'pending' AND
rl.log_id = (
SELECT rl1.log_id
FROM rendering_log rl1
WHERE
rl.rendering_id = rl1.rendering_id AND
rl1.status = 'pending'
AND rl1.log_id = (
SELECT rl2.log_id
FROM rendering_log rl2
WHERE rl1.rendering_id = rl2.rendering_id AND rl2.status!='noaction'
ORDER BY rl2.log_id DESC LIMIT 1
)
ORDER BY rl1.log_id DESC
LIMIT 1
)
for example
rendering_id=1 is having multiple logs
status=noaction
status=noaction
status=pending
and
rendering_id=2 is having multiple logs
status=noaction
status=assigned
status=noaction
status=pending
when we run this query it should display count=1 as only the rendering_id=1 is our desired record.
Right now this query has stopped working, and it hangs the mysql server
Not 100% sure I have got this right, but something like this. Think you still need to use a couple of subselects but (depending on the version of MySQL) doing it this way with JOINs should be a lot faster
SELECT COUNT(rr.rendering_id) AS recordCount
FROM rendering_request rr
INNER JOIN rendering_log rl
ON rl.rendering_id = rr.rendering_id
INNER JOIN (SELECT rendering_id, MAX(log_id) FROM rendering_log WHERE status = 'pending' GROUP BY rendering_id) rl1
ON rl1.rendering_id = rl.rendering_id
AND rl1.log_id = rl.log_id
INNER JOIN (SELECT rendering_id, MAX(log_id) FROM rendering_log WHERE status!='noaction' GROUP BY rendering_id) rl2
ON rl2.rendering_id = rl1.rendering_id
AND rl2.log_id = rl1.log_id
WHERE rl.status = 'pending'