Converting subquery to joins for performance - mysql

I have taken over a big project, and as the database is becoming large, some of the code stopped working,
Here is the query to find those rendering_requests who's last rending_log is pending, sometimes there are log entries which have no status change and recorded as noaction we dont need to count them. That is what I understood from the query.
SELECT
COUNT(rr.rendering_id) AS recordCount
FROM
rendering_request rr, rendering_log rl
WHERE
rl.rendering_id = rr.rendering_id
AND rl.status = 'pending' AND
rl.log_id = (
SELECT rl1.log_id
FROM rendering_log rl1
WHERE
rl.rendering_id = rl1.rendering_id AND
rl1.status = 'pending'
AND rl1.log_id = (
SELECT rl2.log_id
FROM rendering_log rl2
WHERE rl1.rendering_id = rl2.rendering_id AND rl2.status!='noaction'
ORDER BY rl2.log_id DESC LIMIT 1
)
ORDER BY rl1.log_id DESC
LIMIT 1
)
for example
rendering_id=1 is having multiple logs
status=noaction
status=noaction
status=pending
and
rendering_id=2 is having multiple logs
status=noaction
status=assigned
status=noaction
status=pending
when we run this query it should display count=1 as only the rendering_id=1 is our desired record.
Right now this query has stopped working, and it hangs the mysql server

Not 100% sure I have got this right, but something like this. Think you still need to use a couple of subselects but (depending on the version of MySQL) doing it this way with JOINs should be a lot faster
SELECT COUNT(rr.rendering_id) AS recordCount
FROM rendering_request rr
INNER JOIN rendering_log rl
ON rl.rendering_id = rr.rendering_id
INNER JOIN (SELECT rendering_id, MAX(log_id) FROM rendering_log WHERE status = 'pending' GROUP BY rendering_id) rl1
ON rl1.rendering_id = rl.rendering_id
AND rl1.log_id = rl.log_id
INNER JOIN (SELECT rendering_id, MAX(log_id) FROM rendering_log WHERE status!='noaction' GROUP BY rendering_id) rl2
ON rl2.rendering_id = rl1.rendering_id
AND rl2.log_id = rl1.log_id
WHERE rl.status = 'pending'

Related

MySQL Query to fetch Distinct Rows with latest status

I have 3 tables, namely - areas, works and jobs.
areas works jobs
----- ----- -----
area_id work_id area_id (FK)
area_name task work_id (FK)
area_type app_area status
updated_at
I'm trying to select the total list of areas cross joined with works such that I have all the permutations for areas vs works, then have the LATEST status of that combination, if it exists. I want distinct rows for each area_id-work_id combination.
I put together the below query statement but some rows have statuses displayed as NULL when they actually exist. My guess is there's something wrong with my inner SELECT statement but try as I may, I could not get it to work, any idea what's wrong with my statement?
SELECT area_name, works.task, jobs.status
FROM areas
CROSS JOIN works ON works.work_id = works.work_id
LEFT JOIN jobs ON jobs.status = (SELECT jobs.status FROM jobs ORDER BY jobs.updated_at DESC LIMIT 1) AND
(jobs.work_id = works.work_id AND jobs.area_id = areas.area_id)
WHERE works.app_area = 'zone' AND areas.area_type = 'zone'
ORDER BY areas.area_id, works.work_id, jobs.updated_at;
Your logic for the last status should be using the date not the status. The logic looks like this:
SELECT a.area_name, w.task, j.status
FROM areas a CROSS JOIN
works w LEFT JOIN
jobs j
ON j.work_id = w.work_id AND j.area_id = a.area_id AND
j.updated_at = (SELECT MAX(j2.updated_at)
FROM jobs j2
WHERE j2.work_id = w.work_id AND j2.area_id = a.area_id
)
WHERE w.app_area = 'zone' AND a.area_type = 'zone'
ORDER BY a.area_id, w.work_id, j.updated_at;
This also fixes some other problems, such as having an ON clause with CROSS JOIN.
If you want to solve it by your own query then please replace this line in the left join sub query
SELECT j.status FROM jobs j ORDER BY jobs.updated_at DESC LIMIT 1
Using Gordon Solution, I think this is another way you can do it. you'll have to test to see which way works faster for you.
SELECT a.area_name, w.task, (SELECT MAX(j2.updated_at)
FROM jobs j2
WHERE j2.work_id = w.work_id AND j2.area_id = a.area_id
) status
FROM areas a CROSS JOIN
works w LEFT JOIN
jobs j
ON j.work_id = w.work_id AND j.area_id = a.area_id
WHERE w.app_area = 'zone' AND a.area_type = 'zone'
ORDER BY a.area_id, w.work_id, j.updated_at;

Understanding why this query is slow

The below query is very slow (takes around 1 second), but is only searching approx 2500 records (+ inner joined tables).
if i remove the ORDER BY, the query runs in much less time (0.05 or less)
OR if i remove the part nested select below "# used to select where no ProfilePhoto specified" it also runs fast, but i need both of these included.
I have indexes (or primary key) on :tPhoto_PhotoID, PhotoID, p.Enabled, CustomerID, tCustomer_CustomerID, ProfilePhoto (bool), u.UserName, e.PrivateEmail, m.tUser_UserID, Enabled, Active, m.tMemberStatuses_MemberStatusID, e.tCustomerMembership_MembershipID, e.DateCreated
(do i have too many indexes? my understanding is add them anywhere i use WHERE or ON)
The Query :
SELECT e.CustomerID,
e.CustomerName,
e.Location,
SUBSTRING_INDEX(e.CustomerProfile,' ', 25) AS Description,
IFNULL(p.PhotoURL, PhotoTable.PhotoURL) AS PhotoURL
FROM tCustomer e
LEFT JOIN (tCustomerPhoto ep INNER JOIN tPhoto p ON (ep.tPhoto_PhotoID = p.PhotoID AND p.Enabled=1))
ON e.CustomerID = ep.tCustomer_CustomerID AND ep.ProfilePhoto = 1
# used to select where no ProfilePhoto specified
LEFT JOIN ((SELECT pp.PhotoURL, epp.tCustomer_CustomerID
FROM tPhoto pp
LEFT JOIN tCustomerPhoto epp ON epp.tPhoto_PhotoID = pp.PhotoID
GROUP BY epp.tCustomer_CustomerID) AS PhotoTable) ON e.CustomerID = PhotoTable.tCustomer_CustomerID
INNER JOIN tUser u ON u.UserName = e.PrivateEmail
INNER JOIN tmembers m ON m.tUser_UserID = u.UserID
WHERE e.Enabled=1
AND e.Active=1
AND m.tMemberStatuses_MemberStatusID = 2
AND e.tCustomerMembership_MembershipID != 6
ORDER BY e.DateCreated DESC
LIMIT 12
i have similar queries that but they run much faster.
any opinions would be grateful:
Until we get more clarity on your question between working in other query etc..Try EXPLAIN {YourSelectQuery} in MySQL client and see the suggestions to improve the performance.

Using count in mysql join

I have been trying to write a sql query to get some stats from bugzilla. Here is the query
select bugs.bug_id AS bug_id,
COUNT(map_pingpong.bug_when) AS re_open,
MAX(map_closetime.bug_when) AS closed_date
from bugs
LEFT JOIN bugs_activity AS map_pingpong
ON ((map_pingpong.bug_id = bugs.bug_id
and map_pingpong.fieldid=15))
LEFT JOIN bugs_activity AS map_closetime
ON ((bugs.bug_id = map_closetime.bug_id
and map_closetime.fieldid=8
and bugs.bug_status = 'CLOSED' ))
where (bugs.assigned_to = 480)
GROUP BY bugs.bug_id
ORDER BY bug_id;
So, the query supposed to return two things
1) Count of an event happening
2) Date of an event happening
So when i break the query into two different queries they are returning right values. If i run it as above, the count values are wrong (Date is correct though). I am not supposed to run two joins on the same table ? or count should not be there when you use join ?
As Barmar said, you have to dissociate the resultsets to get correct counts:
SELECT
bugs.bug_id AS bug_id,
map_pingpong.cnt AS re_open,
map_closetime.mx AS closed_date
FROM bugs
LEFT JOIN (
SELECT bug_id, COUNT(bug_when) AS cnt
FROM bugs_activity
WHERE fieldid = 15
GROUP BY bug_id
) AS map_pingpong ON map_pingpong.bug_id = bugs.bug_id
LEFT JOIN (
SELECT ba.bug_id, MAX(ba.bug_when) AS mx
FROM bugs_activity ba JOIN bugs ON bugs.bug_status = 'CLOSED' AND ba.bug_id = bug.bug_id AND bugs.assigned_to = 480
WHERE ba.fieldid = 8
GROUP BY ba.bug_id
) AS map_closetime ON bugs.bug_id = map_closetime.bug_id
WHERE bugs.assigned_to = 480
GROUP BY bugs.bug_id
ORDER BY bug_id;
As this is correct functionally speaking, it might be a complete disaster in terms of performance so, be careful...

optimize Mysql: get latest status of the sale

In the following query, I show the latest status of the sale (by stage, in this case the number 3). The query is based on a subquery in the status history of the sale:
SELECT v.id_sale,
IFNULL((
SELECT (CASE WHEN IFNULL( vec.description, '' ) = ''
THEN ve.name
ELSE vec.description
END)
FROM t_record veh
INNER JOIN t_state_campaign vec ON vec.id_state_campaign = veh.id_state_campaign
INNER JOIN t_state ve ON ve.id_state = vec.id_state
WHERE veh.id_sale = v.id_sale
AND vec.id_stage = 3
ORDER BY veh.id_record DESC
LIMIT 1
), 'x') sale_state_3
FROM t_sale v
INNER JOIN t_quarters sd ON v.id_quarters = sd.id_quarters
WHERE 1 =1
AND v.flag =1
AND v.id_quarters =4
AND EXISTS (
SELECT '1'
FROM t_record
WHERE id_sale = v.id_sale
LIMIT 1
)
the query delay 0.0057seg and show 1011 records.
Because I have to filter the sales by the name of the state as it would have to repeat the subquery in a where clause, I have decided to change the same query using joins. In this case, I'm using the MAX function to obtain the latest status:
SELECT
v.id_sale,
IFNULL(veh3.State3,'x') AS sale_state_3
FROM t_sale v
INNER JOIN t_quarters sd ON v.id_quarters = sd.id_quarters
LEFT JOIN (
SELECT veh.id_sale,
(CASE WHEN IFNULL(vec.description,'') = ''
THEN ve.name
ELSE vec.description END) AS State3
FROM t_record veh
INNER JOIN (
SELECT id_sale, MAX(id_record) AS max_rating
FROM(
SELECT veh.id_sale, id_record
FROM t_record veh
INNER JOIN t_state_campaign vec ON vec.id_state_campaign = veh.id_state_campaign AND vec.id_stage = 3
) m
GROUP BY id_sale
) x ON x.max_rating = veh.id_record
INNER JOIN t_state_campaign vec ON vec.id_state_campaign = veh.id_state_campaign
INNER JOIN t_state ve ON ve.id_state = vec.id_state
) veh3 ON veh3.id_sale = v.id_sale
WHERE v.flag = 1
AND v.id_quarters = 4
This query shows the same results (1011). But the problem is it takes 0.0753 sec
Reviewing the possibilities I have found the factor that makes the difference in the speed of the query:
AND EXISTS (
SELECT '1'
FROM t_record
WHERE id_sale = v.id_sale
LIMIT 1
)
If I remove this clause, both queries the same time delay... Why it works better? Is there any way to use this clause in the joins? I hope your help.
EDIT
I will show the results of EXPLAIN for each query respectively:
q1:
q2:
Interesting, so that little statement basically determines if there is a match between t_record.id_sale and t_sale.id_sale.
Why is this making your query run faster? Because Where statements applied prior to subSelects in the select statement, so if there is no record to go with the sale, then it doesn't bother processing the subSelect. Which is netting you some time. So that's why it works better.
Is it going to work in your join syntax? I don't really know without having your tables to test against but you can always just apply it to the end and find out. Add the keyword EXPLAIN to the beginning of your query and you will get a plan of execution which will help you optimize things. Probably the best way to get better results in your join syntax is to add some indexes to your tables.
But I ask you, is this even necessary? You have a query returning in <8 hundredths of a second. Unless this query is getting ran thousands of times an hour, this is not really taxing your DB at all and your time is probably better spent making improvements elsewhere in your application.

Group by subkey but if new read if not 1 show 0

Ok I know this is going to sound stupid. But I have tried everything.
Here is my code to start of with
SELECT toD.username AS ToUser,
fromD.username AS FromUser,
rvw.* FROM usermessages AS rvw
LEFT JOIN users AS toD
ON toD.id = rvw.touserid
LEFT JOIN users AS fromD ON fromD.id = rvw.fromuserid
WHERE touserid = '" . $this->userid . "'
AND deleted = '0'
GROUP BY subkey
ORDER BY rvw.read ASC, rvw.created DESC
while this does work, what I am finding is that if there is a new message, and the read is 0 it still shows up as 1. I know this is because I am grouping the rows together.
But am not sure of any other way to do this.
It doesn't work because mysql can return any row from the group no matter how you try to order your set. To find first row in the group using some custom order you have to split it into two tasks - first finding all distinct values for the column you group by and then finding first row in the subquery for every referenced value. So your query should look like:
SELECT toD.username AS ToUser, fromD.username as FromUser, msg.* FROM
( SELECT DISTINCT touserid, subkey
FROM usermessages
WHERE touserid = 'insert_your_id_here'
AND deleted=0 ) msgg
JOIN usermessages msg
ON msg.id = ( SELECT msgu.id
FROM usermessages msgu
WHERE msgu.touserid = msgg.touserid
AND msgu.subkey = msgg.subkey
AND deleted=0
ORDER BY msgu.read ASC, msgu.created DESC
LIMIT 1 )
JOIN users fromD ON msg.fromuserid = fromD.id
JOIN users toD ON msg.touserid = toD.id
Make sure you have an index on (touserid,subkey). Depending on how big your db is you may need more.