Query optimization - mysql

I'm having a problem with this slow query:
SELECT c.*, csc1.changed_status
FROM contract c
LEFT
JOIN contract_status_change csc1
ON csc1.contract_status_change_id =
( SELECT csc2.contract_status_change_id
FROM contract_status_change csc2
WHERE csc2.contract_id = c.contract_id
ORDER
BY csc2.date_changed DESC
LIMIT 1
)
;
I have a contract table and a contract_status_change table, which records statuses against the contract. This query is joining on the latest status with the contract so you can get its current status..
Please can you help me tidy it up?
-edit-
my apologies. I have updated the query to include selecting the actual latest status out. Sorry for the confusion!

After formatting your query for readability (consistent whitespace and capitalization, removing unnecessary backticks and parentheses, more sensible aliases):
SELECT c.*
FROM contract c
LEFT
JOIN contract_status_change csc1
ON csc1.contract_status_change_id =
( SELECT csc2.contract_status_change_id
FROM contract_status_change csc2
WHERE csc2.contract_id = c.contract_id
ORDER
BY csc2.date_changed DESC
LIMIT 1
)
;
and assuming that contract_status_change.contract_status_change_id is a unique identifier, I'm forced to conclude that your query is equivalent to this, much more efficient one:
SELECT c.*
FROM contract c
;
You say that it "is joining on the latest status with the contract so you can get its current status", but it doesn't do anything with the current status — doesn't order by it, doesn't filter by it, doesn't include it in the query results — so there's no need for that.

This should help a bit.
SELECT c.*, csc1.changed_status
FROM contract c LEFT JOIN contract_changed_status csc1 ON c.contract_id = csc1.contract_id
INNER JOIN
(
SELECT contract_id, changed_status, MAX(date_changed) AS 'max_date'
FROM contract_status_changed GROUP_BY contract_id
GROUP BY contract_id
) csc2 ON csc1.contract_id = csc2.contract_id AND csc1.date_changed = csc2.max_date

Related

Speeding up mysql query

I have a mysql query to join four tables and I thought that it was just best to join tables but now that mysql data is getting bigger the query seems to cause the application to stop execution.
SELECT
`purchase_order`.`id`,
`purchase_order`.`po_date` AS po_date,
`purchase_order`.`po_number`,
`purchase_order`.`customer_id` AS customer_id ,
`customer`.`name` AS customer_name,
`purchase_order`.`status` AS po_status,
`purchase_order_items`.`product_id`,
`purchase_order_items`.`po_item_name`,
`product`.`weight` as product_weight,
`product`.`pending` as product_pending,
`product`.`company_owner` as company_owner,
`purchase_order_items`.`uom`,
`purchase_order_items`.`po_item_type`,
`purchase_order_items`.`order_sequence`,
`purchase_order_items`.`pending_balance`,
`purchase_order_items`.`quantity`,
`purchase_order_items`.`notes`,
`purchase_order_items`.`status` AS po_item_status,
`purchase_order_items`.`id` AS po_item_id
FROM `purchase_order`
INNER JOIN customer ON `customer`.`id` = `purchase_order`.`customer_id`
INNER JOIN purchase_order_items ON `purchase_order_items`.`po_id` = `purchase_order`.`id`
INNER JOIN product ON `purchase_order_items`.`product_id` = `product`.`id`
GROUP BY id ORDER BY `purchase_order`.`po_date` DESC LIMIT 0, 20
my problem really is the query that takes a lot of time to finish. Is there a way to speed this query or to change this query for faster retrieval of the data?
heres the EXPLAIN EXTENED as requested in the comments.
Thanks in advance, I really hope this is the right channel for me to ask. If not please let me know.
Will this give you the correct list of ids?
SELECT id
FROM purchase_order
ORDER BY`po_date` DESC
LIMIT 0, 20
If so, then start with that before launching into the JOIN. You can also (I think) get rid of the GROUP BY that is causing an "explode-implode" of rows.
SELECT ...
FROM ( SELECT id ... (as above) ...) AS ids
JOIN purchase_order po ON po.id = ids.id
JOIN ... (the other tables)
GROUP BY ... -- (this may be problematic, especially with the LIMIT)
ORDER BY po.po_date DESC -- yes, this needs repeating
-- no LIMIT
Something like this
SELECT
`purchase_order`.`id`,
`purchase_order`.`po_date` AS po_date,
`purchase_order`.`po_number`,
`purchase_order`.`customer_id` AS customer_id ,
`customer`.`name` AS customer_name,
`purchase_order`.`status` AS po_status,
`purchase_order_items`.`product_id`,
`purchase_order_items`.`po_item_name`,
`product`.`weight` as product_weight,
`product`.`pending` as product_pending,
`product`.`company_owner` as company_owner,
`purchase_order_items`.`uom`,
`purchase_order_items`.`po_item_type`,
`purchase_order_items`.`order_sequence`,
`purchase_order_items`.`pending_balance`,
`purchase_order_items`.`quantity`,
`purchase_order_items`.`notes`,
`purchase_order_items`.`status` AS po_item_status,
`purchase_order_items`.`id` AS po_item_id
FROM (SELECT id, po_date, po_number, customer_id, status
FROM purchase_order
ORDER BY `po_date` DESC
LIMIT 0, 5) as purchase_order
INNER JOIN customer ON `customer`.`id` = `purchase_order`.`customer_id`
INNER JOIN purchase_order_items
ON `purchase_order_items`.`po_id` = `purchase_order`.`id`
INNER JOIN product ON `purchase_order_items`.`product_id` = `product`.`id`
GROUP BY purchase_order.id DESC
LIMIT 0, 5
You need to be sure that purchase_order.po_date and all id column are indexed. You can check it with below query.
SHOW INDEX FROM yourtable;
Since you mentioned that data is getting bigger. I would suggest doing sharding and then you can parallelize multiple queries. Please refer to the following article
Parallel Query for MySQL with Shard-Query
First, I cleaned up readability a bit. You don't need tick marks around every table.column reference. Also, for short-hand, using aliases works well. Ex: "po" instead of "purchase_order", "poi" instead of "purchase_order_items". The only time I would use tick marks is around reserved words that might cause a problem.
Second, you don't have any aggregations (sum, min, max, count, avg, etc.) in your query so you should be able to strip the GROUP BY clause.
As for indexes, I would have to assume you have an index on your reference tables on their respective "id" key columns.
For your Purchase Order table, I would have an index on that based on the "po_date" in the first index field position in case you already had an index using it. Since your Order by is on that, let the engine jump directly to those dated records first and you have your descending order resolved.
SELECT
po.id,
po.po_date,
po.po_number,
po.customer_id,
c.`name` AS customer_name,
po.`status` AS po_status,
poi.product_id,
poi.po_item_name,
p.weight as product_weight,
p.pending as product_pending,
p.company_owner,
poi.uom,
poi.po_item_type,
poi.order_sequence,
poi.pending_balance,
poi.quantity,
poi.notes,
poi.`status` AS po_item_status,
poi.id AS po_item_id
FROM
purchase_order po
INNER JOIN customer c
ON po.customer_id = c.id
INNER JOIN purchase_order_items poi
ON po.id = poi.po_id
INNER JOIN product p
ON poi.product_id = p.id
ORDER BY
po.po_date DESC
LIMIT
0, 20

SQL query needs optimization

SELECT LM.user_id,LM.users_lineup_id, min( LM.total_score ) AS total_score
FROM vi_lineup_master LM JOIN
vi_contest AS C
ON C.contest_unique_id = LM.contest_unique_id join
(SELECT min( total_score ) as total_score
FROM vi_lineup_master
GROUP BY group_unique_id
) as preq
ON LM.total_score = preq.total_score
WHERE LM.contest_unique_id = 'iledhSBDO' AND
C.league_contest_type = 1
GROUP BY group_unique_id
Above query is to find the loser per group of game, query return accurate result but its not responding with large data. How can I optimize this?
You can try to move your JOINs to subqueries. Also, you should pay attention on your "wrong" GROUP BY usage on the outer query. In Mysql you can group by some columns and select others not specified in the group clause without any aggregation function, but the database can't ensure what data it will return to you. For the sake of consistency of your application, wrap them in an aggregation function.
Check if this one helps:
SELECT
MIN(LM.user_id) AS user_id,
MIN(LM.users_lineup_id) AS users_lineup_id,
MIN(LM.total_score) AS total_score
FROM vi_lineup_master LM
WHERE 1=1
-- check if this "contest_unique_id" is equals
-- to 'iledhSBDO' for a "league_contest_type" valued 1
AND LM.contest_unique_id IN
(
SELECT C.contest_unique_id
FROM vi_contest AS C
WHERE 1=1
AND C.contest_unique_id = 'iledhSBDO'
AND C.league_contest_type = 1
)
-- check if this "total_score" is one of the
-- "min(total_score)" from each "group_unique_id"
AND LM.total_score IN
(
SELECT MIN(total_score)
FROM vi_lineup_master
GROUP BY group_unique_id
)
GROUP BY LM.group_unique_id
;
Also, some pieces of this query may seem redundant, but it's because I did not want to change the filters you wrote, just moved them.
Also, your query logic seems a bit strange to me, based on the tables/columns names and how you wrote it... please, check the comments in my query which reflects what I understood of your implementation.
Hope it helps.

Count, Group By, Subquery, Left Join not working as expected

This is puzzling me and no amount of the Google is helping me, hoping someone can point me in the right direction.
Please note that I have omitted some fields from the tables that don't relate to the question just to simplify things.
contacts
contact_id
name
email
contact_uuids
uuid
contact_id
visitor_activity
uuid
event
contact_communications
comm_id
contact_id
employee_id
Query
SELECT
`c`.*,
(SELECT COUNT(`log_id`) FROM `contact_communications` `cc` WHERE `cc`.`contact_id` = `c`.`contact_id`) as `num_comms`,
(SELECT MAX(`date`) FROM `contact_communications` `cc` WHERE `cc`.`contact_id` = `c`.`contact_id`) as `latest_date`,
(SELECT MIN(`date`) FROM `contact_communications` `cc` WHERE `cc`.`contact_id` = `c`.`contact_id`) as `first_date`,
(SELECT COUNT(`vaid`) FROM `visitor_activity` `va` WHERE `va`.`uuid` = `cu`.`uuid`) as `num_act`
FROM `contacts` `c`
LEFT JOIN `contact_uuids` `cu` ON `c`.`contact_id` = `cu`.`contact_id`
GROUP BY `c`.`contact_id`
ORDER BY `c`.`name` ASC
Some contacts have multiple UUIDs (upwards of 20 or 30).
When I perform the query WITHOUT the GROUP BY statement, I get the results I expect - a row returned for each UUID that exists for that contact, with the correct "num_comms" and "num_act" numbers.
However when I add the GROUP BY statement, the "num_comms" is a lot smaller then expected and the "num_act" returns only the value from the first row without the GROUP BY statement.
I tried doing a "WHERE NOT IN" in the subquery, however that simply crashed the server as it was far too intense.
So - how do I get this to add up all the COUNT values from the LEFT JOIN and not just return the first value?
Also if anyone can help me optimize this that would be great.
Two problems:
GROUP BY c.contact_id does not include all the non-aggregate columns. This is a MySQL extension. What you get is random values for the rows other than contact_id
The JOIN adds confusion. Your only use for visitor_activity is the COUNT(*) one it. But that does not make sense since it is limited to one UUID, whereas the row is limited to one contact_id. Rethink the purpose of that.
Remove this line:
(SELECT COUNT(`vaid`) FROM `visitor_activity` `va` WHERE `va`.`uuid` = `cu`.`uuid`) as `num_act`
and the rest may work ok.
I will continue with the assumption that you want the COUNT of all rows in visitor_activity for all the uuids associated with the one contact_id.
See if this:
( SELECT COUNT(*)
FROM `contacts` c2
JOIN `visitor_activity` USING(uuid)
WHERE c2.contact_id = c.contact_id as `num_act` ) AS num_act
will work for the last subquery. At the same time, remove the JOIN:
LEFT JOIN `contact_uuids` `cu` ON `c`.`contact_id` = `cu`.`contact_id`
Now, back to the other problem (the non-standard usage of GROUP BY). Assuming that contact_id is the PRIMARY KEY, then simply remove the
GROUP BY `c`.`contact_id`

How to optimize this complected query?

While working with following query on mysql, Its getting locked,
SELECT event_list.*
FROM event_list
INNER JOIN members
ON members.profilenam=event_list.even_loc
WHERE (even_own IN (SELECT frd_id
FROM network
WHERE mem_id='911'
GROUP BY frd_id)
OR even_own = '911' )
AND event_list.even_active = 'y'
GROUP BY event_list.even_id
ORDER BY event_list.even_stat ASC
The Inner query inside IN constraint has many frd_id, So because of that above query is slooow..., So please help.
Thanks.
Try this:
SELECT el.*
FROM event_list el
INNER JOIN members m ON m.profilenam = el.even_loc
WHERE el.even_active = 'y' AND
(el.even_own = 911 OR EXISTS (SELECT 1 FROM network n WHERE n.mem_id=911 AND n.frd_id = el.even_own))
GROUP BY el.even_id
ORDER BY el.even_stat ASC
You don't need the GROUP BY on the inner query, that will be making the database engine do a lot of unneeded work.
If you put even_own = '911' before the select from network, then if even_own IS 911 then it will not have to do the subquery.
Also why do you have a group by on the subquery?
Also run explain plan top find out what is taking the time.
This might work better:
( SELECT e.*
FROM event_list AS e
INNER JOIN members AS m ON m.profilenam = e.even_loc
JOIN network AS n ON e.even_own = n.frd_id
WHERE n.mem_id = '911'
AND e.even_active = 'y'
ORDER BY e.even_stat ASC )
UNION DISTINCT
( SELECT e.*
FROM event_list AS e
INNER JOIN members AS m ON m.profilenam = e.even_loc
WHERE e.even_own = '911'
AND e.even_active = 'y' )
ORDER BY e.even_stat ASC
Since I don't know whether the JOINs one-to-many (or what), I threw in DISTINCT to avoid dups. There may be a better way, or it may be unnecessary (that is, UNION ALL).
Notice how I avoid two things that are performance killers:
OR -- turned into UNION
IN (SELECT...) -- turned into JOIN.
I made aliases to cut down on the clutter. I moved the ORDER BY outside the UNION (and added parens to make it work right).

Why Does My MySQL Query Using a Subselect Hang?

The following query hangs: (although subqueries perfomed separately are fine)
I don't know how to make the explain table look ok. If someone tells me, I'll clean it up.
select
sum(grades.points)) as p,
from assignments
left join grades using (assignmentID)
where gradeID IN
(select grades.gradeID
from assignments
left join grades using (assignmentID)
where ... grades.date <= '1255503600' AND grades.date >= '984902400'
group by assignmentID order by grades.date DESC);
I think the problem is with the first grades table... the type ALL with that many rows seems to be the cause.. Everything is indexed.
I uploaded the table as an image. Couldn't get the formatting right:
http://imgur.com/AjX34.png
A commenter wanted the full where clause:
explain extended select count(assignments.assignmentID) as asscount, sum(TRIM(TRAILING '-' FROM grades.points)) as p, sum(assignments.points) as t
from assignments left join grades using (assignmentID)
where gradeID IN
(select grades.gradeID from assignments left join grades using (assignmentID) left join as_types on as_types.ID = assignments.type
where assignments.classID = '7815'
and (assignments.type = 30170 )
and grades.contactID = 7141
and grades.points REGEXP '^[-]?[0-9]+[-]?'
and grades.points != '-'
and grades.points != ''
and (grades.pointsposs IS NULL or grades.pointsposs = '')
and grades.date <= '1255503600'
AND grades.date >= '984902400'
group by assignmentID
order by grades.date DESC);
See "The unbearable slowness of IN":
http://www.artfulsoftware.com/infotree/queries.php#568
Super messy, but: (thanks for everyone's help)
SELECT *
FROM grades
LEFT JOIN assignments ON grades.assignmentID = assignments.assignmentID
RIGHT JOIN (
SELECT g.gradeID
FROM assignments a
LEFT JOIN grades g
USING ( assignmentID )
WHERE a.classID = '7815'
AND (
a.type =30170
)
AND g.contactID =7141
g.points
REGEXP '^[-]?[0-9]+[-]?'
AND g.points != '-'
AND g.points != ''
AND (
g.pointsposs IS NULL
OR g.pointsposs = ''
)
AND g.date <= '1255503600'
AND g.date >= '984902400'
GROUP BY assignmentID
ORDER BY g.date DESC
) AS t1 ON t1.gradeID = grades.gradeID
Suppose you use a Real Database (ie, any database except MySQL, but I'll use Postgres as an example) to do this query :
SELECT * FROM ta WHERE aid IN (SELECT subquery)
a Real Database would look at the subquery and estimate its rowcount :
If the rowcount is small (say, less than a few millions)
It would run the subquery, then build an in-memory hash of ids, which also makes them unique, which is a feature of IN().
Then, if the number of rows pulled from ta is a small part of ta, it would use a suitable index to pull the rows. Or, if a major part of the table is selected, it would just scan it entirely, and lookup each id in the hash, which is very fast.
If however the subquery rowcount is quite large
The database would probably rewrite it as a merge JOIN, adding a Sort+Unique to the subquery.
However, you are using MySQL. In this case, it will not do any of this (it is gonna re-execute the subquery for each row of your table) so it will take 1000 years. Sorry.
If your subquery performs fine when it is executed separately, then try using a JOIN rather than IN, like this:
select count(assignments.assignmentID) as asscount, sum(TRIM(TRAILING '-' FROM grades.points)) as p, sum(assignments.points) as t
from assignments left join grades using (assignmentID)
join
(select grades.gradeID from assignments left join grades using (assignmentID) left join as_types on as_types.ID = assignments.type
where assignments.classID = '7815'
and (assignments.type = 30170 )
and grades.contactID = 7141
and grades.points REGEXP '^[-]?[0-9]+[-]?'
and grades.points != '-'
and grades.points != ''
and (grades.pointsposs IS NULL or grades.pointsposs = '')
and grades.date <= '1255503600'
AND grades.date >= '984902400'
group by assignmentID
order by grades.date DESC) using (gradeID);
There really isn't enough information to answer your question, and you've put a ... in the middle of the where clause which is weird. How big are the tables involved and what are the indexes?
Having said that, if there are too many terms in an in clause, you can see seriously degraded performance. Replace the use of in with a right join.
For starters, the table as_types in the in clause is not used. Left joining it serves no purpose so get rid of it.
That leaves the in clause having only the assignments and grades table from the outer query. Clearly the wheres the modify assignments belong in the where clause for the outer query. You should move all of the where grades=whatever into the on clause of the left join to grades.
The query is a little tough to follow, but I suspect that the subquery isn't necessary at all.
It seems like your query is basically thus:
SELECT FOO()
FROM assignments LEFT JOIN grades USING (assignmentID)
WHERE gradeID IN
(
SELECT grades.gradeID
FROM assignments LEFT JOIN grades USING (assignmentID)
WHERE your_conditions = TRUE
);
But, you're not doing anything really fancy in the where clause in the subquery.
I suspect something more like
SELECT FOO()
FROM assignments LEFT JOIN grades USING (assignmentID)
GROUP BY groupings
WHERE your_conditions_with_some_tweaks = TRUE;
would work just as well.
If I'm missing some key logic here please comment back and I'll edit/delete this post.