PFB a sql query I am trying to run. The final output returns around 60k rows, but it takes close to 5 hours to run. There is no problem with the connections and stuff like that and I feel that my query needs to be optimized heavily. Can somebody please point me in the right direction?
SELECT
rapidview.name AS RapidView,
CASE
WHEN linktype.LINKNAME ="jira_subtask_link"
THEN sprintdest.name
ELSE sprint.name
END AS Sprint,
j.pkey AS CaseKey,
-- Sub task arent assigned sprint details, they are directly pulled from parent task, so that
-- logic is implemented here for pulling all sprint related info
CASE
WHEN linktype.LINKNAME ="jira_subtask_link"
THEN FROM_UNIXTIME(sprintdest.start_date/1000)
ELSE FROM_UNIXTIME(sprint.start_date/1000)
END AS SprintStartDate,
CASE
WHEN linktype.LINKNAME ="jira_subtask_link"
THEN FROM_UNIXTIME(sprintdest.END_DATE/1000)
ELSE FROM_UNIXTIME(sprint.END_DATE/1000)
END AS SprintEndDate,
StoryPoints.numbervalue AS StoryPoint,
c.cname AS Component,
it.pname AS Type,
p.pname AS Project,
iss.pname AS Status,
dest.pkey AS linkedissue,
dest.id AS destid,
dest.created AS linkedissuecreated,
(cglinkedissue.created) AS LinkedIssueClosedDate,
linktype.LINKNAME AS LinkType,
cfoowner.customvalue AS Owner,
j.created AS Created,
cg.created AS ClosedDate,
CASE
WHEN linktype.LINKNAME ="jira_subtask_link"
THEN (
CASE
WHEN sprintdest.started=true
AND sprintdest.closed=false
THEN "Current Sprint"
WHEN sprintdest.started=true
AND sprintdest.closed=true
THEN "Completed Sprint"
WHEN sprintdest.started=false
AND sprintdest.closed=false
THEN "Future Sprint"
END)
ELSE (
CASE
WHEN sprint.started=true
AND sprint.closed=false
THEN "Current Sprint"
WHEN sprint.started=true
AND sprint.closed=true
THEN "Completed Sprint"
WHEN sprint.started=false
AND sprint.closed=false
THEN "Future Sprint"
END)
END AS SprintStatus,
j.TIMEORIGINALESTIMATE/3600 AS EstimatedTime,
j.TIMEESTIMATE/3600 AS RemainingTime,
j.TIMESPENT/3600 AS LoggedHours ,
cg.id AS CGID,
ci.groupid AS cigroupid,
ci.field AS CIFIELD,
ci.newstring AS NEWSTRING
-- DevLead.stringvalue as DevLead,
-- PMLead.stringvalue as PMLead,
-- QaLead.stringvalue as QALead,
-- DevLeadName.display_name as DevleadDisplayName,
-- PMLeadName.display_name as PMLeadDisplayName,
-- QALeadName.display_name as QALeadDisplayName
FROM
jiraissue j
LEFT JOIN
customfieldvalue cfv
ON
cfv.issue=j.id
AND cfv.customfield=11002
LEFT JOIN
AO_60DB71_SPRINT sprint
ON
sprint.id=cfv.stringvalue
LEFT JOIN
AO_60DB71_RAPIDVIEW rapidview
ON
sprint.RAPID_VIEW_ID=rapidview.id
LEFT JOIN
nodeassociation na
ON
j.id=na.source_node_id
AND na.association_type = ('IssueComponent')
LEFT JOIN
component c
ON
na.sink_node_id=c.id
LEFT JOIN
customfieldvalue StoryPoints
ON
j.id=StoryPoints.issue
AND StoryPoints.customfield=10572
/*
LEFT JOIN
customfieldvalue PMLead
ON
j.id=PMLead.issue
AND PMLead.customfield=10382
LEFT JOIN
customfieldvalue DevLead
ON
j.id=DevLead.issue
AND StoryPoints.customfield=10380
LEFT JOIN
customfieldvalue QaLead
ON
j.id=QaLead.issue
AND QaLead.customfield=10381
left join cwd_user DevLeadName
on DevLead.stringvalue=DevLeadName.user_name
left join cwd_user PMLeadName
on PMLead.stringvalue=PMLeadName.user_name
left join cwd_user QALeadName
on QaLead.stringvalue=QALeadName.user_name
*/
LEFT JOIN
issuetype it -- To pull in issuetype
ON
j.issuetype=it.id
LEFT JOIN
project p -- To pull in project
ON
j.project=p.id
LEFT JOIN
issuestatus iss -- To pull in Case Status
ON
j.issuestatus=iss.id
LEFT JOIN
issuelink il -- To identify linked cases
ON
j.id=il.destination
LEFT JOIN
issuelinktype linktype
ON
il.linktype=linktype.id
LEFT JOIN
jiraissue dest -- To idenfity component for the the linked case
ON
dest.id=il.source
LEFT JOIN
customfieldvalue owner -- To pull in customfields
ON
j.id=owner.issue
AND owner.customfield=10310
LEFT JOIN
customfieldoption cfoowner -- To pull in customfields
ON
cfoowner.id=owner.stringvalue
LEFT JOIN
changegroup cg -- To pull in case history to identify status changes
ON
j.id=cg.issueid
LEFT JOIN
changeitem ci
ON
cg.id=ci.groupid
AND ci.field='status'
AND ci.newstring LIKE '%Closed%'
LEFT JOIN
changegroup cglinkedissue -- To pull in case history to identify status changes
ON
dest.id=cglinkedissue.issueid
LEFT JOIN
changeitem cilinkedissue
ON
cilinkedissue.groupid=cglinkedissue.id
AND cilinkedissue.field='status'
AND cilinkedissue.newstring LIKE '%Closed%'
LEFT JOIN
customfieldvalue cfvdest
ON
cfvdest.issue=dest.id
AND cfvdest.customfield=11002
LEFT JOIN
AO_60DB71_SPRINT sprintdest
ON
sprintdest.id=cfvdest.stringvalue
-- year( FROM_UNIXTIME(sprint.END_DATE/1000) /1000)>=2015
-- or year( FROM_UNIXTIME(sprintdest.END_DATE/1000) /1000)>=2015
-- where
-- j.pkey='CLQ-41441'
group by
j.id,
c.id,il.id,sprint.id
Execution Plan
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE j ALL (null) (null) (null) (null) 891945 (null)
1 SIMPLE cfv ref cfvalue_issue cfvalue_issue 18 jira_rnd_p.j.ID,const 1 (null)
1 SIMPLE sprint eq_ref PRIMARY PRIMARY 8 jira_rnd_p.cfv.STRINGVALUE 1 Using where
1 SIMPLE rapidview eq_ref PRIMARY PRIMARY 8 jira_rnd_p.sprint.RAPID_VIEW_ID 1 (null)
1 SIMPLE na ref PRIMARY,node_source PRIMARY 8 jira_rnd_p.j.ID 1 Using where; Using index
1 SIMPLE c eq_ref PRIMARY PRIMARY 8 jira_rnd_p.na.SINK_NODE_ID 1 (null)
1 SIMPLE StoryPoints ref cfvalue_issue cfvalue_issue 18 jira_rnd_p.j.ID,const 1 (null)
1 SIMPLE it eq_ref PRIMARY PRIMARY 182 jira_rnd_p.j.issuetype 1 Using where
1 SIMPLE p eq_ref PRIMARY PRIMARY 8 jira_rnd_p.j.PROJECT 1 (null)
1 SIMPLE iss eq_ref PRIMARY PRIMARY 182 jira_rnd_p.j.issuestatus 1 Using where
1 SIMPLE il ref issuelink_dest issuelink_dest 9 jira_rnd_p.j.ID 1 (null)
1 SIMPLE linktype eq_ref PRIMARY PRIMARY 8 jira_rnd_p.il.LINKTYPE 1 (null)
1 SIMPLE dest eq_ref PRIMARY PRIMARY 8 jira_rnd_p.il.SOURCE 1 (null)
1 SIMPLE owner ref cfvalue_issue cfvalue_issue 18 jira_rnd_p.j.ID,const 1 (null)
1 SIMPLE cfoowner eq_ref PRIMARY PRIMARY 8 jira_rnd_p.owner.STRINGVALUE 1 Using where
1 SIMPLE cg ref chggroup_issue chggroup_issue 9 jira_rnd_p.j.ID 4 (null)
1 SIMPLE ci ref chgitem_chggrp,chgitem_field chgitem_chggrp 9 jira_rnd_p.cg.ID 1 Using where
1 SIMPLE cglinkedissue ref chggroup_issue chggroup_issue 9 jira_rnd_p.dest.ID 4 (null)
1 SIMPLE cilinkedissue ref chgitem_chggrp,chgitem_field chgitem_chggrp 9 jira_rnd_p.cglinkedissue.ID 1 Using where
1 SIMPLE cfvdest ref cfvalue_issue cfvalue_issue 18 jira_rnd_p.dest.ID,const 1 (null)
1 SIMPLE sprintdest eq_ref PRIMARY PRIMARY 8 jira_rnd_p.cfvdest.STRINGVALUE 1 Using where
Try EXPLAIN.
Pay attention to possible_keys, key, rows.
Maybe you can post the EXPLAIN result, and we can see what to do.
Do you need LEFT? That is, are all those other tables optional?
If you can get rid of LEFT in certain cases, you can avoid scanning all 891K rows of j.
Are you only interested in "closed" items? If so, the query does not limit itself to them, again because of LEFT.
I would start by removing LEFT wherever practical. Then move the AND clauses that are not really part of the JOIN to a WHERE on the end. This might allow the query to filter stuff sooner, rather than lugging 891K (or more) rows (including lots of NULLs) around before getting to the GROUP BY.
Related
So I am dealing with what a query with a decent amount of joins and a lot of many to main relationships.
The only tables with a one to many would be invoice, so, and xc_orders.
Each of these tables also have hundreds of thousands of rows -
invoice has 822,967 rows
invc_fee has 208,021 rows
invc_tender has 821,799 rows
customer has 377,515 rows
cust_address has 665,633
invc_item has 1,975,436 rows
invn_sbs has 122,669 rows
so has 195,169 rows
xc_orders has 267,165 rows
If I split up the query below into two separate queries based on the WHERE conditions it changes the length of time to run the queries from 56.8 seconds to 5.36 seconds for the first query and 5.32 seconds for the second query. I take it this is due to the OR clause? Was just running the queries on their own and looking at the time to run these without caching the results the most obvious way to determine if it's alright to combine the WHERE conditions? Was there something I was missing that would allow for me to speed up the results and still keep the OR conditional statements in there?Thanks for the help.
For what it's worth this was being run a MySQL 5.5 database.
SELECT SQL_NO_CACHE i.invc_no, DATE_FORMAT(i.created_date, '%Y-%m-%d') AS invcdate, IF(i.so_no LIKE '%WEB%', substring(i.so_no,5,10),i.so_no) AS so, format(SUM(it.amt),2) AS invc_amt, i.invc_type, format(ii.qty,0) as qty, isb.description1, format(ii.price,2) As price, replace(isb.dcs_code,' ','') AS dcs, isb.siz, isb.attr, trim(i.note) AS invc_note, trim(so.note) AS so_note, trim(xo.notes) AS xcart_notes, trim(xo.customer_notes) AS xcart_cust_notes
FROM rp.invoice AS i
LEFT JOIN rp.invc_fee AS ife ON i.invc_sid = ife.invc_sid
LEFT JOIN rp.invc_tender AS it ON it.invc_sid = i.invc_sid
LEFT JOIN rp.customer AS c ON i.cust_sid = c.cust_sid
LEFT JOIN rp.cust_address AS ca ON c.cust_sid = ca.cust_sid /* NEW */
LEFT JOIN rp.invc_item AS ii ON ii.invc_sid = i.invc_sid
LEFT JOIN rp.invn_sbs AS isb ON isb.item_sid = ii.item_sid
LEFT JOIN rp.so AS so ON so.so_sid = i.so_sid
LEFT JOIN dev.xc_orders AS xo ON xo.orderid = REPLACE(so.so_no,'WEB0','')
WHERE i.invc_no != '0' AND (c.email_addr = 'email#gmail.com' OR (c.first_name = 'Eric' AND c.last_name = 'MXXXX' AND ca.address1 LIKE '1234%' AND ca.zip = '12345')) AND IFNULL(ife.fee_type, 0) >= 0
GROUP BY i.invc_no, i.created_date, i.so_no, i.invc_type, ii.qty, isb.description1, ii.price, isb.dcs_code, isb.siz, isb.attr, i.note, so.note, xo.notes, xo.customer_notes, ii.item_pos, ii.item_sid
ORDER BY i.created_date desc, i.invc_no, i.invc_type
Here is the explain results
id select table type possible_keys key_len ref rows filtered Extra
1 SIMPLE i ALL INVC_NO 822967 91.92 Using where; Using temporary; Using filesort
1 SIMPLE ife ref PRIMARY 8 rp.i.INVC_SID 2080 100.00 Using where; Using index
1 SIMPLE it ref PRIMARY 8 rp.i.INVC_SID 8217 100.00
1 SIMPLE c eq_ref PRIMARY 8 rp.i.CUST_SID 1 100.00 Using where
1 SIMPLE ca ref PRIMARY 8 rp.c.CUST_SID 6656 100.00 Using where
1 SIMPLE ii ref PRIMARY 8 rp.i.INVC_SID 19754 100.00
1 SIMPLE isb ref PRIMARY 8 rp.ii.ITEM_SID 1226 100.00 Using where
1 SIMPLE so eq_ref PRIMARY 8 rp.i.SO_SID 1 100.00 Using where
1 SIMPLE xo eq_ref PRIMARY 4 func 1 100.00 Using where
To improve performance, I would suggest replacing
LEFT JOIN rp.customer AS c ON i.cust_sid = c.cust_sid
LEFT JOIN rp.cust_address AS ca ON c.cust_sid = ca.cust_sid /* NEW */
....
WHERE i.invc_no != '0' AND (c.email_addr = 'email#gmail.com' OR (c.first_name = 'Eric' AND c.last_name = 'MXXXX' AND ca.address1 LIKE '1234%' AND ca.zip = '12345')) AND IFNULL(ife.fee_type, 0) >= 0
With
LEFT JOIN
( SELECT * FROM rp.customer WHERE c.email_addr = 'email#gmail.com' OR (c.first_name = 'Eric' AND c.last_name = 'MXXXX' ) AS c ON i.cust_sid = c.cust_sid
LEFT JOIN (SELECT * FROM rp.cust_addr WHERE ca.address1 LIKE '1234%' AND ca.zip = '12345') AS ca ON c.cust_sid = ca.cust_sid /* NEW */
....
WHERE i.invc_no != '0' AND IFNULL(ife.fee_type, 0) >= 0
I have a mySQL table (myISAM) containing approximately two million rows - name, address, company data. The first name and surname are held in separate columns, so I also have a second table (linked by the primary key of the first) which holds a single full name column.
The first name, surname, and company name (among others) in the first table are indexed, as is the full name column in the secondary table.
Taking this query as a starting point:
SELECT * FROM table_a INNER JOIN table_b ON table_a.ID = table_b.ID WHERE....
searching exact match or even after-like on the name columns works in milliseconds:
....table_a.first_name = 'Fred'
....table_a.surname = 'Bloggs'
....table_b.fullname = 'Fred Bloggs'
....table_a.first_name LIKE 'Mike%'
just a few examples.
Throw the COMPANY NAME in there as well..... the query suddenly takes 15 to 20 seconds:
....table_a.first_name = 'Fred' OR table_a.company_name = 'Widgets Inc'
for example
Both fields are indexed, it's an exact match.... why would the addition of a second indexed search column slow things down so much? Have I missed something about my table design?
Examples follow - there are a few other tables joined but I'm not sure these are affecting performance:
Example of name-only query which returns in 0.0123 seconds:
SELECT SQL_CALC_FOUND_ROWS
webmaster.dupe_master_id AS webmaster_id,
webmaster.first_name,
webmaster.family_name,
webmaster.job_title,
webmaster.company_name,
webmaster.address_1,
webmaster.address_2,
webmaster.town_city,
webmaster.state_county,
webmaster.post_code,
webmaster.email,
webmaster.ignored,
countries.country_name,
GROUP_CONCAT(DISTINCT titles.code ORDER BY code ASC) AS sub_string,
'' AS expo_string
FROM
(`webmaster`)
LEFT JOIN `countries` ON `countries`.`country_id` = `webmaster`.`country_id`
LEFT JOIN `red_subscriptions` ON `red_subscriptions`.`webmaster_id` = `webmaster`.`webmaster_id` AND red_subscriptions.subscription_status_id = 2
LEFT JOIN `titles` ON `titles`.`title_id` = `red_subscriptions`.`title_id`
LEFT JOIN `webmaster_tags` ON `webmaster_tags`.`webmaster_id` = `webmaster`.`webmaster_id`
LEFT JOIN `tags` ON `tags`.`tag_id` = `webmaster_tags`.`tag_id`
INNER JOIN `webmaster_search_data` ON `webmaster`.`webmaster_id` = `webmaster_search_data`.`webmaster_id`
WHERE
(full_name = '<name>')
GROUP BY
`webmaster`.`dupe_master_id`
LIMIT 50
Add in company_name (also indexed) and the query time goes through the roof:
SELECT SQL_CALC_FOUND_ROWS
webmaster.dupe_master_id AS webmaster_id,
webmaster.first_name,
webmaster.family_name,
webmaster.job_title,
webmaster.company_name,
webmaster.address_1,
webmaster.address_2,
webmaster.town_city,
webmaster.state_county,
webmaster.post_code,
webmaster.email,
webmaster.ignored,
countries.country_name,
GROUP_CONCAT(DISTINCT titles.code ORDER BY code ASC) AS sub_string,
'' AS expo_string
FROM
(`webmaster`)
LEFT JOIN `countries` ON `countries`.`country_id` = `webmaster`.`country_id`
LEFT JOIN `red_subscriptions` ON `red_subscriptions`.`webmaster_id` = `webmaster`.`webmaster_id` AND red_subscriptions.subscription_status_id = 2
LEFT JOIN `titles` ON `titles`.`title_id` = `red_subscriptions`.`title_id`
LEFT JOIN `webmaster_tags` ON `webmaster_tags`.`webmaster_id` = `webmaster`.`webmaster_id`
LEFT JOIN `tags` ON `tags`.`tag_id` = `webmaster_tags`.`tag_id`
INNER JOIN `webmaster_search_data` ON `webmaster`.`webmaster_id` = `webmaster_search_data`.`webmaster_id`
WHERE
(full_name = '<name>' OR company_name '<name>')
GROUP BY
`webmaster`.`dupe_master_id`
LIMIT 50
EXPLAIN on full_name only:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE webmaster_search_data ref webmaster_id,full_name full_name 302 const 94 Using where; Using temporary; Using filesort
1 SIMPLE webmaster eq_ref PRIMARY PRIMARY 4 webmaster_search_data.webmaster_id 1
1 SIMPLE countries eq_ref PRIMARY PRIMARY 2 webmaster.country_id 1
1 SIMPLE red_subscriptions ref webmaster_id,subscription_status_id webmaster_id 4 webmaster_search_data.webmaster_id 1
1 SIMPLE titles eq_ref PRIMARY PRIMARY 2 red_subscriptions.title_id 1
1 SIMPLE webmaster_tags ref webmaster_id webmaster_id 4 webmaster_search_data.webmaster_id 5
1 SIMPLE tags eq_ref PRIMARY PRIMARY 2 webmaster_tags.tag_id 1 Using index
Explain when company_name is added:
1 SIMPLE webmaster index PRIMARY,company_name dupe_master_id 4 NULL 2072015 Using filesort
1 SIMPLE countries eq_ref PRIMARY PRIMARY 2 webmaster.country_id 1
1 SIMPLE red_subscriptions ref webmaster_id,subscription_status_id webmaster_id 4 webmaster.webmaster_id 1
1 SIMPLE titles eq_ref PRIMARY PRIMARY 2 red_subscriptions.title_id 1
1 SIMPLE webmaster_tags ref webmaster_id webmaster_id 4 webmaster.webmaster_id 5
1 SIMPLE tags eq_ref PRIMARY PRIMARY 2 webmaster_tags.tag_id 1 Using index
1 SIMPLE webmaster_search_data eq_ref webmaster_id,full_name webmaster_id 4 webmaster.webmaster_id 1 Using where
MySQL cannot use two indexes at once. When you throw in the company name, MySQL cannot use the index on Firstname, Lastname anymore because now there are more columns it has to check to get an exact result.
It is probably doing a full table scan.
You could split your queries up by doing a Union, that way you can use both columns with the index.
SELECT * FROM
( SELECT * FROM table_a
INNER JOIN table_b ON table_a.ID = table_b.ID
WHERE table_a.first_name = 'Fred'
UNION
SELECT * FROM table_a
INNER JOIN table_b ON table_a.ID = table_b.ID
WHERE table_a.company_name = 'Widgets Inc'
) sub;
Each query should be evaluated separately and use the adequate index. THe UNION will take care of doubles, so you will in the end have the same result.
I have this complex query which produces 3744 rows in about 50ms.
SELECT
srl.event_id as eid
, srl.race_num as rnum
, bts.boat_id as bid_id
, srl.series_year as yr
, srl.id as id
, IFNULL(rfi.fleet,fleet_def) as flt_old,flt_match,s.series_id as sid
, s.series_year as syr
,IFNULL(ovr_pts,POINTS('4',IFNULL(ovr_place,place),num_start)) as points
FROM
(SELECT en1.boat_id,en1.boat_name,MAX(fleet) as fleet_def FROM entries en1
JOIN series_race_list srl1 ON srl1.event_id=en1.event_id
AND srl1.series_year=en1.race_year
LEFT JOIN entries_race er1 ON en1.boat_id= er1.boat_id
AND srl1.event_id=en1.event_id
AND srl1.series_year =en1.race_year
WHERE srl1.series_id ='3' AND srl1.series_year ='2012'
AND en1.entry_deleted='N'
GROUP BY boat_id) bts
JOIN series_race_list srl LEFT JOIN series as s ON s.series_id=srl.series_id
AND s.series_year =srl.series_year
LEFT JOIN entries as en ON srl.event_id=en.event_id
AND srl.series_year =en.race_year AND bts.boat_id =en.boat_id
LEFT JOIN entries_race er ON er.race_id= srl.event_id AND er.race_num=srl.race_num
AND er.yr = srl.series_year AND bts.boat_id =er.boat_id
LEFT JOIN event_race_info as eri ON eri.race_id= srl.event_id
AND eri.race_num=srl.race_num AND eri.yr = srl.series_year
ANd er.line=eri.line AND status REGEXP 'prelim|final'
LEFT JOIN race_results as rr ON srl.event_id=rr.race_id
AND srl.race_num= rr.race_num AND srl.series_year =rr.yr
AND bts.boat_id= rr.boat_id AND checked_in='Y'
LEFT JOIN race_fleet_info as rfi ON rfi.race_id= srl.event_id
AND rfi.yr=srl.series_year AND srl.race_num= rfi.race_num
AND rfi.fleet=rr.flt AND complete='Y'
LEFT JOIN series_pts_override as spo ON srl.id =spo.id AND en.boat_id =spo.bid
WHERE s.series_id ='3' AND s.series_year ='2012' AND approved ='Y'
Sorry for the length. As I said this query executes in around 50ms. Now I want to use this data and perform queries on this 3744 row result. As soon I as wrap this with a query like
SELECT eid FROM(
......previous query here.....
) data
The execution time goes from 50 ms to 2.5 sec Ouch!
I tried creating a temporary table, That was the same. (Actually this is my preferred approach since I will need to do a few different queries on this results set.
Reading on this site I don't think this is a correlated sub query but is seems to be acting like one.
Seems like the act of creating a alias table is my issues, since the sub query has a derived table alias and the temp table is obviously a table.
How can I get access to these 3744 rows of data with out this time penalty?
If it would help I can figure out how to post the Explains.
Explain for the longer query:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 3744
2 DERIVED s const PRIMARY PRIMARY 5 1
2 DERIVED srl ref series_id,series_id_2 series_id 5 16 Using where
2 DERIVED <derived3> ALL NULL NULL NULL NULL 208 Using join buffer
2 DERIVED en eq_ref PRIMARY,event_id,event_id_2 PRIMARY 9 race_reg_test.srl.event_id,bts.boat_id 1 Using index
2 DERIVED er ref PRIMARY,boat_id,boat_id_2 boat_id_2 5 bts.boat_id 5
2 DERIVED eri eq_ref PRIMARY PRIMARY 13 race_reg_test.srl.event_id,race_reg_test.srl.race_... 1
2 DERIVED rr ref PRIMARY,boat_id boat_id 4 bts.boat_id 9
2 DERIVED rfi eq_ref PRIMARY PRIMARY 31 race_reg_test.srl.event_id,race_reg_test.srl.race_... 1
2 DERIVED spo ref PRIMARY PRIMARY 8 race_reg_test.srl.id,race_reg_test.en.boat_id 1
3 DERIVED srl1 ref series_id,series_id_2 series_id 5 16 Using index; Using temporary; Using filesort
3 DERIVED en1 ref PRIMARY,event_id,event_id_2 PRIMARY 5 race_reg_test.srl1.event_id 11 Using where
3 DERIVED er1 ref boat_id,boat_id_2 boat_id 4 race_reg_test.en1.boat_id 9 Using index
You said you tried creating a temporary table, I am not sure if by that you mean a View or not.
I would create a View with that query and then perform any queries necessary on the View.
CREATE VIEW massive_query_view AS
SELECT
srl.event_id as eid
, srl.race_num as rnum
, bts.boat_id as bid_id
, srl.series_year as yr
, srl.id as id
, IFNULL(rfi.fleet,fleet_def) as flt_old,flt_match,s.series_id as sid
, s.series_year as syr
,IFNULL(ovr_pts,POINTS('4',IFNULL(ovr_place,place),num_start)) as points
FROM
(SELECT en1.boat_id,en1.boat_name,MAX(fleet) as fleet_def FROM entries en1
JOIN series_race_list srl1 ON srl1.event_id=en1.event_id
AND srl1.series_year=en1.race_year
LEFT JOIN entries_race er1 ON en1.boat_id= er1.boat_id
AND srl1.event_id=en1.event_id
AND srl1.series_year =en1.race_year
WHERE srl1.series_id ='3' AND srl1.series_year ='2012'
AND en1.entry_deleted='N'
GROUP BY boat_id) bts
JOIN series_race_list srl LEFT JOIN series as s ON s.series_id=srl.series_id
AND s.series_year =srl.series_year
LEFT JOIN entries as en ON srl.event_id=en.event_id
AND srl.series_year =en.race_year AND bts.boat_id =en.boat_id
LEFT JOIN entries_race er ON er.race_id= srl.event_id AND er.race_num=srl.race_num
AND er.yr = srl.series_year AND bts.boat_id =er.boat_id
LEFT JOIN event_race_info as eri ON eri.race_id= srl.event_id
AND eri.race_num=srl.race_num AND eri.yr = srl.series_year
ANd er.line=eri.line AND status REGEXP 'prelim|final'
LEFT JOIN race_results as rr ON srl.event_id=rr.race_id
AND srl.race_num= rr.race_num AND srl.series_year =rr.yr
AND bts.boat_id= rr.boat_id AND checked_in='Y'
LEFT JOIN race_fleet_info as rfi ON rfi.race_id= srl.event_id
AND rfi.yr=srl.series_year AND srl.race_num= rfi.race_num
AND rfi.fleet=rr.flt AND complete='Y'
LEFT JOIN series_pts_override as spo ON srl.id =spo.id AND en.boat_id =spo.bid
WHERE s.series_id ='3' AND s.series_year ='2012' AND approved ='Y'
Then, you can perform queries on the View.
SELECT * FROM massive_query_view;
Hope that speeds things up. Another thing you can do is check your indexes. Indexes make where clauses faster but inserts slower. For more information, view the MySQL documentation on how MySQL uses indexes: http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html.
A few things, but the biggest one I see is in your original query... at the point of
boat_id) bts
JOIN series_race_list srl
LEFT JOIN series as s
You have no "ON" condition between bts and srl which will result in a Cartesian result and probably a big killer to you. For every record in bts, its creating an entry in srl, then from that product joining to series. From srl to series is ok as it is joined on apparent valid criteria / keys.
Next, you have a few fields that are not alias.field, such as max(fleet) in inner-most query that aliases out to "bts". In addition, why the MAX(fleet) if its grouped by the boat ID which I would interpret as a primary key and would be unique... would a boat ever change it's fleet? If so, is this accurate? If you have a table of fleets (also having its own auto-sequence ID), and a boat changes ownership/sponsor ship (whatever) to a pre-existing fleet from say... fleet 93 to a new who already had an ID on file of 47 where even though 47 was the newest relationship, but and older pre-existing ID... is that what you really want? MAX()?
Additional fields for no alias.field: ovr_pts and ovr_place, place in the field list (and what is the POINTS() function... status at the regular expression, checked_in at race results, and complete at race fleet info, and finally approved in the final where clause. Minor, but could be helpful for index optimizing.
Lastly, your query has the WHERE clause on specific "s.series_id... and s.series_year..." yet you have a LEFT-JOIN earlier in the query. This basically cancels out the left-join component of it and turns it into an implied INNER JOIN since you are not allowing for NULL as a valid option of inclusion.
After some clarification, I might even suggest altering the query around some, but the biggest thing I see was from the start... no "ON" condition joining bts and the series_rate_list table.
I am writing a PHP script that creates an SQL query. This script and database is for the Joomla CMS, and specifically it's querying the SOBIPro component's tables (to use the data entered there in this component). However, due to the way that the SOBI Pro tables are handled, with each instance of a field being its own row in a table, this means including a separate instance of the table for every field I want to pull back. This doesn't seem very efficient, and in fact in this one search it times out.
The SQL query is as follows (this is after being generated by my PHP code):
SELECT DISTINCT o.id AS entryid, o.parent AS parentID, name.baseData AS title,business.baseData AS business_data,
contact_fn.baseData AS contact_fn_data ,contact_ln.baseData AS contact_ln_data ,position.baseData AS position_data,
civic1.baseData AS civic1_data ,civic2.baseData AS civic2_data ,mailing.baseData AS mailing_data,
community.baseData AS community_data ,municip.baseData AS municip_data ,county.baseData AS county_data,
province.baseData AS province_data ,country.baseData AS country_data ,postal.baseData AS descr_data,
phone.baseData AS phone_data ,tollfree.baseData AS tollfree_data ,fax.baseData AS fax_data,
email.baseData AS email_data ,web.baseData AS web_data ,empTotal.baseData AS empTotal_data
FROM jos_sobipro_object AS o
INNER JOIN jos_sobipro_field_data AS name ON name.sid = o.id
INNER JOIN jos_sobipro_relations AS r ON o.id = r.id
LEFT JOIN jos_sobipro_field_data AS business ON business.sid = o.id AND business.fid = 36
LEFT JOIN jos_sobipro_field_data AS contact_fn ON contact_fn.sid = o.id AND contact_fn.fid = 74
LEFT JOIN jos_sobipro_field_data AS contact_ln ON contact_ln.sid = o.id AND contact_ln.fid = 75
LEFT JOIN jos_sobipro_field_data AS position ON position.sid = o.id AND position.fid = 76
LEFT JOIN jos_sobipro_field_data AS civic1 ON civic1.sid = o.id AND civic1.fid = 77
LEFT JOIN jos_sobipro_field_data AS civic2 ON civic2.sid = o.id AND civic2.fid = 78
LEFT JOIN jos_sobipro_field_data AS mailing ON mailing.sid = o.id AND mailing.fid = 79
LEFT JOIN jos_sobipro_field_data AS community ON community.sid = o.id AND community.fid = 80
LEFT JOIN jos_sobipro_field_data AS municip ON municip.sid = o.id AND municip.fid = 81
LEFT JOIN jos_sobipro_field_data AS county ON county.sid = o.id AND county.fid = 82
LEFT JOIN jos_sobipro_field_data AS province ON province.sid = o.id AND province.fid = 83
LEFT JOIN jos_sobipro_field_data AS country ON country.sid = o.id AND country.fid = 84
LEFT JOIN jos_sobipro_field_data AS postal ON postal.sid = o.id AND postal.fid = 85
LEFT JOIN jos_sobipro_field_data AS phone ON phone.sid = o.id AND phone.fid = 86
LEFT JOIN jos_sobipro_field_data AS tollfree ON tollfree.sid = o.id AND tollfree.fid = 87
LEFT JOIN jos_sobipro_field_data AS fax ON fax.sid = o.id AND fax.fid = 88
LEFT JOIN jos_sobipro_field_data AS email ON email.sid = o.id AND email.fid = 89
LEFT JOIN jos_sobipro_field_data AS web ON web.sid = o.id AND web.fid = 90
LEFT JOIN jos_sobipro_field_data AS empTotal ON empTotal.sid = o.id AND empTotal.fid = 106
WHERE o.approved = 1 AND o.oType = 'entry' AND name.fid = 36 AND name.baseData <> ''
AND name.section = 54 AND r.pid IN (415,418,425,431,458) AND (municip.baseData = "Municipality Name")
ORDER BY name.baseData ASC
It seems to work decently fast as long as the municip.baseData search isn't involved, in which case it flops even at 15 entries in the directory. There has to be a better way to get this SQL code designed, while still bringing back all of the fields needed. This query is called via AJAX, and eventually there will be 2000+ entries in the directory.
EDIT: Here is the EXPLAIN output, as requested:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE name ref PRIMARY PRIMARY 8 const,const 15 Using where; Using temporary; Using filesort
1 SIMPLE municip ref PRIMARY PRIMARY 4 const 9 Using where
1 SIMPLE o eq_ref PRIMARY,oType PRIMARY 4 [[dbname]].municip.sid 1 Using where
1 SIMPLE county ref PRIMARY PRIMARY 4 const 10
1 SIMPLE province ref PRIMARY PRIMARY 4 const 10
1 SIMPLE country ref PRIMARY PRIMARY 4 const 8
1 SIMPLE postal ref PRIMARY PRIMARY 4 const 9
1 SIMPLE business ref PRIMARY PRIMARY 4 const 15
1 SIMPLE contact_fn ref PRIMARY PRIMARY 4 const 9
1 SIMPLE contact_ln ref PRIMARY PRIMARY 4 const 9
1 SIMPLE position ref PRIMARY PRIMARY 4 const 9
1 SIMPLE civic1 ref PRIMARY PRIMARY 4 const 10
1 SIMPLE civic2 ref PRIMARY PRIMARY 4 const 9
1 SIMPLE phone ref PRIMARY PRIMARY 4 const 10
1 SIMPLE tollfree ref PRIMARY PRIMARY 4 const 9
1 SIMPLE fax ref PRIMARY PRIMARY 4 const 10
1 SIMPLE email ref PRIMARY PRIMARY 4 const 9
1 SIMPLE mailing ref PRIMARY PRIMARY 4 const 11
1 SIMPLE community ref PRIMARY PRIMARY 4 const 9
1 SIMPLE web ref PRIMARY PRIMARY 4 const 10
1 SIMPLE empTotal ref PRIMARY PRIMARY 4 const 10
1 SIMPLE r ref PRIMARY PRIMARY 4 [[dbname]].name.sid 3 Using where; Using index; Distinct
Many times, when you have an overly extended JOIN/JOIN/JOIN/etc as you have, the SQL engine will get hung on itself trying to find small result sets and backfil the linking in a less efficient manner. Your query LOOKS good.
Your PRIMARY table (FROM jos_sobipro_object AS o) is really the KEY driving element to the query. Try adding the "STRAIGHT_JOIN" special keyword with MySQL..
SELECT STRAIGHT_JOIN DISTINCT ... rest of query ...
STRAIGHT_JOIN tells the optimizer to just DO the query in the order I've listed here. Then it will work faster KNOWING the first table is the primary for querying the data.
That said, and not exactly seeing index info, I would SPECIFICALLY have an index on jos_sobipro_field_data to GET the "lookup" data by (SID, FID).
I've had to do similar approach with govt data of 14+ million records in main table and joining to 22+ lookup tables. MySQL would hang after 30+ hours. By adding STRAIGHT_JOIN, the query finished in about 3 hours (as expected by what it was doing).
I'm trying to run a query which is taking 5 seconds to execute with 100000 rows. The query is given below. I've tried all possible indexes i could. Please suggest me what am i missing.
select distinct db_books.bookid as id
, request_type.name as book_type
, request_type.id as book_type_id
, db_books.subject as subject
, sender_user.uid as sender_user_id
, sender_user.username as sender_user
, sender_company.companyid as sender_company_id
, sender_company.companyname as sender_company
, sender_team_id.teamid as sender_team_id
, sender_team_id.name as sender_team
, GROUP_CONCAT(distinct receiver_user_details.uid separator '|') as receiver_user_id
, GROUP_CONCAT(distinct receiver_user_details.username separator '|') as receiver_user
, GROUP_CONCAT(distinct receiver_company.companyid separator '|') as receiver_company_id
, GROUP_CONCAT(distinct receiver_company.companyname separator '|') as receiver_company
, GROUP_CONCAT(distinct receiver_team_details.teamid separator '|') as receiver_team_id
, GROUP_CONCAT(distinct receiver_team_details.name separator '|') as receiver_team
, status.id as statusid
, status.name as status
, db_books.modifydate as modified_date
, db_books.createddate as creation_date
, state.id as stateid
, state.name as state
, assignee.uid as assignee_user_id
, assignee.username as assignee_user
, purpose.name as purpose
, purpose.id as purposeid
, g.name as entityname
, g.entityid as entityid
from db_books db_books
inner join db_users sender_user on (sender_user.deleted=0 and sender_user.uid=db_books.sndrUserid)
inner join db_companies sender_company on (sender_company.deleted=0 and sender_company.companyid=db_books.sndrCompanyid)
inner join db_companies receiver_company on (receiver_company.deleted=0 and receiver_company.companyid=db_books.target_company_id)
inner join db_request_types request_type on (request_type.id=db_books.book_type_id)
left outer join db_teams sender_team_id on (sender_team_id.deleted=0 and sender_team_id.teamid=db_books.sender_team_id)
left outer join db_books_to_users receiver_user on (receiver_user.bookid=db_books.bookid)
left outer join db_users receiver_user_details on (receiver_user_details.uid=receiver_user.userid)
left outer join db_books_to_teams receiver_teams on (receiver_teams.bookid=db_books.bookid)
left outer join db_teams receiver_team_details on (receiver_team_details.teamid=receiver_teams.teamid)
left outer join db_request_status status on (status.id=db_books.statusid)
left outer join db_request_state_types state on (state.id=db_books.request_state_id)
left outer join db_request_purpose purpose on (purpose.id=db_books.request_purpose_id)
left outer join db_users assignee on (assignee.uid=db_books.assignee)
left outer join db_books_details mdtl on (mdtl.deleted=0 and mdtl.bookid=db_books.bookid)
left outer join db_entities g on (g.deleted=0 and g.entityid=mdtl.entityid)
where 1=1
and
(db_books.sndrUserid=25000000003265
or db_books.sender_team_id in (
select a.teamid from db_team_users a
inner join db_teams b on (b.teamid=a.teamid and b.deleted=0)
where a.userid=25000000003265
)
or db_books.bookid in (
select distinct bookid from db_books_to_users where userid=25000000003265
union
select distinct bookid from db_books_to_teams where teamid in
(
select a.teamid from db_team_users a
inner join db_teams b on (b.teamid=a.teamid and b.deleted=0)
where a.deleted=0 AND a.userid=25000000003265
)
)
)
group by db_books.bookid
limit 20
The explain plan is as given below.
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY sender_user ALL PRIMARY,u2 14573 Using where; Using temporary; Using filesort
1 PRIMARY db_books ref i_db_books_target_company_id,i_db_books_sndrUserid,i_db_books_sndrCompanyid,i_sndrUserid_sender_team_idbookid i_db_books_sndrUserid 7 mde_staging.sender_user.uid 41 Using where
1 PRIMARY sender_company eq_ref PRIMARY,db_companies_icd PRIMARY 7 mde_staging.db_books.sndrCompanyid 1 Using where
1 PRIMARY receiver_company eq_ref PRIMARY,db_companies_icd PRIMARY 7 mde_staging.db_books.target_company_id 1 Using where
1 PRIMARY sender_team_id eq_ref PRIMARY,db_teams_i PRIMARY 7 mde_staging.db_books.sender_team_id 1
1 PRIMARY receiver_user ref i_db_books_to_users_bookid i_db_books_to_users_bookid 7 mde_staging.db_books.bookid 1
1 PRIMARY receiver_user_details eq_ref PRIMARY,u2 PRIMARY 7 mde_staging.receiver_user.userid 1
1 PRIMARY receiver_teams ref i_db_books_to_teams_bookid i_db_books_to_teams_bookid 7 mde_staging.db_books.bookid 1
1 PRIMARY receiver_team_details eq_ref PRIMARY,db_teams_i PRIMARY 7 mde_staging.receiver_teams.teamid 1
1 PRIMARY status eq_ref PRIMARY PRIMARY 4 mde_staging.db_books.statusid 1
1 PRIMARY state eq_ref PRIMARY PRIMARY 4 mde_staging.db_books.request_state_id 1
1 PRIMARY purpose eq_ref PRIMARY PRIMARY 4 mde_staging.db_books.request_purpose_id 1
1 PRIMARY assignee eq_ref PRIMARY,u2 PRIMARY 7 mde_staging.db_books.assignee 1
1 PRIMARY mdtl ref db_books_details_bookid db_books_details_bookid 7 mde_staging.db_books.bookid 1
1 PRIMARY request_type ALL PRIMARY 4 Using where; Using join buffer
1 PRIMARY g eq_ref PRIMARY,db_entities7 PRIMARY 7 mde_staging.mdtl.entityid 1
3 DEPENDENT SUBQUERY db_books_to_users ref i_db_books_to_users_bookid i_db_books_to_users_bookid 7 func 1 Using where; Using temporary
4 DEPENDENT UNION db_books_to_teams ref i_db_books_to_teams_bookid i_db_books_to_teams_bookid 7 func 1 Using where; Using temporary
5 DEPENDENT SUBQUERY b eq_ref PRIMARY,db_teams_i PRIMARY 7 func 1 Using where
5 DEPENDENT SUBQUERY a ref db_team_users_i db_team_users_i 11 func,const 1 Using where
UNION RESULT <union3,4> ALL
2 DEPENDENT SUBQUERY b eq_ref PRIMARY,db_teams_i PRIMARY 7 func 1 Using where
2 DEPENDENT SUBQUERY a ref db_team_users_i db_team_users_i 7 func 1 Using where
If you see the first row of the explain plan, it is not using the possible index and then using file sort etc. Not sure if that is the problem. Please suggest me how to fix this or me what indexes to use??
The biggest problem I see is the subquery qualifiers. Those hit per every row tested. I would then change the WHERE clause portion to just a prequery as the first table and get those resulting books and join to books, then the rest should be fine. In addition, the clause "STRAIGHT_JOIN" tells the engine to do the query in the order you've said. Sometimes, it gets to ahead of you and tries to optimize based on one of the "lookup" reference tables and back-fill find the rest. All that said,
CHANGE the SELECT at the top to
select STRAIGHT_JOIN distinct
and then your from clause from
from
db_books db_books
to
from
( SELECT distinct db.bookid
from
db_books db
left join db_team_users TeamA
ON db.sndrUserID = TeamA.userID
AND db.Sender_Team_ID = TeamA.TeamID
LEFT JOIN db_teams TeamB
ON TeamA.TeamID = TeamB.TeamID
AND TeamB.Deleted = 0
left join db_books_to_users ToUser
ON db.BookID = ToUser.BookID
AND db.sndrUserID = ToUser.userID
left join db_books_to_teams ToTeamA
ON db.TeamID = ToTeamA.TeamID
AND db.sndrUserID = ToTeamA.UserID
AND a.Deleted = 0
left join db_teams ToTeamsB
ON ToTeamA.TeamID = ToTeamB.TeamID
AND b.Deleted = 0
where
db.sndrUserID = 25000000003265
OR NOT TeamB.TeamID IS NULL
OR NOT ToUser.BookID IS NULL
OR NOT ToTeamB.TeamID IS NULL
limit
20 ) PreQualBooks
JOIN db_books
ON PreQualBooks.BookID = db_Books.BookID
And you can remove the Final WHERE clause as this prequery will be done ONCE up front to pre-qualify every POSSIBLE book ID based on user or team relationship with JOINs. By allowing LEFT JOIN, the books table goes through ONCE, with all the respective relationships to team / user status and will only return those records based on the send user OR the lowest level of the respective LEFT JOINs (TeamB, ToUser and ToTeamB). This prequery also applies the limit to 20 books, so the LIMIT clause at the end of your query is not needed either as only 20 books will ever be POSSIBLE.
Leave your Outer GROUP BY due to your group_concat.