I'd need to optimize the following query which takes up to 10 minutes to run.
Performing the explain it seems to be running on all 350815 rows of the "table_3" table and 1 for all the others.
General rules to place indexes the propper way? Should I think about using multidimensional indexes? Where should I use them at first on the JOINS, the WHERE or the GROUP BY, if I remember right there should be a hierarchy to follow. Also If I have 1 row for all tables but one (in the row column of the explain table) how can I optimize usually my optimization consists in ending up with only one row for all columns but one.
All tables average from 100k to 1000k+ rows.
CREATE TABLE datab1.sku_performance
SELECT
table1.sku,
CONCAT(table1.sku,' ',table1.fk_container ) as sku_container,
table1.price as price,
SUM( CASE WHEN ( table1.fk_table1_status = 82
OR table1.fk_table1_status = 119
OR table1.fk_table1_status = 124
OR table1.fk_table1_status = 141
OR table1.fk_table1_status = 131) THEN 1 ELSE 0 END)
/ COUNT( DISTINCT id_catalog_school_class) as qty_returned,
SUM( CASE WHEN ( table1.fk_table1_status In (23,13,44,65,6,75,8,171,12,166))
THEN 1 ELSE 0 END)
/ COUNT( DISTINCT id_catalog_school_class) as qt,
container.id_container as container_id,
container.idden as container_idden,
container.delivery_badge,
catalog_school.id_catalog_school,
LEFT(catalog_school.flight_fair,2) as departing_country,
catalog_school.weight,
catalog_school.flight_type,
catalog_school.price,
table_3.id_table_3,
table_3.fk_catalog_brand,
MAX( LEFT( table_3.note,3 )) AS supplier,
GROUP_CONCAT( product_number, ' by ',FORMAT(catalog_school_class.quantity,0)
ORDER BY product_number ASC SEPARATOR ' + ') as supplier_prod,
Sum( distinct( catalog_school_class.purch_pri * catalog_school_class.quantity)) AS final_purch_pri,
catalog_groupp.idden as supplier_idden,
catalog_category_details.id_catalog_category,
catalog_category_details.cat1 as product_cat1,
catalog_category_details.cat2 as product_cat2,
COUNT( distinct catalog_school_class.id_catalog_school_class) as setinfo,
datab1.pageviewgrouped.pv as page_views,
Sum(distinct(catalog_school_class.purch_pri * catalog_school_class.quantity)) AS purch_pri,
container_has_table_3.position,
max( table1.created_at ) as last_order_date
FROM
table1
LEFT JOIN container
ON table1.fk_container = container.id_container
LEFT JOIN catalog_school
ON table1.sku = catalog_school.sku
LEFT JOIN table_3
ON catalog_school.fk_table_3 = table_3.id_table_3
LEFT JOIN container_has_table_3
ON table_3.id_table_3 = container_has_table_3.fk_table_3
LEFT JOIN datab1.pageviewgrouped
on table_3.id_table_3 = datab1.pageviewgrouped.url
LEFT JOIN datab1.catalog_category_details
ON datab1.catalog_category_details.id_catalog_category = table_3_has_catalog_minority.fk_catalog_category
LEFT JOIN catalog_groupp
ON table_3.fk_catalog_groupp = catalog_groupp.id_catalog_groupp
LEFT JOIN table_3_has_catalog_minority
ON table_3.id_table_3 = table_3_has_catalog_minority.fk_table_3
LEFT JOIN catalog_school_class
ON catalog_school.id_catalog_school = catalog_school_class.fk_catalog_school
WHERE
table_3.status_ok = 1
AND catalog_school.status = 'active'
AND table_3_has_catalog_minority.is_primary = '1'
GROUP BY
table1.sku,
table1.fk_container;
rows per table :
.table1 960096 to 1.3mn rows
.container 9275 to 13000 rows
.catalog_school 709970 to 1 mn rows
.table_3 709970 to 1 mn rows
.container_has_table_3 709970 to 1 mn rows
.pageviewgrouped 500000 rows
.catalog_school_class 709970 to 1 mn rows
.catalog_groupp 3000 rows
.table_3_has_catalog_minority 709970 to 1 mn rows
.catalog_category_details 659 rows
Too much to put into a single comment, so I'll add here and adjust later as possibly needed... You have LEFT JOINs everywhere, but your WHERE clause is specifically qualifying fields from the Table_3, Catalog_School and Table_3_has_catalog_minority. This by default changes them to INNER JOINs.
With respect to your where clause
WHERE
table_3.status_ok = 1
AND catalog_school.status = 'active'
AND table_3_has_catalog_minority.is_primary = '1'
Which table / column would have the smallest results based on these criteria. ex: Table_3.Status_ok = 1 might have 500k records but table_3_has_catalog_minority.is_primary may only have 65k and catalog_school.status = 'active' may have 430k.
Also, some of your columns are not qualified with the table they are coming from. Can you please confirm... such as "id_catalog_school_class" and "product_number"
SOMETIMES, changing the order of the tables, with good knowledge of the makeup of the data and in MySQL adding a "STRAIGHT_JOIN" keyword can improve performance. This was something I've had in the past working with gov't database of contracts and grants with 20+ million records and joining to about 15+ lookup tables. It went from hanging the server to getting the query finished in less than 2 hrs. Considering the amount of data I was dealing with, that was actually a good time.
AFTER dissecting this thing some, I restructured a bit more for readability, added aliases for table references and changed the order of the query and have some suggested indexes. To help the query, I tried moving the Catalog_School table to the first position and added the STRAIGHT_JOIN. The index is based on the STATUS first to match the WHERE clause, THEN I included the SKU as it is first element of the GROUP BY, then the other columns used to join to the subsequent tables. By having these columns in the index, it can qualify the joins without having to go to the raw data.
By changing the group by to the Catalog_School.SKU instead of table_1.SKU the index from catalog_school can be used to help optimize that. It is the same value since the join from the catalog_school.sku = table_1.sku. I also added index references for table_1 and table_3 that are suggestions -- again, to preemptively qualify the joins without going to the raw data pages of the tables.
I would be interested in knowing the final performance (better or worse) from your data.
TABLE INDEX ON...
catalog_school ( status, sku, fk_table_3, id_catalog_school )
table_1 ( sku, fk_container )
table_3 ( id_table_3, status_ok, fk_catalog_groupp )
SELECT STRAIGHT_JOIN
CS.sku,
CONCAT(CS.sku,' ',T1.fk_container ) as sku_container,
T1.price as price,
SUM( CASE WHEN ( T1.fk_table1_status IN ( 82, 119, 124, 141, 131)
THEN 1 ELSE 0 END)
/ COUNT( DISTINCT CSC.id_catalog_school_class) as qty_returned,
SUM( CASE WHEN ( T1.fk_table1_status In (23,13,44,65,6,75,8,171,12,166))
THEN 1 ELSE 0 END)
/ COUNT( DISTINCT CSC.id_catalog_school_class) as qt,
CS.id_catalog_school,
LEFT(CS.flight_fair,2) as departing_country,
CS.weight,
CS.flight_type,
CS.price,
T3.id_table_3,
T3.fk_catalog_brand,
MAX( LEFT( T3.note,3 )) AS supplier,
C.id_container as container_id,
C.idden as container_idden,
C.delivery_badge,
GROUP_CONCAT( product_number, ' by ',FORMAT(CSC.quantity,0)
ORDER BY product_number ASC SEPARATOR ' + ') as supplier_prod,
Sum( distinct( CSC.purch_pri * CSC.quantity)) AS final_purch_pri,
CGP.idden as supplier_idden,
CCD.id_catalog_category,
CCD.cat1 as product_cat1,
CCD.cat2 as product_cat2,
COUNT( distinct CSC.id_catalog_school_class) as setinfo,
PVG.pv as page_views,
Sum(distinct(CSC.purch_pri * CSC.quantity)) AS purch_pri,
CHT3.position,
max( T1.created_at ) as last_order_date
FROM
catalog_school CS
JOIN table1 T1
ON CS.sku = T1.sku
LEFT JOIN container C
ON T1.fk_container = C.id_container
LEFT JOIN catalog_school_class CSC
ON CS.id_catalog_school = CSC.fk_catalog_school
JOIN table_3 T3
ON CS.fk_table_3 = T3.id_table_3
JOIN table_3_has_catalog_minority T3HCM
ON T3.id_table_3 = T3HCM.fk_table_3
LEFT JOIN datab1.catalog_category_details CCD
ON T3HCM.fk_catalog_category = CCD.id_catalog_category
LEFT JOIN container_has_table_3 CHT3
ON T3.id_table_3 = CHT3.fk_table_3
LEFT JOIN datab1.pageviewgrouped PVG
on T3.id_table_3 = PVG.url
LEFT JOIN catalog_groupp CGP
ON T3.fk_catalog_groupp = CGP.id_catalog_groupp
WHERE
CS.status = 'active'
AND T3.status_ok = 1
AND T3HCM.is_primary = '1'
GROUP BY
CS.sku,
T1.fk_container;
Related
I would like to find out what causes the slow execution of my MySQL query here. It's 1 row fetched in 0.0017s (3.6215s) How can i optimize this?
SELECT hadmlog.hpercode as 'HOSPITAL NUMBER', FLOOR (hadmlog.patage) as 'AGE', CONCAT (hperson.patlast, ', ', hperson.patfirst, ' ',hperson.patmiddle) as 'PROFILE', hcity.ctyname as 'DISTRICT', hadmlog.disdate as 'DISCHARGED DATE'
FROM hadmlog
INNER JOIN hperson ON hadmlog.hpercode=hperson.hpercode
INNER JOIN haddr ON hadmlog.hpercode=haddr.hpercode
INNER JOIN hcity ON haddr.ctycode=hcity.ctycode
WHERE hadmlog.patage BETWEEN '1' AND '4'
AND hperson.patsex = 'M'
AND DATE(hadmlog.disdate) = DATE(curdate())
AND haddr.haddrdte = ( select max(haddrdte)
from haddr
where haddr.hpercode = hperson.hpercode )
ORDER BY Profile;
First, fix the query:
SELECT al.hpercode as HOSPITAL_NUMBER, FLOOR(al.patage) as AGE,
CONCAT_WS(' ', p.patlast, p.patfirst, p.patmiddle) as PROFILE,
c.ctyname as DISTRICT, al.disdate as DISCHARGED_DATE
FROM hadmlog al JOIN
hperson p
ON al.hpercode = p.hpercode JOIN
haddr a
ON al.hpercode = a.hpercode JOIN
hcity c
ON a.ctycode = c.ctycode
WHERE al.patage BETWEEN 1 AND 4 AND
p.patsex = 'M' AND
al.disdate >= curdate() AND
al.disdate < curdate() + INTERVAL 1 day AND
a.haddrdte = (select max(h2.haddrdte)
from haddr h2
where h2.hpercode = p.hpercode
)
ORDER BY Profile;
Some of the changes are cosmetic (such as the table aliases and CONCAT_WS()). The more relevant changes are:
patage appears to be a number, so make the comparisons numbers. Strings can impede the optimizer.
date(disdate) can also impede the optimizer.
Then, for this query, I am guessing that you want an indexes:
hadmlog(disdate, patage)
And then you want indexes on the JOIN keys used for the other tables.
Please help to sort out this isse
I have database in which i have 150,000 business records, each business record has its own business category (e.g: Bars, Pubs, Restaurant).
I am using this SQl to get Categories listing based on the visitor's location.
SELECT
ROUND(6371*acos(cos(radians('52.28231599999999'))*cos(radians(bizprof.vLatitude))*cos(radians(bizprof.vLongitude)-radians('-1.584927'))+sin(radians('52.28231599999999'))*sin(radians(bizprof.vLatitude))),2) AS distance,
`bizcat`.`vCategoryName`,
`bizcat`.`iCategoryId` FROM `business_profile` `bizprof`
LEFT JOIN `users` `u` ON u.iUserId = bizprof.iUserId
AND u.tiIsProfileSet = 1
AND u.tiIsActive = 1
AND u.tiIsDeleted = 0
LEFT JOIN `business_categories` `bizcat` ON bizcat.iCategoryId = bizprof.iCategoryId
GROUP BY `bizcat`.`iCategoryId`
HAVING distance >= 0 AND distance <= 10
This query taking too much time to render the data from the records.
Any idea on this ?
Use ST_Distance_Sphere(g1, g2 [, radius]) and spatial indexes
Move on AND u.tiIsProfileSet = 1 AND u.tiIsActive = 1 AND u.tiIsDeleted = 0 to where condition.
Avoid third join, fetch data from business_categories with another query (via relation e.g.)
Try to execute this query
SELECT
ST_Distance_Sphere(Point('-1.584927','52.28231599999999'), Point(`bizprof`.`vLongitude`,`bizprof`.`vLatitude`), 6370986 ) AS `distance`,
`bizprof`.`iCategoryId`
FROM `business_profile` `bizprof`
LEFT JOIN `users` `u` ON `u`.`iUserId` = `bizprof`.`iUserId`
WHERE 1=1
AND `u`.`tiIsProfileSet` = 1
AND `u`.`tiIsActive` = 1
AND `u`.`tiIsDeleted` = 0
HAVING distance >= 0 AND distance <= 10*1000
just some suggestion ..
Be sure you have proper compisite indexes on
Table business_profile columns( iUserId, iCategoryId)
table users columns (iUserId, tiIsProfileSet, tiIsActive, tiIsDeleted )
table business_categories column (iCategoryId)
then you should not use group by without aggregation function (if you need distinct result add DISTINCT clause in select )
you could also use the where (repeating di code for distance) clause and not the having for filter the result
SELECT
ROUND(6371*acos(cos(radians('52.28231599999999'))*cos(radians(bizprof.vLatitude))*cos(radians(bizprof.vLongitude)-radians('-1.584927'))+sin(radians('52.28231599999999'))*sin(radians(bizprof.vLatitude))),2) AS distance,
`bizcat`.`vCategoryName`,
`bizcat`.`iCategoryId`
FROM `business_profile` `bizprof`
LEFT JOIN `users` `u` ON u.iUserId = bizprof.iUserId
AND u.tiIsProfileSet = 1
AND u.tiIsActive = 1
AND u.tiIsDeleted = 0
LEFT JOIN `business_categories` `bizcat` ON bizcat.iCategoryId = bizprof.iCategoryId
WHERE ROUND(6371*acos(cos(radians('52.28231599999999'))*cos(radians(bizprof.vLatitude))*cos(radians(bizprof.vLongitude)-radians('-1.584927'))+sin(radians('52.28231599999999'))*sin(radians(bizprof.vLatitude))),2) >= 0
AND ROUND(6371*acos(cos(radians('52.28231599999999'))*cos(radians(bizprof.vLatitude))*cos(radians(bizprof.vLongitude)-radians('-1.584927'))+sin(radians('52.28231599999999'))*sin(radians(bizprof.vLatitude))),2) <= 10
My Sql query takes more time to execute from mysql database server . There are number of tables are joined with sb_tblproperty table. sb_tblproperty is main table that contain more than 1,00,000 rows . most of table contain 50,000 rows.
How to optimize my sql query to fast execution. I have also used indexing.
indexing Explain - query - structure
SELECT `t1`.`propertyId`, `t1`.`projectId`,
`t1`.`furnised`, `t1`.`ownerID`, `t1`.`subType`,
`t1`.`fors`, `t1`.`size`, `t1`.`unit`,
`t1`.`bedrooms`, `t1`.`address`, `t1`.`dateConfirm`,
`t1`.`dateAdded`, `t1`.`floor`, `t1`.`priceAmount`,
`t1`.`priceRate`, `t1`.`allInclusive`, `t1`.`booking`,
`t1`.`bookingRate`, `t1`.`paidPercetage`,
`t1`.`paidAmount`, `t1`.`is_sold`, `t1`.`remarks`,
`t1`.`status`, `t1`.`confirmedStatus`, `t1`.`source`,
`t1`.`companyName` as company, `t1`.`monthly_rent`,
`t1`.`per_sqft`, `t1`.`lease_duration`,
`t1`.`lease_commencement`, `t1`.`lock_in_period`,
`t1`.`security_deposit`, `t1`.`security_amount`,
`t1`.`total_area_leased`, `t1`.`lease_escalation_amount`,
`t1`.`lease_escalation_years`, `t2`.`propertyTypeName` as
propertyTypeName, `t3`.`propertySubTypeName` subType,
`t3`.`propertySubTypeId` subTypeId, `Owner`.`ContactName`
ownerName, `Owner`.`companyName`, `Owner`.`mobile1`,
`Owner`.`otherPhoneNo`, `Owner`.`mobile2`,
`Owner`.`email`, `Owner`.`address` as caddress,
`Owner`.`contactType`, `P`.`projectName` as project,
`P`.`developerName` as developer, `c`.`name` as city,
if(t1.projectId="", group_concat( distinct( L.locality)),
group_concat( distinct(L2.locality))) as locality, `U`.`firstname`
addedBy, `U1`.`firstname` confirmedBy
FROM `sb_tblproperty` as t1
JOIN `sb_contact` Owner ON `Owner`.`id` = `t1`.`ownerID`
JOIN `tbl_city` C ON `c`.`id` = `t1`.`city`
JOIN `sb_propertytype` t2 ON `t1`.`propertyType`= `t2`.`propertyTypeId`
JOIN `sb_propertysubtype` t3 ON `t1`.`subType` =`t3`.`propertySubTypeId`
LEFT JOIN `sb_tbluser` U ON `t1`.`addedBy` = `U`.`userId`
LEFT JOIN`sb_tbluser` U1 ON `t1`.`confirmedBy` = `U1`.`userId`
LEFT JOIN `sb_tblproject` P ON `P`.`id` = `t1`.`projectId` LEFT
JOIN `sb_tblpropertylocality` PL ON `t1`.`propertyId` = `PL`.`propertyId`
LEFT JOIN `sa_localitiez` L ON `L`.`id` = `PL`.`localityId`
LEFT JOIN `sb_tblprojectlocality` PROL ON `PROL`.`projectId` = `P`.`id`
LEFT JOIN `sa_localitiez` L2 ON `L2`.`id` = `PROL`.`localityId`
LEFT JOIN `sb_tblfloor` F
ON `F`.`floorName` =`t1`.`floor`
WHERE `t1`.`is_sold` != '1' GROUP BY `t1`.`propertyId`
ORDER BY `t1`.`dateConfirm`
DESC LIMIT 1000
Please provide the EXPLAIN.
Meanwhile, try this:
SELECT ...
FROM (
SELECT propertyId
FROM sb_tblproperty
WHERE `is_sold` = 0
ORDER BY `dateConfirm` DESC
LIMIT 1000
) AS x
JOIN `sb_tblproperty` as t1 ON t1.propertyId = x.propertyId
JOIN `sb_contact` Owner ON `Owner`.`id` = `t1`.`ownerID`
JOIN `tbl_city` C ON `c`.`id` = `t1`.`city`
...
LEFT JOIN `sb_tblfloor` F ON `F`.`floorName` =`t1`.`floor`
ORDER BY `t1`.`dateConfirm` DESC -- yes, again
Together with
INDEX(is_sold, dateConfirm)
How can t1.projectId="" ? Isn't projectId the PRIMARY KEY? (This is one of many reasons for needing the SHOW CREATE TABLE.)
If my suggestion leads to "duplicate" rows (that is, multiple rows with the same propertyId), don't simply add back the GROUP BY propertyId. Instead figure out why, and avoid the need for the GROUP BY. (That is probably the performance issue.)
A likely case is the GROUP_CONCAT. A common workaround is to change from
GROUP_CONCAT( distinct( L.locality)) AS Localities,
...
LEFT JOIN `sa_localitiez` L ON `L`.`id` = `PL`.`localityId`
to
( SELECT GROUP_CONCAT(distinct locality)
FROM sa_localitiez
WHERE id = PL.localityId ) AS Localities
...
# and remove the JOIN
I want to SELECT a field based on a ID value.
Products
PRODUCT_ID Name
19 Chair
20 Table
Product_fields
ID PRODUCT_ID TYPE DESCRIPTION
1 19 C White
2 19 S Modern
3 20 C Black
4 20 S Classic
I need a result like:
Product Type_C Type_S
Chair White Modern
Table Black Classic
I am able to produce this using two LEFT JOINs on the product_fields table but this slows down the query too much. Is there a better way?
Slows down the query how much? What is acceptable?
If you really don't want to use joins (you must have one join), then use views or nested queries. But I don't think they will be any faster, though you can give it a try.
See views at sqlfiddle
select p.PRODUCT_ID, p.Name, f.CDescription, f.SDescription
from Products p
join(
SELECT PRODUCT_ID, Max( CDescription ) as CDescription,
Max( SDescription ) as SDescription
FROM(
select PRODUCT_ID,
case Type when 'C' then Description end as CDescription,
case Type when 'S' then Description end as SDescription
from Fields
) x
group by PRODUCT_ID
) f
on f.PRODUCT_ID = p.PRODUCT_ID;
The complete statement is:
SELECT
NL.product_name,
PRD.product_sku AS product_sku,
CF.virtuemart_product_id AS virtuemart_product_id,
GROUP_CONCAT(distinct CFA.customsforall_value_name
ORDER BY CFA.customsforall_value_name ASC
separator ' | ' ) AS Name_exp_3,
ROUND((((prices.product_price * CALC.calc_value) / 100) + prices.product_price),
2) AS Prijs,
VMCF_L.custom_value AS latijn,
VMCF_T.custom_value AS THT
VMCF_B.custom_value AS Batch
from j25_virtuemart_products AS PRD
LEFT join j25_virtuemart_product_custom_plg_customsforall AS CF ON CF.virtuemart_product_id = PRD.virtuemart_product_id
join j25_virtuemart_product_prices AS prices ON PRD.virtuemart_product_id = prices.virtuemart_product_id
join j25_virtuemart_calcs AS CALC ON prices.product_tax_id = CALC.virtuemart_calc_id
join j25_virtuemart_products_nl_nl AS NL ON NL.virtuemart_product_id = PRD.virtuemart_product_id
LEFT join j25_virtuemart_product_customfields AS VMCF ON VMCF.virtuemart_product_id = PRD.virtuemart_product_id
LEFT join j25_virtuemart_custom_plg_customsforall_values AS CFA ON CFA.customsforall_value_id = CF.customsforall_value_id
LEFT JOIN j25_virtuemart_product_customfields AS VMCF_L ON VMCF.virtuemart_product_id = VMCF_L.virtuemart_product_id AND VMCF_L.virtuemart_custom_id = 16
LEFT JOIN j25_virtuemart_product_customfields AS VMCF_T ON VMCF.virtuemart_product_id = VMCF_T.virtuemart_product_id AND VMCF_T.virtuemart_custom_id = 3
LEFT JOIN j25_virtuemart_product_customfields AS VMCF_B ON VMCF.virtuemart_product_id = VMCF_B.virtuemart_product_id AND VMCF_B.virtuemart_custom_id = 18
WHERE
PRD.product_sku like '02.%'
group by PRD.virtuemart_product_id
order by NL.product_name;
Where the three SELECT results named 'Latijn', 'THT', and 'Batch' are the ones which I compared earlier as the black/white and classic/modern values.
Hope this makes any sense.
As you can see this involves a Virtuemart installation, so I cannot fiddle about to much with the schema.
When I exclude the bottom 3 JOINS and there related FIELDS, the query takes approx 0,5 seconds. With the JOINS and FIELDS included, the query takes almost 19 seconds.
I have created a view from this complete query which I query from my labeling application.
Thanks everyone! With your input I created:
select
NL.product_nameASproduct_name,
PRD.product_skuASproduct_sku,
CF.virtuemart_product_idASvirtuemart_product_id,
group_concat(distinctCFA.customsforall_value_name
order byCFA.customsforall_value_nameASC
separator ' | ') ASName_exp_3,
round((((prices.product_price*CALC.calc_value) / 100) +prices.product_price),
2) ASPrijs,
f.LatijnASLatijn,
f.THTASTHT,
f.BatchASBatch
from
(((((((j25_virtuemart_productsPRD
left joinj25_virtuemart_product_custom_plg_customsforallCFON ((CF.virtuemart_product_id=PRD.virtuemart_product_id)))
joinj25_virtuemart_product_pricespricesON ((PRD.virtuemart_product_id=prices.virtuemart_product_id)))
joinj25_virtuemart_calcsCALCON ((prices.product_tax_id=CALC.virtuemart_calc_id)))
joinj25_virtuemart_products_nl_nlNLON ((NL.virtuemart_product_id=PRD.virtuemart_product_id)))
left joinj25_virtuemart_product_customfieldsVMCFON ((VMCF.virtuemart_product_id=PRD.virtuemart_product_id)))
left joinj25_virtuemart_custom_plg_customsforall_valuesCFAON ((CFA.customsforall_value_id=CF.customsforall_value_id)))
left joinvw_batch_Latijn_THT_groupedfON ((f.virtuemart_product_id=PRD.virtuemart_product_id)))
where
(PRD.product_skulike '02.%')
group byPRD.virtuemart_product_id
order byNL.product_name``
Which takes 1.4 seconds to execute, a whole lot faster then the 19 seconds I started with.
I am having some trouble putting together a SQL statement properly because I don't have much experience SQL, especially aggregate functions. Safe to say I don't really know what I'm doing outside of the basic SQL structure. I can do regular joins, but not complex ones.
I have some tables: 'Survey', 'Questions', 'Session', 'ParentSurvey', and 'ParentSurveyQuestion'. Structurally, a survey can have questions, it can have users that started the survey (a session), and it can have a parent survey whose questions get imported into the current survey.
What I want to do is get information for a each survey in the Survey table; total questions it has, how many sessions have been started (conditionally, ones that have not finished), and the number of questions in the parents survey. The three joined tables can but do not have to contain any values, and if they don't then 0 should be returned by COUNT. The common field in three of the tables is a variation of 'survey_id'
Here is my SQL so far, I put the table structure below it.
SELECT
`kp_survey_id`,
COALESCE( q.cnt, 0 ) AS questionsAmount,
COALESCE( s.cnt, 0 ) AS sessionsAmount
COALESCE( p.cnt, 0 ) AS parentQAmount,
FROM `Survey`
LEFT JOIN <-- I'd like the count of questions for this survey
( SELECT COUNT(*) AS cnt
FROM Questions
GROUP BY kf_survey_id ) q
ON Survey.kp_survey_id = Questions.kf_survey_id
LEFT JOIN
( SELECT COUNT(*) AS cnt <-- I'd like the count of started sessions for this survey
FROM Session
WHERE session_status = 'started' <-- should this be Session.session_status?
GROUP BY kf_survey_id ) s
ON Survey.kp_survey_id = Session.kf_survey_id
LEFT JOIN
( SELECT COUNT(*) AS cnt <-- I'd like the count of questions in the parent survey with this survey id
FROM ParentSurvey
GROUP BY kp_parent_survey_id ) p
ON Survey.kf_parent_survey_id = ParentSurveyQuestion.kf_parent_survey_id
'kp' prefix means primary key, while 'kf' prefix means foreign key
Structure:
Survey: 'kp_survey_id' | 'kf_parent_survey_id'
Question: 'kp_question_id' | 'kf_survey_id'
Session: 'kp_session_id' | 'kf_survey_id' | 'session_status'
ParentSurvey: 'kp_parent_survey_id' | 'survey_name'
ParentSurveyQuestion: 'kp_parent_question_id' | 'kf_parent_survey_id'
There are also other columns in each table like 'name' or 'account_id', but i don't think they matter in this case
I'd like to know if I'm doing this correctly or if I'm missing something. I'm repurposing some code I found here on stackoverflow and modifying it to meet my needs, as I haven't seen conditional aggregation for more than three tables on this site.
My expected output is something like:
kp_survey_id | questionsAmount | sessionsAmount | parentQAmount
1 | 3 | 0 | 3
2 | 0 | 5 | 3
I think you were pretty close -- just need to fix your joins and include the survey id in the subqueries to use in those joins:
SELECT
`kp_survey_id`,
COALESCE( q.cnt, 0 ) AS questionsAmount,
COALESCE( s.cnt, 0 ) AS sessionsAmount
COALESCE( p.cnt, 0 ) AS parentQAmount,
FROM `Survey`
LEFT JOIN
( SELECT COUNT(*) cnt, kf_survey_id AS cnt
FROM Questions
GROUP BY kf_survey_id ) q
ON Survey.kp_survey_id = q.kf_survey_id
LEFT JOIN
( SELECT COUNT(*) cnt, kf_survey_id
FROM Session
WHERE session_status = 'started'
GROUP BY kf_survey_id ) s
ON Survey.kp_survey_id = s.kf_survey_id
LEFT JOIN
( SELECT COUNT(*) cnt, kp_parent_survey_id
FROM ParentSurvey
GROUP BY kp_parent_survey_id ) p
ON Survey.kf_parent_survey_id = p.kp_parent_survey_id
One thing you need to do is correct your joins. When you are joining to a subquery, you need to use the alias of the subquery. In your case you are using the alias of the table being used in the subquery.
Another thing you need to change is to include the field you wish to use in your JOIN in the subquery.
Make these changes and try running. Do you get an error or the desired results?
SELECT
`kp_survey_id`,
COALESCE( q.cnt, 0 ) AS questionsAmount,
COALESCE( s.cnt, 0 ) AS sessionsAmount
COALESCE( p.cnt, 0 ) AS parentQAmount,
FROM `Survey`
LEFT JOIN <-- I'd like the count of questions for this survey
( SELECT kf_survey_id, COUNT(*) AS cnt
FROM Questions
GROUP BY kf_survey_id ) q
ON Survey.kp_survey_id = q.kf_survey_id
LEFT JOIN
( SELECT kf_survey_id, COUNT(*) AS cnt <-- I'd like the count of started sessions for this survey
FROM Session
WHERE session_status = 'started' <-- should this be Session.session_status?
GROUP BY kf_survey_id ) s
ON Survey.kp_survey_id = s.kf_survey_id
LEFT JOIN
( SELECT kp_parent_survey_id, COUNT(*) AS cnt <-- I'd like the count of questions in the parent survey with this survey id
FROM ParentSurvey
GROUP BY kp_parent_survey_id ) p
ON Survey.kf_parent_survey_id = p.kf_parent_survey_id