Unable to FORCE INDEX on LEFT JOIN subquery? - mysql

I have a complicated query which involves a subquery inside a LEFT JOIN statement. The process time is ~2s.
If I replace the subquery with a temporary table, I hit the same ~2s.
If I create an index in the temporary table, I have ~0.05s.
I am trying to figure out why I can't work with Mysql USE INDEX or FORCE INDEX directly in the LEFT JOIN STATEMENT.
Here is a simple query which will return a 'check your mysql syntax' error.
SELECT *
FROM table1 t1
LEFT JOIN (SELECT id_project FROM table2) t2
FORCE INDEX FOR JOIN (id_project)
ON (t2.id_project = t1.id_project);
Why can't I use a FORCE INDEX with a LEFT JOIN subquery?
--edit: here is the actual query returning mysql error
SELECT
p.id_projet, pa.piste_affaire, ty.type, p.e_force, gr.groupe, p.client, p.intitule_affaire,
ag.agence, st.statut, p.winratio, p.justification_nogo, p.webdoc, sp.sponsor, dis.domain_is,
cons.constructeur, p.date_chgt_statut, p.date_creation,
p.ca_estimatif, p.tcv, p.marge,
tarp.target_price, p.marketable_price,
p.remise_propal_initiale, p.remise_propal_reel,
p.date_de_signature, p.debut_build, p.fin_build,
his.date, his.login, co.contrat,
p.duree_contrat, p.fqp, cap.capitalisation,
p.international, p.risk_management, p.asi,
GROUP_CONCAT(com.commentaire,',') AS COMMENT
FROM
piste_affaire pa, statut st, TYPE ty, agence ag,
domain_is dis, sponsor sp, constructeur cons, groupe gr,
target_price tarp, contrat co, historique his, capitalisation cap,
projet p
LEFT JOIN commentaire com ON (p.id_projet=com.id_projet)
LEFT JOIN bizdev bizd ON (p.id_bizdev = bizd.id_bizdev )
LEFT JOIN (SELECT eco_projet.id_projet,
GROUP_CONCAT(
CONCAT(ecosysteme)
ORDER BY ecosysteme SEPARATOR ', ' ) AS ecosysteme
FROM ecosysteme NATURAL JOIN eco_projet, projet
WHERE eco_projet.id_projet = projet.id_projet
GROUP BY eco_projet.id_projet) AS tmpeco
/* if I delete that line, everything works as intended with ~2s query*/
FORCE INDEX FOR JOIN (id_projet)
/* ** */
ON (tmpeco.id_projet = p.id_projet)
WHERE
ag.id_division != 2 AND
dis.id_division != 2 AND
st.id_division != 2 AND
pa.id_division != 2 AND
cons.id_division != 2 AND
p.type_division = 1 AND
pa.id_piste_affaire = p.id_piste_affaire AND
p.id_statut = st.id_statut AND
p.id_type = ty.id_type AND
p.id_agence = ag.id_agence AND
p.id_domain_is = dis.id_domain_is AND
p.id_sponsor = sp.id_sponsor AND
p.id_constructeur = cons.id_constructeur AND
p.id_groupe = gr.id_groupe AND
p.id_target_price = tarp.id_target_price AND
p.id_contrat = co.id_contrat AND
p.id_projet = his.id_projet AND
p.id_capitalisation = cap.id_capitalisation
GROUP BY p.id_projet;

Why the subquery? Try this:
SELECT *
FROM table1 t1
LEFT JOIN table2 t2
FORCE INDEX FOR JOIN (id_project)
ON (t2.id_project = t1.id_project);
You probably won't even need to force the index with the above query.

Related

Alternative to Where Exist clause MySQL

I have this select statement that is taking quite a while to run on a larger dataset
select lookup_svcscat_svcscatnew.SVCSCAT_NEW_DESC as svc_type,
enrolid, msclmid, dx1, dx2, dx3,
proc1,msk_cpt_mapping.surg_length_cd as SL_CD,
msk_cpt_mapping.days as day_window,o.svcdate_form, pay,
table_label
from ccaeo190_ky o
left join lookup_svcscat_svcscatnew on o.svcscat = lookup_svcscat_svcscatnew.svcscat
left join msk_cpt_mapping on o.proc1 = msk_cpt_mapping.cpt_code
where EXISTS
(
select 1
from eoc_op_mapping e
where e.msclmid = o.msclmid
and e.enrolid = o.enrolid
and proc1 =27447
)
ORDER BY svcdate_form, msclmid;
I want to return any row in my ccaeo190_ky table that meets the requirements of the where EXISTS clause on table eoc_op_mapping. Is there any way to achieve these results using joins or select statements?
I was thinking something like:
select lookup_svcscat_svcscatnew.SVCSCAT_NEW_DESC as svc_type,
o.enrolid, o.msclmid, dx1, dx2, dx3,
proc1,msk_cpt_mapping.surg_length_cd as SL_CD,
msk_cpt_mapping.days as day_window,o.svcdate_form, pay,
table_label
from ccaeo190_ky o
left join lookup_svcscat_svcscatnew on o.svcscat = lookup_svcscat_svcscatnew.svcscat
left join msk_cpt_mapping on o.proc1 = msk_cpt_mapping.cpt_code
inner join
(select msclmid, SUM(IF(proc1 = 27447,1,0)) AS cpt
from eoc_op_mapping
group by enrolid
HAVING cpt > 0) e
on e.enrolid = o.enrolid
group by o.enrolid;
But I don't know if this is in the right direction
Usually EXISTS performs better than a join.
If you want to try a join, this the equivalent to your WHERE EXISTS:
.......................................................
inner join (
select distinct msclmid, enrolid
from eoc_op_mapping
where proc1 = 27447
) e on e.msclmid = o.msclmid and e.enrolid = o.enrolid
.......................................................
You can remove distinct if there are no duplicate msclmid, enrolid combinations in eoc_op_mapping.

Mysql query taking 20 minutes only for 11000 records. how to optimize below mysql query select query in where clause with not exists

SELECT DISTINCT ACA.Application_No, AC.FirstName,AC.Id,AC.LastName,AC.MobileNo,CL.leadId
FROM ABSLI_PAYMENT_TRANSACTION APT
INNER JOIN ABSLI_CUSTOMER_APPLICATION ACA ON ACA.Policy_No=APT.policyId
INNER JOIN ABSLI_CUSTOMER AC ON AC.Id=ACA.CustomerId
LEFT JOIN ABSLI_CUSTOMER_LEAD CL ON CL.policyId = ACA.Policy_No
INNER JOIN ABSLI_Policy_Status_Tracking pst ON pst.policyId = APT.policyId
WHERE APT.paymentStatus='Y'
AND NOT EXISTS (SELECT 1 FROM ABSLI_SERVICE_STATUS WHERE PolicyNo=APT.policyId AND NAME = 'APEX_Validate')
AND ACA.Application_No NOT IN (SELECT RT.ApplicationNumber FROM ABSLI_REFUND_TRANSACTION RT WHERE RT.Status != 'Retain')
ORDER BY pst.updatedDate DESC;
Correlated subqueries can be costly on larger datasets, you might want to try converting
...
AND NOT EXISTS (SELECT 1 FROM ABSLI_SERVICE_STATUS WHERE PolicyNo=APT.policyId AND NAME = 'APEX_Validate')
...
to something like
...
LEFT JOIN ABSLI_SERVICE_STATUS AS ss ON APT.policyId = ss.PolicyNo AND ss.NAME = 'APEX_Validate'
...
AND ss.NAME IS NULL
Normally, I'd suggest an "id" field from ss for the IS NULL check, but NAME obviously exists (from the query you have) and cannot be 'APEX_Validate' and NULL at the same time. Also, if there is a compound index on PolicyNo, Name that index can probably be used without accessing the table itself.)
Considering you have an index on ABSLI_REFUND_TRANSACTION.ApplicationNumber column, you can try to use NOT EXISTS also for the second subquery instead of NOT IN. This way, you benefit this index for that subquery, and usually NOT IN might be problematic for large set of data.
SELECT DISTINCT ACA.Application_No, AC.FirstName, AC.Id, AC.LastName, AC.MobileNo, CL.leadId
FROM ABSLI_PAYMENT_TRANSACTION APT
INNER JOIN ABSLI_CUSTOMER_APPLICATION ACA
ON ACA.Policy_No = APT.policyId
INNER JOIN ABSLI_CUSTOMER AC
ON AC.Id = ACA.CustomerId
LEFT JOIN ABSLI_CUSTOMER_LEAD CL
ON CL.policyId = ACA.Policy_No
INNER JOIN ABSLI_Policy_Status_Tracking pst
ON pst.policyId = APT.policyId
WHERE APT.paymentStatus = 'Y'
AND NOT EXISTS (SELECT 1
FROM ABSLI_SERVICE_STATUS
WHERE PolicyNo = APT.policyId
AND NAME = 'APEX_Validate')
AND NOT EXISTS (SELECT 1
FROM ABSLI_REFUND_TRANSACTION RT
WHERE RT.Status != 'Retain'
AND RT.ApplicationNumber = ACA.Application_No)
ORDER BY pst.updatedDate DESC;
But without knowing the execution plan, hard to tell more about performance.

How to access outer table data in join where

How can I access data from an outer table in a SELECT, and use it in an WHERE inside a JOIN estructure?
Below is the current query:
SELECT
cvl.id caracteristica_valor_id,
cvl.nome caracteristica_valor_nome,
cvl.valor caracteristica_valor_valor,
ctp.id caracteristica_tipo_id,
ctp.nome caracteristica_tipo_nome,
ctp.codigo caracteristica_tipo_codigo,
ctp.tipo caracteristica_tipo_tipo,
COUNT(DISTINCT var.id_perfil_produto) quantidade_itens
FROM
caracteristica_variacao cvr
INNER JOIN caracteristica_valor cvl ON cvl.id = cvr.id_caracteristica_valor
INNER JOIN caracteristica_tipo ctp ON ctp.id = cvl.id_caracteristica_tipo
INNER JOIN variacao var ON var.id = cvr.id_variacao
INNER JOIN(
SELECT DISTINCT
ppr.id perfil_produto_id
FROM
perfil_produto ppr
INNER JOIN produto pro ON pro.id = ppr.id_produto
INNER JOIN(
SELECT ppr2.id AS id_perfil_sub,a
COUNT(var.id) AS qtd_variacoes,
SUM(var.quantidade_estoque) AS quantidade_estoque,
COALESCE(SUM(var.quantidade_estoque_reservada),0) AS quantidade_estoque_reservada,
MIN(var.disponibilidade) AS disponibilidade,
MIN(var.frete_gratis) AS frete_gratis,
MIN(var.preco_venda) AS preco_venda,
MAX(var.preco_listagem) AS preco_listagem
FROM
variacao var
LEFT JOIN perfil_produto ppr2 ON ppr2.id = var.id_perfil_produto
LEFT JOIN caracteristica_variacao cvr_1 ON cvr_1.id_variacao = var.id
LEFT JOIN caracteristica_valor cvl_1 ON cvl_1.id = cvr_1.id_caracteristica_valor
LEFT JOIN caracteristica_tipo ctp_1 ON ctp_1.id = cvl_1.id_caracteristica_tipo
WHERE
var.disponibilidade = 1
AND(
ctp_1.codigo = 'tamanho' AND cvl_1.valor IN('p')
)
GROUP BY
ppr2.id
) AS grp_var ON grp_var.id_perfil_sub = ppr.id
INNER JOIN produto_categoria prc ON pro.id = prc.produto_id
INNER JOIN categoria cat ON prc.categoria_id = cat.id
WHERE
pro.disponibilidade = 1 AND prc.categoria_id IN (164, 165, 166)
) AS produto ON produto.perfil_produto_id = var.id_perfil_produto
GROUP BY
cvl.id
ORDER BY
ctp.tipo ASC,
ctp.id
I need the field ctp.codigo from the outer table inside thist part:
WHERE
var.disponibilidade = 1
AND(
ctp_1.codigo = 'tamanho' AND cvl_1.valor IN('p')
)
for this section to be as follows:
WHERE
var.disponibilidade = 1
AND(
(ctp.codigo != 'tamanho' AND ctp_1.codigo = 'tamanho' AND cvl_1.valor IN('p'))
OR
(ctp.codigo = 'tamanho')
)
It's not possible to reference columns from the outer query from inside an inline view query.
In the MySQL venacular, the inline view query is called a "derived table". And that name makes sense, because of the way MySQL processes it. The execution plan first materializes the inline view query into a temporary(-ish) table. Once that is done, then the outer query can run, referencing the contents of the derived table.
MySQL doesn't have available the columns from the outer query at the time the inline view query runs.
It is possible to reference columns from the outer query inside a subquery that appears for example in the SELECT list, or in the WHERE clause. We call a subquery that references columns from outer query a "correlated subquery".

Mysql tekes too much time to excute sql query, based on multiple join

My Sql query takes more time to execute from mysql database server . There are number of tables are joined with sb_tblproperty table. sb_tblproperty is main table that contain more than 1,00,000 rows . most of table contain 50,000 rows.
How to optimize my sql query to fast execution. I have also used indexing.
indexing Explain - query - structure
SELECT `t1`.`propertyId`, `t1`.`projectId`,
`t1`.`furnised`, `t1`.`ownerID`, `t1`.`subType`,
`t1`.`fors`, `t1`.`size`, `t1`.`unit`,
`t1`.`bedrooms`, `t1`.`address`, `t1`.`dateConfirm`,
`t1`.`dateAdded`, `t1`.`floor`, `t1`.`priceAmount`,
`t1`.`priceRate`, `t1`.`allInclusive`, `t1`.`booking`,
`t1`.`bookingRate`, `t1`.`paidPercetage`,
`t1`.`paidAmount`, `t1`.`is_sold`, `t1`.`remarks`,
`t1`.`status`, `t1`.`confirmedStatus`, `t1`.`source`,
`t1`.`companyName` as company, `t1`.`monthly_rent`,
`t1`.`per_sqft`, `t1`.`lease_duration`,
`t1`.`lease_commencement`, `t1`.`lock_in_period`,
`t1`.`security_deposit`, `t1`.`security_amount`,
`t1`.`total_area_leased`, `t1`.`lease_escalation_amount`,
`t1`.`lease_escalation_years`, `t2`.`propertyTypeName` as
propertyTypeName, `t3`.`propertySubTypeName` subType,
`t3`.`propertySubTypeId` subTypeId, `Owner`.`ContactName`
ownerName, `Owner`.`companyName`, `Owner`.`mobile1`,
`Owner`.`otherPhoneNo`, `Owner`.`mobile2`,
`Owner`.`email`, `Owner`.`address` as caddress,
`Owner`.`contactType`, `P`.`projectName` as project,
`P`.`developerName` as developer, `c`.`name` as city,
if(t1.projectId="", group_concat( distinct( L.locality)),
group_concat( distinct(L2.locality))) as locality, `U`.`firstname`
addedBy, `U1`.`firstname` confirmedBy
FROM `sb_tblproperty` as t1
JOIN `sb_contact` Owner ON `Owner`.`id` = `t1`.`ownerID`
JOIN `tbl_city` C ON `c`.`id` = `t1`.`city`
JOIN `sb_propertytype` t2 ON `t1`.`propertyType`= `t2`.`propertyTypeId`
JOIN `sb_propertysubtype` t3 ON `t1`.`subType` =`t3`.`propertySubTypeId`
LEFT JOIN `sb_tbluser` U ON `t1`.`addedBy` = `U`.`userId`
LEFT JOIN`sb_tbluser` U1 ON `t1`.`confirmedBy` = `U1`.`userId`
LEFT JOIN `sb_tblproject` P ON `P`.`id` = `t1`.`projectId` LEFT
JOIN `sb_tblpropertylocality` PL ON `t1`.`propertyId` = `PL`.`propertyId`
LEFT JOIN `sa_localitiez` L ON `L`.`id` = `PL`.`localityId`
LEFT JOIN `sb_tblprojectlocality` PROL ON `PROL`.`projectId` = `P`.`id`
LEFT JOIN `sa_localitiez` L2 ON `L2`.`id` = `PROL`.`localityId`
LEFT JOIN `sb_tblfloor` F
ON `F`.`floorName` =`t1`.`floor`
WHERE `t1`.`is_sold` != '1' GROUP BY `t1`.`propertyId`
ORDER BY `t1`.`dateConfirm`
DESC LIMIT 1000
Please provide the EXPLAIN.
Meanwhile, try this:
SELECT ...
FROM (
SELECT propertyId
FROM sb_tblproperty
WHERE `is_sold` = 0
ORDER BY `dateConfirm` DESC
LIMIT 1000
) AS x
JOIN `sb_tblproperty` as t1 ON t1.propertyId = x.propertyId
JOIN `sb_contact` Owner ON `Owner`.`id` = `t1`.`ownerID`
JOIN `tbl_city` C ON `c`.`id` = `t1`.`city`
...
LEFT JOIN `sb_tblfloor` F ON `F`.`floorName` =`t1`.`floor`
ORDER BY `t1`.`dateConfirm` DESC -- yes, again
Together with
INDEX(is_sold, dateConfirm)
How can t1.projectId="" ? Isn't projectId the PRIMARY KEY? (This is one of many reasons for needing the SHOW CREATE TABLE.)
If my suggestion leads to "duplicate" rows (that is, multiple rows with the same propertyId), don't simply add back the GROUP BY propertyId. Instead figure out why, and avoid the need for the GROUP BY. (That is probably the performance issue.)
A likely case is the GROUP_CONCAT. A common workaround is to change from
GROUP_CONCAT( distinct( L.locality)) AS Localities,
...
LEFT JOIN `sa_localitiez` L ON `L`.`id` = `PL`.`localityId`
to
( SELECT GROUP_CONCAT(distinct locality)
FROM sa_localitiez
WHERE id = PL.localityId ) AS Localities
...
# and remove the JOIN

Poorly performing SQL query

I have got a query like below, executing in SQL Server 2008
SELECT
ipm.HEORG_REFNO,
ipm.HOTYP_REFNO,
ipm.CASLT_REFNO,
ipm.HOLVL_REFNO,
IPM.MAIN_IDENT,
...
FROM
dbo.HEALTH_ORGANISATIONS ipm (NOLOCK)
LEFT JOIN
(SELECT
s.heorg_refno, min(s.start_dttm) as start_dttm_SPONT, max(isnull(convert(datetime,s.end_dttm,120),convert(datetime,'9999-01-01', 120))) as end_dttm_SPONT
FROM
dbo.service_points s (NOLOCK)
INNER JOIN
dbo.reference_values rfval (NOLOCK) ON s.SPTYP_REFNO = rfval.RFVAL_REFNO
AND RFVAL.MAIN_CODE != 'PDT'
GROUP BY
s.heorg_refno) SPONT ON ipm.HEORG_REFNO = SPONT.HEORG_REFNO
-- Bring only Health Organisation records and also certain records,whose HOTYP_REFNO does not exist in REF_VALS
WHERE
NOT EXISTS ((SELECT 'x'
FROM REFERENCE_VALUES RVAL (NOLOCK)
WHERE RVAL.RFVAL_REFNO = ipm.HOTYP_REFNO
AND main_code IN ('011','012','015','016', '017','019','2','AANDE','AEB','AEC','CLINIC','DAYCC','DEPRT','GPSIT','HC','HOSPL','HOST','LOCTN','LOSYN','MIU','MISC','MRL', 'SITE','THEAT','WARD','PDT','NURHM','DAYCR')
or ipm.HEORG_REFNO IN(select distinct HEORG_REFNO from SERVICE_POINT_SESSIONS (NOLOCK) where OWNER_HEORG_REFNO = 2001934 and HEORG_REFNO != 2001934)
or ipm.HEORG_REFNO IN (select REFNO from LOR_IPM_SYNTH_STG_DEV.. STAGING_Activity_LOCATION_DCS (NOLOCK) where Sources='HEORG_REFNO' and REFNO != 2001934)
)
)
It takes hell a lot of time to execute the query .
When I comment the below 2 lines, it runs faster:
or ipm.HEORG_REFNO IN(select distinct HEORG_REFNO from SERVICE_POINT_SESSIONS (NOLOCK) where OWNER_HEORG_REFNO = 2001934 and HEORG_REFNO != 2001934)
or ipm.HEORG_REFNO IN (select REFNO from LOR_IPM_SYNTH_STG_DEV.. STAGING_Activity_LOCATION_DCS (NOLOCK) where Sources='HEORG_REFNO' and REFNO != 2001934)
Thanks for any guidance provided in tuning the query
My first thoughts are that your query is massively complex - I would be looking at ways to simplify it...
In clauses don't always perform well - I would be tempted to suck this info into a table variable of banned "main_Codes", left join to it and test for null...
Time to run the Execution Plan though and see where your bottle necks actually are, which will depend on your own environment (indexing, stats etc)...
Try converting those IN subquery to a JOIN query like below and make sure you have proper index created on all the columns involved in join condition and where filter condition.
LEFT JOIN SERVICE_POINT_SESSIONS sps ON ipm.HEORG_REFNO = sps.HEORG_REFNO
AND sps.OWNER_HEORG_REFNO = 2001934
AND sps.HEORG_REFNO != 2001934
I would modify your query to be like below. Though I can't do nothing about your big inlist as of now but you should pull that inlist in a table variable and consider doing a JOIN with that rather.
SELECT
ipm.HEORG_REFNO,
ipm.HOTYP_REFNO,
ipm.CASLT_REFNO,
ipm.HOLVL_REFNO,
IPM.MAIN_IDENT,
...
FROM
dbo.HEALTH_ORGANISATIONS ipm
LEFT JOIN
(SELECT
s.heorg_refno, min(s.start_dttm) as start_dttm_SPONT,
max(isnull(convert(datetime,s.end_dttm,120),convert(datetime,'9999-01-01', 120))) as end_dttm_SPONT
FROM
dbo.service_points s
INNER JOIN dbo.reference_values rfval
ON s.SPTYP_REFNO = rfval.RFVAL_REFNO
AND RFVAL.MAIN_CODE != 'PDT'
GROUP BY
s.heorg_refno) SPONT ON ipm.HEORG_REFNO = SPONT.HEORG_REFNO
LEFT JOIN SERVICE_POINT_SESSIONS sps
ON ipm.HEORG_REFNO = sps.HEORG_REFNO
AND sps.OWNER_HEORG_REFNO = 2001934
AND sps.HEORG_REFNO != 2001934
LEFT JOIN LOR_IPM_SYNTH_STG_DEV .. STAGING_Activity_LOCATION_DCS sald
ON ipm.HEORG_REFNO = sald.HEORG_REFNO
AND sald.Sources='HEORG_REFNO'
AND sald.REFNO != 2001934
WHERE NOT EXISTS (SELECT 1
FROM REFERENCE_VALUES RVAL
WHERE RVAL.RFVAL_REFNO = ipm.HOTYP_REFNO
AND RVAL.main_code IN ('011','012','015','016', '017','019','2','AANDE','AEB','AEC','CLINIC','DAYCC','DEPRT','GPSIT','HC','HOSPL','HOST','LOCTN','LOSYN','MIU','MISC','MRL', 'SITE','THEAT','WARD','PDT','NURHM','DAYCR'));