Help me change this single complex query to use temporary tables - mysql

About the system:
- There are tutors who create classes and packs
- A tags based search approach is being followed.Tag relations are created when new tutors register and when tutors create packs (this makes tutors and packs searcheable). For details please check the section How tags work in this system? below.
Following is the concerned query
Can anybody help me suggest an approach using temporary tables. We have indexed all the relevant fields and it looks like this is the least time possible with this approach:-
SELECT SUM(DISTINCT( t.tag LIKE "%Dictatorship%"
OR tt.tag LIKE "%Dictatorship%"
OR ttt.tag LIKE "%Dictatorship%" )) AS key_1_total_matches
,
SUM(DISTINCT( t.tag LIKE "%democracy%"
OR tt.tag LIKE "%democracy%"
OR ttt.tag LIKE "%democracy%" )) AS key_2_total_matches
,
COUNT(DISTINCT( od.id_od )) AS
tutor_popularity,
CASE
WHEN ( IF(( wc.id_wc > 0 ), ( wc.wc_api_status = 1
AND wc.wc_type = 0
AND wc.class_date > '2010-06-01 22:00:56'
AND wccp.status = 1
AND ( wccp.country_code = 'IE'
OR wccp.country_code IN ( 'INT' )
) ), 0)
) THEN 1
ELSE 0
END AS 'classes_published'
,
CASE
WHEN ( IF(( lp.id_lp > 0 ), ( lp.id_status = 1
AND lp.published = 1
AND lpcp.status = 1
AND ( lpcp.country_code = 'IE'
OR lpcp.country_code IN ( 'INT' )
) ), 0)
) THEN 1
ELSE 0
END AS 'packs_published',
td . *,
u . *
FROM tutor_details AS td
JOIN users AS u
ON u.id_user = td.id_user
LEFT JOIN learning_packs_tag_relations AS lptagrels
ON td.id_tutor = lptagrels.id_tutor
LEFT JOIN learning_packs AS lp
ON lptagrels.id_lp = lp.id_lp
LEFT JOIN learning_packs_categories AS lpc
ON lpc.id_lp_cat = lp.id_lp_cat
LEFT JOIN learning_packs_categories AS lpcp
ON lpcp.id_lp_cat = lpc.id_parent
LEFT JOIN learning_pack_content AS lpct
ON ( lp.id_lp = lpct.id_lp )
LEFT JOIN webclasses_tag_relations AS wtagrels
ON td.id_tutor = wtagrels.id_tutor
LEFT JOIN webclasses AS wc
ON wtagrels.id_wc = wc.id_wc
LEFT JOIN learning_packs_categories AS wcc
ON wcc.id_lp_cat = wc.id_wp_cat
LEFT JOIN learning_packs_categories AS wccp
ON wccp.id_lp_cat = wcc.id_parent
LEFT JOIN order_details AS od
ON td.id_tutor = od.id_author
LEFT JOIN orders AS o
ON od.id_order = o.id_order
LEFT JOIN tutors_tag_relations AS ttagrels
ON td.id_tutor = ttagrels.id_tutor
LEFT JOIN tags AS t
ON t.id_tag = ttagrels.id_tag
LEFT JOIN tags AS tt
ON tt.id_tag = lptagrels.id_tag
LEFT JOIN tags AS ttt
ON ttt.id_tag = wtagrels.id_tag
WHERE ( u.country = 'IE'
OR u.country IN ( 'INT' ) )
AND CASE
WHEN ( ( tt.id_tag = lptagrels.id_tag )
AND ( lp.id_lp > 0 ) ) THEN lp.id_status = 1
AND lp.published = 1
AND lpcp.status = 1
AND ( lpcp.country_code = 'IE'
OR lpcp.country_code IN (
'INT'
) )
ELSE 1
END
AND CASE
WHEN ( ( ttt.id_tag = wtagrels.id_tag )
AND ( wc.id_wc > 0 ) ) THEN wc.wc_api_status = 1
AND wc.wc_type = 0
AND
wc.class_date > '2010-06-01 22:00:56'
AND wccp.status = 1
AND ( wccp.country_code = 'IE'
OR wccp.country_code IN (
'INT'
) )
ELSE 1
END
AND CASE
WHEN ( od.id_od > 0 ) THEN od.id_author = td.id_tutor
AND o.order_status = 'paid'
AND CASE
WHEN ( od.id_wc > 0 ) THEN od.can_attend_class = 1
ELSE 1
END
ELSE 1
END
AND ( t.tag LIKE "%Dictatorship%"
OR t.tag LIKE "%democracy%"
OR tt.tag LIKE "%Dictatorship%"
OR tt.tag LIKE "%democracy%"
OR ttt.tag LIKE "%Dictatorship%"
OR ttt.tag LIKE "%democracy%" )
GROUP BY td.id_tutor
HAVING key_1_total_matches = 1
AND key_2_total_matches = 1
ORDER BY tutor_popularity DESC,
u.surname ASC,
u.name ASC
LIMIT 0, 20
The problem
The results returned by the above query are correct (AND logic working as per expectation), but the time taken by the query rises alarmingly for heavier data and for the current data I have it is like 10 seconds as against normal query timings of the order of 0.005 - 0.0002 seconds, which makes it totally unusable.
Somebody suggested in my previous question to do the following:-
create a temporary table and insert here all relevant data that might end up in the final result set
run several updates on this table, joining the required tables one at a time instead of all of them at the same time
finally perform a query on this temporary table to extract the end result
All this was done in a stored procedure, the end result has passed unit tests, and is blazing fast.
I have never worked with temporary tables till now. Only if I could get some hints, kind of schematic representations so that I can start with...
Is there something faulty with the query?
What can be the reason behind 10+ seconds of execution time?
How tags work in this system?
When a tutor registers, tags are entered and tag relations are created with respect to tutor's details like name, surname etc.
When a Tutors create packs, again tags are entered and tag relations are created with respect to pack's details like pack name, description etc.
tag relations for tutors stored in tutors_tag_relations and those for packs stored in learning_packs_tag_relations. All individual tags are stored in tags table.

Temporary tables are not a silver bullet. The fundamental problem with your queries lies with patterns like this:
t.tag LIKE "%Dictatorship%"
OR tt.tag LIKE "%Dictatorship%"
OR ttt.tag LIKE "%Dictatorship%"
Wildcarding the left side of a LIKE comparison guarantees that an index can not be used. Effectively, you're table scanning all three tables involved...
You need to leverage Full Text Searching, either MySQL's native FTS or 3rd party stuff like Sphinx. All the FTS I've known include a scoring/rank value indicating the strength of the match - you can read the MySQL documentation for the algorithm details. But the score/rank is not the same as what you've got: SUM(DISTINCT LIKE...), you could get the same using something like:
SELECT t.id_tag,
COUNT(*) AS num_matches
FROM TABGS
WHERE MATCH(tag) AGAINST ('Dictatorship')
GROUP BY t.id_tag

Related

Problems with query speed when using a nested query for item count

When I add the nested query for invCount, my query time goes from .03 sec to 14 sec. The query works and I get correct values, but it is very, very slow in comparison. Is that just because I have to many conditions in that query? When I take it out and still have the second nested query, the time is still .03 secs. There is clearly something about the first nested query the database doesn't like, but I am not seeing what it is. I have a foreign key set for all the inner join lines too. Any help or ideas would be appreciated.
SELECT a.*,
f.name,
f.partNumber,
f.showInAdminStore,
f.showInPublicStore,
f.productImage,
r.mastCatID,
(SELECT COUNT(b.inventoryID)
FROM storeInventory b
INNER JOIN events c ON c.eventID = b.eventID
WHERE b.pluID = a.pluID
AND b.listPrice = a.listPrice
AND b.unlimitedQty = a.unlimitedQty
AND (b.packageID = a.packageID OR (b.packageID IS NULL AND a.packageID IS NULL))
AND b.orderID IS NULL
AND c.isOpen = '1'
AND b.paymentTypeID <= '2'
AND (b.inCart < '$cartTime' OR b.inCart IS NULL) ) AS invCount,
(SELECT COUNT(x.inventoryID)
FROM storeInventory x
WHERE x.packageID = a.inventoryID) AS packageCount
FROM storeInventory a
INNER JOIN storePLUs f ON f.pluID = a.pluID
INNER JOIN storeCategories r ON r.catID = f.catID
INNER JOIN events d ON d.eventID = a.eventID
WHERE a.storeFrontID = '1'
AND a.orderID IS NULL
AND a.paymentTypeID <= '2'
AND d.isOpen = '1'
GROUP BY a.packageID, a.unlimitedQty, a.listPrice, a.pluID
Table from query output
UPDATE: 12/12/2022
I changed the line checking the packageID to "AND (b.packageID <=> a.packageID)" as suggested and that cut my query time down to 7.8 seconds from 14 seconds. Thanks for the pointer. I will definitely use that in the future for NULL comparisons.
using "count(*)" took about half a second off. When I take the first nested query out, it drops down to .05 seconds even with the other nested queries in there, so I feel like there is still something causing issues. I tried running it without the other "AND (b.inCart < '$cartTime' OR b.inCart IS NULL)" line and that did take about a second off, but no where what I was hoping for. Is there an operand that includes NULL on a less than comparison? I also tried running it without the inner join in the nested query and that didn't change much at all. Of course removing any of that, throughs the values off and they become incorrect, so I can't run it that way.
Here is my current query setup that still pulls correct values.
SELECT a.*,
f.name,
f.partNumber,
f.showInAdminStore,
f.showInPublicStore,
f.productImage,
r.mastCatID,
(SELECT COUNT(*)
FROM storeInventory b
INNER JOIN events c ON c.eventID = b.eventID
WHERE b.pluID = a.pluID
AND b.listPrice = a.listPrice
AND b.unlimitedQty = a.unlimitedQty
AND (b.packageID <=> a.packageID)
AND b.orderID IS NULL
AND c.isOpen = '1'
AND b.paymentTypeID <= '2'
AND (b.inCart < '$cartTime' OR b.inCart IS NULL) ) AS invCount,
(SELECT COUNT(x.inventoryID)
FROM storeInventory x
WHERE x.packageID = a.inventoryID) AS packageCount
FROM storeInventory a
INNER JOIN storePLUs f ON f.pluID = a.pluID
INNER JOIN storeCategories r ON r.catID = f.catID
INNER JOIN events d ON d.eventID = a.eventID
WHERE a.storeFrontID = '1'
AND a.orderID IS NULL
AND a.paymentTypeID <= '2'
AND d.isOpen = '1'
GROUP BY a.packageID, a.unlimitedQty, a.listPrice, a.pluID
I am not familiar with the term 'Composite indexes' Is that something different than these?
Screenshot of ForeignKeys on Table a
I think
AND (b.packageID = a.packageID
OR (b.packageID IS NULL
AND a.packageID IS NULL)
)
can be simplified to ( https://dev.mysql.com/doc/refman/8.0/en/comparison-operators.html#operator_equal-to ):
AND ( b.packageID <=> a.packageID )
Use COUNT(*) instead of COUNT(x.inventoryID) unless you check for not-NULL.
The subquery to compute packageCount seems strange; you seem to count inventories but join on packages.
The need to reach into another table to check isOpen is part of the performance problem. If eventID is not the PRIMARY KEYforevents, then add INDEX(eventID, isOpen)`.
Some other indexes that may help:
a: INDEX(storeFrontID, orderID, paymentTypeID)
a: INDEX(packageID, unlimitedQty, listPrice, pluID)
b: INDEX(pluID, listPrice, unlimitedQty, orderID)
f: INDEX(pluID, catID)
r: INDEX(catID, mastCatID)
x: INDEX(packageID, inventoryID)
After OP's Update
There is no way to do (x<y OR x IS NULL) except by switching to a UNION. In your case, it is pretty easy to do the conversion. Replace
( SELECT COUNT(*) ... AND ( b.inCart < '$cartTime'
OR b.inCart IS NULL ) ) AS invCount,
with
( SELECT COUNT(*) ... AND b.inCart < '$cartTime' ) +
( SELECT COUNT(*) ... AND b.inCart IS NULL ) AS invCount,
Revised indexes:
storePLUs:
INDEX(pluID, catID)
storeCategories:
INDEX(catID, mastCatID)
events:
INDEX(isOpen, eventID)
storeInventory:
INDEX(pluID, listPrice, unlimitedQty, orderID, packageID)
INDEX(pluID, listPrice, unlimitedQty, orderID, inCart)
INDEX(packageID, inventoryID)
INDEX(storeFrontID, orderID, paymentTypeID)

Why ranking is incorrect when join more than two tables?

I learn how to create the ranking from the website(http://www.sqlines.com/mysql/how-to/get_top_n_each_group)
but I don't know why my ranking becomes 1 when I join more than 2 tables.
For example, code below works great but when I uncomment the lines, it doesn't work anymore.
any idea why?
As well as, select only <= 3 doesn't work neither when I subquery the correct one.
SELECT
c.clientname ,
worklist.clientreference ,
worklist.workordernumber ,
worklist.workordertype ,
worklist.approveddate ,
worklist.taskid ,
-- p.vacant
-- p.autosecure AS "Auto Secure",
#num:=IF(#grp = worklist.clientreference, #num + 1, 1) AS row_number,
#grp:=worklist.clientreference AS dummy
FROM
worklist
LEFT JOIN client c ON worklist.clientid = c.clientid
-- LEFT JOIN property p ON worklist.propertyid = p.propertyid
-- LEFT JOIN investor i ON i.investorid = worklist.investorid
WHERE
worklist.status IN ('crteria')
AND c.clientname IN ('crteria')
AND worklist.workorder = "crteria"
-- AND p.status = 'crteria'
-- AND p.vacant NOT REGEXP "crteria"
ORDER BY worklist.clientreference DESC```

Include another select column based on data - MySQL

How do I include SUM((pm.Quantity * bl.TotalQty)) AS NextBOMItemCount WHERE projectbomlist.ParentPartNum = bl.PartNum?
The data should not be changed, the same data should be retrieved, the however additional column has to be included.
VIEW: `NEWprojectBOMItemCount
select
`pm`.`ProjectCode` AS `ProjectCode`,
`bl`.`PartNum` AS `PartNum`,
sum((`pm`.`Quantity` * `bl`.`TotalQty`)) AS `BOMItemCount`,
`bl`.`mp` AS `mp`,
`p`.`complete` AS `complete`,
`bl`.`RMInd` AS `RMInd`,
`bl`.`M_PartNum` AS `M_PartNum`
from
(
(`projectmachine` `pm` join `projectbomlist` `bl`)
join `projects` `p`
)
where
(
(`pm`.`MachineListID` = `bl`.`MachineListID`)
and (`pm`.`ProjectCode` = `bl`.`ProjectCode`)
and (`pm`.`ProjectCode` = `p`.`ProjectCode`)
and (`p`.`AfterProjectHeirarchyInd` = 'Y')
)
and and pm.ProjectCode = 'AB212323'
group by
`pm`.`ProjectCode` ,
`bl`.`PartNum`
order by
`pm`.`ProjectCode` ,
`bl`.`PartNum`
Or, another option can be, please consider above view used in below query, please suggest changes to the below query as shown above (repeating here)
`sum((pm.Quantity * bl.TotalQty)) AS NextBOMItemCount where projectbomlist.ParentPartNum = bl.PartNum` - in place of `(select-NextBOMItemCount)`?
Please see PBLH.ParentPartNum is the column that I should compare with BL.ProjectCode to get NextBOMItemCount value.
QUERY calling view: NEWprojectBOMItemCount
Select
BL.PartNum PartNumber,
PBLH.ParentPartNum NextBOM,
(select-NextBOMItemCount),
BOMItemCount TotalQty,
PL.Description,
BL.MP as PartType,
PL.Vendor,
PL.QBType
from
NEWprojectBOMItemcount BL,
bomwiz.partslist PL,
bomwiz.projectbomlistheirarchy PBLH
Where
BL.PartNum = PL.PartNum
And BL.PartNum = PBLH.PartNum
And BL.ProjectCode = PBLH.ProjectCode
And BL.projectCode = 'AB212323'
Order By PartNumber
I think that you are looking for conditional aggregation. Your requirement could be expressed as follows:
SUM(
CASE WHEN blh.ParentPartNum = bl.PartNum
THEN pm.Quantity * bl.TotalQty
ELSE 0
END
) AS NextBOMItemCount
Let me pinpoint other issues with your query:
you have unwanted parentheses all around, and I am suspicious about the syntax of the JOINs ; you need to move conditions to the ON clause of the relevant JOIN.
every non-aggregated column must appear in the GROUP BY clause - you have missing columns there
backquotes are usually not needed
Here is an updated version of the query:
SELECT
pm.ProjectCode AS ProjectCode,
bl.PartNum AS PartNum,
SUM(pm.Quantity * bl.TotalQty) AS BOMItemCount,
SUM(
CASE WHEN blh.ParentPartNum = bl.PartNum
THEN pm.Quantity * bl.TotalQty
ELSE 0
END
) AS NextBOMItemCount,
bl.mp AS mp,
p.complete AS complete,
bl.RMInd AS RMInd,
bl.M_PartNum AS M_PartNum
FROM
projectmachine AS pm
INNER JOIN projectbomlist AS bl
ON pm.MachineListID = bl.MachineListID
AND pm.ProjectCode = bl.ProjectCode
INNER JOIN join projects AS p
ON pm.ProjectCode = p.ProjectCode
AND p.AfterProjectHeirarchyInd = 'Y'
INNER JOIN projectbomlistheirarchy blh
ON bl.ProjectCode = blh.ProjectCode
WHERE
pm.ProjectCode = 'AB212323'
GROUP BY
pm.ProjectCode,
bl.PartNum,
bl.mp,
p.complete,
bl.RMInd,
bl.M_PartNum
ORDER BY
pm.ProjectCode,
bl.PartNum

User variable inside multinested query

I have to optimize a rather long complex query with multiple queries inside. There is a subquery that is repeated many times as a 3rd gen SELECT:
(SELECT mc.cotmoneda2 FROM monedacotizaciones mc WHERE date(mc.`FechaHora`)<= date( p.Fechacreacion) AND mc.tipo=0 order by mc.`FechaHora`desc limit 1))
heres a reduced version of the complete query:
SELECT p.ID
,p.Tipo
, p.Numero
, p.Nombre
, e.Empresa
,(CASE p.NroMoneda WHEN 1 Then (SELECT sum(fi.ImportePrecio1/(SELECT mc.cotmoneda2 FROM monedacotizaciones mc where date(mc.`FechaHora`)<= date( p.Fechacreacion) and mc.tipo=0 order by mc.`FechaHora`desc limit 1))
FROM facturasitems fi inner join facturas f on (fi.idFactura= f.Recid)
where (f.estado =0 or f.estado =1 or f.estado =3 ) and f.idpedido = p.`recid`)
ELSE (SELECT sum(fi.ImportePrecio2)
FROM facturasitems fi inner join facturas f on (fi.idFactura= f.Recid)
where (f.estado =0 or f.estado =1 or f.estado =3 ) and f.idpedido = p.`recid`) end) as FacturadoUSA
,(SELECT sum(Ci.ImportePrecio1/(SELECT mc.cotmoneda2 FROM monedacotizaciones mc where date(mc.`FechaHora`)<= date( p.Fechacreacion) and mc.tipo=0 order by mc.`FechaHora`desc limit 1))
FROM Comprasitems ci inner join Compras C on (ci.idCompra= C.Recid)
WHERE (c.estado =0 OR c.estado =1 ) AND C.idpedido = p.`recid`) as CostoRealUSA
,(SELECT sum(dgv.importe/(SELECT mc.cotmoneda2 FROM monedacotizaciones mc where date(mc.`FechaHora`)<= date( p.Fechacreacion) and mc.tipo=0 order by mc.`FechaHora`desc limit 1))
FROM detalles_gastosvarios23 dgv
where dgv.idref = p.Recid) as GastosReales
FROM Pedidos p INNER JOIN `contactos` ON (p.`idref`=`contactos`.`idcontacto`)
INNER JOIN `empresas` e ON (contactos.`idempresa`= e.`idempresa`)
INNER JOIN `talonarios` ON (`talonarios`.`recid`= p.`idtalonario`)
WHERE (p.`fechacreacion` BETWEEN '<%fechainicio%>' AND '<%fechafin%>')
AND talonarios.NroSucursal =1
GROUP BY p.Numero
What I want to do is create a user variable containing the subquery, so that it will be reevaluated for each record, but only once per record. The way it is now it works, but takes over 3 minutes!. I have tried many different options many times but looks like Im not getting the syntax right. The thing is the user variable subquery contains a reference to p.
Thanks, sorry for my poor english.
I would say one way to speed up the query is to make a temporary table. There are probably other ways, but this is one way. Part of what is making it slow is all of the conditions like, "f.estado = 0 OR f.estado = 1 OR f.estado = 3" etc... repeating over and over. If your temporary table only includes a list of records with those conditions met, it will go faster.
SELECT <Desired Columns>
INTO #<TableName>
FROM facturasitems fi inner join facturas f on (fi.idFactura= f.Recid)
WHERE (f.estado =0 or f.estado =1 or f.estado =3 ) and f.idpedido = p.`recid`)
Then one of your queries may look like
SELECT sum(fi.ImportePrecio2)
FROM #<TableName>
Your query will no longer have to look at all of those conditions each time.

MySQL: how can I count number of articles by a join table

I have a table with news items, I have another table with media_types, I want to make one simple query that reads the media_types table and count for each record how many news_items exist.
The result will be turned into a json response that I will use for a chart, this is my SQLstatement
SELECT
gc.country AS "country"
, COUNT(*) AS "online"
FROM default_news_items AS ni
JOIN default_news_item_country AS nic ON (nic.id = ni.country)
JOIN default_country AS c ON (nic.country = c.id)
JOIN default_geo_country AS gc ON (gc.id = c.geo_country)
LEFT JOIN default_medias ON (m.id = ni.media)
WHERE TRUE
AND ni.deleted = 0
AND ni.date_item > '2013-10-23'
AND ni.date_item < '2013-10-29'
AND gc.country <> 'unknown'
AND m.media_type = '14'
GROUP BY gc.country
ORDER BY `online` desc LIMIT 10
This is the json respond I create from the mysql respond
[
{"country":"New Zealand","online":"7"},
{"country":"Switzerland","online":"1"}
]
How do I add print and social data to my output like this
I would like the json respond look like this
[
{"country":"New Zealand","online":"7", "social":"17", "print":"2"},
{"country":"Switzerland","online":"1", "social":"7", "print":"1"}
]
Can I use the count (*) in the select statement to do something like this
COUNT( * ) as online, COUNT( * ) as social, COUNT( * ) as print
Is it possible or do I have to do several SQL statement to get the data I'm looking for?
This is the general structure:
SELECT default_geo_country.country as country,
SUM(default_medias.media_type = 14) as online,
SUM(default_medias.media_type = XX) as social,
SUM(default_medias.media_type = YY) as print
FROM ...
JOIN ...
WHERE ...
GROUP BY country
I think you want conditional aggregation. Your question, however, only shows the online media type.
Your query would be more readable by using table aliases and removing the back quotes. Also, if media_type is an integer, then you should not enclose the constant for comparison in single quotes -- I, for one, find it misleading to compare a string constant to an integer column.
I suspect this is the way you want to go. Where the . . . is, you want to fill in with the counts for the other media types.
SELECT default_geo_country.country as country,
sum(media_type = '14') as online,
sum(default_medias.media_type = XX) as social,
sum(default_medias.media_type = YY) as print
. . .
FROM default_news_items ni JOIN
default_news_item_country nic
ON nic.id = ni.country JOIN
default_country dc
ON nic.country = dc.id JOIN
default_geo_country gc
ON gc.id = dc.geo_country LEFT JOIN
default_medias dm
ON dm.id = dni.media
WHERE ni.deleted = '0'
AND ni.date_item > '2013-10-23'
AND ni.date_item < '2013-10-29'
AND gc.country <> 'unknown'
GROUP BY gc.country
ORDER BY online desc
LIMIT 10