MySQL search query with multiple joins and subqueries running slow - mysql

I have the following query which is actually within a stored procedure, but I removed it as there is too much going on inside the stored procedure. Basically this is the end result which takes ages (more than a minute) to run and I know the reason why - as you will also see from looking at the result of the explain - but I just cannot get it sorted.
Just to quickly explain what this query is doing. It is fetching all products from companies that are "connected" to the company where li.nToObjectID = 37. The result also returns some other information about the other companies like its name, company id, etc.
SELECT DISTINCT
SQL_CALC_FOUND_ROWS
p.id,
p.sTitle,
p.sTeaser,
p.TimeStamp,
p.ExpiryDate,
p.InStoreDate,
p.sCreator,
p.sProductCode,
p.nRetailPrice,
p.nCostPrice,
p.bPublic,
c.id as nCompanyID,
c.sName as sCompany,
m.id as nMID,
m.sFileName as sHighResFileName,
m.nSize,
(
Select sName
FROM tblBrand
WHERE id = p.nBrandID
) as sBrand,
(
Select t.sFileName
FROM tblThumbnail t
where t.nMediaID = m.id AND
t.sType = "thumbnail"
) as sFileName,
(
Select t.nWidth
FROM tblThumbnail t
where t.nMediaID = m.id AND
t.sType = "thumbnail"
) as nWidth,
(
Select t.nHeight
FROM tblThumbnail t
where t.nMediaID = m.id AND
t.sType = "thumbnail"
) as nHeight,
IF (
(
SELECT COUNT(id) FROM tblLink
WHERE
sType = "company"
AND sStatus = "active"
AND nToObjectID = 37
AND nFromObjectID = u.nCompanyID
),
1,
0
) AS bLinked
FROM tblProduct p
INNER JOIN tblMedia m
ON (
m.nTypeID = p.id AND
m.sType = "product"
)
INNER JOIN tblUser u
ON u.id = p.nUserID
INNER JOIN tblCompany c
ON u.nCompanyID = c.id
LEFT JOIN tblLink li
ON (
li.sType = "company"
AND li.sStatus = "active"
AND li.nToObjectID = 37
AND li.nFromObjectID = u.nCompanyID
)
WHERE c.bActive = 1
AND p.bArchive = 0
AND p.bActive = 1
AND NOW() <= p.ExpiryDate
AND (
li.id IS NOT NULL
OR (
li.id IS NULL
AND p.bPublic = 1
)
)
ORDER BY p.TimeStamp DESC
LIMIT 0, 52
Click here to see the output for EXPLAIN. Sorry, just couldn't get the formatting correct.
http://i60.tinypic.com/2hdqjgj.png
And lastly the number of rows for all the tables in this query:
tblProducts
Count: 5392
tblBrand
Count: 194
tblCompany
Count: 368
tblUser
Count: 416
tblMedia
Count: 5724
tblLink
Count: 24800
tblThumbnail
Count: 22207
So I have 2 questions:
1. Is there another way of writing this query which might potentially speed it up?
2. What index combination do I need for tblProducts so that not all the rows are searched through?
UPDATE 1
This is the new query after removing the subqueries and making use of left joins instead:
SELECT DISTINCT DISTINCT
SQL_CALC_FOUND_ROWS
p.id,
p.sTitle,
p.sTeaser,
p.TimeStamp,
p.ExpiryDate,
p.InStoreDate,
p.sCreator,
p.sProductCode,
p.nRetailPrice,
p.nCostPrice,
p.bPublic,
c.id as nCompanyID,
c.sName as sCompany,
m.id as nMID,
m.sFileName as sHighResFileName,
m.nSize,
brand.sName as sBrand,
thumb.sFilename,
thumb.nWidth,
thumb.nHeight,
IF (
(
SELECT COUNT(id) FROM tblLink
WHERE
sType = "company"
AND sStatus = "active"
AND nToObjectID = 37
AND nFromObjectID = u.nCompanyID
),
1,
0
) AS bLinked
FROM tblProduct p
INNER JOIN tblMedia m
ON (
m.nTypeID = p.id AND
m.sType = "product"
)
INNER JOIN tblUser u
ON u.id = p.nUserID
INNER JOIN tblCompany c
ON u.nCompanyID = c.id
LEFT JOIN tblLink li
ON (
li.sType = "company"
AND li.sStatus = "active"
AND li.nToObjectID = 37
AND li.nFromObjectID = u.nCompanyID
)
LEFT JOIN tblBrand AS brand
ON brand.id = p.nBrandID
LEFT JOIN tblThumbnail AS thumb
ON (
thumb.nMediaID = m.id
AND thumb.sType = 'thumbnail'
)
WHERE c.bActive = 1
AND p.bArchive = 0
AND p.bActive = 1
AND NOW() <= p.ExpiryDate
AND (
li.id IS NOT NULL
OR (
li.id IS NULL
AND p.bPublic = 1
)
)
ORDER BY p.TimeStamp DESC
LIMIT 0, 52;
UPDATE 2
ALTER TABLE tblThumbnail ADD INDEX (nMediaID,sType) USING BTREE;
ALTER TABLE tblMedia ADD INDEX (nTypeID,sType) USING BTREE;
ALTER TABLE tblProduct ADD INDEX (bArchive,bActive,ExpiryDate,bPublic,TimeStamp) USING BTREE;
After doing the above changes the explain showed that it is now only searching through 1464 rows on tblProduct instead of 5392.

That's a big query with a lot going on. It's going to take a few steps of work to optimize it. I will take the liberty of just presenting a couple of steps.
First step. Can you get rid of SQL_CALC_FOUND_ROWS and still have your program work correctly? If so, do that. When you specify SQL_CALC_FOUND_ROWS it sometimes means the server has to delay sending you the first row of your resultset until the last row is available.
Second step. Refactor the dependent subqueries to be JOINs instead.
Here's how you might approach that. Part of your query looks like this...
SELECT DISTINCT SQL_CALC_FOUND_ROWS
p.id,
...
c.id as nCompanyID,
...
m.id as nMID,
...
( /* dependent subquery to be removed */
Select sName
FROM tblBrand
WHERE id = p.nBrandID
) as sBrand,
( /* dependent subquery to be removed */
Select t.sFileName
FROM tblThumbnail t
where t.nMediaID = m.id AND
t.sType = "thumbnail"
) as sFileName,
( /* dependent subquery to be removed */
Select t.nWidth
FROM tblThumbnail t
where t.nMediaID = m.id AND
t.sType = "thumbnail"
) as nWidth,
( /* dependent subquery to be removed */
Select t.nHeight
FROM tblThumbnail t
where t.nMediaID = m.id AND
t.sType = "thumbnail"
) as nHeight,
...
Try this instead. Notice how the brand and thumbnail dependent subqueries disappear. You had three dependent subqueries for the thumbnail; they can disappear into a single JOIN.
SELECT DISTINCT SQL_CALC_FOUND_ROWS
p.id,
...
brand.sName,
thumb.sFilename,
thumb.nWidth,
thumb.nHeight,
...
FROM tblProduct p
INNER JOIN tblMedia AS m ON (m.nTypeID = p.id AND m.sType = 'product')
... (other table joins) ...
LEFT JOIN tblBrand AS brand ON p.id = p.nBrandID
LEFT JOIN tblMedia AS thumb ON (t.nMediaID = m.id AND thumb.sType = 'thumbnail')
I used LEFT JOIN rather than INNER JOIN so MySQL will present NULL values if the joined rows are missing.
Edit
You're using a join pattern that looks like this:
JOIN sometable AS s ON (s.someID = m.id AND s.sType = 'string')
You seem to do this for a few tables. You probably can speed up the JOIN operations by creating compound indexes in those tables. For example, try adding the following index to tblThumbnail: (sType, nMediaID). You can do that with this DDL statement.
ALTER TABLE tblThumbnail ADD INDEX (sType, nMediaID) USING BTREE
You can do similar things to other tables with the same join pattern.

Related

MySQL select with group and one to many relations condition

For example have such structure:
CREATE TABLE clicks
(`date` varchar(50), `sum` int, `id` int)
;
CREATE TABLE marks
(`click_id` int, `name` varchar(50), `value` varchar(50))
;
where click can have many marks
So example data:
INSERT INTO clicks
(`sum`, `id`, `date`)
VALUES
(100, 1, '2017-01-01'),
(200, 2, '2017-01-01')
;
INSERT INTO marks
(`click_id`, `name`, `value`)
VALUES
(1, 'utm_source', 'test_source1'),
(1, 'utm_medium', 'test_medium1'),
(1, 'utm_term', 'test_term1'),
(2, 'utm_source', 'test_source1'),
(2, 'utm_medium', 'test_medium1')
;
I need to get agregated values of click grouped by date which contains all of selected values.
I make request:
select
c.date,
sum(c.sum)
from clicks as c
left join marks as m ON m.click_id = c.id
where
(m.name = 'utm_source' AND m.value='test_source1') OR
(m.name = 'utm_medium' AND m.value='test_medium1') OR
(m.name = 'utm_term' AND m.value='test_term1')
group by date
and get 2017-01-01 = 700, but I want to get 100 which means that only click 1 has all of marks.
Or if condition will be
(m.name = 'utm_source' AND m.value='test_source1') OR
(m.name = 'utm_medium' AND m.value='test_medium1')
I need to get 300 instead of 600
I found answer in getting distinct click_id by first query and then sum and group by date with condition whereIn, but on real database which is very large and has id as uuid this request executes extrimely slow. Any advices how to get it work propely?
You can achieve it using below queries:
When there are the three conditions then you have to pass the HAVING count(*) >= 3
SELECT cc.DATE
,sum(cc.sum)
FROM clicks AS cc
INNER JOIN (
SELECT id
FROM clicks AS c
LEFT JOIN marks AS m ON m.click_id = c.id
WHERE (
m.NAME = 'utm_source'
AND m.value = 'test_source1'
)
OR (
m.NAME = 'utm_medium'
AND m.value = 'test_medium1'
)
OR (
m.NAME = 'utm_term'
AND m.value = 'test_term1'
)
GROUP BY id
HAVING count(*) >= 3
) AS t ON cc.id = t.id
GROUP BY cc.DATE
When there are the three conditions then you have to pass the HAVING count(*) >= 2
SELECT cc.DATE
,sum(cc.sum)
FROM clicks AS cc
INNER JOIN (
SELECT id
FROM clicks AS c
LEFT JOIN marks AS m ON m.click_id = c.id
WHERE (
m.NAME = 'utm_source'
AND m.value = 'test_source1'
)
OR (
m.NAME = 'utm_medium'
AND m.value = 'test_medium1'
)
GROUP BY id
HAVING count(*) >= 2
) AS t ON cc.id = t.id
GROUP BY cc.DATE
Demo: http://sqlfiddle.com/#!9/fe571a/35
Hope this works for you...
You're getting 700 because the join generates multiple rows for the different IDs. There are 3 rows in the mark table with ID=1 and sum=100 and there are two rows with ID=2 and sum=200. On doing the join where shall have 3 rows with sum=100 and 2 rows with sum=200, so adding these sum gives 700. To fix this you have to aggregate on the click_id too as illustrated below:
select
c.date,
sum(c.sum)
from clicks as c
inner join (select * from marks where (name = 'utm_source' AND
value='test_source1') OR (name = 'utm_medium' AND value='test_medium1')
OR (name = 'utm_term' AND value='test_term1')
group by click_id) as m
ON m.click_id = c.id
group by c.date;
DEMO SQL FIDDLE
I found the right way myself, which works on large amounts of data
The main goal is to make request generate one table with subqueries(conditions) which do not depend on amount of data in results, so the best way is:
select
c.date,
sum(c.sum)
from clicks as c
join marks as m1 ON m1.click_id = c.id
join marks as m2 ON m2.click_id = c.id
join marks as m3 ON m3.click_id = c.id
where
(m1.name = 'utm_source' AND m1.value='test_source1') AND
(m2.name = 'utm_medium' AND m2.value='test_medium1') AND
(m3.name = 'utm_term' AND m3.value='test_term1')
group by date
So we need to make as many joins as many conditions we have

How to Make This SQL Query More Efficient?

I'm not sure how to make the following SQL query more efficient. Right now, the query is taking 8 - 12 seconds on a pretty fast server, but that's not close to fast enough for a Website when users are trying to load a page with this code on it. It's looking through tables with many rows, for instance the "Post" table has 717,873 rows. Basically, the query lists all Posts related to what the user is following (newest to oldest).
Is there a way to make it faster by only getting the last 20 results total based on PostTimeOrder?
Any help would be much appreciated or insight on anything that can be done to improve this situation. Thank you.
Here's the full SQL query (lots of nesting):
SELECT DISTINCT p.Id, UNIX_TIMESTAMP(p.PostCreationTime) AS PostCreationTime, p.Content AS Content, p.Bu AS Bu, p.Se AS Se, UNIX_TIMESTAMP(p.PostCreationTime) AS PostTimeOrder
FROM Post p
WHERE (p.Id IN (SELECT pc.PostId
FROM PostCreator pc
WHERE (pc.UserId IN (SELECT uf.FollowedId
FROM UserFollowing uf
WHERE uf.FollowingId = '100')
OR pc.UserId = '100')
))
OR (p.Id IN (SELECT pum.PostId
FROM PostUserMentions pum
WHERE (pum.UserId IN (SELECT uf.FollowedId
FROM UserFollowing uf
WHERE uf.FollowingId = '100')
OR pum.UserId = '100')
))
OR (p.Id IN (SELECT ssp.PostId
FROM SStreamPost ssp
WHERE (ssp.SStreamId IN (SELECT ssf.SStreamId
FROM SStreamFollowing ssf
WHERE ssf.UserId = '100'))
))
OR (p.Id IN (SELECT psm.PostId
FROM PostSMentions psm
WHERE (psm.StockId IN (SELECT sf.StockId
FROM StockFollowing sf
WHERE sf.UserId = '100' ))
))
UNION ALL
SELECT DISTINCT p.Id AS Id, UNIX_TIMESTAMP(p.PostCreationTime) AS PostCreationTime, p.Content AS Content, p.Bu AS Bu, p.Se AS Se, UNIX_TIMESTAMP(upe.PostEchoTime) AS PostTimeOrder
FROM Post p
INNER JOIN UserPostE upe
on p.Id = upe.PostId
INNER JOIN UserFollowing uf
on (upe.UserId = uf.FollowedId AND (uf.FollowingId = '100' OR upe.UserId = '100'))
ORDER BY PostTimeOrder DESC;
Changing your p.ID in (...) predicates to existence predicates with correlated subqueries may help. Also since both halves of your union all query are pulling from the Post table and possibly returning nearly identical records you might be able to combine the two into one query by left outer joining to UserPostE and adding upe.PostID is not null as an OR condition in the WHERE clause. UserFollowing will still inner join to UPE. If you want the same Post record twice once with upe.PostEchoTime and once with p.PostCreationTime as the PostTimeOrder you'll need keep the UNION ALL
SELECT
DISTINCT -- <<=- May not be needed
p.Id
, UNIX_TIMESTAMP(p.PostCreationTime) AS PostCreationTime
, p.Content AS Content
, p.Bu AS Bu
, p.Se AS Se
, UNIX_TIMESTAMP(coalesce( upe.PostEchoTime
, p.PostCreationTime)) AS PostTimeOrder
FROM Post p
LEFT JOIN UserPostE upe
INNER JOIN UserFollowing uf
on (upe.UserId = uf.FollowedId AND
(uf.FollowingId = '100' OR
upe.UserId = '100'))
on p.Id = upe.PostId
WHERE upe.PostID is not null
or exists (SELECT 1
FROM PostCreator pc
WHERE pc.PostId = p.ID
and pc.UserId = '100'
or exists (SELECT 1
FROM UserFollowing uf
WHERE uf.FollowedId = pc.UserID
and uf.FollowingId = '100')
)
OR exists (SELECT 1
FROM PostUserMentions pum
WHERE pum.PostId = p.ID
and pum.UserId = '100'
or exists (SELECT 1
FROM UserFollowing uf
WHERE uf.FollowedId = pum.UserId
and uf.FollowingId = '100')
)
OR exists (SELECT 1
FROM SStreamPost ssp
WHERE ssp.PostId = p.ID
and exists (SELECT 1
FROM SStreamFollowing ssf
WHERE ssf.SStreamId = ssp.SStreamId
and ssf.UserId = '100')
)
OR exists (SELECT 1
FROM PostSMentions psm
WHERE psm.PostId = p.ID
and exists (SELECT
FROM StockFollowing sf
WHERE sf.StockId = psm.StockId
and sf.UserId = '100' )
)
ORDER BY PostTimeOrder DESC
The from section could alternatively be rewritten to also use an existence clause with a correlated sub query:
FROM Post p
LEFT JOIN UserPostE upe
on p.Id = upe.PostId
and ( upe.UserId = '100'
or exists (select 1
from UserFollowing uf
where uf.FollwedID = upe.UserID
and uf.FollowingId = '100'))
Turn IN ( SELECT ... ) into a JOIN .. ON ... (see below)
Turn OR into UNION (see below)
Some the tables are many:many mappings? Such as SStreamFollowing? Follow the tips in http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table
Example of IN:
SELECT ssp.PostId
FROM SStreamPost ssp
WHERE (ssp.SStreamId IN (
SELECT ssf.SStreamId
FROM SStreamFollowing ssf
WHERE ssf.UserId = '100' ))
-->
SELECT ssp.PostId
FROM SStreamPost ssp
JOIN SStreamFollowing ssf ON ssp.SStreamId = ssf.SStreamId
WHERE ssf.UserId = '100'
The big WHERE with all the INs becomes something like
JOIN ( ( SELECT pc.PostId AS id ... )
UNION ( SELECT pum.PostId ... )
UNION ( SELECT ssp.PostId ... )
UNION ( SELECT psm.PostId ... ) )
Get what you can done of that those suggestions, then come back for more advice if you still need it. And bring SHOW CREATE TABLE with you.

Add complex mysql query to seach model in Yii2

I want to use the default search and pagination in yii2. But the query is complex and I don't know how can I add it to the search model! This is the query:
SELECT p.*,po_sum,rpo_sum,so_sum
FROMproduct p
LEFT JOIN (
SELECT id,product_id , IF(sum(quantity) IS NULL, 0, sum(quantity)) AS po_sum
FROM purchase_order_products inner join purchase_order on purchase_order.id = purchase_order_products.purchase_order_id
Where purchase_order.status = 'Approved'
GROUP BY product_id )
subcount ON p.id = subcount.product_id
LEFT JOIN (
SELECT id,product_id , sum(quantity) AS rpo_sum
FROM return_purchase_order_products inner join return_purchase_order on return_purchase_order.id = return_purchase_order_products.purchase_order_id Where return_purchase_order.status = 'Approved'
GROUP BY product_id )
subcount2 ON p.id = subcount2.product_id
LEFT JOIN (
SELECT product_id , sum(quantity_ordered) AS so_sum
FROM sales_order_item inner join sales_order on sales_order.id = sales_order_item.sales_order_id Where sales_order.order_status = 'complete'
GROUP BY product_id )
subcount3 ON p.id = subcount3.product_id
order by po_sum DESC,rpo_sum DESC
Any help?
If you use MySql >= 5.7.7 the easiest way is to create a view with that query and use it in the tableName method.
You need that version of MySql because you cant use subquery in from clause during view creation in previous versions.

Slow MySQL query with subquery from table

I am trying to bring back a string based on an IF statement but it is extremely slow.
It has something to do with the first subquery but I am unsure of how to rearrange this as to bring back the same results but faster.
Here is my SQL:
SELECT IF
(
(
SELECT COUNT(*)
FROM
(
SELECT DISTINCT enquiryId, type
FROM parts_enquiries, parts_service_types AS pst
WHERE parts_enquiries.serviceTypeId = pst.id
) AS parts
WHERE parts.enquiryId = enquiries.id
) > 1, 'Mixed',
(
SELECT DISTINCT type
FROM parts_enquiries, parts_service_types AS pst
WHERE parts_enquiries.serviceTypeId = pst.id AND enquiryId = enquiries.id
)
) AS partTypes
FROM enquiries,
entities
WHERE enquiries.entityId = entities.id
How can I make it faster?
I have modified my original query below, but I am getting the error that subquery returns more than one row:
SELECT
(SELECT
CASE WHEN COUNT(DISTINCT type) > 1 THEN 'Mixed' ELSE `type` END AS type
FROM parts_enquiries
INNER JOIN parts_service_types AS pst ON parts_enquiries.serviceTypeId = pst.id
INNER JOIN enquiries ON parts_enquiries.enquiryId = enquiries.id
INNER JOIN entities ON enquiries.entityId = entities.id
GROUP BY enquiryId) AS partTypes
FROM enquiries,
entities
WHERE enquiries.entityId = entities.id
Please have a look if this query yields the same results:
SELECT
enquiryId,
CASE WHEN COUNT(DISTINCT type) > 1 THEN 'Mixed' ELSE `type` END AS type
FROM parts_enquiries
INNER JOIN parts_service_types AS pst ON parts_enquiries.serviceTypeId = pst.id
INNER JOIN enquiries ON parts_enquiries.enquiryId = enquiries.id
INNER JOIN entities ON enquiries.entityId = entities.id
GROUP BY enquiryId
But N.B.'s comment is still valid. To see if and index is used and other information we need to see the EXPLAIN and the table definitions.
This should get you what you want.
I would first pre-query your parts enquiries and parts service types looking for both the count and MINIMUM of the part 'type', grouped by the enquiry ID.
then, run your IF() against that result. If the distinct count is > 0, then 'Mixed'. If only one, since I did the MIN(), it would only have the description of that one value that you desire anyhow.
SELECT
E.ID
IF ( PreQuery.DistTypes > 1, 'Mixed', PreQuery.FirstType ) as PartType
from
Enquiries E
JOIN ( SELECT
PE.EnquiryID,
COUNT( DISTINCT PE.ServiceTypeID ) as DistTypes,
MIN( PST.Type ) as FirstType
from
Parts_Enquiries PE
JOIN Parts_Service_Types PST
ON PE.ServiceTypeID = PST.ID
group by
PE.EnquiryID ) as PreQuery
ON E.ID = PreQuery.EnquiryID

MySQL Query Optimisation

Looking for some help with optimising the query below. Seems to be two bottlenecks at the moment which cause it to take around 90s to complete the query. There's only 5000 products so it's not exactly a massive database/table. The bottlenecks are SQL_CALC_FOUND_ROWS and the ORDER BY statement - If I remove both of these it takes around a second to run the query.
I've tried removing SQL_CALC_FOUND_ROWS and running a count() statement, but that takes a long time as well..
Is the best thing going to be to use INNER JOIN's (which I'm not too familiar with) as per the following Stackoverflow post? Slow query when using ORDER BY
SELECT SQL_CALC_FOUND_ROWS *
FROM tbl_products
LEFT JOIN tbl_link_products_categories ON lpc_p_id = p_id
LEFT JOIN tbl_link_products_brands ON lpb_p_id = p_id
LEFT JOIN tbl_link_products_authors ON lpa_p_id = p_id
LEFT JOIN tbl_link_products_narrators ON lpn_p_id = p_id
LEFT JOIN tbl_linkfiles ON lf_id = p_id
AND (
lf_table = 'tbl_products'
OR lf_table IS NULL
)
LEFT JOIN tbl_files ON lf_file_id = file_id
AND (
file_nameid = 'p_main_image_'
OR file_nameid IS NULL
)
WHERE p_live = 'y'
ORDER BY p_title_clean ASC, p_title ASC
LIMIT 0 , 10
You could try reducing the size of the joins by using a derived table to retrieve the filtered and ordered products before joining. This assumes that p_live, p_title_clean and p_title are fields in your tbl_products table -
SELECT *
FROM (SELECT *
FROM tbl_products
WHERE p_live = 'y'
ORDER BY p_title_clean ASC, p_title ASC
LIMIT 0 , 10
) AS tbl_products
LEFT JOIN tbl_link_products_categories
ON lpc_p_id = p_id
LEFT JOIN tbl_link_products_brands
ON lpb_p_id = p_id
LEFT JOIN tbl_link_products_authors
ON lpa_p_id = p_id
LEFT JOIN tbl_link_products_narrators
ON lpn_p_id = p_id
LEFT JOIN tbl_linkfiles
ON lf_id = p_id
AND (
lf_table = 'tbl_products'
OR lf_table IS NULL
)
LEFT JOIN tbl_files
ON lf_file_id = file_id
AND (
file_nameid = 'p_main_image_'
OR file_nameid IS NULL
)
This is a "stab in the dark" as there is not enough detail in your question.