How to Make This SQL Query More Efficient? - mysql

I'm not sure how to make the following SQL query more efficient. Right now, the query is taking 8 - 12 seconds on a pretty fast server, but that's not close to fast enough for a Website when users are trying to load a page with this code on it. It's looking through tables with many rows, for instance the "Post" table has 717,873 rows. Basically, the query lists all Posts related to what the user is following (newest to oldest).
Is there a way to make it faster by only getting the last 20 results total based on PostTimeOrder?
Any help would be much appreciated or insight on anything that can be done to improve this situation. Thank you.
Here's the full SQL query (lots of nesting):
SELECT DISTINCT p.Id, UNIX_TIMESTAMP(p.PostCreationTime) AS PostCreationTime, p.Content AS Content, p.Bu AS Bu, p.Se AS Se, UNIX_TIMESTAMP(p.PostCreationTime) AS PostTimeOrder
FROM Post p
WHERE (p.Id IN (SELECT pc.PostId
FROM PostCreator pc
WHERE (pc.UserId IN (SELECT uf.FollowedId
FROM UserFollowing uf
WHERE uf.FollowingId = '100')
OR pc.UserId = '100')
))
OR (p.Id IN (SELECT pum.PostId
FROM PostUserMentions pum
WHERE (pum.UserId IN (SELECT uf.FollowedId
FROM UserFollowing uf
WHERE uf.FollowingId = '100')
OR pum.UserId = '100')
))
OR (p.Id IN (SELECT ssp.PostId
FROM SStreamPost ssp
WHERE (ssp.SStreamId IN (SELECT ssf.SStreamId
FROM SStreamFollowing ssf
WHERE ssf.UserId = '100'))
))
OR (p.Id IN (SELECT psm.PostId
FROM PostSMentions psm
WHERE (psm.StockId IN (SELECT sf.StockId
FROM StockFollowing sf
WHERE sf.UserId = '100' ))
))
UNION ALL
SELECT DISTINCT p.Id AS Id, UNIX_TIMESTAMP(p.PostCreationTime) AS PostCreationTime, p.Content AS Content, p.Bu AS Bu, p.Se AS Se, UNIX_TIMESTAMP(upe.PostEchoTime) AS PostTimeOrder
FROM Post p
INNER JOIN UserPostE upe
on p.Id = upe.PostId
INNER JOIN UserFollowing uf
on (upe.UserId = uf.FollowedId AND (uf.FollowingId = '100' OR upe.UserId = '100'))
ORDER BY PostTimeOrder DESC;

Changing your p.ID in (...) predicates to existence predicates with correlated subqueries may help. Also since both halves of your union all query are pulling from the Post table and possibly returning nearly identical records you might be able to combine the two into one query by left outer joining to UserPostE and adding upe.PostID is not null as an OR condition in the WHERE clause. UserFollowing will still inner join to UPE. If you want the same Post record twice once with upe.PostEchoTime and once with p.PostCreationTime as the PostTimeOrder you'll need keep the UNION ALL
SELECT
DISTINCT -- <<=- May not be needed
p.Id
, UNIX_TIMESTAMP(p.PostCreationTime) AS PostCreationTime
, p.Content AS Content
, p.Bu AS Bu
, p.Se AS Se
, UNIX_TIMESTAMP(coalesce( upe.PostEchoTime
, p.PostCreationTime)) AS PostTimeOrder
FROM Post p
LEFT JOIN UserPostE upe
INNER JOIN UserFollowing uf
on (upe.UserId = uf.FollowedId AND
(uf.FollowingId = '100' OR
upe.UserId = '100'))
on p.Id = upe.PostId
WHERE upe.PostID is not null
or exists (SELECT 1
FROM PostCreator pc
WHERE pc.PostId = p.ID
and pc.UserId = '100'
or exists (SELECT 1
FROM UserFollowing uf
WHERE uf.FollowedId = pc.UserID
and uf.FollowingId = '100')
)
OR exists (SELECT 1
FROM PostUserMentions pum
WHERE pum.PostId = p.ID
and pum.UserId = '100'
or exists (SELECT 1
FROM UserFollowing uf
WHERE uf.FollowedId = pum.UserId
and uf.FollowingId = '100')
)
OR exists (SELECT 1
FROM SStreamPost ssp
WHERE ssp.PostId = p.ID
and exists (SELECT 1
FROM SStreamFollowing ssf
WHERE ssf.SStreamId = ssp.SStreamId
and ssf.UserId = '100')
)
OR exists (SELECT 1
FROM PostSMentions psm
WHERE psm.PostId = p.ID
and exists (SELECT
FROM StockFollowing sf
WHERE sf.StockId = psm.StockId
and sf.UserId = '100' )
)
ORDER BY PostTimeOrder DESC
The from section could alternatively be rewritten to also use an existence clause with a correlated sub query:
FROM Post p
LEFT JOIN UserPostE upe
on p.Id = upe.PostId
and ( upe.UserId = '100'
or exists (select 1
from UserFollowing uf
where uf.FollwedID = upe.UserID
and uf.FollowingId = '100'))

Turn IN ( SELECT ... ) into a JOIN .. ON ... (see below)
Turn OR into UNION (see below)
Some the tables are many:many mappings? Such as SStreamFollowing? Follow the tips in http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table
Example of IN:
SELECT ssp.PostId
FROM SStreamPost ssp
WHERE (ssp.SStreamId IN (
SELECT ssf.SStreamId
FROM SStreamFollowing ssf
WHERE ssf.UserId = '100' ))
-->
SELECT ssp.PostId
FROM SStreamPost ssp
JOIN SStreamFollowing ssf ON ssp.SStreamId = ssf.SStreamId
WHERE ssf.UserId = '100'
The big WHERE with all the INs becomes something like
JOIN ( ( SELECT pc.PostId AS id ... )
UNION ( SELECT pum.PostId ... )
UNION ( SELECT ssp.PostId ... )
UNION ( SELECT psm.PostId ... ) )
Get what you can done of that those suggestions, then come back for more advice if you still need it. And bring SHOW CREATE TABLE with you.

Related

how optimize prestashop category get product for random

this is prestashop 1.7 version category get product query. if use random, it is very slow, how optimize it?
SELECT
cp.id_category,
p.*,
product_shop.*,
stock.out_of_stock,
IFNULL( stock.quantity, 0 ) AS quantity,
IFNULL( product_attribute_shop.id_product_attribute, 0 ) AS id_product_attribute,
product_attribute_shop.minimal_quantity AS product_attribute_minimal_quantity,
pl.`description`,
pl.`description_short`,
pl.`available_now`,
pl.`available_later`,
pl.`link_rewrite`,
pl.`meta_description`,
pl.`meta_keywords`,
pl.`meta_title`,
pl.`name`,
image_shop.`id_image` id_image,
il.`legend` AS legend,
m.`name` AS manufacturer_name,
cl.`name` AS category_default,
DATEDIFF(
product_shop.`date_add`,
DATE_SUB( "2019-11-30 00:00:00", INTERVAL 7 DAY )) > 0 AS new,
product_shop.price AS orderprice
FROM
`ps_category_product` cp
LEFT JOIN `ps_product` p ON p.`id_product` = cp.`id_product`
INNER JOIN ps_product_shop product_shop ON ( product_shop.id_product = p.id_product AND product_shop.id_shop = 1 )
LEFT JOIN `ps_product_attribute_shop` product_attribute_shop ON ( p.`id_product` = product_attribute_shop.`id_product` AND product_attribute_shop.`default_on` = 1 AND product_attribute_shop.id_shop = 1 )
LEFT JOIN ps_stock_available stock ON ( stock.id_product = `p`.id_product AND stock.id_product_attribute = 0 AND stock.id_shop = 1 AND stock.id_shop_group = 0 )
LEFT JOIN `ps_category_lang` cl ON ( product_shop.`id_category_default` = cl.`id_category` AND cl.`id_lang` = 11 AND cl.id_shop = 1 )
LEFT JOIN `ps_product_lang` pl ON ( p.`id_product` = pl.`id_product` AND pl.`id_lang` = 11 AND pl.id_shop = 1 )
LEFT JOIN `ps_image_shop` image_shop ON ( image_shop.`id_product` = p.`id_product` AND image_shop.cover = 1 AND image_shop.id_shop = 1 )
LEFT JOIN `ps_image_lang` il ON ( image_shop.`id_image` = il.`id_image` AND il.`id_lang` = 11 )
LEFT JOIN `ps_manufacturer` m ON m.`id_manufacturer` = p.`id_manufacturer`
WHERE
product_shop.`id_shop` = 1
AND cp.`id_category` = 12
AND product_shop.`active` = 1
AND product_shop.`visibility` IN ( "both", "catalog" )
ORDER BY
RAND()
LIMIT 50
Please provide SHOW CREATE TABLE for each table. Meanwhile, ...
Let's start by optimizing the joins.
LEFT JOIN `ps_product_lang` pl ON ( p.`id_product` = pl.`id_product`
AND pl.`id_lang` = 11
AND pl.id_shop = 1 )
That needs INDEX(id_product, id_lang, id_shop) (The columns may be in any order.)
Don't use LEFT unless you really need to fetch a row from the righthand table as NULLs when it does not exist. In particular,
LEFT JOIN `ps_product` p
is probably getting in the way of optimization.
WHERE product_shop.`id_shop` = 1
AND product_shop.`active` = 1
AND product_shop.`visibility` IN ( "both", "catalog" )
would probably benefit from these indexes
INDEX(id_shop, active, visibility, id_product)
INDEX(id_product, id_shop, active, visibility)
product_category needs
INDEX(id_category, id_product) -- in this order.
In general many-to-many mapping tables need to follow the tips here: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table
The query has the "explode-implode" syndrome. This is where it first does a JOINs, collecting a lot of data, then throws away much of it due, in your case, to the LIMIT 10. It can probably be cured by turning the query inside-out. The general ID is to start with a derived table that gets the 10 rows desired, then reaches into the other table for the rest of the desired columns. This "reaching" need happen only 10 times, not however many the JOINs currently require.
SELECT ...
FROM ( SELECT <<primary key columns from cp, p, and product_shop>>
FROM cp
JOIN p ON ...
JOIN product_shop ON ...
ORDER BY RAND()
LIMIT 10 ) AS x
JOIN <<p, product_shop ON their PKs>> -- to get p.*, product_shop.*>>
[LEFT] JOIN << each of the other tables>> -- to get the other tables
You should start by testing the subquery (a "derived" table) to verify that it is noticeably faster than the original query.

duplicates in mysql full join

I have two tables and I want to put them together and dont want to get the same id, instead I want to have the rows with the value and not with the null values - if there is an id with values.
This is my query
(SELECT users_antworten.user,
users_antworten.antwort,
profilfragen.id,
profilfragen.frage,
profilfragen.status,
profilfragen.aktiviert
FROM users_antworten
right JOIN profilfragen ON users_antworten.frage = NULL
WHERE profilfragen.aktiviert = 1
group by profilfragen.id
)
UNION
(SELECT users_antworten.user,
users_antworten.antwort,
profilfragen.id,
profilfragen.frage,
profilfragen.status,
profilfragen.aktiviert
FROM users_antworten
LEFT JOIN profilfragen ON users_antworten.frage = profilfragen.id
WHERE profilfragen.aktiviert = 1 AND users_antworten.user = 6
group by profilfragen.id
)
Thank you!
Finally after hours :-) I've got it:
(select null as user,
null as antwort,
p.id,
p.frage,
p.status,
p.aktiviert
FROM profilfragen p
WHERE not exists
( SELECT null
FROM users_antworten u
WHERE u.frage = p.id AND u.user=6
)
)
UNION
(SELECT u.user,
u.antwort,
p.id,
p.frage,
p.status,
p.aktiviert
FROM users_antworten u left join profilfragen p on p.id = u.frage WHERE u.user= :id
)

MySQL search query with multiple joins and subqueries running slow

I have the following query which is actually within a stored procedure, but I removed it as there is too much going on inside the stored procedure. Basically this is the end result which takes ages (more than a minute) to run and I know the reason why - as you will also see from looking at the result of the explain - but I just cannot get it sorted.
Just to quickly explain what this query is doing. It is fetching all products from companies that are "connected" to the company where li.nToObjectID = 37. The result also returns some other information about the other companies like its name, company id, etc.
SELECT DISTINCT
SQL_CALC_FOUND_ROWS
p.id,
p.sTitle,
p.sTeaser,
p.TimeStamp,
p.ExpiryDate,
p.InStoreDate,
p.sCreator,
p.sProductCode,
p.nRetailPrice,
p.nCostPrice,
p.bPublic,
c.id as nCompanyID,
c.sName as sCompany,
m.id as nMID,
m.sFileName as sHighResFileName,
m.nSize,
(
Select sName
FROM tblBrand
WHERE id = p.nBrandID
) as sBrand,
(
Select t.sFileName
FROM tblThumbnail t
where t.nMediaID = m.id AND
t.sType = "thumbnail"
) as sFileName,
(
Select t.nWidth
FROM tblThumbnail t
where t.nMediaID = m.id AND
t.sType = "thumbnail"
) as nWidth,
(
Select t.nHeight
FROM tblThumbnail t
where t.nMediaID = m.id AND
t.sType = "thumbnail"
) as nHeight,
IF (
(
SELECT COUNT(id) FROM tblLink
WHERE
sType = "company"
AND sStatus = "active"
AND nToObjectID = 37
AND nFromObjectID = u.nCompanyID
),
1,
0
) AS bLinked
FROM tblProduct p
INNER JOIN tblMedia m
ON (
m.nTypeID = p.id AND
m.sType = "product"
)
INNER JOIN tblUser u
ON u.id = p.nUserID
INNER JOIN tblCompany c
ON u.nCompanyID = c.id
LEFT JOIN tblLink li
ON (
li.sType = "company"
AND li.sStatus = "active"
AND li.nToObjectID = 37
AND li.nFromObjectID = u.nCompanyID
)
WHERE c.bActive = 1
AND p.bArchive = 0
AND p.bActive = 1
AND NOW() <= p.ExpiryDate
AND (
li.id IS NOT NULL
OR (
li.id IS NULL
AND p.bPublic = 1
)
)
ORDER BY p.TimeStamp DESC
LIMIT 0, 52
Click here to see the output for EXPLAIN. Sorry, just couldn't get the formatting correct.
http://i60.tinypic.com/2hdqjgj.png
And lastly the number of rows for all the tables in this query:
tblProducts
Count: 5392
tblBrand
Count: 194
tblCompany
Count: 368
tblUser
Count: 416
tblMedia
Count: 5724
tblLink
Count: 24800
tblThumbnail
Count: 22207
So I have 2 questions:
1. Is there another way of writing this query which might potentially speed it up?
2. What index combination do I need for tblProducts so that not all the rows are searched through?
UPDATE 1
This is the new query after removing the subqueries and making use of left joins instead:
SELECT DISTINCT DISTINCT
SQL_CALC_FOUND_ROWS
p.id,
p.sTitle,
p.sTeaser,
p.TimeStamp,
p.ExpiryDate,
p.InStoreDate,
p.sCreator,
p.sProductCode,
p.nRetailPrice,
p.nCostPrice,
p.bPublic,
c.id as nCompanyID,
c.sName as sCompany,
m.id as nMID,
m.sFileName as sHighResFileName,
m.nSize,
brand.sName as sBrand,
thumb.sFilename,
thumb.nWidth,
thumb.nHeight,
IF (
(
SELECT COUNT(id) FROM tblLink
WHERE
sType = "company"
AND sStatus = "active"
AND nToObjectID = 37
AND nFromObjectID = u.nCompanyID
),
1,
0
) AS bLinked
FROM tblProduct p
INNER JOIN tblMedia m
ON (
m.nTypeID = p.id AND
m.sType = "product"
)
INNER JOIN tblUser u
ON u.id = p.nUserID
INNER JOIN tblCompany c
ON u.nCompanyID = c.id
LEFT JOIN tblLink li
ON (
li.sType = "company"
AND li.sStatus = "active"
AND li.nToObjectID = 37
AND li.nFromObjectID = u.nCompanyID
)
LEFT JOIN tblBrand AS brand
ON brand.id = p.nBrandID
LEFT JOIN tblThumbnail AS thumb
ON (
thumb.nMediaID = m.id
AND thumb.sType = 'thumbnail'
)
WHERE c.bActive = 1
AND p.bArchive = 0
AND p.bActive = 1
AND NOW() <= p.ExpiryDate
AND (
li.id IS NOT NULL
OR (
li.id IS NULL
AND p.bPublic = 1
)
)
ORDER BY p.TimeStamp DESC
LIMIT 0, 52;
UPDATE 2
ALTER TABLE tblThumbnail ADD INDEX (nMediaID,sType) USING BTREE;
ALTER TABLE tblMedia ADD INDEX (nTypeID,sType) USING BTREE;
ALTER TABLE tblProduct ADD INDEX (bArchive,bActive,ExpiryDate,bPublic,TimeStamp) USING BTREE;
After doing the above changes the explain showed that it is now only searching through 1464 rows on tblProduct instead of 5392.
That's a big query with a lot going on. It's going to take a few steps of work to optimize it. I will take the liberty of just presenting a couple of steps.
First step. Can you get rid of SQL_CALC_FOUND_ROWS and still have your program work correctly? If so, do that. When you specify SQL_CALC_FOUND_ROWS it sometimes means the server has to delay sending you the first row of your resultset until the last row is available.
Second step. Refactor the dependent subqueries to be JOINs instead.
Here's how you might approach that. Part of your query looks like this...
SELECT DISTINCT SQL_CALC_FOUND_ROWS
p.id,
...
c.id as nCompanyID,
...
m.id as nMID,
...
( /* dependent subquery to be removed */
Select sName
FROM tblBrand
WHERE id = p.nBrandID
) as sBrand,
( /* dependent subquery to be removed */
Select t.sFileName
FROM tblThumbnail t
where t.nMediaID = m.id AND
t.sType = "thumbnail"
) as sFileName,
( /* dependent subquery to be removed */
Select t.nWidth
FROM tblThumbnail t
where t.nMediaID = m.id AND
t.sType = "thumbnail"
) as nWidth,
( /* dependent subquery to be removed */
Select t.nHeight
FROM tblThumbnail t
where t.nMediaID = m.id AND
t.sType = "thumbnail"
) as nHeight,
...
Try this instead. Notice how the brand and thumbnail dependent subqueries disappear. You had three dependent subqueries for the thumbnail; they can disappear into a single JOIN.
SELECT DISTINCT SQL_CALC_FOUND_ROWS
p.id,
...
brand.sName,
thumb.sFilename,
thumb.nWidth,
thumb.nHeight,
...
FROM tblProduct p
INNER JOIN tblMedia AS m ON (m.nTypeID = p.id AND m.sType = 'product')
... (other table joins) ...
LEFT JOIN tblBrand AS brand ON p.id = p.nBrandID
LEFT JOIN tblMedia AS thumb ON (t.nMediaID = m.id AND thumb.sType = 'thumbnail')
I used LEFT JOIN rather than INNER JOIN so MySQL will present NULL values if the joined rows are missing.
Edit
You're using a join pattern that looks like this:
JOIN sometable AS s ON (s.someID = m.id AND s.sType = 'string')
You seem to do this for a few tables. You probably can speed up the JOIN operations by creating compound indexes in those tables. For example, try adding the following index to tblThumbnail: (sType, nMediaID). You can do that with this DDL statement.
ALTER TABLE tblThumbnail ADD INDEX (sType, nMediaID) USING BTREE
You can do similar things to other tables with the same join pattern.

I need get all records from table joined where at least one case with condition

I have three tables:
cp_projeto (id, nome...)
cp_habilidade_projeto (id, id_projeto)
cp_habilidade (id, nome...)
I need all projects with all cp_habilidade where cp_projeto have one cp_habilidade. My actual query:
SELECT
p.id as id_projeto,
p.nome as nome_projeto,
p.id_tipo_projeto,
p.dhPostagem,
cp_habilidade_projeto.id as id_habilidade_projeto,
cp_habilidade.nome as nome_habilidade
FROM (
SELECT * FROM cp_projeto
WHERE (id_status_projeto = 2)
ORDER BY dhPostagem DESC LIMIT 0, 10
) AS p
inner JOIN cp_habilidade_projeto ON (cp_habilidade_projeto.id_projeto = p.id)
inner JOIN cp_habilidade ON (cp_habilidade.id = cp_habilidade_projeto.id_habilidade)
JOIN cp_sub_categoria ON (cp_sub_categoria.id = p.id_sub_categoria)
WHERE (
p.nome like '%CSS%'
OR cp_habilidade.nome like '%CSS%'
)
This returns only cp_habilidade.nome = %CSS%, I need it all.
Thanks!
SELECT
p.id as id_projeto,
p.nome as nome_projeto,
p.id_tipo_projeto,
p.dhPostagem,
hp.id as id_habilidade_projeto,
h.nome as nome_habilidade
FROM cp_projecto p
JOIN cp_habilidade_projeto hp ON p.id = hp.id_projeto
JOIN cp_habilidade h ON h.id = hp.id_habilidade
WHERE p.id IN ( SELECT cp_habilidade_projeto.id_projeto
FROM cp_habilidade
JOIN cp_habilidade_projeto ON cp_habilidade.id = cp_habilidade_projeto.id_habilidade
WHERE cp_habilidade.nome LIKE '%CSS%' )

SQL Query Problem

I've been at this for a bit now. Basically, I'm needing to add a derived column to count the hits to a weblog entry in the database. The problem is, the hits are being totaled and shown on only on the first record. Any Ideas? I've emboldened the parts of the query I'm talking about. The query is below:
SELECT DISTINCT(t.entry_id),
exp_categories.rank,
**exp_hits.hits,**
t.entry_id,
t.weblog_id,
t.forum_topic_id,
t.author_id,
t.ip_address,
t.title,
t.url_title,
t.status,
t.dst_enabled,
t.view_count_one,
t.view_count_two,
t.view_count_three,
t.view_count_four,
t.allow_comments,
t.comment_expiration_date,
t.allow_trackbacks,
t.sticky,
t.entry_date,
t.year,
t.month,
t.day,
t.entry_date,
t.edit_date,
t.expiration_date,
t.recent_comment_date,
t.comment_total,
t.trackback_total,
t.sent_trackbacks,
t.recent_trackback_date,
t.site_id as entry_site_id,
w.blog_title,
w.blog_name,
w.search_results_url,
w.search_excerpt,
w.blog_url,
w.comment_url,
w.tb_return_url,
w.comment_moderate,
w.weblog_html_formatting,
w.weblog_allow_img_urls,
w.weblog_auto_link_urls,
w.enable_trackbacks,
w.trackback_use_url_title,
w.trackback_field,
w.trackback_use_captcha,
w.trackback_system_enabled,
m.username,
m.email,
m.url,
m.screen_name,
m.location,
m.occupation,
m.interests,
m.aol_im,
m.yahoo_im,
m.msn_im,
m.icq,
m.signature,
m.sig_img_filename,
m.sig_img_width,
m.sig_img_height,
m.avatar_filename,
m.avatar_width,
m.avatar_height,
m.photo_filename,
m.photo_width,
m.photo_height,
m.group_id,
m.member_id,
m.bday_d,
m.bday_m,
m.bday_y,
m.bio,
md.*,
wd.*
FROM exp_weblog_titles AS t
LEFT JOIN exp_weblogs AS w ON t.weblog_id = w.weblog_id
LEFT JOIN exp_weblog_data AS wd ON t.entry_id = wd.entry_id
LEFT JOIN exp_members AS m ON m.member_id = t.author_id
LEFT JOIN exp_member_data AS md ON md.member_id = m.member_id
LEFT JOIN exp_category_posts ON wd.entry_id = exp_category_posts.entry_id
**LEFT JOIN
(
SELECT COUNT(*) AS hits, exp_hits.entry_id FROM exp_hits ORDER BY exp_hits.entry_id
) exp_hits ON t.entry_id = exp_hits.entry_id**
LEFT JOIN
(
SELECT exp_categories.cat_id, cat_name as rank
FROM exp_categories
WHERE exp_categories.group_id = '9'
) exp_categories ON exp_categories.cat_id = exp_category_posts.cat_id
WHERE t.entry_id IN (2,3,4) ORDER BY exp_categories.rank DESC, **exp_hits.hits DESC**, entry_date desc
Try changeing the subselect to
SELECT COUNT(*) AS hits,
exp_hits.entry_id
FROM exp_hits
GROUP BY exp_hits.entry_id
Out of curiosity, is your hits functionality something that can't be accomplished with the view_count_one/two/three/four fields already present in the database and supported by ExpressionEngine template tags?