Mysql - optimisation - multiple group_concat & joins using having - mysql

I've looked at similar group_concat mysql optimisation threads but none seem relevant to my issue, and my mysql knowledge is being stretched with this one.
I have been tasked with improving the speed of a script with an extremely heavy Mysql query contained within.
The query in question uses GROUP_CONCAT to create a list of colours, tags and sizes all relevant to a particular product. It then uses HAVING / FIND_IN_SET to filter these concatenated lists to find the attribute, set by the user controls and display the results.
In the example below it's looking for all products with product_tag=1, product_colour=18 and product_size=17. So this could be a blue product (colour) in medium (size) for a male (tag).
The shop_products tables contains about 3500 rows, so is not particularly large, but the below takes around 30 seconds to execute. It works OK with 1 or 2 joins, but adding in the third just kills it.
SELECT shop_products.id, shop_products.name, shop_products.default_image_id,
GROUP_CONCAT( DISTINCT shop_product_to_colours.colour_id ) AS product_colours,
GROUP_CONCAT( DISTINCT shop_products_to_tag.tag_id ) AS product_tags,
GROUP_CONCAT( DISTINCT shop_product_colour_to_sizes.tag_id ) AS product_sizes
FROM shop_products
LEFT JOIN shop_product_to_colours ON shop_products.id = shop_product_to_colours.product_id
LEFT JOIN shop_products_to_tag ON shop_products.id = shop_products_to_tag.product_id
LEFT JOIN shop_product_colour_to_sizes ON shop_products.id = shop_product_colour_to_sizes.product_id
WHERE shop_products.category_id = '50'
GROUP BY shop_products.id
HAVING((FIND_IN_SET( 1, product_tags ) >0)
AND(FIND_IN_SET( 18, product_colours ) >0)
AND(FIND_IN_SET( 17, product_sizes ) >0))
ORDER BY shop_products.name ASC
LIMIT 0 , 30
I was hoping somebody could generally advise a better way to structure this query without re-structuring the database (which isn't really an option at this point without weeks of data migration and script changes)? Or any general advise on optimisation. Using explain currently returns the below (as you can see the indexes are all over the place!).
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE shop_products ref category_id,category_id_2 category_id 2 const 3225 Using where; Using temporary; Using filesort
1 SIMPLE shop_product_to_colours ref product_id,product_id_2,product_id_3 product_id 4 candymix_db.shop_products.id 13
1 SIMPLE shop_products_to_tag ref product_id,product_id_2 product_id 4 candymix_db.shop_products.id 4
1 SIMPLE shop_product_colour_to_sizes ref product_id product_id 4 candymix_db.shop_products.id 133

Rewrite query to use WHERE instead of HAVING. Because WHERE is applied when MySQL performs search on rows and it can use index. HAVING is applied after rows are selected to filter already selected result. HAVING by design can't use indexes.
You can do it, for example, this way:
SELECT p.id, p.name, p.default_image_id,
GROUP_CONCAT( DISTINCT pc.colour_id ) AS product_colours,
GROUP_CONCAT( DISTINCT pt.tag_id ) AS product_tags,
GROUP_CONCAT( DISTINCT ps.tag_id ) AS product_sizes
FROM shop_products p
JOIN shop_product_to_colours pc_test ON p.id = pc_test.product_id AND pc_test.colour_id = 18
JOIN shop_products_to_tag pt_test ON p.id = pt_test.product_id AND pt_test.tag_id = 1
JOIN shop_product_colour_to_sizes ps_test ON p.id = ps_test.product_id AND ps_test.tag_id = 17
JOIN shop_product_to_colours pc ON p.id = pc.product_id
JOIN shop_products_to_tag pt ON p.id = pt.product_id
JOIN shop_product_colour_to_sizes ps ON p.id = ps.product_id
WHERE p.category_id = '50'
GROUP BY p.id
ORDER BY p.name ASC
Update
We are joining each table two times.
First to check if it contains some value (condition from FIND_IN_SET).
Second join will produce data for GROUP_CONCAT to select all product values from table.
Update 2
As #Matt Raines commented, if we don't need list product values with GROUP_CONCAT, query becomes even simplier:
SELECT p.id, p.name, p.default_image_id
FROM shop_products p
JOIN shop_product_to_colours pc ON p.id = pc.product_id
JOIN shop_products_to_tag pt ON p.id = pt.product_id
JOIN shop_product_colour_to_sizes ps ON p.id = ps.product_id
WHERE p.category_id = '50'
AND (pc.colour_id = 18 AND pt.tag_id = 1 AND ps.tag_id = 17)
GROUP BY p.id
ORDER BY p.name ASC
This will select all products with three filtered attributes.

I think if I understand this question, what you need to do is:
Find a list of all of the shop_product.id's that have the correct tag/color/size options
Get a list of all of the tag/color/size combinations available for that product id.
I was trying to make you a SQLFiddle for this, but the site seems broken at the moment. Try something like:
SELECT shop_products.id, shop_products.name, shop_products.default_image_id,
GROUP_CONCAT( DISTINCT shop_product_to_colours.colour_id ) AS product_colours,
GROUP_CONCAT( DISTINCT shop_products_to_tag.tag_id ) AS product_tags,
GROUP_CONCAT( DISTINCT shop_product_colour_to_sizes.tag_id ) AS product_sizes
FROM
shop_products INNER JOIN
(SELECT shop_products.id id,
FROM
shop_products
LEFT JOIN shop_product_to_colours ON shop_products.id = shop_product_to_colours.product_id
LEFT JOIN shop_products_to_tag ON shop_products.id = shop_products_to_tag.product_id
LEFT JOIN shop_product_colour_to_sizes ON shop_products.id = shop_product_colour_to_sizes.product_id
WHERE
shop_products.category_id = '50'
shop_products_to_tag.tag_id=1
shop_product_to_colours.colour_id=18
shop_product_colour_to_sizes.tag_id=17
) matches ON shop_products.id = matches.id
LEFT JOIN shop_product_to_colours ON shop_products.id = shop_product_to_colours.product_id
LEFT JOIN shop_products_to_tag ON shop_products.id = shop_products_to_tag.product_id
LEFT JOIN shop_product_colour_to_sizes ON shop_products.id = shop_product_colour_to_sizes.product_id
GROUP BY shop_products.id
ORDER BY shop_products.name ASC
LIMIT 0 , 30;
The problem with you first approach is that it requires the database to create every combination of every product and then filter. In my example, I'm filtering down the product id's first then generating the combinations.
My query is untested as I don't have a MySQL Environment handy and SQLFiddle is down, but it should give you the idea.

First, I aliased your queries to shorten readability.
SP = Shop_Products
PC = Shop_Products_To_Colours
PT = Shop_Products_To_Tag
PS = Shop_Products_To_Sizes
Next, your having should be a WHERE since you are explicitly looking FOR something. No need trying to query the entire system just to throw records after the result is returned. Third, you had LEFT-JOIN, but when applicable to a WHERE or HAVING, and you are not allowing for NULL, it forces TO a JOIN (both parts required). Finally, your WHERE clause has quotes around the ID you are looking for, but that is probably integer anyhow. Remove the quotes.
Now, for indexes and optimization there. To help with the criteria, grouping, and JOINs, I would have the following composite indexes (multiple fields) instead of a table with just individual columns as the index.
table index
Shop_Products ( category_id, id, name )
Shop_Products_To_Colours ( product_id, colour_id )
Shop_Products_To_Tag ( product_id, tag_id )
Shop_Products_To_Sizes ( product_id, tag_id )
Revised query
SELECT
SP.id,
SP.name,
SP.default_image_id,
GROUP_CONCAT( DISTINCT PC.colour_id ) AS product_colours,
GROUP_CONCAT( DISTINCT PT.tag_id ) AS product_tags,
GROUP_CONCAT( DISTINCT PS.tag_id ) AS product_sizes
FROM
shop_products SP
JOIN shop_product_to_colours PC
ON SP.id = PC.product_id
AND PC.colour_id = 18
JOIN shop_products_to_tag PT
ON SP.id = PT.product_id
AND PT.tag_id = 1
JOIN shop_product_colour_to_sizes PS
ON SP.id = PS.product_id
AND PS.tag_id = 17
WHERE
SP.category_id = 50
GROUP BY
SP.id
ORDER BY
SP.name ASC
LIMIT
0 , 30
One Final comment. Since you are ordering by the NAME, but grouping by the ID, it might cause a delay in the final sorting. HOWEVER, if you change it to group by the NAME PLUS ID, you will still be unique by the ID, but an adjusted index ON your Shop_Products to
table index
Shop_Products ( category_id, name, id )
will help both the group AND order since they will be in natural order from the index.
GROUP BY
SP.name,
SP.id
ORDER BY
SP.name ASC,
SP.ID

Related

Improve MySql query left outer joins with subquery

We are maintaining a history of Content. We want to get the updated entry of each content, with create Time and update Time should be of the first entry of the Content. The query contains multiple selects and where clauses with so many left joins. The dataset is very huge, thereby query is taking more than 60 seconds to execute. Kindly help in improving the same. Query:
select * from (select * from (
SELECT c.*, initCMS.initcreatetime, initCMS.initupdatetime, user.name as partnerName, r.name as rightsName, r1.name as copyRightsName, a.name as agelimitName, ct.type as contenttypename, cat.name as categoryname, lang.name as languagename FROM ContentCMS c
left join ContentCategoryType ct on ct.id = c.contentType
left join User user on c.contentPartnerId = user.id
left join Category cat on cat.id = c.categoryId
left join Language lang on lang.id = c.languageCode
left join CopyRights r on c.rights = r.id
left join CopyRights r1 on c.copyrights = r1.id
left join Age a on c.ageLimit = a.id
left outer join (
SELECT contentId, createTime as initcreatetime, updateTime as initupdatetime from ContentCMS cms where cms.deleted='0'
) as initCMS on initCMS.contentId = c.contentId WHERE c.deleted='0' order by c.id DESC
) as temp group by contentId) as c where c.editedBy='0'
Any help would be highly appreciated. Thank you.
Just a partial eval and suggestion because your query seems non properly formed
This left join seems unuseful
FROM ContentCMS c
......
left join (
SELECT contentId
, createTime as initcreatetime
, updateTime as initupdatetime
from ContentCMS cms
where cms.deleted='0'
) as initCMS on initCMS.contentId = c.contentId
same table
the order by (without limit) in a subquery in join is unuseful because join ordered values or unordered value produce the same result
the group by contentId is strange beacuse there aren't aggregation function and the sue of group by without aggregation function is deprecated is sql
and in the most recente version for mysql is not allowed (by deafult) if you need distinct value or just a rows for each contentId you should use distinct or retrive the value in a not casual manner (the use of group by without aggregation function retrive casual value for not aggregated column .
for a partial eval your query should be refactored as
SELECT c.*
, c.initcreatetime
, c.initupdatetime
, user.name as partnerName
, r.name as rightsName
, r1.name as copyRightsName
, a.name as agelimitName
, ct.type as contenttypename
, cat.name as categoryname
, lang.name as languagename
FROM ContentCMS c
left join ContentCategoryType ct on ct.id = c.contentType
left join User user on c.contentPartnerId = user.id
left join Category cat on cat.id = c.categoryId
left join Language lang on lang.id = c.languageCode
left join CopyRights r on c.rights = r.id
left join CopyRights r1 on c.copyrights = r1.id
WHERE c.deleted='0'
) as temp
for the rest you should expiclitally select the column you effectively need add proper aggregation function for the others
Also the nested subquery just for improperly reduce the rows don't help performance ... you should also re-eval you data modelling and design.

Pagination count query with multiple joins

I had this question about filtering different products by selecting options. That query has been solved here: Filter products by options.
My problem is now with the count query for the pagination. For instance this query returns 37 rows with the count of 1.
SELECT COUNT(DISTINCT p.id) AS number
FROM products p
LEFT JOIN product_categories pc ON p.id = pc.product_id
LEFT JOIN product_images pi ON p.id = pi.product_id
LEFT JOIN product_options po ON p.id = po.product_id
WHERE p.product_active = 1
AND po.option_id IN(1)
AND p.main_price BETWEEN 5250.00 AND 14000.00
GROUP BY(p.id)
HAVING COUNT(DISTINCT po.option_id) = 1
But if i remove DISTINCT:
SELECT COUNT(p.id) AS number
FROM products p
LEFT JOIN product_categories pc ON p.id = pc.product_id
LEFT JOIN product_images pi ON p.id = pi.product_id
LEFT JOIN product_options po ON p.id = po.product_id
WHERE p.product_active = 1
AND po.option_id IN(1)
AND p.main_price BETWEEN 5250.00 AND 14000.00
GROUP BY(p.id)
HAVING COUNT(DISTINCT po.option_id) = 1
this returns also 37 rows with mixed numbers.
What am doing wrong? I know i could outcome this by running aditional count on this result set but i think that is not right solution?
Also as suggested in previous question, there was stated that i should not be needing DISTINCT and the query is flawwed because of that. Can you tell me what is the problem?
The only difference in your queries is here:
SELECT COUNT(DISTINCT p.id) AS number
vs.
SELECT COUNT(p.id) AS number
So you get the same number of result rows, because FROM, WHERE, and HAVING are all the same. Only the data per row you select is different.
In the first case it's the number of distinct IDs that are not null, which is always 1, because you group by that ID. (You say: Look at all records with ID 5 and count how many different IDs you find in these records. Well, the ID in every record with ID 5 is 5. So it's only one ID.)
In the second case you count IDs that are not null. As the ID is never null, this is the same as counting records: COUNT(*). And you'd better make this clear by using COUNT(*) instead of the obfuscated COUNT(p.id).
OK guys it is more clear now. Is something like this good enough to solve this?
SELECT COUNT(*) AS number
FROM (SELECT p.id
FROM products p
LEFT JOIN product_categories pc ON p.id = pc.product_id
LEFT JOIN product_images pi ON p.id = pi.product_id
LEFT JOIN product_options po ON p.id = po.product_id
WHERE p.product_active = 1
AND po.option_id IN(1)
AND p.main_price BETWEEN 5250.00 AND 14000.00
GROUP BY(p.id)
HAVING COUNT(DISTINCT po.option_id) = 1) AS count_table
I will go with this one. Thanks.

Duplicated rows

SQL Query:
SELECT
T.*,
U.nick AS author_nick,
P.id AS post_id,
P.name AS post_name,
P.author AS post_author_id,
P.date AS post_date,
U2.nick AS post_author
FROM
zero_topics T
LEFT JOIN
zero_posts P
ON
T.id = P.topic_id
LEFT JOIN
zero_players U
ON
T.author = U.uuid
LEFT JOIN
zero_players U2
ON
P.author = U2.uuid
ORDER BY
CASE
WHEN P.date is null THEN T.date
ELSE P.date
END DESC
Output:
Topics:
Posts:
Question: Why i have duplicated topic id 22? i have in mysql two topics (id 22 and 23) and two posts(id 24 and 25). I want to see topic with last post only.
If a join produces multiple results and you want only at most one result, you have to rewrite the join and/or filtering criteria to provide that result. If you want only the latest result of all the results, it's doable and reasonably easy once you use it a few times.
select a.Data, b.Data
from Table1 a
left join Table2 b
on b.JoinValue = a.JoinValue
and b.DateField =(
select Max( DateField )
from Table2
where JoinValue = b.JoinValue );
The correlated subquery pulls out the one date that is the highest (most recent) value of all the joinable candidates. That then becomes the row that takes part in the join -- or, of course, nothing if there are no candidates at all. This is a pattern I use quite a lot.

How can I get the sum of a column ?

I have 3 tables: activites, taks and requirements. I want to return all of the duration of all the tasks for a specific requirement. This is my query:
SELECT r.id as req_id,
r.project_id,
r.name as req_name,
r.cost,r.estimated,
p.name as project_name,
v.name AS `status` ,
t.taskid,
(SELECT SEC_TO_TIME(SUM(TIME_TO_SEC(duration)))
FROM activities a
WHERE a.taskid = t.taskid) AS worked
FROM requirements r
INNER JOIN projects p
ON p.projectid = r.project_id
INNER JOIN `values` v
ON v.id = r.r_status_id
LEFT JOIN tasks t
on t.id_requirement = r.id
WHERE 1 = 1
ORDER BY req_id desc
And this is the result :
As you can see there are 2 same req_id (48) . I want to appear one time and get the sum of the last two rows in worked. How can I manage that ?
this is the activities structure :
this is tasks structure :
and this is the requirement structure :
Include your activities table in the JOIN, GROUP by all requirement columns you need and add a sum. Since you are aggregating tasks, you cannot have taskid in the SELECT clause.
SELECT r.id as req_id,
r.project_id,
r.name as req_name,
r.cost,r.estimated,
p.name as project_name,
v.name AS `status` ,
SEC_TO_TIME(SUM(TIME_TO_SEC(a.duration)))
FROM requirements r
INNER JOIN projects p ON p.projectid = r.project_id
INNER JOIN `values` v ON v.id = r.r_status_id
LEFT JOIN tasks t ON t.id_requirement = r.id
LEFT JOIN activities a ON a.taskid=t.taskid
WHERE 1 = 1
GROUP BY r.id, r.project_id, r.name,r.cost,r.estimated,p.name, v.name
ORDER BY req_id desc
The joins in your query appear to be creating extra rows. I'm sure there is a way to fix the logic directly, possibly by pre-aggregating some results in the from clause.
Your duplicates appear to be complete duplicates (every column is exactly the same). The easy way to fix the problem is to use select distinct. So, just start your query with:
SELECT DISTINCT r.id as req_id, r.project_id, r.name as req_name,
. . .
I suspect that one of your underlying tables has duplicated rows that you are not expecting, but that is another issue.

Slow query execution time

SELECT p.id,
p.title,
p.slug,
p.content,
(SELECT url
FROM gallery
WHERE postid = p.id
LIMIT 1) AS url,
t.name
FROM posts AS p
INNER JOIN termrel AS tr
ON ( tr.object = p.id )
INNER JOIN termtax AS tx
ON ( tx.id = tr.termtax_id )
INNER JOIN terms AS t
ON ( t.id = tx.term_id )
WHERE tx.taxonomy_id = 3
AND p.post_status IS NULL
ORDER BY t.name ASC
This query took about 0.2407s to execute. How to make it fast?
Correlated subqueries can have subpar performance as they are executed row by row.
To solve this move your correlated subquery into a regular subquery/derived table and join to it. It will then not have execute row by row for the entire returned result set as it will be executed BEFORE the select statement.
mysql specific links that confirm correaleated subqueries are not optimal choices in mysql.
How to optimize
Answer indicating msql notoriously bad at optimizing correlated subqueries
I use sql-server, but I'm sure the principle is the same for mysql, so I hope this at least points you in the right direction. You would need to partition/return your one result per loan, maybe some could chime in on mysql specific syntax and I could update my answer
select
p.id
,p.title
,p.slug
,p.content
,t.name
,mySubQuery.value
from
posts as p
inner join termrel as tr
on ( tr.object = p.id )
inner join termtax as tx
on ( tx.id = tr.termtax_id )
inner join terms as t
on ( t.id = tx.term_id )
left join (
-- use MYSQL function to partition the reslts and only return 1, I use sql-server, not sure of the RDMS specific syntax
select
id
,url
from
gallery
limit 1
) as mySubquery
on mySubquery.id = p.id
where
tx.taxonomy_id = 3
and p.post_status is null
order by
t.name asc