Optimize SQL query with many inner joins on same table - mysql

I'm stuck with a performance issue:
A shop has an article filter with categories "color", "size", "gender" and "feature". All those details are stored inside an article_criterias table, that looks like this:
Table layout of article_criterias is; this table has about 36.000 rows:
article_id | group | option | option_val
100 | "size" | "35" | 35.00
100 | "size" | "36" | 36.00
100 | "size" | "36½" | 36.50
100 | "color" | "40" | 40.00
100 | "color" | "50" | 50.00
100 | "gender" | "1" | 1.00
101 | "size" | "40" | 40.00
...
We have a SQL query that is built dynamically, based on which criteria are currently selected. The query is good for 2-3 criteria, but will get very slow when selecting more than 5 options (each additional INNER JOIN roughly doubles the execution time)
How can we make this SQL faster, maybe even replacing the inner joins with a more performant concept?
This is the query (the logic is correct, just the performance is bad):
-- This SQL is generated when the user selected the following criteria
-- gender: 1
-- color: 80 + 30
-- size 36 + 37 + 38 + 39 + 42 + 46
SELECT
criteria.group AS `key`,
criteria.option AS `value`
FROM articles
INNER JOIN article_criterias AS criteria ON articles.id = criteria.article_id
INNER JOIN article_criterias AS criteria_gender
ON criteria_gender.article_id = articles.id AND criteria_gender.group = "gender"
INNER JOIN article_criterias AS criteria_color1
ON criteria_color1.article_id = articles.id AND criteria_color1.group = "color"
INNER JOIN article_criterias AS criteria_size2
ON criteria_size2.article_id = articles.id AND criteria_size2.group = "size"
INNER JOIN article_criterias AS criteria_size3
ON criteria_size3.article_id = articles.id AND criteria_size3.group = "size"
INNER JOIN article_criterias AS criteria_size4
ON criteria_size4.article_id = articles.id AND criteria_size4.group = "size"
INNER JOIN article_criterias AS criteria_size5
ON criteria_size5.article_id = articles.id AND criteria_size5.group = "size"
INNER JOIN article_criterias AS criteria_size6
ON criteria_size6.article_id = articles.id AND criteria_size6.group = "size"
INNER JOIN article_criterias AS criteria_size7
ON criteria_size7.article_id = articles.id AND criteria_size7.group = "size"
WHERE
AND (criteria_gender.option IN ("1"))
AND (criteria_color1.option IN ("80", "30"))
AND (criteria_size2.option_val BETWEEN 35.500000 AND 36.500000)
AND (criteria_size3.option_val BETWEEN 36.500000 AND 37.500000)
AND (criteria_size4.option_val BETWEEN 37.500000 AND 38.500000)
AND (criteria_size5.option_val BETWEEN 38.500000 AND 39.500000)
AND (criteria_size6.option_val BETWEEN 41.500000 AND 42.500000)
AND (criteria_size7.option_val BETWEEN 45.500000 AND 46.500000)

Key/value tables are really a nuisance. However, in order to find certain criteria matches aggregate your data:
select
a.*,
ac.group AS "key",
ac.option AS "value"
from articles a
join article_criterias ac on ac.article_id = a.article_id
where a.article_id in
(
select article_id
from article_criterias
group by article_id
having sum("group" = 'gender' and option = '1') > 0
and sum("group" = 'color' and option in ('30','80')) > 0
and sum("group" = 'size' and option_val between 35.5 and 36.5) > 0
and sum("group" = 'size' and option_val between 36.5 and 37.5) > 0
and sum("group" = 'size' and option_val between 37.5 and 38.5) > 0
and sum("group" = 'size' and option_val between 38.5 and 39.5) > 0
and sum("group" = 'size' and option_val between 41.5 and 42.5) > 0
and sum("group" = 'size' and option_val between 45.5 and 46.5) > 0
)
order by a.article_id, ac.group, ac.option;
This gets you all articles that are available for gender 1, colors 30 and/or 80, and all listed size ranges, along with all their options. (The size ranges are a bit strange, though; a size 36.5 would meet two ranges for instance.) You get the idea: group by article_id and use HAVING in order to only get article_ids that meet the critria.
As to indexes you'll want
create index idx on article_criterias(article_id, "group", option, option_val);

As suggested by #affan-pathan adding index did solve the issue:
CREATE INDEX text_option
ON `article_criterias` (`article_id`, `group`, `option`);
CREATE INDEX numeric_option
ON `article_criterias` (`article_id`, `group`, `option_val`);
Those two indexes cut the execute time of the above query form nearly 1 minute to less than 50 milliseconds!!

I understand indexs you create solved your problem,
but just to play with a pseudo alternative (which avoid multiple INNER JOIN), can you try something like this? (I did test with just three condition. Your condition should be inserted in inner query. To select only the record who meets all conditions, you have to change last WHERE condition (WHERE max = 3, using the number of conditions you wrote above; so if you are using 5 conditions, you should write WHERE max = 5). (I changed the name of columns groups and option, for my ease of use).
It's just an idea so pls do some tests and check for performance and pls let me know...
CREATE TABLE CRITERIA (ARTICLE_ID INT, GROU VARCHAR(10), OPT VARCHAR(20), OPTION_VAL NUMERIC(12,2));
CREATE TABLE ARTICLES (ID INT);
INSERT INTO CRITERIA VALUES (100,'size','35',35);
INSERT INTO CRITERIA VALUES (100,'size','36',36);
INSERT INTO CRITERIA VALUES (100,'color','40',40);
INSERT INTO CRITERIA VALUES (100,'gender','1',1);
INSERT INTO CRITERIA VALUES (200,'size','36.2',36.2);
INSERT INTO CRITERIA VALUES (300,'size','36.2',36.2);
INSERT INTO ARTICLES VALUES (100);
INSERT INTO ARTICLES VALUES (200);
INSERT INTO ARTICLES VALUES (300);
-------------------------------------------------------
SELECT D.article_id, D.GROU, D.OPT
FROM (SELECT C.*
, #o:=CASE WHEN #h=ARTICLE_ID THEN #o ELSE cumul END max
, #h:=ARTICLE_ID AS a_id
FROM (SELECT article_id,
B.GROU, B.OPT,
#r:= CASE WHEN #g = B.ARTICLE_ID THEN #r+1 ELSE 1 END cumul,
#g:= B.ARTICLE_ID g
FROM CRITERIA B
CROSS JOIN (SELECT #g:=0, #r:=0) T1
WHERE (B.GROU='gender' AND B.OPT IN ('1'))
OR (B.GROU='color' AND B.OPT IN ('40', '30'))
OR (B.GROU='size' AND B.OPT BETWEEN 35.500000 AND 36.500000)
ORDER BY article_id
) C
CROSS JOIN (SELECT #o:=0, #h:=0) T2
ORDER BY ARTICLE_ID, CUMUL DESC) D
WHERE max=3
;
Output:
article_id GROU OPT
100 gender 1
100 color 40
100 size 36

Related

Mysql Select Two Rows in One

I need to select in one row grouped by a same key other two values in the same field:
TABLE
attribute_id
entity_id
value
DATA
attribute_id|entity_id| value
85| 220| 4740
257| 220|image1.png
And need this result:
attibute_id 85 as SKU, attribute_id 257 as IMAGE in this result:
SKU | IMAGE
4740 | image1.png
How can I do this? TIA!
I think this does what you want:
select ts.entity_id, ts.value as sku, ti.value as image
from t ts join
t ti
on ts.entity_id = ti.entity_id and
ts.attribute_id = 85 and
ti.attribute_id = 257;
You can also solve this using conditional aggregation:
select t.entity_id,
max(case when t.attribute_id = 85 then t.value end) as sku,
max(case when t.attribute_id = 257 then t.value end) as image
from t
group by t.entity_id;
If you have attribute_id|entity_id combinations unique across the table you don't need to group data, just join like this:
http://sqlfiddle.com/#!9/1b2f60/2
SELECT a.entity_id,
a.value AS some_attribute1,
b.value AS image
FROM attribs a
LEFT JOIN attribs b
ON a.entity_id = b.entity_id
AND b.attribute_id = 257
WHERE a.attribute_id = 85

MySQL Request with IN clause, with array of couple identifier

I have a user table with a couple as identifier : id and type, like this :
id | type | name
----------------
15 | 1 | AAA
16 | 1 | BBB
15 | 2 | CCC
I would like to get a list, matching both id and type.
I currently use a concat system, which works :
SELECT u.id,
u.type,
u.name
FROM user u
WHERE CONCAT(u.id, '-', u.type) IN ('15-1', '16-1', '17-1', '10-2', '15-2')
But, I have the feeling it could be better, what would be the proper way to do it ?
Thank you !
You may use the following approach in mysql
with dat as
(
select 17 id, 1 type, 'AAA' t
union all
select 16 id, 1 type, 'BBB' t
union all
select 17 id, 2 type, 'CCC' t
)
-- end of testing data
select *
from dat
where (id, type) in (
-- query data
(17, 1), (16, 1)
)
IN can operate on "tuples" of values, like this (a, b) IN ((c,d), (e,f), ....). Using this method is (should be) faster as you are not doing a concat operation on "a" and "b" and then comparing strings; instead you are comparing pairs of values, unprocessed and with an appropriate comparison operation (i.e. not always string compares).
Additionally, if "a" and/or "b" are string values themselves using the concat technique risks ambiguous results. ("1-2","3") and ("1","2-3") pairs concat to the same result "1-2-3"
You can separate them out. Not sure if it's more efficient but at least you would save the concat part :
SELECT u.id,
u.type,
u.name
FROM user u
WHERE (u.id = 15 AND u.type = 1)
OR (u.id = 16 AND u.type = 1)
OR (u.id = 17 AND u.type = 1)
OR (u.id = 10 AND u.type = 2)
OR (u.id = 15 AND u.type = 2)
I think it depends a lot on how you obtain the values for id and type that you use for filtering
If they are results of another computation they can be saved in a temporary table and used in a join
create TEMPORARY TABLE criteria as
select 15 as id, 1 as type
UNION
select 16 as id, 1 as type
UNION
select 17 as id, 1 as type
UNION
select 10 as id, 2 as type
UNION
select 15 as id, 2 as type
SELECT u.id,
u.type,
u.name
FROM user u
inner join criteria c on u.type = c.type and u.id = c.id
The other option is an inner query and then a join or a WITH clause (which is rather late addition to Mysql arsenal of tricks)

How can I optimise mySQL to use JOINs instead of nested IN queries?

I have a query which combines a user's balance at a number of locations and uses a nested subquery to combine data from the customer_balance table and the merchant_groups table. There is a second piece of data required from the customer_balance table that is unique to each merchant.
I'd like to optimise my query to return a sum and a unique value i.e. the order of results is important.
For instance, there may be three merchants in a merchant_group:
id | group_id | group_member_id
1 12 36
2 12 70
3 12 106
The user may have a balance at 2 locations but not all in the customer_balance table:
id | group_member_id | user_id | balance | personal_note
1 36 420 1.00 "Likes chocolate"
2 70 420 20.00 null
Notice there isn't a 3rd row in the balance table.
What I'd like to end up with is the ability to pull the sum of the balance as well as the most appropriate personal_note.
So far I have this working in all situations with the following query:
SELECT sum(c.cash_balance) as cash_balance,n.customer_note FROM customer_balance AS c
LEFT JOIN (SELECT customer_note, user_id FROM customer_balance
WHERE user_id = 420 AND group_member_id = 36) AS n on c.user_id = n.user_id
WHERE c.user_id = 420 AND c.group_id IN (SELECT group_member_id FROM merchant_group WHERE group_id = 12)
I can change out the group_member_id appropriately and I will always get the combined balance as expected and the appropriate note. i.e. what I'm looking for is:
balance: 21.00
customer_note: "Likes Chocolate" OR null (depending on the group_member_id)
Is it possible to optimise this query without using resource heavy nested queries e.g. using a JOIN? (or some other method).
I have tried a number of options, but cannot get it working in all situations. The following is the closest I have gotten, except this doesn't return the correct note:
SELECT sum(cb.balance), cb.personal_note FROM customer_balance AS cb
LEFT JOIN merchant_group AS mg on mg.group_member_id = cb.group_member_id
WHERE cb.user_id = 420 && mg.group_id = 12
ORDER BY (mg.group_member_id = 106)
I also tried another option (but since lost the query) that works, but not when the group_member_id = 106 - because there was no record in one table (but this is a valid use case that I'd like to cater for).
Thanks!
This should be equivalent but without subselect
SELECT
sum(c.cash_balance) as cash_balance
, n.customer_note
FROM customer_balance AS c
LEFT JOIN customer_balance as n on ( c.user_id = n.user_id AND n.group_member_id = 36 AND n.user_id = 420 )
INNER JOIN merchant_group as mg on ( c.group_id = mg.group_member_id AND mg.group_id = 12)
WHERE c.user_id = 420

Conditional condition in ON clause

I am trying to apply a conditional condition inside ON clause of a LEFT JOIN. What I am trying to achieve is somewhat like this:
Pseudo Code
SELECT * FROM item AS i
LEFT JOIN sales AS s ON i.sku = s.item_no
AND (some condition)
AND (
IF (s.type = 0 AND s.code = 'me')
ELSEIF (s.type = 1 AND s.code = 'my-group')
ELSEIF (s.type = 2)
)
I want the query to return the row, if it matches any one of the conditions (Edit: and if it matches one, should omit the rest for the same item).
Sample Data
Sales
item_no | type | code | price
1 0 me 10
1 1 my-group 12
1 2 14
2 1 my-group 20
2 2 22
3 2 30
4 0 not-me 40
I want the query to return
item_no | type | code | price
1 0 me 10
2 1 my-group 20
3 2 30
Edit: The sales is table is used to apply special prices for individual users, user groups, and/or all users.
if type = 0, code contains username. (for a single user)
if type = 1, code contains user-group. (for users in a group)
if type = 2, code contains empty-string (for all users).
Use the following SQL (assumed, the the table sales has a unique id field as usual in yii):
SELECT * FROM item AS i
LEFT JOIN sales AS s ON i.sku = s.item_no
AND id = (
SELECT id FROM sales
WHERE item_no = i.sku
AND (type = 0 AND code = 'me' OR
type = 1 AND code = 'my-group' OR
type = 2)
ORDER BY type
LIMIT 1
)
Try following -
SELECT *,SUBSTRING_INDEX(GROUP_CONCAT(s.type ORDER BY s.type),','1) AS `type`, SUBSTRING_INDEX(GROUP_CONCAT(s.code ORDER BY s.type),','1) AS `code`,SUBSTRING_INDEX(GROUP_CONCAT(s.price ORDER BY s.type),','1) AS `price`
FROM item AS i
LEFT JOIN sales AS s
ON i.sku = s.item_no AND (SOME CONDITION)
GROUP BY i.sku

JOIN vs UNION vs IN() - big tables and many WHERE conditions

I use MySQL 5.5 and I have 3 tables created for testing:
attributes (entity_id, cid, aid, value) - indexes: ALL
items (entity_id, price, currency) - indexes: entity_id
rates (currency_from, currency_to, rate) - indexes: NONE
I need to count the results for specified conditions (search by attributes) and select X rows ordered by some column.
The query should support searching in item attributes (attributes table).
I have a query like this at first:
SELECT i.entity_id, i.price * COALESCE(r.rate, 1) AS final_price
FROM items i
JOIN attributes a ON a.entity_id = i.entity_id
LEFT JOIN rates r ON i.currency = r.currency_from AND r.currency_to = 'EUR'
WHERE a.cid = 4 AND ( (a.aid >= 10 AND a.value > 2000) OR (a.aid <= 10 AND a.value > 5) )
HAVING final_price BETWEEN 0 AND 9000
ORDER BY final_price DESC
LIMIT 20
but it's quite slow on big tables. The where conditions can be bigger (even to 30 params) and use CAST(a.value as SIGNED) to use BETWEEN sometimes (for range values).
For example:
SELECT
i.entity_id,
i.price * COALESCE(r.rate, 1) AS final_price
FROM
attributes a
JOIN items i
ON a.entity_id = i.entity_id
LEFT JOIN rates r
ON i.currency = r.currency_from
AND r.currency_to = 'EUR'
WHERE
a.cid = 4 AND (
(a.aid = 10 AND CAST(a.value AS SIGNED) BETWEEN 2000 AND 2014)
OR (a.aid = 121 AND CAST(a.value AS SIGNED) BETWEEN 40 AND 60)
OR (a.aid = 45 AND CAST(a.value AS SIGNED) BETWEEN 770 AND 1500)
OR (a.aid = 95 AND CAST(a.value AS SIGNED) BETWEEN 12770 AND 15500)
OR (a.aid = 98 AND a.value = 'some value')
OR (a.aid = 199 AND a.value = 'some another value')
OR (a.aid = 102 AND a.value = 1)
OR (a.aid = 112 AND a.value = 42) )
GROUP BY
i.entity_id
HAVING
COUNT(i.entity_id) = 7
AND final_price BETWEEN 0 AND 9000
ORDER BY
final_price DESC
LIMIT 20
I group by COUNT() equal to 7 (number of attributes to search), because I need to find items with all these attributes.
EXPLAIN for the base query (the first one):
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE a ALL entity_id,value NULL NULL NULL 379999 Using where; Using temporary; Using filesort
1 SIMPLE i eq_ref PRIMARY PRIMARY 4 testowa.a.entity_id 1 Using where
1 SIMPLE r ALL NULL NULL NULL NULL 2
I read many topics about comparing UNION vs JOIN vs IN() and the best results gives the second option, but it's too slow all the time.
Is there any way to get better performance here? Why is it so slow?
Should I think about moving some logic (split this query to 3 small) to backend (php/ror) code?
I would restructure your query slightly and have the attributes table first
and then joined to the items. Also, I would have a covering index on the
items table via (entity_id, price) and an index on your attributes table
ON (cid, aid, value, entity_id), and your rates table index
ON (currency_from, currency_to, rate). This way, all are covering indexes
and the engine won't need to go to the raw data pages to get the data, it can
pull it from the indexes it is already using for the joining / criteria.
SELECT
i.entity_id,
i.price * COALESCE(r.rate, 1) AS final_price
FROM
attributes a
JOIN items i
ON a.entity_id = i.entity_id
LEFT JOIN rates r
ON i.currency = r.currency_from
AND r.currency_to = 'EUR'
WHERE
a.cid = 4 AND ( (a.aid >= 10 AND a.value > 2000) OR (a.aid <= 10 AND a.value > 5) )
HAVING
final_price BETWEEN 0 AND 9000
ORDER BY
final_price DESC
LIMIT 20
So, although this would help the query you have provided, could you show some other where you would have many more criteria conditions... you mentioned it could be as many (or more) than 30. Looking at more might alter the query slightly.
As for your updated query with multiple criteria, I would then add an IN() clause for all the "aid" values after the "a.cid = 4". This way, before it has to hit all the "OR" conditions, if it fails on the "aid" not being one you consider, it never has to hit those... such as
a.cid = 4
AND a.id in ( 10, 121, 45, 95, 98, 199, 102 )
AND ( rest of the complex aid, casting and between criteria )