Optimizing SQL request

Optimizing SQL request - mysql

Im using PDO Mysql, and made a request to select cheapest offers for a product in my database. It works fine, only problem is it is slow (for 200 offers (and still just 25 to return)) it takes almost a second, which is a lot higher than what I aim.
I'm no expert in SQL, so I seek for your help on this matter. Here is the request and I'll be happy to provide more info if needed :
SELECT
mo.id AS id,
mo.stock AS stock,
mo.price AS price,
mo.promotional_price AS promotional_price,
mo.picture_1 AS picture_1,
mo.picture_2 AS picture_2,
mo.picture_3 AS picture_3,
mo.picture_4 AS picture_4,
mo.picture_5 AS picture_5,
mo.title AS title,
mo.description AS description,
mo.state AS state,
mo.is_new AS is_new,
mo.is_original AS is_original,
c.name AS name,
u.id AS user_id,
u.username AS username,
u.postal_code AS postal_code,
p.name AS country_name,
ra.cache_rating_avg AS cache_rating_avg,
ra.cache_rating_nb AS cache_rating_nb,
GROUP_CONCAT(md.delivery_mode_id SEPARATOR ', ') AS delivery_mode_ids,
GROUP_CONCAT(ri.title SEPARATOR ', ') AS delivery_mode_titles
FROM
mp_offer mo, catalog_product_i18n c,
ref_country_i18n p, mp_offer_delivery_mode md,
ref_delivery_mode r,
ref_delivery_mode_i18n ri, user u
LEFT JOIN mp_user_review_rating_i18n ra
ON u.id = ra.user_id
WHERE (mo.product_id = c.id
AND mo.culture = c.culture
AND mo.user_id = u.id
AND u.country_id = p.id
AND mo.id = md.offer_id
AND md.delivery_mode_id = ri.id
AND mo.culture = ri.culture)
AND (mo.culture = 1
AND p.culture = 1)
AND mo.is_deleted = 0
AND mo.product_id = 60
AND ((u.holiday_start IS NULL)
OR (u.holiday_start = '0000-00-00')
OR (u.holiday_end IS NULL)
OR (u.holiday_end = '0000-00-00')
OR (u.holiday_start > '2012-05-03')
OR (u.holiday_end < '2012-05-03'))
AND mo.stock > 0
GROUP BY mo.id
ORDER BY IF (mo.promotional_price IS NULL,
mo.price,
LEAST(mo.price, mo.promotional_price)) ASC
LIMIT 25 OFFSET 0;
I take the offers for a particular product that have their "culture" set to 1, are not deleted, that have some stock and whose seller is not in holidays. I order by price (promotional_price when there is one).
Is LEAST a slow function?
Here is the output of EXPLAIN :
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE c const PRIMARY,catalog_product_i18n_product,catalog_product_i18n_culture PRIMARY 8 const,const 1 "Using temporary; Using filesort"
1 SIMPLE mo ref PRIMARY,culture,is_deleted,product_id,user_id culture 4 const 3 "Using where with pushed condition"
1 SIMPLE u eq_ref PRIMARY,user_country PRIMARY 4 database.mo.user_id 1 "Using where with pushed condition"
1 SIMPLE p eq_ref PRIMARY,ref_country_i18n_culture PRIMARY 8 database.u.country_id,const 1
1 SIMPLE r ALL NULL NULL NULL NULL 3 "Using join buffer"
1 SIMPLE ra ALL NULL NULL NULL NULL 4
1 SIMPLE md ref PRIMARY,fk_offer_has_delivery_mode_delivery_mode1,fk_offer_has_delivery_mode_offer1 PRIMARY 4 database.mo.id 2
1 SIMPLE ri eq_ref PRIMARY PRIMARY 2 database.md.delivery_mode_id,const 1
Thanks in advance for your help on optimizing this request.
J

You are not making use of ref_delivery_mode table that you have included in from clause. It's getting cause of Cartesian product of tables result.

Related

Why does OR in subquery make query so much slower?

I'm using MySQL and have the following query that I was trying to improve:
SELECT
*
FROM
overpayments AS op
JOIN payment_allocations AS overpayment_pa ON overpayment_pa.allocatable_id = op.id
AND overpayment_pa.allocatable_type = 'Overpayment'
JOIN (
SELECT
pa.payment_source_type,
pa.payment_source_id,
ft.conversion_rate
FROM
payment_allocations AS pa
LEFT JOIN line_items AS li ON pa.payment_source_id = li.id
LEFT JOIN credit_notes AS cn ON li.parent_document_id = cn.id
LEFT JOIN financial_transactions AS ft ON (
ft.commercial_document_id = pa.payment_source_id
AND ft.commercial_document_type = pa.payment_source_type
)
OR (
ft.commercial_document_id = cn.id
AND ft.commercial_document_type = 'CreditNote'
)
WHERE
pa.allocatable_type = 'Overpayment'
AND pa.company_id = 14792
AND ft.company_id = 14792
) AS op_bank_transaction_ft ON op_bank_transaction_ft.payment_source_id = overpayment_pa.payment_source_id
AND op_bank_transaction_ft.payment_source_type = overpayment_pa.payment_source_type;
It takes 10s to run. I was able to improve to 0.047s it by removing the OR statement in the subquery and using COALESCE to get the result:
SELECT
*
FROM
overpayments AS op
JOIN payment_allocations AS overpayment_pa ON overpayment_pa.allocatable_id = op.id
AND overpayment_pa.allocatable_type = 'Overpayment'
JOIN (
SELECT
pa.payment_source_type,
pa.payment_source_id,
coalesce(ft_one.conversion_rate, ft_two.conversion_rate)
FROM
payment_allocations AS pa
LEFT JOIN line_items AS li ON pa.payment_source_id = li.id
LEFT JOIN credit_notes AS cn ON li.parent_document_id = cn.id
LEFT JOIN financial_transactions AS ft_one ON (
ft_one.commercial_document_id = pa.payment_source_id
AND ft_one.commercial_document_type = pa.payment_source_type
AND ft_one.company_id = 14792
)
LEFT JOIN financial_transactions AS ft_two ON (
ft_two.commercial_document_id = cn.id
AND ft_two.commercial_document_type = 'CreditNote'
AND ft_two.company_id = 14792
)
WHERE
pa.allocatable_type = 'Overpayment'
AND pa.company_id = 14792
) AS op_bank_transaction_ft ON op_bank_transaction_ft.payment_source_id = overpayment_pa.payment_source_id
AND op_bank_transaction_ft.payment_source_type = overpayment_pa.payment_source_type;
However, I don't really understand why that worked? The original sub query ran very quickly and only returned 2 results, so why would it slow down the query by so much? Explain on the first query returns the following:
# id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
FIELD13
1
SIMPLE
pa
ref
index_payment_allocations_on_payment_source_id
index_payment_allocations_on_company_id
index_payment_allocations_on_company_id
5
const
191
10.00
Using where
1
SIMPLE
overpayment_pa
ref
index_payment_allocations_on_payment_source_id
index_payment_allocations_on_allocatable_id
index_payment_allocations_on_payment_source_id
5
rails.pa.payment_source_id
1
3.42
Using where
1
SIMPLE
op
eq_ref
PRIMARY
PRIMARY
4
rails.overpayment_pa.allocatable_id
1
100.00
1
SIMPLE
li
eq_ref
PRIMARY
PRIMARY
4
rails.pa.payment_source_id
1
100.00
1
SIMPLE
cn
eq_ref
PRIMARY
PRIMARY
8
rails.li.parent_document_id
1
100.00
Using where; Using index
1
SIMPLE
ft
ALL
transactions_unique_by_commercial_doc
12587878
0.00
Range checked for each record (index map: 0x2)
And for the second I get the following:
# id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
FIELD13
FIELD14
1
SIMPLE
pa
ref
index_payment_allocations_on_payment_source_id
index_payment_allocations_on_company_id
index_payment_allocations_on_company_id
5
const
191
10.00
Using where
1
SIMPLE
overpayment_pa
ref
index_payment_allocations_on_payment_source_id
index_payment_allocations_on_allocatable_id
index_payment_allocations_on_payment_source_id
5
rails.pa.payment_source_id
1
3.42
Using where
1
SIMPLE
op
eq_ref
PRIMARY
PRIMARY
4
rails.overpayment_pa.allocatable_id
1
100.00
1
SIMPLE
ft_one
ref
transactions_unique_by_commercial_doc
index_financial_transactions_on_company_id
transactions_unique_by_commercial_doc
773
rails.pa.payment_source_id
rails.pa.payment_source_type
1
100.00
Using where
1
SIMPLE
li
eq_ref
PRIMARY
PRIMARY
4
rails.pa.payment_source_id
1
100.00
1
SIMPLE
cn
eq_ref
PRIMARY
PRIMARY
8
rails.li.parent_document_id
1
100.00
Using where; Using index
1
SIMPLE
ft_two
ref
transactions_unique_by_commercial_doc
index_financial_transactions_on_company_id
transactions_unique_by_commercial_doc
773
rails.cn.id
const
1
100.00
Using where
but I don't really know how to interpret those results.

Look at the right side of the last row of your first EXPLAIN. It didn't use an index, and it had to scan through megarows. That's slow. Your second query used indexes for every step of the query, so it was much faster.
If your second query yields correct results, use it and don't look back. Congratulations! You've optimized a query.
OR operations, especially in ON clauses, are harder than usual for the query planner module to satisfy, because they often mean it has to take the union of two separate subqueries. It looks like the planner chose to brute-force it in your case. (brute force === scanning many rows.)
Without knowing your indexes, it's hard to help you further.
Read this to learn more. https://use-the-index-luke.com

These may further speed up the second formulation:
overpayment_pa:
INDEX(payment_source_id, payment_source_type, allocatable_type, allocatable_id)
pa: INDEX(allocatable_type, company_id, payment_source_id, payment_source_type)
financial_transactions:
INDEX(commercial_document_id, commercial_document_type, company_id, conversion_rate)

mysql query taking too long to response

Why such query in mysql takes too long to respond:
select s.sid,s.sname,
sum(case WHEN d.dgr_date='2014-12-31' then d.daily_gen end) as dgen,
round(sum(case WHEN d.dgr_date between '2013-12-01' and last_day
('2013-12-31') then d.daily_gen end)/1000000,2) as pmtd
from dgrs d ,locs l, spvs s
where l.mloc=d.mc_loc and s.sid=l.spid
group by s.sname;
index: compound(d.dgr_date, d.daily_gen), d.mc_loc(fk:l.mloc), l.mloc, s.sid
Main table: dgrs(400k rows).
Explain of query
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE s ALL PRIMARY NULL NULL NULL 20 Using temporary; Using filesort
1 SIMPLE l ref PRIMARY,fk_spid fk_spid 4 oms.s.sid 9 Using index
1 SIMPLE d ref fk_mcloc fk_mcloc 102 oms.l.mloc 485 NULL

You should use an indexed field in the GROUP BY clause. I guess s.sid is the PK of table spvs. Use it instead of s.sname.
Make sure the fields that appear in the WHERE clause are indexed.
Put an index on column dgr_date and move the conditions from the second CASE into the where clause:
SELECT s.sid, s.sname,
SUM(IF(d.dgr_date = '2014-12-31', d.daily_gen, 0)) as dgen,
ROUND(SUM(d.daily_gen)/1000000, 2) AS pmtd
FROM dgrs d, locs l, spvs s
WHERE l.mloc = d.mc_loc
AND s.sid = l.spid
AND d.dgr_date BETWEEN '2013-12-01' AND '2013-12-31'
GROUP BY s.sid

mysql join may decrease response time
select s.sid,s.sname,
sum(case WHEN d.dgr_date='2014-12-31' then d.daily_gen end) as dgen,
round(sum(case WHEN d.dgr_date between '2013-12-01' and last_day
('2013-12-31') then d.daily_gen end)/1000000,2) as pmtd
from dgrs d
inner join locs l on l.mloc=d.mc_loc
inner join spvs s on and s.sid=l.spid
group by s.sname;

need to make this query scalable/optimized for larger db in the future (remove full table reads)

Been working at this for awhile now and cannot seem to get it optimized. Although it does work, each left joined logs* table is reading every row in the database regardless if it is part of the set it is joined to (user_id's). While it returns correct results as is, this will be a problem as the user base and db as a whole grows.
Some quick background : given an account id there can be any number of computers to it. On each of those computers there can be any number of users linked to it. These user_id's are then linked in the logs tables. Each of these relationships is indexed (account_id, computer_id, user_id) for the necessary tables.
I have put the left joins in subqueries to prevent a cartesian product (a previous issue which subqueries solved).
Query :
SELECT
users.username as username,
computers.computer_name as computer_name,
l1.cnt as cnt1,
l2.cnt as cnt2,
l3.cnt as cnt3,
l4.cnt as cnt4,
l5.cnt as cnt5,
l6.cnt as cnt6
FROM computers
INNER JOIN users
on users.computer_id = computers.computer_id
LEFT JOIN
(SELECT
user_id,
count(*) as cnt
from logs1
group by user_id
) AS l1
on l1.user_id = users.user_id
LEFT JOIN
(SELECT
user_id,
count(*) as cnt
from logs2
group by user_id
) AS l2
on l2.user_id = users.user_id
LEFT JOIN
(SELECT
user_id,
count(*) as cnt
from logs3
group by user_id
) AS l3
on l3.user_id = users.user_id
LEFT JOIN
(SELECT
user_id,
count(*) as cnt
from logs4
group by user_id
) AS l4
on l4.user_id = users.user_id
LEFT JOIN
(SELECT
user_id,
count(*) as cnt
from logs5
group by user_id
) AS l5
on l5.user_id = users.user_id
LEFT JOIN
(SELECT
user_id,
count(*) as cnt
from logs6
group by user_id
) AS l6
on l6.user_id = users.user_id
WHERE computers.account_id = :cw_account_id AND computers.status = :cw_status
GROUP BY users.user_id
Plan :
computers 1 PRIMARY ref PRIMARY,unique_filter,status unique_filter 4 const 5 Using where; Using temporary; Using filesort
users 1 PRIMARY ref PRIMARY,unique_filter unique_filter 4 stephen_spcplus_inno.computers.computer_id 1 Using index
<derived2> 1 PRIMARY ref <auto_key0> <auto_key0> 4 stephen_spcplus_inno.users.user_id 3
logs1 2 DERIVED index user_id user_id 8 33 Using index
<derived3> 1 PRIMARY ref <auto_key0> <auto_key0> 4 stephen_spcplus_inno.users.user_id 10
logs2 3 DERIVED index user_id user_id 8 101 Using index
<derived4> 1 PRIMARY ref <auto_key0> <auto_key0> 4 stephen_spcplus_inno.users.user_id 4
logs3 4 DERIVED index user_id user_id 8 41 Using index
<derived5> 1 PRIMARY ref <auto_key0> <auto_key0> 4 stephen_spcplus_inno.users.user_id 2
logs4 5 DERIVED index user_id user_id 8 28 Using index
<derived6> 1 PRIMARY ref <auto_key0> <auto_key0> 4 stephen_spcplus_inno.users.user_id 2
logs5 6 DERIVED index user_id user_id 8 28 Using index
<derived7> 1 PRIMARY ref <auto_key0> <auto_key0> 4 stephen_spcplus_inno.users.user_id 275
logs6 7 DERIVED index user_id user_id 775 27516 Using index
example results :
username computer_name cnt1 cnt2 cnt3 cnt4 cnt5 cnt6
testuser COMPUTER_1 1 2 1 (null) (null) 3
testuser2 COMPUTER_1 (null) (null) (null) (null) (null) (null)
someuser COMPUTER_2 32 83 26 15 28 1157
As an example, for logs6 the plan is reading every row in the database (27516) yet there were only 1160 which 'should' have been joined.
I have tried lots of different things, but cannot get this to operate in an optimized manner. As it is currently the reason all the rows from each table are being read is due to the use of COUNT(*) within each joins subquery... removing this and only the needed rows are joined like I want, however, I do not know how to get the counts then in the same grouped result.
Help from any gurus would be great! Yes, I know I do not have a lot of rows in the db, but I can see the results are correct and see that the full table scans are going to be a problem as well.
EDIT (partial solution) :
I have found a partial solution to this problem, but it requires an additional query to get a list of user_ids. By adding WHERE user_id IN (17,22,23) where these are the user_ids which should be joined... to each log table I get the correct results and the entire table is not scanned.
If anyone knows of a way to make this work without this additional query and where additional please let me know.

I simplified your question to 2 log-tables and played around with it a bit on SQLFiddle.
=> http://sqlfiddle.com/#!2/a99e4a/2
It seems that using a sub-query makes things worse in my example data, but I wonder how it handles things when there are much more records in the tables that don't fit the criteria.
I'd suggest you give it a try and see what comes out. I don't have a MySql db to play around with here and I'd rather not bring SqlFiddle to its knees =)

Why does this WHERE clause make my query 180 times slower?

the following query executes in 1.6 seconds
SET #num :=0, #current_shop_id := NULL, #current_product_id := NULL;
#this query limits the results of the query within it by row number (so that only 250 products get displayed per store)
SELECT * FROM (
#this query adds row numbers to the query within it
SELECT *, #num := IF( #current_shop_id = shop_id, IF(#current_product_id=product_id,#num,#num+1), 0) AS row_number, #current_shop_id := shop_id AS shop_dummy, #current_product_id := product_id AS product_dummy FROM (
SELECT shop, shops.shop_id AS
shop_id, p1.product_id AS
product_id
FROM products p1 LEFT JOIN #this LEFT JOIN gets the favorites count for each product
(
SELECT fav3.product_id AS product_id, SUM(CASE
WHEN fav3.current = 1 AND fav3.closeted = 1 THEN 1
WHEN fav3.current = 1 AND fav3.closeted = 0 THEN -1
ELSE 0
END) AS favorites_count
FROM favorites fav3
GROUP BY fav3.product_id
) AS fav4 ON p1.product_id=fav4.product_id
INNER JOIN sex ON sex.product_id=p1.product_id AND
sex.sex=0 AND
sex.date >= SUBDATE(NOW(),INTERVAL 1 DAY)
INNER JOIN shops ON shops.shop_id = p1.shop_id
ORDER BY shop, sex.DATE, product_id
) AS testtable
) AS rowed_results WHERE
rowed_results.row_number>=0 AND
rowed_results.row_number<(7)
adding AND shops.shop_id=86 to the final WHERE clause causes the query to execute in 292 seconds:
SET #num :=0, #current_shop_id := NULL, #current_product_id := NULL;
#this query limits the results of the query within it by row number (so that only 250 products get displayed per store)
SELECT * FROM (
#this query adds row numbers to the query within it
SELECT *, #num := IF( #current_shop_id = shop_id, IF(#current_product_id=product_id,#num,#num+1), 0) AS row_number, #current_shop_id := shop_id AS shop_dummy, #current_product_id := product_id AS product_dummy FROM (
SELECT shop, shops.shop_id AS
shop_id, p1.product_id AS
product_id
FROM products p1 LEFT JOIN #this LEFT JOIN gets the favorites count for each product
(
SELECT fav3.product_id AS product_id, SUM(CASE
WHEN fav3.current = 1 AND fav3.closeted = 1 THEN 1
WHEN fav3.current = 1 AND fav3.closeted = 0 THEN -1
ELSE 0
END) AS favorites_count
FROM favorites fav3
GROUP BY fav3.product_id
) AS fav4 ON p1.product_id=fav4.product_id
INNER JOIN sex ON sex.product_id=p1.product_id AND
sex.sex=0 AND
sex.date >= SUBDATE(NOW(),INTERVAL 1 DAY)
INNER JOIN shops ON shops.shop_id = p1.shop_id AND
shops.shop_id=86
ORDER BY shop, sex.DATE, product_id
) AS testtable
) AS rowed_results WHERE
rowed_results.row_number>=0 AND
rowed_results.row_number<(7)
I would have thought limiting the shops table with AND shops.shop_id=86 would reduce execution time. Instead, execution time appears to depend upon the number of rows in the products table with products.shop_id equal to the specified shops.shop_id. There are about 34K rows in the products table with products.shop_id=86, and execution time is 292 seconds. For products.shop_id=50, there are about 28K rows, and execution time is 210 seconds. For products.shop_id=175, there are about 2K rows, and execution time is 2.8 seconds. What is going on?
EXPLAIN EXTENDED for the 1.6 second query is:
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 1203 100.00 Using where
2 DERIVED <derived3> ALL NULL NULL NULL NULL 1203 100.00
3 DERIVED sex ALL product_id_2,product_id NULL NULL NULL 526846 75.00 Using where; Using temporary; Using filesort
3 DERIVED p1 eq_ref PRIMARY,shop_id,shop_id_2,product_id,shop_id_3 PRIMARY 4 mydatabase.sex.product_id 1 100.00
3 DERIVED <derived4> ALL NULL NULL NULL NULL 14752 100.00
3 DERIVED shops eq_ref PRIMARY PRIMARY 4 mydatabase.p1.shop_id 1 100.00
4 DERIVED fav3 ALL NULL NULL NULL NULL 15356 100.00 Using temporary; Using filesort
SHOW WARNINGS for this EXPLAIN EXTENDED is
-----+
| Note | 1003 | select `rowed_results`.`shop` AS `shop`,`rowed_results`.`shop_id` AS `shop_id`,`rowed_results`.`product_id` AS `product_id`,`rowed_results`.`row_number` AS `row_number`,`rowed_results`.`shop_dummy` AS `shop_dummy`,`rowed_results`.`product_dummy` AS `product_dummy` from (select `testtable`.`shop` AS `shop`,`testtable`.`shop_id` AS `shop_id`,`testtable`.`product_id` AS `product_id`,(#num:=if(((#current_shop_id) = `testtable`.`shop_id`),if(((#current_product_id) = `testtable`.`product_id`),(#num),((#num) + 1)),0)) AS `row_number`,(#current_shop_id:=`testtable`.`shop_id`) AS `shop_dummy`,(#current_product_id:=`testtable`.`product_id`) AS `product_dummy` from (select `mydatabase`.`shops`.`shop` AS `shop`,`mydatabase`.`shops`.`shop_id` AS `shop_id`,`mydatabase`.`p1`.`product_id` AS `product_id` from `mydatabase`.`products` `p1` left join (select `mydatabase`.`fav3`.`product_id` AS `product_id`,sum((case when ((`mydatabase`.`fav3`.`current` = 1) and (`mydatabase`.`fav3`.`closeted` = 1)) then 1 when ((`mydatabase`.`fav3`.`current` = 1) and (`mydatabase`.`fav3`.`closeted` = 0)) then -(1) else 0 end)) AS `favorites_count` from `mydatabase`.`favorites` `fav3` group by `mydatabase`.`fav3`.`product_id`) `fav4` on(((`mydatabase`.`p1`.`product_id` = `mydatabase`.`sex`.`product_id`) and (`fav4`.`product_id` = `mydatabase`.`sex`.`product_id`))) join `mydatabase`.`sex` join `mydatabase`.`shops` where ((`mydatabase`.`sex`.`sex` = 0) and (`mydatabase`.`p1`.`product_id` = `mydatabase`.`sex`.`product_id`) and (`mydatabase`.`shops`.`shop_id` = `mydatabase`.`p1`.`shop_id`) and (`mydatabase`.`sex`.`date` >= (now() - interval 1 day))) order by `mydatabase`.`shops`.`shop`,`mydatabase`.`sex`.`date`,`mydatabase`.`p1`.`product_id`) `testtable`) `rowed_results` where ((`rowed_results`.`row_number` >= 0) and (`rowed_results`.`row_number` < 7)) |
+------
EXPLAIN EXTENDED for the 292 second query is:
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 36 100.00 Using where
2 DERIVED <derived3> ALL NULL NULL NULL NULL 36 100.00
3 DERIVED shops const PRIMARY PRIMARY 4 1 100.00 Using temporary; Using filesort
3 DERIVED p1 ref PRIMARY,shop_id,shop_id_2,product_id,shop_id_3 shop_id 4 11799 100.00
3 DERIVED <derived4> ALL NULL NULL NULL NULL 14752 100.00
3 DERIVED sex eq_ref product_id_2,product_id product_id_2 5 mydatabase.p1.product_id 1 100.00 Using where
4 DERIVED fav3 ALL NULL NULL NULL NULL 15356 100.00 Using temporary; Using filesort
SHOW WARNINGS for this EXPLAIN EXTENDED is
----+
| Note | 1003 | select `rowed_results`.`shop` AS `shop`,`rowed_results`.`shop_id` AS `shop_id`,`rowed_results`.`product_id` AS `product_id`,`rowed_results`.`row_number` AS `row_number`,`rowed_results`.`shop_dummy` AS `shop_dummy`,`rowed_results`.`product_dummy` AS `product_dummy` from (select `testtable`.`shop` AS `shop`,`testtable`.`shop_id` AS `shop_id`,`testtable`.`product_id` AS `product_id`,(#num:=if(((#current_shop_id) = `testtable`.`shop_id`),if(((#current_product_id) = `testtable`.`product_id`),(#num),((#num) + 1)),0)) AS `row_number`,(#current_shop_id:=`testtable`.`shop_id`) AS `shop_dummy`,(#current_product_id:=`testtable`.`product_id`) AS `product_dummy` from (select 'shop.nordstrom.com' AS `shop`,'86' AS `shop_id`,`mydatabase`.`p1`.`product_id` AS `product_id` from `mydatabase`.`products` `p1` left join (select `mydatabase`.`fav3`.`product_id` AS `product_id`,sum((case when ((`mydatabase`.`fav3`.`current` = 1) and (`mydatabase`.`fav3`.`closeted` = 1)) then 1 when ((`mydatabase`.`fav3`.`current` = 1) and (`mydatabase`.`fav3`.`closeted` = 0)) then -(1) else 0 end)) AS `favorites_count` from `mydatabase`.`favorites` `fav3` group by `mydatabase`.`fav3`.`product_id`) `fav4` on(((`fav4`.`product_id` = `mydatabase`.`p1`.`product_id`) and (`mydatabase`.`sex`.`product_id` = `mydatabase`.`p1`.`product_id`))) join `mydatabase`.`sex` join `mydatabase`.`shops` where ((`mydatabase`.`sex`.`sex` = 0) and (`mydatabase`.`sex`.`product_id` = `mydatabase`.`p1`.`product_id`) and (`mydatabase`.`p1`.`shop_id` = 86) and (`mydatabase`.`sex`.`date` >= (now() - interval 1 day))) order by 'shop.nordstrom.com',`mydatabase`.`sex`.`date`,`mydatabase`.`p1`.`product_id`) `testtable`) `rowed_results` where ((`rowed_results`.`row_number` >= 0) and (`rowed_results`.`row_number` < 7)) |
+-----
I am running MySQL client version: 5.1.56. The shops table has a primary index on shop_id:
Action Keyname Type Unique Packed Column Cardinality Collation Null Comment
Edit Drop PRIMARY BTREE Yes No shop_id 163 A
I have analyzed the shop table but this did not help.
I notice that if I remove the LEFT JOIN the difference in execution times drops to 0.12 seconds versus 0.28 seconds.
Cez's solution, namely to use the 1.6-second version of the query and remove irrelevant results by adding rowed_results.shop_dummy=86 to the outer query (as below), executes in 1.7 seconds. This circumvents the problem, but the mystery remains why 292-second query is so slow.
SET #num :=0, #current_shop_id := NULL, #current_product_id := NULL;
#this query limits the results of the query within it by row number (so that only 250 products get displayed per store)
SELECT * FROM (
#this query adds row numbers to the query within it
SELECT *, #num := IF( #current_shop_id = shop_id, IF(#current_product_id=product_id,#num,#num+1), 0) AS row_number, #current_shop_id := shop_id AS shop_dummy, #current_product_id := product_id AS product_dummy FROM (
SELECT shop, shops.shop_id AS
shop_id, p1.product_id AS
product_id
FROM products p1 LEFT JOIN #this LEFT JOIN gets the favorites count for each product
(
SELECT fav3.product_id AS product_id, SUM(CASE
WHEN fav3.current = 1 AND fav3.closeted = 1 THEN 1
WHEN fav3.current = 1 AND fav3.closeted = 0 THEN -1
ELSE 0
END) AS favorites_count
FROM favorites fav3
GROUP BY fav3.product_id
) AS fav4 ON p1.product_id=fav4.product_id
INNER JOIN sex ON sex.product_id=p1.product_id AND sex.sex=0
INNER JOIN shops ON shops.shop_id = p1.shop_id
WHERE sex.date >= SUBDATE(NOW(),INTERVAL 1 DAY)
ORDER BY shop, sex.DATE, product_id
) AS testtable
) AS rowed_results WHERE
rowed_results.row_number>=0 AND
rowed_results.row_number<(7) AND
rowed_results.shop_dummy=86;

After the chat room, and actually creating tables/columns to match the query, I've come up with the following query.
I have started my inner-most query to be on the sex, product (for shop_id) and favorites table. Since you described that ProductX at ShopA = Product ID = 1 but same ProductX at ShopB = Product ID = 2 (example only), each product is ALWAYS unique per shop and never duplicated. That said, I can get the product and shop_id WITH the count of favorites (if any) at this query, yet group on just the product_id .. as shop_id won't change per product I am using MAX(). Since you are always looking by a date of "yesterday" and gender (sex=0 female), I would have the SEX table indexed on ( date, sex, product_id )... I would guess you are not adding 1000's of items every day... Products obviously would have an index on product_id (primary key), and favorites SHOULD have an index on product_id.
From that result (alias "sxFav") we can then do a direct join to the sex and products table by that "Product_ID" to get any additional information you may want, such as name of shop, date product added, product description, etc. This result is then ordered by the shop_id the product is being sold from, date and finally product ID (but you may consider grabbing a description column at inner query and using that as sort-by). This results in alias "PreQuery".
With the order being all proper by shop, we can now add the #MySQLVariable references to get each product assigned a row number similar to how you originally attempted. However, only reset back to 1 when a shop ID changes.
SELECT
PreQuery.*,
#num := IF( #current_shop_id = PreQuery.shop_id, #num +1, 1 ) AS RowPerShop,
#current_shop_id := PreQuery.shop_id AS shop_dummy
from
( SELECT
sxFav.product_id,
sxFav.shop_id,
sxFav.Favorites_Count
from
( SELECT
sex.product_id,
MAX( p.shop_id ) shop_id,
SUM( CASE WHEN F.current = 1 AND F.closeted = 1 THEN 1
WHEN F.current = 1 AND F.closeted = 0 THEN -1
ELSE 0 END ) AS favorites_count
from
sex
JOIN products p
ON sex.Product_ID = p.Product_ID
LEFT JOIN Favorites F
ON sex.product_id = F.product_ID
where
sex.date >= subdate( now(), interval 1 day)
and sex.sex = 0
group by
sex.product_id ) sxFav
JOIN sex
ON sxFav.Product_ID = sex.Product_ID
JOIN products p
ON sxFav.Product_ID = p.Product_ID
order by
sxFav.shop_id,
sex.date,
sxFav.product_id ) PreQuery,
( select #num :=0,
#current_shop_id := 0 ) as SQLVars
Now, if you are looking for specific "paging" information (such as 7 entries per shop), wrap the ENTIRE query above into something like...
select * from ( entire query above ) where RowPerShop between 1 and 7
(or between 8 and 14, 15 and 21, etc as needed)
or even
RowPerShop between RowsPerPage*PageYouAreShowing and RowsPerPage*(PageYouAreShowing +1)

You should move the shops.shop_id=86 to the JOIN condition for shops. No reason to put it outside the JOIN, you run the risk of MySQL JOINing first, then filtering. A JOIN can do the same job the a WHERE clause does, especially if you are not referencing other tables.
....
INNER JOIN shops ON shops.shop_id = p1.shop_id AND shops.shop_id=86
....
Same thing with the sex join:
...
INNER JOIN shops ON shops.shop_id = p1.shop_id
AND sex.date >= SUBDATE(NOW(),INTERVAL 1 DAY)
...
Derived tables are great, but they have no indexes on them. Usually this doesn't matter since they are generally in RAM. But between filtering and sorting with no indexes, things can add up.
Note that in the second query that take much longer, the table processing order changes. The shop table is at the top in the slow query and the p1 table retrieves 11799 rows instead of 1 row in the fast query. It also doesn't use the primary key any more. That's likely where your problem is.
3 DERIVED p1 eq_ref PRIMARY,shop_id,shop_id_2,product_id,shop_id_3 PRIMARY 4 mydatabase.sex.product_id 1 100.00
3 DERIVED p1 ref PRIMARY,shop_id,shop_id_2,product_id,shop_id_3 shop_id 4 11799 100.00

Judging by the discussion, the query planner is performing badly when specifying the shop at a lower level.
Add rowed_results.shop_dummy=86 to the outer query to get the results that you are looking for.

How can I optimise my MySQL query?

I'm having real difficulties optimising a MySQL query. I have to use the existing database structure, but I am getting an extremely slow response under certain circumstances.
My query is:
SELECT
`t`.*,
`p`.`trp_name`,
`p`.`trp_lname`,
`trv`.`trv_prosceslevel`,
`trv`.`trv_id`,
`v`.`visa_destcountry`,
`track`.`track_id`,
`track`.`track_datetoembassy`,
`track`.`track_expectedreturn`,
`track`.`track_status`,
`track`.`track_comments`
FROM
(SELECT
*
FROM
`_transactions`
WHERE
DATE(`tr_datecreated`) BETWEEN DATE('2011-07-01 00:00:00') AND DATE('2011-08-01 23:59:59')) `t`
JOIN
`_trpeople` `p` ON `t`.`tr_id` = `p`.`trp_trid` AND `p`.`trp_name` = 'Joe' AND `p`.`trp_lname` = 'Bloggs'
JOIN
`_trvisas` `trv` ON `t`.`tr_id` = `trv`.`trv_trid`
JOIN
`_visas` `v` ON `trv`.`trv_visaid` = `v`.`visa_code`
JOIN
`_trtracking` `track` ON `track`.`track_trid` = `t`.`tr_id` AND `p`.`trp_id` = `track`.`track_trpid` AND `trv`.`trv_id` = `track`.`track_trvid` AND `track`.`track_status` IN ('New','Missing_Info',
'En_Route',
'Ready_Pickup',
'Received',
'Awaiting_Voucher',
'Sent_Client',
'Closed')
ORDER BY `tr_id` DESC
The results of an explain statement on the above is:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 164 Using temporary; Using filesort
1 PRIMARY track ALL status_index NULL NULL NULL 4677 Using where
1 PRIMARY p eq_ref PRIMARY PRIMARY 4 db.track.track_trpid 1 Using where
1 PRIMARY trv eq_ref PRIMARY PRIMARY 4 db.track.track_trvid 1 Using where
1 PRIMARY v eq_ref visa_code visa_code 4 db.trv.trv_visaid 1
2 DERIVED _transactions ALL NULL NULL NULL NULL 4276 Using where
The query times are acceptable until the value of 'Closed' is included in the very last track.track_status IN clause. The length of time is then increased about 10 to 15 times the other queries.
This makes sense as the 'Closed' status refers to all the clients whose transactions have been dealt with, wihich corresponds to about 90% to 95% of the database.
The issue is, is that in some cases, the search is taking about 45 seconds which is rediculous. I'm sure MySQL can do much better than that and it's just my query at fault, even if the tables do have 4000 rows, but I can't work out how to optimise this statement.
I'd be grateful for some advice about where I'm going wrong and how I should be implementing this query to produce a faster result.
Many thanks

Try this:
SELECT t.*,
p.trp_name,
p.trp_lname,
trv.trv_prosceslevel,
trv.trv_id,
v.visa_destcountry,
track.track_id,
track.track_datetoembassy,
track.track_expectedreturn,
track.track_status,
track.track_comments
FROM
_transactions t
JOIN _trpeople p ON t.tr_id = p.trp_trid
JOIN _trvisas trv ON t.tr_id = trv.trv_trid
JOIN _visas v ON trv.trv_visaid = v.visa_code
JOIN _trtracking track ON track.track_trid = t.tr_id
AND p.trp_id = track.track_trpid
AND trv.trv_id = track.track_trvid
WHERE DATE(t.tr_datecreated)
BETWEEN DATE('2011-07-01 00:00:00') AND DATE('2011-08-01 23:59:59')
AND track.track_status IN ('New','Missing_Info','En_Route','Ready_Pickup','Received','Awaiting_Voucher','Sent_Client', 'Closed')
AND p.trp_name = 'Joe' AND p.trp_lname = 'Bloggs'
ORDER BY tr_id DESC

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Optimizing SQL request - mysql

You are not making use of ref_delivery_mode table that you have included in from clause. It's getting cause of Cartesian product of tables result.

Related

Why does OR in subquery make query so much slower?

mysql query taking too long to response

need to make this query scalable/optimized for larger db in the future (remove full table reads)

Why does this WHERE clause make my query 180 times slower?

How can I optimise my MySQL query?

Categories

Resources