I'm have a question. The following query is taking upwards of 2 - 3 seconds to exicute and I'm not sure why. I have 2 tables involved one with a list of items and the another with a list of attribute's for each item. The items table is indexed with unique primary key and the attributes table has a foreign key constraint.
The relationship between the items table is ONE TO MANY to the attributes.
I am not sure how else to speed up query and would appreciate any advice.
The database is MYSQL inodb
EXPLAIN SELECT * FROM eshop_items AS ite WHERE (SELECT attValue FROM eshop_items_attributes WHERE attItemId=ite.ItemId ANd attType=5 AND attValue='20')='20' ORDER BY itemAdded DESC LIMIT 0, 18;
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
1 PRIMARY ite ALL NULL NULL NULL NULL 57179 Using where; Using filesort
2 DEPENDENT SUBQUERY eshop_items_attributes ref attItemId attItemId 9 gabriel_new.ite.itemId 5 Using where
Index: eshop_items_attributes
Name Fieldnames Index Type Index method
attItemId attItemId Normal BTREE
attType attType Normal BTREE
attValue attValue Normal BTREE
Index: eshop_items
Name Fieldnames Index Type Index method
itemCode itemCode Unique BTREE
itemCodeOrig itemCodeOrig Unique BTREE
itemConfig itemConfig Normal BTREE
itemStatus itemStatus Normal BTREE
Can't use a join because the item_attributes table is a key -> value pair table. So for every record in the items_attributes table there can be many item id's
here is a sample
item_id attribute_index attribute_value
12345 10 true
12345 2 somevalue
12345 6 some other value
32456 10 true
32456 11 another value
32456 2 somevalue
So a join wouldn't work because I can't join multiple rows from the items_attributes table to one row in the items table.
I can't write a query where attribute_index is = to 2 AN attribute_index = 10. I would always get back no results.
:(
Change the query from correlated to IN and see what happens.
SELECT *
FROM eshop_items AS ite
WHERE ItemId IN (
SELECT attItemId
FROM eshop_items_attributes
WHERE attType=5
AND attValue='20')
ORDER BY itemAdded DESC
LIMIT 0, 18
You'll see further gains by changing your btree to bitmap on eshop_items_attributes. But be warned: bitmap has consequences on INSERT/UPDATE.
The "DEPENDENT SUBQUERY" is what's killing performance in this query. It has to run the subquery once for every distinct ItemId in the outer query. It should be much better as a join:
SELECT ite.* FROM eshop_items AS ite
INNER JOIN eshop_items_attributes AS a ON ite.ItemId = a.attItemId
WHERE a.attType = 5 AND a.attValue = 20
ORDER BY ite.itemAdded DESC LIMIT 0, 18;
I find it much easier to think about such a query as a join:
SELECT ite.*
FROM eshop_items ite join
eshop_items_attributes ia
on ia.attItemId = ite.ItemId and
ia.attType = 5 and
ia.attValue='20'
ORDER BY ite.itemAdded DESC
LIMIT 0, 18;
This works if there is at most one matching attribute for each item. Otherwise, you need select distinct (which could hurt performance, except you are already doing a sort).
To facilitate this join, create the index eshop_items_attributes(attType, attValue, attItemId). The index should satisfy the join without having to read the table, the rest is dealing with the result set.
The same index would probably help with the correlated subquery.
Related
I am using Laravel query builder to get desired results from database. The following query if working perfectly but taking too much time to get results. Can you please help me with this?
select
`amz_ads_sp_campaigns`.*,
SUM(attributedUnitsOrdered7d) as order7d,
SUM(attributedUnitsOrdered30d) as order30d,
SUM(attributedSales7d) as sale7d,
SUM(attributedSales30d) as sale30d,
SUM(impressions) as impressions,
SUM(clicks) as clicks,
SUM(cost) as cost,
SUM(attributedConversions7d) as attributedConversions7d,
SUM(attributedConversions30d) as attributedConversions30d
from
`amz_ads_sp_product_targetings`
inner join `amz_ads_sp_report_product_targetings` on `amz_ads_sp_product_targetings`.`campaignId` = `amz_ads_sp_report_product_targetings`.`campaignId`
inner join `amz_ads_sp_campaigns` on `amz_ads_sp_report_product_targetings`.`campaignId` = `amz_ads_sp_campaigns`.`campaignId`
where
(
`amz_ads_sp_product_targetings`.`user_id` = ?
and `amz_ads_sp_product_targetings`.`profileId` = ?
)
group by
`amz_ads_sp_product_targetings`.`campaignId`
Result of Explain SQL
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE amz_ads_sp_report_product_targetings ALL campaignId NULL NULL NULL 50061 Using temporary; Using filesort
1 SIMPLE amz_ads_sp_campaigns ref campaignId campaignId 8 pr-amz-ppc.amz_ads_sp_report_product_targetings.ca... 1
1 SIMPLE amz_ads_sp_product_targetings ref campaignId campaignId 8 pr-amz-ppc.amz_ads_sp_report_product_targetings.ca... 33 Using where
Your query could benefit from several indices to cover the WHERE clause as well as the join conditions:
CREATE INDEX idx1 ON amz_ads_sp_product_targetings (
user_id, profileId, campaignId);
CREATE INDEX idx2 ON amz_ads_sp_report_product_targetings (
campaignId);
CREATE INDEX idx3 ON amz_ads_sp_campaigns (campaignId);
The first index idx1 covers the entire WHERE clause, which might let MySQL throw away many records on the initial scan of the amz_ads_sp_product_targetings table. It also includes the campaignId column, which is needed for the first join. The second and third indices cover the join columns of each respective table. This might let MySQL do a more rapid lookup during the join process.
Note that selecting amz_ads_sp_campaigns.* is not valid unless the campaignId of that table be the primary key. Also, there isn't much else we can do speed up the query, as SUM, by its nature, requires touching every record in order to come up the result sum.
I have two tables to make my search engine, one containing all keywords and the other contains all the possible targets for each keyword.
Table: keywords
id (int)
keyword (varchar)
Table: results
id (int)
keyword_id (int)
table_id (int)
target_id (int)
For both tables, I set MyISAM as storage engine since 95% of the times I am just running select queries on these tables and in 5% of the times, insert queries. And off course, I already compared the performance using InnoDB and the performance was poor considering my later queries.
I also added the following indexes
keywords.keyword (unique)
results.keyword_id (index)
results.table_id (index)
results.target_id (index)
in the keywords table, I have about 1.2 million records and in results table I have about 9.8 million records.
Now the issue is that I run the following query and the results is made in 0.0014 seconds
SELECT rs.table_id, rs.target_id
FROM keywords ky INNER JOIN results rs ON ky.id=rs.keyword_id
WHERE ky.keyword LIKE "x%" OR ky.keyword LIKE "y%"
But when I add GROUP BY, the result is made in 0.2 seconds
SELECT rs.table_id, rs.target_id
FROM keywords ky INNER JOIN results rs ON ky.id=rs.keyword_id
WHERE ky.keyword LIKE "x%" OR ky.keyword LIKE "y%"
GROUP BY rs.table_id, rs.target_id
I tested composite indexes, single column indexes and even dropping table_id and target_id indexes but in all the cases the performance is the same and it seems that in Group By clause, the index is not applied.
The explain plan shows that:
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
1 | SIMPLE | ky | range | PRIMARY,keyword | keyword | 767 | NULL | 3271 | Using index condition; Using where; Using temporary; Using filesort
1 | SIMPLE | rs | ref | keyword_id | keyword_id | 4 | ky.id | 3
I have the following composite key already added
ALTER TABLE results ADD INDEX `table_id` (`table_id`, `target_id`) USING BTREE;
Here's MySQL documentation for GROUP BY optimization, this is what it says:
The most important preconditions for using indexes for GROUP BY are
that all GROUP BY columns reference attributes from the same index
So, if you have different index on these two columns, they won't be used by GROUP BY. You should try creating a composite index on table_id and target_id.
Also, the query seem to be using LIKE operator. Please note that if the value being compared in LIKE has leading wildcard in it then MySQL won't be able to use any index for that column anyway. Have a look at explain plan of the query and see which indices are used.
JOIN + GROUP BY (or DISTINCT) is what I call "explode-implode" -- First the JOIN multiplies the number of 'rows' to look at, then the GROUP BY deflates the row count.
One work around to avoid this is to focus on the primary table, then check for EXISTS in the other table:
SELECT rs.table_id, rs.target_id
FROM keywords ky
WHERE EXISTS(
SELECT 1
FROM results rs
WHERE ky.id = rs.keyword_id
AND ( ky.keyword LIKE "x%"
OR ky.keyword LIKE "y%" )
);
rs requires INDEX(keyword_id).
An improvement on that might be to get rid of the OR via
WHERE ky.id = rs.keyword_id
AND ky.keyword REGEXP "^[xy]"
But that is not very helpful since it still needs to fully check keyword.
Another improvement could be to turn the OR into UNION:
( SELECT rs.table_id, rs.target_id
FROM keywords ky
INNER JOIN results rs ON ky.id=rs.keyword_id
WHERE ky.keyword LIKE "x%"
) UNION ALL
( SELECT rs.table_id, rs.target_id
FROM keywords ky
INNER JOIN results rs ON ky.id=rs.keyword_id
WHERE ky.keyword LIKE "y%"
)
ky: INDEX(keyword, id)
rs: INDEX(keyword_id)
The advantage here (other than avoiding the inflate-deflate) is that the index can be used on.
(Please provide SHOW CREATE TABLE for both tables; there may be other tips.)
This query:
EXPLAIN SELECT ppi_loan.customerID,
loan_number,
CONCAT(forename, ' ', surname) AS agent,
name,
broker,
(SELECT timestamp
FROM ppi_sar_status
WHERE history = 0
AND (status = 10 || status = 13)
AND ppi_sar_status.loanID = ppi_loan.loanID) AS ppi_unsure_date,
fosSent,
letterSent,
(SELECT timestamp
FROM ppi_ques_status
WHERE status = 1
AND ppi_ques_status.loanID = ppi_loan.loanID
ORDER BY timestamp DESC LIMIT 1) AS sent_date,
ppi_ques_status.timestamp
FROM ppi_loan
LEFT JOIN ppi_assignments ON ppi_assignments.customerID = ppi_loan.customerID
LEFT JOIN italk.users ON italk.users.id = agentID
LEFT JOIN ppi_ques_status ON ppi_ques_status.loanID = ppi_loan.loanID
JOIN ppi_lenders ON ppi_lenders.id = ppi_loan.lender
JOIN ppi_status ON ppi_status.customerID = ppi_loan.customerID
JOIN ppi_statuses ON ppi_statuses.status = ppi_status.status
AND ppi_ques_status.status = 1
AND ppi_ques_status.history = 0
AND (cc_type = '' || (cc_type != '' AND cc_accepted = 'no'))
AND ppi_loan.deleted = 'no'
AND ppi_loan.customerID != 10
GROUP BY ppi_loan.customerID, loan_number
Is very slow, here are all the results from the EXPLAIN query
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY ppi_ques_status ref loanID,status,history status 3 const 91086 Using where; Using temporary; Using filesort
1 PRIMARY ppi_loan eq_ref PRIMARY,customerID PRIMARY 8 ppimm.ppi_ques_status.loanID 1 Using where
1 PRIMARY ppi_lenders eq_ref PRIMARY PRIMARY 4 ppimm.ppi_loan.lender 1 Using where
1 PRIMARY ppi_assignments eq_ref customerID customerID 8 ppimm.ppi_loan.customerID 1
1 PRIMARY users eq_ref PRIMARY PRIMARY 8 ppimm.ppi_assignments.agentID 1
1 PRIMARY ppi_status ref status,customerID customerID 8 ppimm.ppi_loan.customerID 6
1 PRIMARY ppi_statuses eq_ref PRIMARY PRIMARY 4 ppimm.ppi_status.status 1 Using where; Using index
3 DEPENDENT SUBQUERY ppi_ques_status ref loanID,status loanID 8 func 1 Using where; Using filesort
2 DEPENDENT SUBQUERY ppi_sar_status ref loanID,status,history loanID 8 func 2 Using where
Why is it scanning so many rows and why "Using temporary; Using filesort"?
I can't remove any subqueries as I need all of the results that they produce
As already mentioned in a comment, the main cause of a slow query is that you seem to have single column indexes only, while you would need multi-column indexes to cover the joins, the filters, and the group by.
Also, your query has 2 other issues:
Even though you group by on 2 fields only, several other fields are listed in the select list without being subject to an aggregate function, such as min(). MySQL does allow such queries to be run under certain sql mode settings, but they are still against the sql standard and may have unexpected side effects, unless you really know what your are doing.
You have filters on the ppi_loan table in the join condition that is the left table in a left join. Due to the nature of the left join, these records will not be eliminated from the resultset, but MySQL will not join any values on them. These criteria should be moved to the where clause.
The indexes I would create:
ppi_sar_status: multi-column index on loanID, status, history fields - I would consider moving this to the join section because this table is not there
ppi_ques_status: multi-column index on loanID, status, timestamp fields - this would support both the subquery and the join. Remember, the subquery also has filesort in the explain.
ppi_loan: as a minimum a multi-column index on customerID, loan_number fields to support the group by clause, therefore avoiding the filesort as a minimum. You may consider adding the other fields in the join criteria based on their selectivity to this index.
I'm also not sure why you have the last 2 status tables in the join, since you are not retrieving any values from them. If you re using these tables to eliminate certain records, then consider using an exists() subquery instead of a join. In a join MySQL needs to fetch data from all joined tables, whereas in an exists() subquery it would only check if at least 1 record exists in the resultset without retrieving any actual data from the underlying tables.
I have a very complex query that is running and finding locations of members joining the subscription details and sorting by distance.
Can someone provide instruction on the correct indexes and cardinality I should add to make this load faster.
Right now on 1 million records it takes 75 seconds and I know it can be improved.
Thank you.
SELECT SQL_CALC_FOUND_ROWS (((acos(sin((33.987541*pi()/180)) * sin((users_data.lat*pi()/180))+cos((33.987541*pi()/180)) * cos((users_data.lat*pi()/180)) * cos(((-118.472153- users_data.lon)* pi()/180))))*180/pi())*60*1.1515) as distance,subscription_types.location_limit as location_limit,users_data.user_id,users_data.last_name,users_data.filename,users_data.user_id,users_data.phone_number,users_data.city,users_data.state_code,users_data.zip_code,users_data.country_code,users_data.quote,users_data.subscription_id,users_data.company,users_data.position,users_data.profession_id,users_data.experience,users_data.account_type,users_data.verified,users_data.nationwide,IF(listing_type = 'Company', company, last_name) as name
FROM `users_data`
LEFT JOIN `users_reviews` ON users_data.user_id=users_reviews.user_id AND users_reviews.review_status='2'
LEFT JOIN users_locations ON users_locations.user_id=users_data.user_id
LEFT JOIN subscription_types ON users_data.subscription_id=subscription_types.subscription_id
WHERE users_data.active='2'
AND subscription_types.searchable='1'
AND users_data.state_code='CA'
AND users_data.country_code='US'
GROUP BY users_data.user_id
HAVING distance <= '50'
OR location_limit='all'
OR users_data.nationwide='1'
ORDER BY subscription_types.search_priority ASC, distance ASC
LIMIT 0,10
EXPLAIN
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE users_reviews system user_id,review_status NULL NULL NULL 0 const row not found
1 SIMPLE users_locations system user_id NULL NULL NULL 0 const row not found
1 SIMPLE users_data ref subscription_id,active,state_code,country_code state_code 47 const 88241 Using where; Using temporary; Using filesort
1 SIMPLE subscription_types ALL PRIMARY,searchable NULL NULL NULL 4 Using where; Using join buffer
You query is not that complex. You have only one join, on a table subscription_types which is certainly a little table with no more than a few hundred rows.
Where are your indexes ? The best way to improve your query is to create indexes on the field you are filtering, like active, country_code, state_code and searchable
Have you create the foreign key on users_data.subscription_id ? You need an index on that too.
ForceIndex is useless, let the RDBMS determine the best indexes to chose.
Left Join is useless too, because the line subscription_types.searchable='1' will remove the unmatch correspondance
The order on search_priority implies that you need indexes on this columns too
The filtering in the HAVING can make the indexes not used. You don't need to put these filters in the HAVING. If I understand your table schema, this is not really the aggregate that is filtered.
Your table contains 1 million rows, but how much rows are returned, without the limit? With the right indexes, the query should execute under a second.
SELECT ...
FROM `users_data`
INNER JOIN subscription_types
ON users_data.subscription_id = subscription_types.subscription_id
WHERE users_data.active='2'
AND users_data.country_code='US'
AND users_data.state_code='NY'
AND subscription_types.searchable='1'
AND (distance <= '50' OR location_limit='all' OR users_data.nationwide='1')
GROUP BY users_data.user_id
ORDER BY subscription_types.search_priority ASC, distance ASC
LIMIT 0,10
I have three tables, all of them can have possibly millions of rows. I have an actions table and a reactions table, that holds reactions related to actions. Then there is a emotes table linked to reactions. What I would like to do with this particular query, is finding the most clicked emote for a certain action. The difficulty for me is that the query includes three tables instead of only two.
Table actions (postings):
PKY id
...
Table reactions (comments, emotes etc.):
PKY id
INT action_id (related to actions table)
...
Table emotes:
PKY id
INT react_id (related to reactions table)
INT emote_id (related to a hardcoded list of available emotes)
...
The SQL query I came up with basically seems to work, but it takes 12 seconds if the tables contain millions rows. The SQL query looks like this:
select emote_id, count(*) as cnt from emotes
where react_id in (
select id from reactions where action_id=2942715
)
group by emote_id order by cnt desc limit 1
MySQL explain says the following:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY emotes index NULL react_id_2 21 NULL 4358594 Using where; Using index; Using temporary; Using f...
2 DEPENDENT SUBQUERY reactions unique_subquery PRIMARY,action_id PRIMARY 8 func 1 Using where
...I am grateful for any tips for improving on the query. Note that I will NOT call this query every time a list of actions is being built, but only when emotes are being added. Therefore it's no problem if the query takes maybe 0.5 seconds to finish. But 12 is too long!
what about this
SELECT
emote_id,
count(*) as cnt
FROM emotes a
INNER JOIN reactions r
ON r.id = a.react_id
WHERE action_id = 2942715
GROUP BY emote_id
ORDER BY cnt DESC
LIMIT 1