Optimize query with loads of IN entries - mysql

I have quite a massive query that I want to optimize, it consists of 1 table request and 5 table left joins.
This query takes 0.3428 sec to complete ( Results: 4,340 total, Query took 0.3428 sec)
I am working with about 10000 entries which will definitely grow.
Now the query by it self is not the problem it is the IN statements that is the biggest problem.
I have 2 IN statements
Both are in the WHERE statement
For this specific page load both have a big amount of ID's, 3344 amount of id entries Example: (99, 1, 5, 8458, ...)
Both IN statements will have the same set of 3344 ID's Example: ((cf.catid IN ( 99, 1, 5, 8458, ... ) AND cf.cid=c.id) OR p.category IN ( 99, 1, 5, 8458, ... ))
The query looks like this:
SELECT
p.id, c.id AS pCid, c.name AS cName, p.name, p.seo,
p.description AS pDescription, cd.description,
p.category, p.archive, cf.catid, cf.pid, p.order_nr,
c.order_nr AS cOrder, c.seo AS cSeo, cat.name AS catName,
cat.order_id, pr.price, pr.sale_price, pr.sale_expiry,
IF( pr.sale_price > 0, pr.sale_price, pr.price ) AS `oPrices`,
pr.member_price, p.`set`, p.get_the_look,
c.from_text_price, c.thumb, c.code AS colour_code,
p.code AS product_code, p.supplier_part_number,
p.oem_part_number, p.make, p.model, p.year, p.sub_model
FROM
products p
LEFT JOIN category_featured cf ON p.id=cf.pid
LEFT JOIN colours c ON c.pid=p.id
LEFT JOIN colour_descriptions cd ON c.id=cd.colour_id
LEFT JOIN category cat ON cat.id=p.category
LEFT JOIN pricing pr ON pr.cid=c.id
WHERE
(
(cf.catid IN ( .. 3344 ID entries .. ) AND cf.cid=c.id) OR p.category IN ( .. 3344 ID entries .. )
)
AND p.archive='0'
AND p.status='1' AND c.status='1'
AND c.archive='0'
AND cat.status IN (1,2)
GROUP BY `c`.`id`
ORDER BY `oPrices` DESC
Is there a better way to do a check for specific ID's in a table using the IN statement or maybe use a different check all together?
Speed is the main issue here, I want to achieve the best performance possible.
So far what I did and how some of the settings are set:
I created indexes for those tables (only the columns that are INT (integers) that are used in this query have indexes)
Some tables are MyISAM some are InnoDB (other tables that are not used in the query have a relation with a few tables that are in the query so they had to be InnoDB)
no relations between the tables in the query exist
to run the query I use PHP and MySQLI
Thanks
UPDATE!!!!
I noticed why the query is so slow the new column that I create, using the IF statement oPrices and then useage of "ORDER BY oPrices DESC" makes the query slow, once I remove it the query only takes 0.00009 of a sec which is amazing!!! But now I wont get the correctly ordered data and if even I do the ordering with PHP I will have to create a new pagination function which is not ideal.

IN can make a query very difficult to optimize as the index may not be used (you can verify this by using EXPLAIN). An alternative approach would be to load these IDs into a temporary table and then perform a JOIN.
From this link:
http://explainextended.com/2009/08/18/passing-parameters-in-mysql-in-list-vs-temporary-table/
We see that for a large list of parameters, passing them in a
temporary table is much faster that as a constant list, while for
small lists performance is almost the same.
Using a temporary table is the best way to pass large arrays of
parameters in MySQL.

Related

Segmentation of 1 Million customer getting slow

We are using MySQL Aurora, we have 1 million customers in the customer table and the customer table has around 30 columns.
When the first time we load, we load 50 records with a limit and order by date with the total customer count. This first loading takes approx 60 sec. We also have different segmentation options by which user can filter their data. We have indexed all the important columns that can help us to speed up the filter query. We also tried increasing the server resources but does not see significant changes.
We are looking for a solution that can help to load the records in under 3 sec for the first time and with different segmentation also.
We are open to any solution, and looking for the answer to the following questions:
Doest the AWS MySQL compatible Aurora is enough capable of doing this, doest it can support this much faster query with complex filters?
Should we plan to sync our data to any other type of database server and load from there?
Any suggestion for the database server that is frequent insert/update supported and can query extremely fast records.
Any help on the above point will be really appreciated.
UPDATES:
SELECT c.id as customer_id, `c`.first_name, `c`.last_name, `c`.email, `ssh`.`shop_name`, `ssh`.`domain`, `cx`.`is_push_access`, `cx`.`last_contacted_by`, `cx`.`last_email_open_type`, `cx`.`last_email_open_date`, `cx`.`last_contacted_date`, `cx`.`last_email_link_click_date`, `cx`.`avg_order_amount`, `cx`.`email_marketing`, `cx`.`loyalty_email`, `cx`.`newsletter_email`, `cx`.`review_email`, `cx`.`lyt_points_earned`, `cx`.`is_product_reviewed`, `cx`.`is_bounce`, `cx`.`lyt_points_redeemed`, (lyt_points_earned - cx.lyt_points_redeemed - cx.lyt_points_expire) AS `points_remaining`, `cx`.`lyt_is_refer`, `cx`.`lyt_cus_birthday`, `cx`.`lyt_customer_status`, `cx`.`lyt_points_expire`, `cx`.`contact_status`, `cx`.`source_detail`, `cx`.`total_sent`, `cx`.`last_order_date`, `cx`.`utm_source`, `cx`.`utm_medium`, `cx`.`utm_campaign`, `cx`.`utm_content`, `cx`.`utm_term`, `cx`.`total_campaign_sent`, `cx`.`total_campaign_opened`, `cx`.`total_campaign_clicked`, `cx`.`last_campaign_sent_date`, `cx`.`last_campaign_opened_date`, `cx`.`last_campaign_clicked_date`, `cx`.`total_campaign_delivered`, `ca`.`company`, `ca`.`address1`, `ca`.`address2`, `ca`.`city`, `ca`.`province`, `ca`.`country`, `ca`.`zip`, `ca`.`province_code`, `ca`.`country_code`, `ct`.`tier_name`, `aft`.`code` AS `affiliate_code`
FROM `tbl_customers` AS `c`
LEFT JOIN `tbl_shop_setting` AS `ssh` ON c.shop_id = ssh.id
LEFT JOIN `aio_customer_extra` AS `cx` ON (c.id = cx.customer_id)
LEFT JOIN `tbl_customers_address` AS `ca` ON ca.id = cx.customer_default_address_id
LEFT JOIN `aio_lyt_customer_tier` AS `ct` ON cx.lyt_customer_tier_id = ct.id
LEFT JOIN `aio_customer_custom` AS `acc` ON acc.customer_id = c.id
LEFT JOIN `aio_aft_affiliates` AS `aft` ON aft.customer_id = c.id WHERE (c.shop_id = 'xxxx')
GROUP BY `c`.`id`
ORDER BY `c`.`last_seen_date` DESC, `c`.`id` DESC
LIMIT 20
Note:
All the foreign keys and GROUP and ORDER BY columns are properly indexed.
If we remove the GROUP BY and ORDER BY clause, the query executes extremely fast (under 1 sec), but we can't permanently remove that, with GROUP BY and ORDER BY it's taking 45 sec.
Indexing
If you don't already have these indexes, I recommend adding them:
c: INDEX(shop_id, id, last_seen_date) -- in this order
cx: INDEX(customer_id)
aft: INDEX(customer_id, code)
acc: INDEX(customer_id)
(If any is already the PRIMARY KEY, don't redundantly add an INDEX.)
If that does not help enough, please provide EXPLAIN SELECT ... and move on to...
Reformulate to avoid "explode-implode":
What you have gathers lots of data, then GROUPs them and ORDERs them. Instead, let's turn the query inside out.
find the 20 customers.id that are needed -- do this with as little effort as possible.
gather all the other stuff needed
Step 1 is probably simply
SELECT id
FROM `tbl_customers` AS `c1`
WHERE (c1.shop_id = 'xxxx')
GROUP BY `c1`.`id`
ORDER BY `c1`.`last_seen_date` DESC, `c1`.`id` DESC
LIMIT 20
Note that the index I recommended above will be "covering" for this subquery.
Step 2:
SELECT [ lots of stuff ]
FROM ( [ the above query ] ) AS c2
JOIN tbl_customers AS c USING(customer_id)
LEFT JOIN [ all the LEFT JOINs, as you have ]
ORDER BY c.last_seen_date DESC, c.id DESC -- yes, this is repeated
(Note that the GROUP BY and LIMIT exist only in the subquery.) (I assume that the PK of tbl_customers is customer_id.)

MySQL Slow performance with count() of matching records in a joined table

There is a table called "basket_status" in the query below. For each record in basket_status, a count of yarn balls in the basket is being made from another table (yarn_ball_updates).
The basket_status table has 761 rows. The yarn_ball_updates table has 1,204,294 records. Running the query below takes about 30 seconds to 60 seconds (depending on how busy the server is) and returns 750 rows. Obviously my problem is doing a match against 1,204,294 records for all of the 761 basket_status records.
I tried making a view based on the query but offered no performance increase. I believe I read that for views you can't have sub queries and complex joins.
What direction should I take to speed up this query? I've never made a MySQL scheduled task or anything, but it seems like the "basket_status" table should have a "yarn_ball_count" count already in it, and an automated process should be updating that new extra count() column maybe?
Thanks for any help or direction.
SELECT p.id, p.basket_name, p.high_quality, p.yarn_ball_count
FROM (
SELECT q.id, q.basket_name, q.high_quality,
CAST(SUM(IF (q.report_date = mxd.mxdate,1,0)) AS CHAR) yarn_ball_count
FROM (
SELECT bs.id, bs.basket_name, bs.high_quality,ybu.report_date
FROM yb.basket_status bs
JOIN yb.yarn_ball_updates ybu ON bs.basket_name = ybu.alpha_pmn
) q,
(SELECT MAX(ybu.report_date) mxdate FROM yb.yarn_ball_updates ybu) mxd
GROUP BY q.basket_name, q.high_quality ) p
I don't think you need nested queries for this. I'm not a MySQL developer but won't this work?
SELECT bs.id, bs.basket_name, bs.high_quality, count(*) yarn_ball_count
FROM yb.basket_status bs
JOIN yb.yarn_ball_updates ybu ON bs.basket_name = ybu.alpha_pmn
JOIN (SELECT MAX(ybu.report_date) mxdate FROM yb.yarn_ball_updates) mxd ON ybu.report_date = mxd.mxdate
GROUP BY bs.basket_name, bs.high_quality

Optimizing MySQL query with subselect

I am trying to make the following query run faster than 180 secs:
SELECT
x.di_on_g AS deviceid, SUM(1) AS amount
FROM
(SELECT
g.device_id AS di_on_g
FROM
guide g
INNER JOIN operator_guide_type ogt ON ogt.guide_type_id = g.guide_type_id
INNER JOIN operator_device od ON od.device_id = g.device_id
WHERE
g.operator_id IN (1 , 1)
AND g.locale_id = 1
AND (g.device_id IN ("many (~1500) comma separated IDs coming from my code"))
GROUP BY g.device_id , g.guide_type_id) x
GROUP BY x.di_on_g
ORDER BY amount;
Screenshot from EXPLAIN:
https://ibb.co/da5oAF
Even if I run the subquery as separate query it is still very slow...:
SELECT
g.device_id AS di_on_g
FROM
guide g
INNER JOIN operator_guide_type ogt ON ogt.guide_type_id = g.guide_type_id
INNER JOIN operator_device od ON od.device_id = g.device_id
WHERE
g.operator_id IN (1 , 1)
AND g.locale_id = 1
AND (g.device_id IN (("many (~1500) comma separated IDs coming from my code")
Screenshot from EXPLAIN:
ibb.co/gJHRVF
I have indexes on g.device_id and on other appropriate places.
Indexes:
SHOW INDEX FROM guide;
ibb.co/eVgmVF
SHOW INDEX FROM operator_guide_type;
ibb.co/f0TTcv
SHOW INDEX FROM operator_device;
ibb.co/mseqqF
I tried creating a new temp table for the ids and using a JOIN to replace the slow IN clause but that didn't make the query much faster.
All IDs are Integers and I tried creating a new temp table for the ids that come from my code and JOIN that table instead of the slow IN clause but that didn't make the query much faster. (10 secs faster)
None of the tables have more then 300,000 rows and the mysql configuration is good.
And the visual plan:
Query Plan
Any help will be appreciated !
Let's focus on the subquery. The main problem is "inflate-deflate", but I will get to that in a moment.
Add the composite index:
INDEX(locale_id, operator_id, device_id)
Why the duplicated "1" in
g.operator_id IN (1 , 1)
Why does the GROUP BY have 2 columns, when you select only 1? Is there some reason for using GROUP BY instead of DISTINCT. (The latter seems to be your intent.)
The only reason for these
INNER JOIN operator_guide_type ogt ON ogt.guide_type_id = g.guide_type_id
INNER JOIN operator_device od ON od.device_id = g.device_id
would be to verify that there are guides and devices in those other table. Is that correct? Are these the PRIMARY KEYs, hence unique?: ogt.guide_type_id and od.device_id. If so, why do you need the GROUP BY? Based on the EXPLAIN, it sounds like both of those are related 1:many. So...
SELECT g.device_id AS di_on_g
FROM guide g
WHERE EXISTS( SELECT * FROM operator_guide_type WHERE guide_type_id = g.guide_type_id )
AND EXISTS( SELECT * FROM operator_device WHERE device_id = g.device_id
AND g.operator_id IN (1)
AND g.locale_id = 1
AND g.device_id IN (...)
Notes:
The GROUP BY is no longer needed.
The "inflate-deflate" of JOIN + GROUP BY is gone. The Explain points this out -- 139K rows inflated to 61M -- very costly.
EXISTS is a "semijoin", meaning that it does not collect all matches, but stops when it finds any match.
"the mysql configuration is good" -- How much RAM do you have? What Engine is the table? What is the value of innodb_buffer_pool_size?

mySQL query performance with INNER JOINs

I have what may be a basic performance question. I've done a lot of SQL queries, but not much in terms of complex inner joins and such. So, here it is:
I have a database with 4 tables, countries, territories, employees, and transactions.
The transactions links up with the employees and countries. The employees links up with the territories. In order to produce a required report, I'm running a PHP script that processes a SQL query against a mySQL database.
SELECT trans.transactionDate, agent.code, agent.type, trans.transactionAmount, agent.territory
FROM transactionTable as trans
INNER JOIN
(
SELECT agent1.code as code, agent1.type as type, territory.territory as territory FROM agentTable as agent1
INNER JOIN territoryTable as territory
ON agent1.zip=territory.zip
) AS agent
ON agent.code=trans.agent
ORDER BY trans.agent
There are about 50,000 records in the agent table, and over 200,000 in the transaction table. The other two are relatively tiny. It's taking about 7 minutes to run this query. And I haven't even inserted the fourth table yet, which needs to relate a field in the transactionTable (country) to a field in the countryTable (country) and return a field in the countryTable (region).
So, two questions:
Where would I logically put the connection between the transactionTable and the countryTable?
Can anyone suggest a way that this can be quickened up?
Thanks.
Your query should be equivalent to this:
SELECT tx.transactionDate,
a.code,
a.type,
tx.transactionAmount,
t.territory
FROM transactionTable tx,
agentTable a,
territoryTable t
WHERE tx.agent = a.code
AND a.zip = t.zip
ORDER BY tx.agent
or to this if you like to use JOIN:
SELECT tx.transactionDate,
a.code,
a.type,
tx.transactionAmount,
t.territory
FROM transactionTable tx
JOIN agentTable a ON tx.agent = a.code
JOIN territoryTable t ON a.zip = t.zip
ORDER BY tx.agent
In order to work fast, you must have following indexes on your tables:
CREATE INDEX transactionTable_agent ON transactionTable(agent);
CREATE INDEX territoryTable_zip ON territoryTable(zip);
CREATE INDEX agentTable_code ON agentTable(code);
(basically any field that is part of WHERE or JOIN constraint should be indexed).
That said, your table structure looks suspicious in a sense that it is joined by apparently non-unique fields like zip code. You really want to join by more unique entities, like agent id, transaction id and so on - otherwise expect your queries to generate a lot of redundant data and be really slow.
One more note: INNER JOIN is equivalent to simply JOIN, there is no reason to type redundant clause.

Multiple left joins and performance

I have following tables:
products - 4500 records
Fields: id, sku, name, alias, price, special_price, quantity, desc, photo, manufacturer_id, model_id, hits, publishing
products_attribute_rel - 35000 records
Fields: id, product_id, attribute_id, attribute_val_id
attribute_values - 243 records
Fields: id, attr_id, value, ordering
manufacturers - 29 records
Fields: id, title,publishing
models - 946 records
Fields: id, manufacturer_id, title, publishing
So I get data from these tables by one query:
SELECT jp.*,
jm.id AS jm_id,
jm.title AS jm_title,
jmo.id AS jmo_id,
jmo.title AS jmo_title
FROM `products` AS jp
LEFT JOIN `products_attribute_rel` AS jpar ON jpar.product_id = jp.id
LEFT JOIN `attribute_values` AS jav ON jav.attr_id = jpar.attribute_val_id
LEFT JOIN `manufacturers` AS jm ON jm.id = jp.manufacturer_id
LEFT JOIN `models` AS jmo ON jmo.id = jp.model_id
GROUP BY jp.id HAVING COUNT(DISTINCT jpar.attribute_val_id) >= 0
This query is slow as hell. It takes hundreds of seconds mysql to handle it.
So how it would be possible to improve this query ? With small data chunks it works
perfectly well. But I guess everything ruins products_attribute_rel table, which
has 35000 records.
Your help would be appreciated.
EDITED
EXPLAIN results of the SELECT query:
The problem is that MySQL uses the join-type ALL for 3 tables. That means that MySQL performs 3 full table scans, puts every possibility together before sorting those out that don't match the ON statement. To get a much faster join-type (for instance eq_ref), you must put an index on the coloumns that are used on the ON statements.
Be aware though that putting an index on every possible coloumn is not recommended. A lot of indexes do speed up SELECT statements, however it also creates an overhead since the index must be stored and managed. This means that manipulation queries like UPDATE and DELETE are much slower. I've seen queries deleting only 1000 records in half an hour. It's a trade-off where you have to decide what happens more often and what is more important.
To get more infos on MySQL join-types, take a look at this.
More on indexes here.
Tables data is not so much huge that it's taking hundreds of seconds. Something is wrong with table schema. Please do proper indexing. That will surly speed up.
select distinct
jm.id AS jm_id,
jm.title AS jm_title,
jmo.id AS jmo_id,
jmo.title AS jmo_title
from products jp,
products_attribute_rel jpar,
attribute_values jav,
manufacturers jm
models jmo
where jpar.product_id = jp.id
and jav.attr_id = jpar.attribute_val_id
and jm.id = jp.manufacturer_id
and jmo.id = jp.model_id
you can do that if you want to select all the data. Hope it works.