Need help optimizing MySQL query with joins

Need help optimizing MySQL query with joins - mysql

I'm still having problems understanding how to read, understand and optimize MySQL explain. I know to create indices on orderby columns but that's about it. Therefore I am hoping you can help me tune this query:
EXPLAIN
SELECT specie.id, specie.commonname, specie.block_description, maximage.title,
maximage.karma, imagefile.file_name, imagefile.width, imagefile.height,
imagefile.transferred
FROM specie
INNER JOIN specie_map ON specie_map.specie_id = specie.id
INNER JOIN (
SELECT *
FROM image
ORDER BY karma DESC
) AS maximage ON specie_map.image_id = maximage.id
INNER JOIN imagefile ON imagefile.image_id = maximage.id
AND imagefile.type = 'small'
GROUP BY specie.commonname
ORDER BY commonname ASC
LIMIT 0 , 24
What this query does is to find the photo with the most karma for a specie. You can see the result of this live:
http://www.jungledragon.com/species
I have a table of species, a table of images, a mapping table in between and an imagefile table, since there are multiple image files (formats) per image.
Explain output:
For the specie table, I have indices on its primary id and the field commonname. For the image table, I have indices on its id and karma field, and a few others not relevant to this question.
This query currently takes 0.8 to 1.1s which is too slow in my opinion. I have a suspicion that the right index will speed this up many times, but I don't know which one.

I think you'd go a great way by getting rid of the subquery. Look at the first and last rows of the "explain" result - it's copying the entire "image" table to a temporary table. You could obtain the same result by replacing the subquery with INNER JOIN image and moving ORDER BY karma DESC to the final ORDER BY clause:
SELECT specie.id, specie.commonname, specie.block_description, maximage.title,
maximage.karma, imagefile.file_name, imagefile.width, imagefile.height,
imagefile.transferred
FROM specie
INNER JOIN specie_map ON specie_map.specie_id = specie.id
INNER JOIN image AS maximage ON specie_map.image_id = maximage.id
INNER JOIN imagefile ON imagefile.image_id = maximage.id
AND imagefile.type = 'small'
GROUP BY specie.commonname
ORDER BY commonname ASC, karma DESC
LIMIT 0 , 24

The real problem is that there is no need to optimize MySQL explain. There is usually a query (or several queries) that you want to be efficient and EXPLAIN is a way to see if the execution of the query is going to happen as you expect it to happen.
That is you need to understand how the execution plan should look like and why and compare it with results of the EXPLAIN command. To understand how the plan is going to look like you should understand how indexes in MySQL work.
In the meantime, your query is a tricky one, since for efficient index using it has some limitations: a) simultaneous ordering and by a field from one table, and b) finding the last element in each group from another (the latter is a tricky task as itself). Since your database is rather small, you are lucky that you current query is rather fast (though you consider it slow).
I would rewrite the query in a bit hacky manner (I assume that there is at least one foto for each specie):
SELECT
specie.id, specie.commonname, specie.block_description,
maximage.title, maximage.karma,
imagefile.file_name, imagefile.width, imagefile.height, imagefile.transferred
FROM (
SELECT s.id,
(SELECT i.id
FROM specie_map sm
JOIN image i ON sm.image_id = i.id
WHERE sm.specie_id = s.id
ORDER BY i.karma DESC
LIMIT 1) as image_id
FROM specie s
ORDER BY s.commonname
LIMIT 0, 24
) as ids
JOIN specie
ON ids.id = specie.id
JOIN image as maximage
ON maximage.id = ids.image_id
JOIN imagefile
ON imagefile.image_id = ids.image_id AND imagefile.type = 'small';
You will need the following indexes:
(commonname) on specie
a composite (specie_id, image_id) on specie_map
a composite (id, karma) on image
a composite (image_id, type) on imagefile
Paging now should happen within the subquery.
The idea is to make complex computations within a subquery that operates with ids only and join for the rest of the data at the top. The data would be ordered in the order of the results of the subquery.

It would be better if you could provide the table structures and indexes. I came up with this alternative, it would be nice if you could try this and tell me what happens (I am curious!):
SELECT t.*, imf.* FROM (
SELECT s.*, (SELECT id FROM image WHERE karma = MAX(i.karma) LIMIT 1) AS max_image_id
FROM image i
INNER JOIN specie_map smap ON smap.image_id = i.id
INNER JOIN specie s ON s.id = smap.specie_id
GROUP BY s.commonname
ORDER BY s.commonname ASC
LIMIT 24
) t INNER JOIN imagefile imf
ON t.max_image_id = imf.image_id AND imf.type = 'small'

Related

Segmentation of 1 Million customer getting slow

We are using MySQL Aurora, we have 1 million customers in the customer table and the customer table has around 30 columns.
When the first time we load, we load 50 records with a limit and order by date with the total customer count. This first loading takes approx 60 sec. We also have different segmentation options by which user can filter their data. We have indexed all the important columns that can help us to speed up the filter query. We also tried increasing the server resources but does not see significant changes.
We are looking for a solution that can help to load the records in under 3 sec for the first time and with different segmentation also.
We are open to any solution, and looking for the answer to the following questions:
Doest the AWS MySQL compatible Aurora is enough capable of doing this, doest it can support this much faster query with complex filters?
Should we plan to sync our data to any other type of database server and load from there?
Any suggestion for the database server that is frequent insert/update supported and can query extremely fast records.
Any help on the above point will be really appreciated.
UPDATES:
SELECT c.id as customer_id, `c`.first_name, `c`.last_name, `c`.email, `ssh`.`shop_name`, `ssh`.`domain`, `cx`.`is_push_access`, `cx`.`last_contacted_by`, `cx`.`last_email_open_type`, `cx`.`last_email_open_date`, `cx`.`last_contacted_date`, `cx`.`last_email_link_click_date`, `cx`.`avg_order_amount`, `cx`.`email_marketing`, `cx`.`loyalty_email`, `cx`.`newsletter_email`, `cx`.`review_email`, `cx`.`lyt_points_earned`, `cx`.`is_product_reviewed`, `cx`.`is_bounce`, `cx`.`lyt_points_redeemed`, (lyt_points_earned - cx.lyt_points_redeemed - cx.lyt_points_expire) AS `points_remaining`, `cx`.`lyt_is_refer`, `cx`.`lyt_cus_birthday`, `cx`.`lyt_customer_status`, `cx`.`lyt_points_expire`, `cx`.`contact_status`, `cx`.`source_detail`, `cx`.`total_sent`, `cx`.`last_order_date`, `cx`.`utm_source`, `cx`.`utm_medium`, `cx`.`utm_campaign`, `cx`.`utm_content`, `cx`.`utm_term`, `cx`.`total_campaign_sent`, `cx`.`total_campaign_opened`, `cx`.`total_campaign_clicked`, `cx`.`last_campaign_sent_date`, `cx`.`last_campaign_opened_date`, `cx`.`last_campaign_clicked_date`, `cx`.`total_campaign_delivered`, `ca`.`company`, `ca`.`address1`, `ca`.`address2`, `ca`.`city`, `ca`.`province`, `ca`.`country`, `ca`.`zip`, `ca`.`province_code`, `ca`.`country_code`, `ct`.`tier_name`, `aft`.`code` AS `affiliate_code`
FROM `tbl_customers` AS `c`
LEFT JOIN `tbl_shop_setting` AS `ssh` ON c.shop_id = ssh.id
LEFT JOIN `aio_customer_extra` AS `cx` ON (c.id = cx.customer_id)
LEFT JOIN `tbl_customers_address` AS `ca` ON ca.id = cx.customer_default_address_id
LEFT JOIN `aio_lyt_customer_tier` AS `ct` ON cx.lyt_customer_tier_id = ct.id
LEFT JOIN `aio_customer_custom` AS `acc` ON acc.customer_id = c.id
LEFT JOIN `aio_aft_affiliates` AS `aft` ON aft.customer_id = c.id WHERE (c.shop_id = 'xxxx')
GROUP BY `c`.`id`
ORDER BY `c`.`last_seen_date` DESC, `c`.`id` DESC
LIMIT 20
Note:
All the foreign keys and GROUP and ORDER BY columns are properly indexed.
If we remove the GROUP BY and ORDER BY clause, the query executes extremely fast (under 1 sec), but we can't permanently remove that, with GROUP BY and ORDER BY it's taking 45 sec.

Indexing
If you don't already have these indexes, I recommend adding them:
c: INDEX(shop_id, id, last_seen_date) -- in this order
cx: INDEX(customer_id)
aft: INDEX(customer_id, code)
acc: INDEX(customer_id)
(If any is already the PRIMARY KEY, don't redundantly add an INDEX.)
If that does not help enough, please provide EXPLAIN SELECT ... and move on to...
Reformulate to avoid "explode-implode":
What you have gathers lots of data, then GROUPs them and ORDERs them. Instead, let's turn the query inside out.
find the 20 customers.id that are needed -- do this with as little effort as possible.
gather all the other stuff needed
Step 1 is probably simply
SELECT id
FROM `tbl_customers` AS `c1`
WHERE (c1.shop_id = 'xxxx')
GROUP BY `c1`.`id`
ORDER BY `c1`.`last_seen_date` DESC, `c1`.`id` DESC
LIMIT 20
Note that the index I recommended above will be "covering" for this subquery.
Step 2:
SELECT [ lots of stuff ]
FROM ( [ the above query ] ) AS c2
JOIN tbl_customers AS c USING(customer_id)
LEFT JOIN [ all the LEFT JOINs, as you have ]
ORDER BY c.last_seen_date DESC, c.id DESC -- yes, this is repeated
(Note that the GROUP BY and LIMIT exist only in the subquery.) (I assume that the PK of tbl_customers is customer_id.)

Cross-Apply bad for a larger database or alternatives perform better?

so 2 (more so 3) questions, is my query just badly coded or thought out ? (be kind, I only just discovered cross apply and relatively new) and is corss-apply even the best sort of join to be using or why is it slow?
So I have a database table (test_tble) of around 66 million records. I then have a ##Temp_tble created which has one column called Ordr_nbr (nchar (13)). This is basically ones I wish to find.
The test_tble has 4 columns (Ordr_nbr, destination, shelf_no, dte_bought).
This is my current query which works the exact way I want it to but it seems to be quite slow performance.
select ##Temp_tble.Ordr_nbr, test_table1.destination, test_table1.shelf_no,test_table1.dte_bought
from ##MyTempTable
cross apply(
select top 1 test_table.destination,Test_Table.shelf_no,Test_Table.dte_bought
from Test_Table
where ##MyTempTable.Order_nbr = Test_Table.order_nbr
order by dte_bought desc)test_table1
If the ##Temp_tble only has 17 orders to search for it take around 2 mins. As you can see I'm trying to get just the most recent dte_bought or to some max(dte_bought) of each order.
In term of index I ran database engine tuner and it says its optimized for the query and I have all relative indexes created such as clustered index on test_tble for dte_bought desc including order_nbr etc.
The execution plan is using a index scan(on non_clustered) and a key lookup(on clustered).
My end result is it to return all the order_nbrs in ##MyTempTble along with columns of destination, shelf_no, dte_bought in relation to that order_nbr, but only the most recent bought ones.
Sorry if I explained this awfully, any info needed that I can provide just ask. I'm not asking for just downright "give me code", more of guidance,advice and learning. Thank you in advance.
UPDATE
I have now tried a sort of left join, it works reasonably quicker but still not instant or very fast (about 30 seconds) and it also doesn't return just the most recent dte_bought, any ideas? see below for left join code.
select a.Order_Nbr,b.Destination,b.LnePos,b.Dte_bought
from ##MyTempTble a
left join Test_Table b
on a.Order_Nbr = b.Order_Nbr
where b.Destination is not null
UPDATE 2
Attempted another let join with a max dte_bought, works very but only returns the order_nbr, the other columns are NULL. Any suggestion?
select a.Order_nbr,b.Destination,b.Shelf_no,b.Dte_Bought
from ##MyTempTable a
left join
(select * from Test_Table where Dte_bought = (
select max(dte_bought) from Test_Table)
)b on b.Order_nbr = a.Order_nbr
order by Dte_bought asc
K.M

Instead of CROSS APPLY() you can use INNER JOIN with subquery. Check the following query :
SELECT
TempT.Ordr_nbr
,TestT.destination
,TestT.shelf_no
,TestT.dte_bought
FROM ##MyTempTable TempT
INNER JOIN (
SELECT T.destination
,T.shelf_no
,T.dte_bought
,ROW_NUMBER() OVER(PARTITION BY T.Order_nbr ORDER BY T.dte_bought DESC) ID
FROM Test_Table T
) TestT
ON TestT.Id=1 AND TempT.Order_nbr = TestT.order_nbr

Query with multiple table joins taking too much time despite indexing

Query-
SELECT SUM(sale_data.total_sale) as totalsale, `sale_data_temp`.`customer_type_cy` as `customer_type`, `distributor_list`.`customer_status` FROM `distributor_list` LEFT JOIN `sale_data` ON `sale_data`.`depo_code` = `distributor_list`.`depo_code` and `sale_data`.`customer_code` = `distributor_list`.`customer_code` LEFT JOIN `sale_data_temp` ON `distributor_list`.`address_coordinates` = `sale_data_temp`.`address_coordinates` LEFT JOIN `item_master` ON `sale_data`.`item_code` = `item_master`.`item_code` WHERE `invoice_date` BETWEEN "2017-04-01" and "2017-11-01" AND `item_master`.`id_category` = 1 GROUP BY `distributor_list`.`address_coordinates`
Query, rewritten with formatting.
SELECT SUM(sale_data.total_sale) as totalsale,
sale_data_temp.customer_type_cy as customer_type,
distributor_list.customer_status
FROM distributor_list
LEFT JOIN sale_data
ON sale_data.depo_code = distributor_list.depo_code
and sale_data.customer_code = distributor_list.customer_code
LEFT JOIN sale_data_temp
ON distributor_list.address_coordinates = sale_data_temp.address_coordinates
LEFT JOIN item_master
ON sale_data.item_code = item_master.item_code
WHERE invoice_date BETWEEN "2017-04-01" and "2017-11-01"
AND item_master.id_category = 1
GROUP BY distributor_list.address_coordinates
DESC-
This Query is taking 7.5 seconds to run. My application contains 3-4 such queries. Therefore loading time appraches 1 min on server.
My sale data table contains 450K records.
Distributor list contains 970 records
Item master contains 7774 records and sale_data_temp contains 324 records.
I am using indexing but it is not being used for sale data table.
All the 400K records are searched as is evident from explain sql.
If I reduce the duration of BETWEEN clause than sale data table uses date index otherwise it scans all 400K rows.
The rows between 01-04-2017 and 01-11-2017 are 84000 but still it scans 400K rows.
MYSQL EXPLAIN-
I have modified queries two times with no success.
Modification 1:
SELECT SUM(sale_data.total_sale) as totalsale, `sale_data_temp`.`customer_type_cy` as `customer_type`, `distributor_list`.`customer_status` FROM `distributor_list` LEFT JOIN `sale_data` ON `sale_data`.`depo_code` = `distributor_list`.`depo_code` and `sale_data`.`customer_code` = `distributor_list`.`customer_code` AND `invoice_date` BETWEEN "2017-04-01" and "2017-11-01" LEFT JOIN `sale_data_temp` ON `distributor_list`.`address_coordinates` = `sale_data_temp`.`address_coordinates` LEFT JOIN `item_master` ON `sale_data`.`item_code` = `item_master`.`item_code` WHERE `item_master`.`id_category` = 1 GROUP BY `distributor_list`.`address_coordinates`
Modification 2
SELECT SQL_NO_CACHE SUM( sd.total_sale ) AS totalsale, `sale_data_temp`.`customer_type_cy` AS `customer_type` , `distributor_list`.`customer_status` FROM `distributor_list` LEFT JOIN (SELECT * FROM `sale_data` WHERE `invoice_date` BETWEEN "2017-04-01" AND "2017-11-01")sd ON `sd`.`depo_code` = `distributor_list`.`depo_code` AND `sd`.`customer_code` = `distributor_list`.`customer_code` LEFT JOIN `sale_data_temp` ON `distributor_list`.`address_coordinates` = `sale_data_temp`.`address_coordinates` LEFT JOIN `item_master` ON `sd`.`item_code` = `item_master`.`item_code` WHERE `item_master`.`id_category` =1 GROUP BY `distributor_list`.`address_coordinates`
HERE ARE MY INDEXES ON SALE DATA TABLE

See the key column of the EXPLAIN results view - no key is being used at the moment so MySQL is not using any of your indexes for filtering out rows so it is scanning the whole table on each query. This is why it is taking so long.
I have taken a look at your first query with relation to your sale_data indices. It looks like you will need to create a new composite index on this table that contains the following columns only:
depo_code, customer_code, item_code, invoice_date, total_sale
I recommend that you name this index test1 and experiment with modifying the ordering of the columns and keep testing again each time using EXPLAIN EXTENDED until you achieve a selected key - you want to see index test1 has been selected in the key column.
See this answer that has helped me before with this, and it will help you understand the importance of correctly ordering your composite indices.
Looking at the cardinality of the single field indices, here is my best attempt at giving you the correct index to apply:
ALTER TABLE `sale_data` ADD INDEX `test1` (`item_code`, `customer_code`, `invoice_date`, `depo_code`, `total_sale`);
Good luck with your mission!

A few things to notice about your query.
You are misusing the notorious MySQL extension to GROUP BY. Read this, then mention the same columns in your GROUP BY clause as you mention in your SELECT clause.
Your LEFT JOIN sale_data and LEFT JOIN item_master operations are actually ordinary JOIN operations. Why? You mention columns from those tables in your WHERE clause.
Your best bet for speedup is doing a date-range scan on an index on sale_data.invoice_date. For some reason known only to the MySQL query planner's feverish machinations, you're not getting it.
Try refactoring your query. Here's one suggestion:
SELECT SUM(sale_data.total_sale) as totalsale,
sale_data_temp.customer_type_cy as customer_type,
distributor_list.customer_status
FROM distributor_list
JOIN sale_data
ON sale_data.invoice_date BETWEEN "2017-04-01" and "2017-11-01"
and sale_data.depo_code = distributor_list.depo_code
and sale_data.customer_code = distributor_list.customer_code
LEFT JOIN sale_data_temp
ON distributor_list.address_coordinates = sale_data_temp.address_coordinates
JOIN item_master
ON sale_data.item_code = item_master.item_code
WHERE item_master.id_category = 1
GROUP BY sale_data_temp.customer_type_cy, distributor_list.customer_status
Try creating a covering index on sale_data for this query. You'll have to mess around a bit to get this right, but this is a starting point. (invoice_date, item_code, depo_code, customer_code, total_sale). The point of a covering index is to allow the query to be satisfied entirely from the index without having to refer back to the table's data. That's why I included total_sale in the index.
Please notice that index I suggested makes your index on invoice_date redundant. You can drop that index.

Optimizing MySQL query with subselect

I am trying to make the following query run faster than 180 secs:
SELECT
x.di_on_g AS deviceid, SUM(1) AS amount
FROM
(SELECT
g.device_id AS di_on_g
FROM
guide g
INNER JOIN operator_guide_type ogt ON ogt.guide_type_id = g.guide_type_id
INNER JOIN operator_device od ON od.device_id = g.device_id
WHERE
g.operator_id IN (1 , 1)
AND g.locale_id = 1
AND (g.device_id IN ("many (~1500) comma separated IDs coming from my code"))
GROUP BY g.device_id , g.guide_type_id) x
GROUP BY x.di_on_g
ORDER BY amount;
Screenshot from EXPLAIN:
https://ibb.co/da5oAF
Even if I run the subquery as separate query it is still very slow...:
SELECT
g.device_id AS di_on_g
FROM
guide g
INNER JOIN operator_guide_type ogt ON ogt.guide_type_id = g.guide_type_id
INNER JOIN operator_device od ON od.device_id = g.device_id
WHERE
g.operator_id IN (1 , 1)
AND g.locale_id = 1
AND (g.device_id IN (("many (~1500) comma separated IDs coming from my code")
Screenshot from EXPLAIN:
ibb.co/gJHRVF
I have indexes on g.device_id and on other appropriate places.
Indexes:
SHOW INDEX FROM guide;
ibb.co/eVgmVF
SHOW INDEX FROM operator_guide_type;
ibb.co/f0TTcv
SHOW INDEX FROM operator_device;
ibb.co/mseqqF
I tried creating a new temp table for the ids and using a JOIN to replace the slow IN clause but that didn't make the query much faster.
All IDs are Integers and I tried creating a new temp table for the ids that come from my code and JOIN that table instead of the slow IN clause but that didn't make the query much faster. (10 secs faster)
None of the tables have more then 300,000 rows and the mysql configuration is good.
And the visual plan:
Query Plan
Any help will be appreciated !

Let's focus on the subquery. The main problem is "inflate-deflate", but I will get to that in a moment.
Add the composite index:
INDEX(locale_id, operator_id, device_id)
Why the duplicated "1" in
g.operator_id IN (1 , 1)
Why does the GROUP BY have 2 columns, when you select only 1? Is there some reason for using GROUP BY instead of DISTINCT. (The latter seems to be your intent.)
The only reason for these
INNER JOIN operator_guide_type ogt ON ogt.guide_type_id = g.guide_type_id
INNER JOIN operator_device od ON od.device_id = g.device_id
would be to verify that there are guides and devices in those other table. Is that correct? Are these the PRIMARY KEYs, hence unique?: ogt.guide_type_id and od.device_id. If so, why do you need the GROUP BY? Based on the EXPLAIN, it sounds like both of those are related 1:many. So...
SELECT g.device_id AS di_on_g
FROM guide g
WHERE EXISTS( SELECT * FROM operator_guide_type WHERE guide_type_id = g.guide_type_id )
AND EXISTS( SELECT * FROM operator_device WHERE device_id = g.device_id
AND g.operator_id IN (1)
AND g.locale_id = 1
AND g.device_id IN (...)
Notes:
The GROUP BY is no longer needed.
The "inflate-deflate" of JOIN + GROUP BY is gone. The Explain points this out -- 139K rows inflated to 61M -- very costly.
EXISTS is a "semijoin", meaning that it does not collect all matches, but stops when it finds any match.
"the mysql configuration is good" -- How much RAM do you have? What Engine is the table? What is the value of innodb_buffer_pool_size?

Any way to improve this slow query?

Its particular query pops up in the slow query log all the time for me. Any way to improve its efficiency?
SELECT
mov_id,
mov_title,
GROUP_CONCAT(DISTINCT genres.genre_name) as all_genres,
mov_desc,
mov_added,
mov_thumb,
mov_hits,
mov_numvotes,
mov_totalvote,
mov_imdb,
mov_release,
mov_type
FROM movies
LEFT JOIN _genres
ON movies.mov_id = _genres.gen_movieid
LEFT JOIN genres
ON _genres.gen_catid = genres.grenre_id
WHERE mov_status = 1 AND mov_incomplete = 0 AND mov_type = 1
GROUP BY mov_id
ORDER BY mov_added DESC
LIMIT 0, 20;
My main concern is in regard to the group_concat function, which outputs a comma separated list of genres associated with the particular film, which I put through a for loop and make click-able links.

Do you need the genre names? If you can do with just the genre_id, you can eliminate the second join. (You can fill in the genre name later, in the UI, using a cache).
What indexes do you have?
You probably want
create index idx_movies on movies
(mov_added, mov_type, mov_status, mov_incomplete)
and most certainly the join index
create index ind_genres_movies on _genres
(gen_mov_id, gen_cat_id)

Can you post the output of EXPLAIN? i.e. put EXPLAIN in front of the SELECT and post the results.
I've had quite a few wins with using SELECT STRAIGHT JOIN and ordering the tables according to their size.
STRAIGHT JOIN stops mysql guess which order to join tables and does it in the order specified so if you use your smallest table first you can reduce the amount of rows being joined.
I'm assuming you have indexes on mov_id, gen_movieid, gen_catid and grenre_id?

The 'using temporary, using filesort' is from the group concat distinct genres.genre_name.
Trying to get a distinct on a column without an index will cause a temp table to be used.
Try adding an index on genre_name column.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Need help optimizing MySQL query with joins - mysql

Related

Segmentation of 1 Million customer getting slow

Cross-Apply bad for a larger database or alternatives perform better?

Query with multiple table joins taking too much time despite indexing

Optimizing MySQL query with subselect

Any way to improve this slow query?

Categories

Resources