In attempting to pull a large series of columns (~15-20) from several joined tables, I put together 2 views that would pull the necessary information. In my local DB (only ~1k posts rows), joining these views worked fine, however; when I created those same views on our production DB (~30k posts rows) and attempted to join the view, I realized that that solution wouldn't scale beyond a test dataset.
I attempted to migrate those 2 views (categories data—like categories.title—and creators' data—like users.display_name) into a CTE post_data which, in theory, would act as a keyed version of those views, and allow me to get all post data for the eligible posts.
I have put together a sample DBFiddle with some test data to explain the table structure. The actual data has many more columns, but this is representative of the joins necessary to build the query.
table : posts
+-----+-----------+------------+------------------------------------------+----------------------------------------+
| id | parent_id | created_by | message | attachments |
+-----+-----------+------------+------------------------------------------+----------------------------------------+
| 8 | NULL | 8 | laptop for sale | [{"media_id": 1380}] |
| 9 | NULL | 4 | NEW lamp shade up for grabs | [{"media_id": 1442}, {"link_id": 103}] |
| 10 | 1 | 7 | Oooh I could be interested | |
| 11 | 1 | 7 | DMing you now! I've been looking for one | |
+-----+-----------+------------+------------------------------------------+----------------------------------------+
table : users
+----+------------------+---------------------------+
| id | display_name | created_at |
+----+------------------+---------------------------+
| 1 | John Appleseed | 2018-02-20T00:00:00+00:00 |
| 2 | Massimo Jenkins | 2018-05-14T00:00:00+00:00 |
| 3 | Johanna Marionna | 2018-06-05T00:00:00+00:00 |
| 4 | Jackson Creek | 2018-11-15T00:00:00+00:00 |
| 5 | Joe Schmoe | 2019-01-09T00:00:00+00:00 |
| 6 | John Johnson | 2019-02-14T00:00:00+00:00 |
| 7 | Donna Madison | 2019-05-14T00:00:00+00:00 |
| 8 | Jenna Kaplan | 2019-06-23T00:00:00+00:00 |
+----+------------------+---------------------------+
table : categories
+----+------------+------------+-------------------------------------------------------+
| id | created_by | title | description |
+----+------------+------------+-------------------------------------------------------+
| 1 | 2 | Technology | Anything tech; Consumer, business or education tools! |
| 2 | 2 | Home Goods | Anything for the home |
+----+------------+------------+-------------------------------------------------------+
table : categories_posts
+---------+-------------+
| post_id | category_id |
+---------+-------------+
| 8 | 1 |
| 9 | 1 |
| 10 | 1 |
| 11 | 1 |
+---------+-------------+
table : users_categories
+---------+-------------+
| user_id | category_id |
+---------+-------------+
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 4 | 1 |
+---------+-------------+
table : posts_removed
+---------+----------------------+------------+
| post_id | removed_at | removed_by |
+---------+----------------------+------------+
| 10 | 2019-01-22 09:08:14 | 7 |
+---------+----------------------+------------+
In the below query, eligible posts are determined in the base SELECT; then, the post_data CTE is joined to the result set (limited to 25 rows) and all columns from the CTE are returned.
WITH post_data AS (
SELECT posts.id,
posts.parent_id,
posts.created_by,
posts.attachments,
categories_posts.category_id,
categories.title,
categories.created_by AS category_created_by,
creator.display_name AS creator_display_name,
creator.created_at AS creator_created_at
/* ... And a whole bunch of other fields from posts, categories_posts, users */
FROM posts
LEFT OUTER JOIN categories_posts
ON categories_posts.post_id = posts.id
LEFT OUTER JOIN categories
ON categories.id = categories_posts.category_id
LEFT OUTER JOIN users creator
ON creator.id = posts.created_by
/* ... And a whole bunch of other joins to facilitate the selected fields */
)
SELECT post_data.*
FROM posts
/* Set up the criteria for the posts selected before getting their data from the CTE */
LEFT OUTER JOIN posts_removed removed ON removed.post_id = posts.id
LEFT OUTER JOIN users user_me ON user_me.id = "1"
LEFT OUTER JOIN users_followed ON users_followed.user_id = posts.created_by
AND users_followed.followed_by = user_me.id
LEFT OUTER JOIN categories_posts ON categories_posts.post_id = posts.id
LEFT OUTER JOIN users_categories ON users_categories.category_id = categories_posts.category_id
LEFT OUTER JOIN posts_removed pp_removed ON pp_removed.post_id = posts.parent_id
/* Join our post_data on the post's ID */
JOIN post_data ON post_data.id = posts.id
WHERE
(
(
users_categories.user_id = user_me.id AND users_categories.left_at IS NULL
) OR categories_posts.category_id IS NULL
) AND (
posts.created_by = user_me.id
OR users_followed.followed_by = user_me.id
OR categories_posts.category_id IS NOT NULL
) AND removed.removed_at IS NULL
AND pp_removed.removed_at IS NULL
AND (post_data.id = posts.id OR post_data.id = posts.parent_id)
ORDER BY posts.id DESC
LIMIT 25
In theory, I thought this would work by selecting the rows based on the base select criteria, then doing an index scan for the CTE based on the Post ID; however, it seems that the query optimizer chooses instead to do a full table scan of the posts table.
The EXPLAIN SELECT gave me this information:
+----+-------------+------------------------+--------+-------------------------------+-------------+---------+---------------------------------------------+--------+----------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | extra |
+----+-------------+------------------------+--------+-------------------------------+-------------+---------+---------------------------------------------+--------+----------+----------------------------------------------------+
| 1 | PRIMARY | posts | ALL | PRIMARY,parent_id,created_by | | | | 33870 | 100 | Using temporary; Using filesort |
| 1 | PRIMARY | removed | eq_ref | PRIMARY | PRIMARY | 8 | posts.id | 1 | 19 | Using where |
| 1 | PRIMARY | user_me | const | PRIMARY | PRIMARY | 8 | const | 1 | 100 | Using where; Using index |
| 1 | PRIMARY | categories_posts | eq_ref | PRIMARY | PRIMARY | 8 | api.posts.id | 1 | 100 | |
| 1 | PRIMARY | categories | eq_ref | PRIMARY | PRIMARY | 8 | categories_posts.category_id | 1 | 100 | Using index |
| 1 | PRIMARY | users_categories | eq_ref | user_id_2,user_id,category_id | user_id_2 | 16 | user_me.id,api.categories_posts.category_id | 1 | 100 | Using where |
| 1 | PRIMARY | users_followed | eq_ref | user_id,followed_by | user_id | 16 | posts.created_by,api.user_me.id | 1 | 100 | Using where; Using index |
| 1 | PRIMARY | pp_removed | eq_ref | PRIMARY | PRIMARY | 8 | api.posts.parent_id | 1 | 19 | Using where |
| 1 | PRIMARY | <derived2> | ALL | | | | | 493911 | 19 | Using where; Using join buffer (Block Nested Loop) |
| 2 | DERIVED | posts | ALL | | | | | 33870 | 100 | Using temporary |
| 2 | DERIVED | categories_posts | eq_ref | PRIMARY | PRIMARY | 8 | api.posts.id | 1 | 100 | |
| 2 | DERIVED | categories | eq_ref | PRIMARY | PRIMARY | 8 | api.categories_posts.category_id | 1 | 100 | |
| 2 | DERIVED | posts_votes | ref | post_id | post_id | 8 | api.posts.id | 1 | 100 | Using index |
| 2 | DERIVED | pp | eq_ref | PRIMARY | PRIMARY | 8 | api.posts.parent_id | 1 | 100 | |
| 2 | DERIVED | pp_removed | eq_ref | PRIMARY | PRIMARY | 8 | api.pp.id | 1 | 100 | Using index |
| 2 | DERIVED | removed | eq_ref | PRIMARY | PRIMARY | 8 | api.posts.id | 1 | 100 | Using index |
| 2 | DERIVED | creator | eq_ref | PRIMARY | PRIMARY | 8 | api.posts.created_by | 1 | 100 | |
| 2 | DERIVED | usernames | ref | user_id | user_id | 8 | api.creator.id | 1 | 100 | |
| 2 | DERIVED | verifications | ALL | | | | | 4 | 100 | Using where; Using join buffer (Block Nested Loop) |
| 2 | DERIVED | categories_identifiers | ref | category_id | category_id | 8 | api.categories.id | 1 | 100 | |
+----+-------------+------------------------+--------+-------------------------------+-------------+---------+---------------------------------------------+--------+----------+----------------------------------------------------+
Beyond this, I tried refactoring my query to try and force key usage in the posts table, such as using FORCE INDEX(PRIMARY) in the select, and moving the CTE be the base query and adding a filter WHERE id IN ({the original base query}), but it seems the optimizer still does a full table scan.
In case it's helpful to decode what's happening in the query plan:
At time of writing, there are 33,387 posts rows, but the query plan shows
The query plan shows a full table scan which returns 33,870 rows
The query plan also shows the derived table (<derived2>) as having 493,911 rows
My core questions are:
Am I correct when I say that subqueries should only be executed once per result row from the base select query? If so, then the CTE should also use the JOIN on posts.id and likely use the table index?
Why does the query plan show that it selects 33,870 rows when there are only 33,387? And where do the 493,911 rows come from?
How do you prevent a full table scan in this case?
Give this a try... Do the LIMIT 25 before JOINing to the WITH:
SELECT * FROM
( SELECT ... FROM posts
JOIN categories_posts ...
ORDER BY posts.id DESC
LIMIT 25 ) AS x
JOIN post_data
ON post_data.id IN (x.id, x.parent_id)
ORDER BY posts.id DESC
Related
Today we just noticed that two queries apparently identical were resulting in vastly different execution plans, in turn resulting in vastly different performance.
Another startling fact is that the query aggregating over 50k+ rows runs 30x faster than the query aggregating over 600 results.
The "fast" query runs in ~400ms and the "slow" query runs in ~10sc.
The slow query:
SELECT account_ownership_id, count(*)
FROM posts
JOIN accounts ON posts.account_id = accounts.id
JOIN platforms ON accounts.platform_id = platforms.id
JOIN sponsor_annotations ON sponsor_annotations.post_id = posts.id
JOIN rightsholders_placements
ON (rightsholders_placements.rightsholder_id = sponsor_annotations.rightsholder_id
AND rightsholders_placements.placement_id = sponsor_annotations.placement_id)
JOIN clients_sponsors_placements
ON (clients_sponsors_placements.rightsholder_id = sponsor_annotations.rightsholder_id
AND clients_sponsors_placements.sponsor_id = sponsor_annotations.sponsor_id
AND clients_sponsors_placements.placement_id = sponsor_annotations.placement_id)
WHERE clients_sponsors_placements.client_id = 1125 and accounts.platform_id = 5
GROUP BY sponsor_annotations.account_ownership_id LIMIT 1000
The fast query:
SELECT account_ownership_id, count(*)
FROM posts
JOIN accounts ON posts.account_id = accounts.id
JOIN platforms ON accounts.platform_id = platforms.id
JOIN sponsor_annotations ON sponsor_annotations.post_id = posts.id
JOIN rightsholders_placements
ON (rightsholders_placements.rightsholder_id = sponsor_annotations.rightsholder_id
AND rightsholders_placements.placement_id = sponsor_annotations.placement_id)
JOIN clients_sponsors_placements
ON (clients_sponsors_placements.rightsholder_id = sponsor_annotations.rightsholder_id
AND clients_sponsors_placements.sponsor_id = sponsor_annotations.sponsor_id
AND clients_sponsors_placements.placement_id = sponsor_annotations.placement_id)
WHERE clients_sponsors_placements.client_id = 1125 and accounts.platform_id = 1
GROUP BY sponsor_annotations.account_ownership_id LIMIT 1000
As you can see, the only difference between the two queries is on the platform id where clause. I would expect the query plans to be very similar, but they're not. And even though the slow query aggregates on much fewer rows, it takes significantly more time.
What's also interesting is that changing the condition to be "accounts.platform_id in (1,5)" speeds up the query significantly (the query becomes as fast as the one where we're doing accounts.platform_id = 1)
Here is the explain plan for the slow query:
+---+--------+-----------------------------+--------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+----+--------------------------------------------------------------------------------------------------------------------------------+-----+-------+----------------------------------------------+
| 1 | SIMPLE | platforms | const | PRIMARY | PRIMARY | 4 | const | 1 | 100.0 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | accounts | ref | PRIMARY,fk_accounts_platforms | fk_accounts_platforms | 4 | const | 354 | 100.0 | Using index |
| 1 | SIMPLE | posts | ref | PRIMARY,fk_posts_accounts_id | fk_posts_accounts_id | 4 | sports.accounts.id | 2 | 100.0 | Using index |
| 1 | SIMPLE | sponsor_annotations | ref | sponsor_annotations,fk_sponsor_annotations_sponsor_placements,fk_sponsor_annotations_rightsholder_placements,fk_sponsor_annotations_sponsors,fk_sponsor_annotations_account_ownership_types | sponsor_annotations | 4 | sports.posts.id | 29 | 100.0 | |
| 1 | SIMPLE | clients_sponsors_placements | eq_ref | PRIMARY,fk_client_sponsor_placements_clients_rightsholders_sponsors,fk_client_sponsor_placements_sponsor_placements | PRIMARY | 16 | const,sports.sponsor_annotations.placement_id,sports.sponsor_annotations.rightsholder_id,sports.sponsor_annotations.sponsor_id | 1 | 100.0 | Using index |
| 1 | SIMPLE | rightsholders_placements | eq_ref | PRIMARY,fk_rightsholders_placements_rightsholders | PRIMARY | 8 | sports.sponsor_annotations.placement_id,sports.sponsor_annotations.rightsholder_id | 1 | 100.0 | Using index |
+---+--------+-----------------------------+--------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+----+--------------------------------------------------------------------------------------------------------------------------------+-----+-------+----------------------------------------------+
And the explain plan for the faster query:
+---+--------+-----------------------------+--------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+----+--------------------------------------------------------------------------------------------------------------------------------------------------+-----+-------+----------------------------------------------+
| 1 | SIMPLE | platforms | const | PRIMARY | PRIMARY | 4 | const | 1 | 100.0 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | clients_sponsors_placements | ref | PRIMARY,fk_client_sponsor_placements_clients_rightsholders_sponsors,fk_client_sponsor_placements_sponsor_placements | PRIMARY | 4 | const | 223 | 100.0 | Using index |
| 1 | SIMPLE | rightsholders_placements | eq_ref | PRIMARY,fk_rightsholders_placements_rightsholders | PRIMARY | 8 | sports.clients_sponsors_placements.placement_id,sports.clients_sponsors_placements.rightsholder_id | 1 | 100.0 | Using index |
| 1 | SIMPLE | sponsor_annotations | ref | sponsor_annotations,fk_sponsor_annotations_sponsor_placements,fk_sponsor_annotations_rightsholder_placements,fk_sponsor_annotations_sponsors,fk_sponsor_annotations_account_ownership_types | fk_sponsor_annotations_sponsor_placements | 12 | sports.clients_sponsors_placements.rightsholder_id,sports.clients_sponsors_placements.sponsor_id,sports.clients_sponsors_placements.placement_id | 158 | 100.0 | |
| 1 | SIMPLE | posts | eq_ref | PRIMARY,fk_posts_accounts_id | PRIMARY | 4 | sports.sponsor_annotations.post_id | 1 | 100.0 | |
| 1 | SIMPLE | accounts | eq_ref | PRIMARY,fk_accounts_platforms | PRIMARY | 4 | sports.posts.account_id | 1 | 100.0 | Using where |
+---+--------+-----------------------------+--------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+----+--------------------------------------------------------------------------------------------------------------------------------------------------+-----+-------+----------------------------------------------+
How would I go about changing either my query or my schema to make sure that in both cases the execution plan is identical?
Thanks,
I'm creating e-commerce web site using MySQL. I have successfully created and inserted data to database.
Here is my database schema
table: categories table: product_types
+----+--------------+ +----+-------------+------------+
| id | name | | id | category_id | name |
+----+--------------+ +----+-------------+------------+
| 1 | Electronics | | 1 | 1 | Smartphone |
| 2 | Fashion | | 2 | 1 | Speakers |
+----+--------------+ +----+-------------+------------+
table: products
+----+-----------------+-------------+-------------------+-------+
| id | product_type_id | category_id | name | price |
+----+-----------------+-------------+-------------------+-------+
| 1 | 1 | 1 | Samsung Galaxy A3 | 300 |
| 2 | 1 | 1 | Samsung Galaxy A7 | 400 |
+----+-----------------+-------------+-------------------+-------+
table: options table: option_values
+----+-----------------+-------+ +----+-----------+------------+
| id | product_type_id | name | | id | option_id | name |
+----+-----------------+-------+ +----+-----------+------------+
| 1 | 1 | RAM | | 1 | 1 | 512 MB |
| 2 | 1 | Screen| | 2 | 1 | 1 GB |
| 3 | 1 | OS | | 3 | 3 | Android 5 |
+----+-----------------+-------+ | 4 | 3 | Android 6 |
| 5 | 2 | HD |
| 6 | 2 | FHD |
+----+-----------+------------+
table: product_option_values
+----+------------+-----------+-----------------+
| id | product_id | option_id | option_value_id |
+----+------------+-----------+-----------------+
| 15 | 1 | 1 | 1 |
| 16 | 1 | 2 | 5 |
| 17 | 1 | 3 | 3 |
| 18 | 2 | 1 | 2 |
| 19 | 2 | 2 | 6 |
| 20 | 2 | 3 | 4 |
+----+------------+-----------+-----------------+
Search must trigger through name column of each table and return name and price from products table.
The problem is that I don't know how to perform full text search joining all that tables.
Is there any easy way to do it?
You need a query that LEFT JOINs on each table to search with a condition based on fulltext search function MATCH, with a WHERE clause to filter out non-matching records. The SELECT DISTINCT ensures that you will not see duplicates.
We need to adjust manually the JOIN criteria from each table to products : option_values is the most complicated case as it does not directly references products (an additional join on product_option_values is needed, aliased pov below.
SELECT DISTINCT p.name, p.price
FROM
products p
LEFT JOIN categories c
ON MATCH(c.name) AGAINST('foo' IN NATURAL LANGUAGE MODE)
AND c.id = p.category_id
LEFT JOIN product_types pt
ON MATCH(pt.name) AGAINST('foo' IN NATURAL LANGUAGE MODE)
AND pt.category_id = p.category_id
LEFT JOIN options o
ON MATCH(o.name) AGAINST('foo' IN NATURAL LANGUAGE MODE)
AND o.product_type_id = p.product_type_id
LEFT JOIN product_option_values pov
ON pov.product_id = p.id
LEFT JOIN option_values ov
ON MATCH(ov.name) AGAINST('foo' IN NATURAL LANGUAGE MODE)
AND ov.id = pov.option_value_id
WHERE
COALESCE(c.id, pt.id, o.id, ov.id) IS NOT NULL
I have a query that is doing what I want on a truncated dataset but when I run it on the full dataset (millions of rows) it takes forever to run.
I have two tables - microsat_table and coverage_table.
microsat_table:
+----+----------+-----------+---------+-------------------------------------------------+
| id | Seq_Name | SSR_Start | SSR_End | Sequence |
+----+----------+-----------+---------+-------------------------------------------------+
| 2 | chr2L | 11050 | 11067 | TTTAATTTAATTTAATTT |
| 3 | chr2L | 44173 | 44187 | TATGTATGTATGTAT |
| 5 | chr2L | 54431 | 54477 | ATAATAATATAATATAATATAATATAATATATAATAATATAATAATA |
| 6 | chr2L | 57571 | 57594 | ATATATATATATATATATATATAT |
| 7 | chr2L | 72439 | 72453 | CATACATACATACAT |
| 8 | chr2L | 74028 | 74042 | ATACATACATACATA |
| 9 | chr2L | 85573 | 85587 | ATTTTATTTTATTTT |
| 10 | chr2L | 92429 | 92443 | ACATACATACATACA |
| 11 | chr2L | 138132 | 138166 | TATATAGATATATAAATATATATATATATATATAT |
| 13 | chr2L | 162245 | 162259 | ATACATACATACATA |
+----+----------+-----------+---------+-------------------------------------------------+
coverage_table:
| Seq_Name | Start | Stop | Coverage |
+----------+-------+-------+----------+
| chr2L | 5716 | 5771 | 1 |
| chr2L | 8730 | 8824 | 1 |
| chr2L | 9894 | 9948 | 1 |
| chr2L | 19391 | 19491 | 1 |
| chr2L | 19575 | 19675 | 1 |
| chr2L | 19773 | 19776 | 1 |
| chr2L | 19776 | 19872 | 2 |
| chr2L | 21920 | 21959 | 1 |
| chr2L | 21959 | 22020 | 2 |
| chr2L | 22020 | 22059 | 1 |
+----------+-------+-------+----------+
I want to add a column to the microsat_table which calculates the average coverage (from the coverage_table) over all rows where the Start and Stop values in the coverage table fall within the SSR_Start and SSR_End values in the microsat_table.
Example result:
+-----+----------+-----------+---------+--------------------------------+---------+
| id | Seq_Name | SSR_Start | SSR_End | Sequence | avg |
+-----+----------+-----------+---------+--------------------------------+---------+
| 53 | chr2L | 402489 | 402503 | AAAACAAAACAAAAC | 3.0000 |
| 64 | chr2L | 447214 | 447233 | CAGCAGCAGCAGCAGCAGCA | 8.0000 |
| 66 | chr2L | 457839 | 457868 | CAGCAGCAGCAACAGCAGCAGCAGGCAGCA | 2.0000 |
| 105 | chr2L | 579589 | 579603 | TCGAATCGAATCGAA | 11.0000 |
| 123 | chr2L | 628484 | 628501 | TAATGTTAATGTTAATGT | 6.0000 |
+-----+----------+-----------+---------+--------------------------------+---------+
My query is:
UPDATE microsat_table
JOIN
(SELECT m.id, SUM(p.Coverage)/count(p.Start)
AS avg FROM microsat_table m
LEFT OUTER JOIN coverage_table p
ON m.Seq_Name LIKE p.Seq_Name
WHERE m.Seq_Name LIKE p.Seq_Name GROUP BY m.id) AS qt
ON microsat_table.id = qt.id
SET microsat_table.avg = qt.avg;
Explain results for the truncated table:
+----+-------------+----------------------+------------+-------+---------------------------------------------------+-------------+---------+--------------------------------+--------+----------+----------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------------------+------------+-------+---------------------------------------------------+-------------+---------+--------------------------------+--------+----------+----------------------------------------------------+
| 1 | UPDATE | microsat_table_short | NULL | ALL | PRIMARY | NULL | NULL | NULL | 40356 | 100.00 | NULL |
| 1 | PRIMARY | <derived2> | NULL | ref | <auto_key0> | <auto_key0> | 4 | testdb.microsat_table_short.id | 1236 | 100.00 | NULL |
| 2 | DERIVED | m | NULL | index | PRIMARY,Sequence,Seq_Name,Motif,SSR_Start,SSR_End | Seq_Name | 53 | NULL | 40356 | 100.00 | Using index; Using temporary; Using filesort |
| 2 | DERIVED | p | NULL | ALL | NULL | NULL | NULL | NULL | 100163 | 1.23 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+----------------------+------------+-------+---------------------------------------------------+-------------+---------+--------------------------------+--------+----------+----------------------------------------------------+
I added indexes (including trying HASH and BTREE indexes) which sped it up considerably, but I've let it run for 1.5 days on the larger dataset and it still didn't finish.
Does anyone have any suggestions on how to make it run faster?
Thanks!!
There are a few relatively minor infelicities in your code. However the big problem is that while you say you want to calculate "the average coverage (from the coverage_table) over all rows where the Start and Stop values in the coverage table fall within the SSR_Start and SSR_End values in the microsat_table" you don't actually seem to limit the query to doing that. Instead you only coded a match on Seq_Name.
The code below attempts to fix that (I used >= and <= which may not be what you need) and the other more minor bits:
UPDATE microsat_table
JOIN
(
SELECT
m.id,
AVG(p.Coverage) AS avg -- MySQL has it's own average function
FROM
microsat_table m
INNER JOIN coverage_table p ON -- Change to INNER JOIN, your old WHERE clause had this effect anyway
m.Seq_Name = p.Seq_Name -- Use '=' not 'Like' when looking for an exact match
WHERE
p.Start >= m.SSR_Start -- This WHERE clause is the most important change
AND p.End <= m.SSR_End -- You omitted it in your version
GROUP BY
m.id) AS qt
ON microsat_table.id = qt.id
SET microsat_table.avg = qt.avg;
Maybe updating the table in 1 big transaction is simply too much for the system? (what is the size of the table you're updating?) You could try doing it in blocks. I'd also go for a simple sub-select here, seems easier to read IMHO.
Also take note of Steve Lovell's remark that your query doesn't seem to care about the start/stop columns. Since you probably forgot it by accident I've added it here too, removing it shouldn't be too difficult =)
DECLARE #min_id int,
#max_id int,
#blocksize int
SELECT #min_id = MIN(id),
#max_id = MAX(id),
#blocksize = 100000 -- adapt as needed
FROM microsat_table
WHILE #min_id <= #max_id
BEGIN
UPDATE microsat_table
SET microsat_table.avg = ((SELECT SUM(p.Coverage)/count(p.Start) AS avg
FROM microsat_table m
LEFT OUTER JOIN coverage_table p
ON m.Seq_Name LIKE p.Seq_Name -- if possble use '=' here instead of LIKE
AND p.Start >= m.SSR_Start -- flagrantly "stolen" from Steve Lovell's answer
AND p.End <= m.SSR_End
WHERE m.id = microsat_table.id)
-- limit update to this block:
WHERE microsat_table.id BETWEEN #min_id AND (#min_id + #blocksize - 1)
-- prepare for next block
SELECT #min_id = #min_id + #blocksize
END
You probably want the primary key on the id field of microsat_table and on the Seq_name + Start column of the coverage_table.
The query works. With 10.000 procuts, it takes 11 seconds. If I don't use ORDER BY it takes only 1 sec. But I need ORDER BY.
Can we optimize it and how?
SELECT
u.urunID,
i.urunadi,
u.seo,
u.stok_kodu,
u.kstok_sayisi,
u.stok_sayisi,
u.goruntuleme,
(SELECT SUM(su.adet) FROM siparis_urunler su LEFT JOIN siparis s ON s.siparisID = su.siparisID WHERE s.durum_id NOT IN (26, 24) AND su.urunID = u.urunID) AS sadet
FROM
urunler u
INNER JOIN urun_isim i ON u.urunID = i.urunID
WHERE
u.stok_sayisi <= u.kstok_sayisi
AND u.durum = 1
GROUP BY
u.urunID
ORDER BY
sadet DESC
LIMIT 0, 20
EXPLAIN:
+----+--------------------+-------+--------+---------------------------------+-----------+---------+-----------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+--------+---------------------------------+-----------+---------+-----------------------------+------+----------------------------------------------+
| 1 | PRIMARY | i | index | PRIMARY,urunadi2 | urunadi | 768 | NULL | 4997 | Using index; Using temporary; Using filesort |
| 1 | PRIMARY | u | eq_ref | PRIMARY,urunID,urunler,urunler2 | PRIMARY | 4 | katalog_db.i.urunID | 1 | Using where |
| 3 | DEPENDENT SUBQUERY | sp | ALL | NULL | NULL | NULL | NULL | 11 | Using where |
| 2 | DEPENDENT SUBQUERY | s | ALL | PRIMARY,siparis | NULL | NULL | NULL | 805 | Using where |
| 2 | DEPENDENT SUBQUERY | su | ref | surunler2 | surunler2 | 10 | katalog_db.s.siparisID,func | 1 | Using where |
+----+--------------------+-------+--------+---------------------------------+-----------+---------+-----------------------------+------+----------------------------------------------+
Does this run any faster?
SELECT
u.urunID,
i.urunadi,
u.seo,
u.stok_kodu,
u.kstok_sayisi,
u.stok_sayisi,
u.goruntuleme,
SUM(su.adet) AS sadet
FROM
urunler u
INNER JOIN urun_isim i ON u.urunID = i.urunID
INNER JOIN siparis_urunler su ON su.urunID = u.urunID
LEFT JOIN siparis s ON s.siparisID = su.siparisID
WHERE
u.stok_sayisi <= u.kstok_sayisi
AND s.durum_id NOT IN (26, 24)
AND u.durum = 1
GROUP BY
u.urunID,
i.urunadi,
u.seo,
u.stok_kodu,
u.kstok_sayisi,
u.stok_sayisi,
u.goruntuleme
ORDER BY 8 DESC
could you please help me with a monster.
Do you see any issue with this one?
Would like to reach the execution time below the second, is it possible?
Please ask for any other data you may need to understand the structure of DB. Any tips&tricks are welcome!
SELECT
ORD_CLI.COD_AGE,
ORD_CLI_RIGHE.DOC_ID,
OFF_CLI.off_cli_id,
ORD_CLI_RIGHE.DOC_RIGA_ID,
ORD_CLI_RIGHE.COD_ART,
ART_PESO.PESO_ART,
ORD_CLI.ANNO_DOC,
ORD_CLI.NUM_DOC,
ORD_CLI.SERIE_DOC,
ORD_CLI.DATA_DOC,
CF.RAG_SOC_CF,
AGENTI.NOME_AGE,
ORD_CLI.COD_CF,
ORD_CLI.COD_IVA,
ORD_CLI.COD_DEP,
ORD_CLI_TOT.IMPONIBILE_V1 AS IMPONIBILE_ORDINE,
FATT_CLI_TOT.IMPONIBILE_V1 AS IMPONIBILE_FATTURA,
ORD_CLI_TOT.IVA_V1,
SUM(ART_PESO.PESO_ART) AS weight,
SUM(FATT_CLI_RIGHE.QUANT_RIGA) AS quantity,
SUM(FATT_CLI_RIGHE.QUANT_RIGA*FATT_CLI_RIGHE.PREZZO_LORDO_VU1) AS sell_price,
SUM(FATT_CLI_RIGHE.QUANT_RIGA*DDT_FOR_RIGHE.PREZZO_LORDO_VU1) AS acqisition_price1,
SUM(FATT_CLI_RIGHE.QUANT_RIGA*FATT_FOR_RIGHE.PREZZO_LORDO_VU1) AS acqisition_price2,
SUM(FATT_CLI_RIGHE.QUANT_RIGA*FATT_CLI_RIGHE_PROVV.IMPORTO_PROVV_VU1) AS agent_reward,
SUM(FATT_CLI_RIGHE.QUANT_RIGA*ART_PESO.PESO_ART * 0.13) AS transport_price,
SUM(FATT_CLI_RIGHE.QUANT_RIGA*(
FATT_CLI_RIGHE.PREZZO_LORDO_VU1
- COALESCE(DDT_FOR_RIGHE.PREZZO_LORDO_VU1, 0)
- COALESCE(FATT_FOR_RIGHE.PREZZO_LORDO_VU1, 0)
- COALESCE(FATT_CLI_RIGHE_PROVV.IMPORTO_PROVV_VU1, 0)
- COALESCE(ART_PESO.PESO_ART, 0) * 0.13
)) AS net_earning,
OFF_CLI.stima_prezzo_acquisto,
OFF_CLI.stima_prezzo_trasporto,
OFF_CLI.stima_provvigioni_agenti,
OFF_CLI.stima_utile
FROM ORD_CLI
INNER JOIN ORD_CLI_RIGHE
ON ORD_CLI_RIGHE.DOC_ID = ORD_CLI.DOC_ID
LEFT JOIN ORD_CLI_RIGHE_SPEC
ON ORD_CLI_RIGHE.DOC_RIGA_ID = ORD_CLI_RIGHE_SPEC.DOC_RIGA_ID
INNER JOIN ART_PESO
ON ART_PESO.COD_ART = ORD_CLI_RIGHE.COD_ART
INNER JOIN ORD_CLI_TOT
ON ORD_CLI.DOC_ID = ORD_CLI_TOT.DOC_ID
INNER JOIN AGENTI
ON AGENTI.COD_AGE = ORD_CLI.COD_AGE
INNER JOIN CF
ON CF.COD_CF = ORD_CLI.COD_CF
LEFT JOIN FATT_CLI_RIGHE_SPEC
ON ORD_CLI_RIGHE.DOC_RIGA_ID = FATT_CLI_RIGHE_SPEC.ORD_RIGA_ID
LEFT JOIN FATT_CLI_RIGHE
ON FATT_CLI_RIGHE.DOC_RIGA_ID = FATT_CLI_RIGHE_SPEC.DOC_RIGA_ID
LEFT JOIN FATT_CLI_TOT
ON FATT_CLI_RIGHE.DOC_ID = FATT_CLI_TOT.DOC_ID
LEFT JOIN FATT_CLI_RIGHE_PROVV
ON FATT_CLI_RIGHE_PROVV.DOC_RIGA_ID = FATT_CLI_RIGHE_SPEC.DOC_RIGA_ID
LEFT JOIN FATT_CLI_RIGHE_LOTTI
ON FATT_CLI_RIGHE_LOTTI.DOC_RIGA_ID = FATT_CLI_RIGHE_SPEC.DOC_RIGA_ID
LEFT JOIN DDT_FOR_RIGHE_LOTTI
ON DDT_FOR_RIGHE_LOTTI.COD_LOT = FATT_CLI_RIGHE_LOTTI.COD_LOT
LEFT JOIN DDT_FOR_RIGHE
ON DDT_FOR_RIGHE.DOC_RIGA_ID = DDT_FOR_RIGHE_LOTTI.DOC_RIGA_ID
LEFT JOIN FATT_FOR_RIGHE
ON FATT_FOR_RIGHE.DOC_RIGA_ID = FATT_CLI_RIGHE_LOTTI.COD_LOT
LEFT JOIN OFF_CLI_RIGHE
ON OFF_CLI_RIGHE.DOC_RIGA_ID = ORD_CLI_RIGHE_SPEC.OFF_RIGA_ID
LEFT JOIN OFF_CLI
ON OFF_CLI.DOC_ID = OFF_CLI_RIGHE.DOC_ID
WHERE
ORD_CLI.COD_BUSN_UN='P'
AND OFF_CLI_RIGHE.DOC_RIGA_ID IS NOT NULL
AND ORD_CLI.DATA_DOC >= '2012-11-29'
AND ORD_CLI.DATA_DOC <= '2013-02-28'
GROUP BY ORD_CLI.DOC_ID
ORDER BY ORD_CLI.DATA_DOC
DESC LIMIT 30 OFFSET 0
Time of execution
Showing rows 0 - 29 ( 30 total, Query took 6.3458 sec)
EXPLAIN of the query
+----+-------------+----------------------+--------+-----------------------------------------------------------------------------+----------------------------------+---------+--------------------------------------------+------+----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------------------+--------+-----------------------------------------------------------------------------+----------------------------------+---------+--------------------------------------------+------+----------+----------------------------------------------+
| 1 | SIMPLE | ORD_CLI | range | PRIMARY,ORD_CLI_DATA_DOC,ORD_CLI_COD_CF,ORD_CLI_COD_BUSN_UN,ORD_CLI_COD_AGE | ORD_CLI_DATA_DOC | 4 | NULL | 3728 | 100.00 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | AGENTI | eq_ref | PRIMARY | PRIMARY | 38 | ORD_CLI.COD_AGE | 1 | 100.00 | Using where |
| 1 | SIMPLE | CF | eq_ref | PRIMARY | PRIMARY | 38 | ORD_CLI.COD_CF | 1 | 100.00 | |
| 1 | SIMPLE | ORD_CLI_TOT | eq_ref | PRIMARY | PRIMARY | 62 | ORD_CLI.DOC_ID | 1 | 100.00 | |
| 1 | SIMPLE | ORD_CLI_RIGHE | ref | PRIMARY,ORD_CLI_RIGHE_DOC_ID,ORD_CLI_RIGHE_COD_ART | ORD_CLI_RIGHE_DOC_ID | 62 | ORD_CLI_TOT.DOC_ID | 2 | 100.00 | Using where |
| 1 | SIMPLE | ART_PESO | eq_ref | PRIMARY | PRIMARY | 92 | ORD_CLI_RIGHE.COD_ART | 1 | 100.00 | |
| 1 | SIMPLE | ORD_CLI_RIGHE_SPEC | eq_ref | PRIMARY,ORD_CLI_RIGHE_SPEC_OFF_RIGA_ID | PRIMARY | 92 | ORD_CLI_RIGHE.DOC_RIGA_ID | 1 | 100.00 | Using where |
| 1 | SIMPLE | OFF_CLI_RIGHE | ref | DOC_RIGA_ID | DOC_RIGA_ID | 92 | ORD_CLI_RIGHE_SPEC.OFF_RIGA_ID | 1 | 100.00 | Using where |
| 1 | SIMPLE | OFF_CLI | ref | DOC_ID | DOC_ID | 63 | OFF_CLI_RIGHE.DOC_ID | 1 | 100.00 | |
| 1 | SIMPLE | FATT_CLI_RIGHE_SPEC | ref | FATT_CLI_RIGHE_SPEC_ORD_RIGA_ID | FATT_CLI_RIGHE_SPEC_ORD_RIGA_ID | 93 | ORD_CLI_RIGHE.DOC_RIGA_ID | 1 | 100.00 | Using index |
| 1 | SIMPLE | FATT_CLI_RIGHE | eq_ref | PRIMARY | PRIMARY | 92 | FATT_CLI_RIGHE_SPEC.DOC_RIGA_ID | 1 | 100.00 | |
| 1 | SIMPLE | FATT_CLI_TOT | eq_ref | PRIMARY | PRIMARY | 62 | FATT_CLI_RIGHE.DOC_ID | 1 | 100.00 | |
| 1 | SIMPLE | FATT_CLI_RIGHE_PROVV | ref | FATT_CLI_RIGHE_PROVV_DOC_RIGA_ID | FATT_CLI_RIGHE_PROVV_DOC_RIGA_ID | 92 | FATT_CLI_RIGHE_SPEC.DOC_RIGA_ID | 1 | 100.00 | |
| 1 | SIMPLE | FATT_CLI_RIGHE_LOTTI | ref | FATT_CLI_RIGHE_LOTTI_DOC_RIGA_ID | FATT_CLI_RIGHE_LOTTI_DOC_RIGA_ID | 92 | FATT_CLI_RIGHE_SPEC.DOC_RIGA_ID | 1 | 100.00 | |
| 1 | SIMPLE | DDT_FOR_RIGHE_LOTTI | ref | DDT_FOR_RIGHE_LOTTI_COD_LOT | DDT_FOR_RIGHE_LOTTI_COD_LOT | 92 | FATT_CLI_RIGHE_LOTTI.COD_LOT | 1 | 100.00 | |
| 1 | SIMPLE | DDT_FOR_RIGHE | eq_ref | PRIMARY | PRIMARY | 92 | DDT_FOR_RIGHE_LOTTI.DOC_RIGA_ID | 1 | 100.00 | |
| 1 | SIMPLE | FATT_FOR_RIGHE | eq_ref | PRIMARY | PRIMARY | 92 | FATT_CLI_RIGHE_LOTTI.COD_LOT | 1 | 100.00 | |
+----+-------------+----------------------+--------+-----------------------------------------------------------------------------+----------------------------------+---------+--------------------------------------------+------+----------+----------------------------------------------+
The following is the result of show status like 'Handler%' excatly after the query been executed
Handler_commit, 2
Handler_delete, 0
Handler_discover, 0
Handler_prepare, 0
Handler_read_first, 0
Handler_read_key, 421001
Handler_read_last, 0
Handler_read_next, 240344
Handler_read_prev, 0
Handler_read_rnd, 30
Handler_read_rnd_next, 2412
Handler_rollback, 0
Handler_savepoint, 0
Handler_savepoint_rollback, 0
Handler_update, 31846
Handler_write, 2409
Database structure: https://gist.github.com/moiseevigor/4988fc8868f92643c9fb
EDIT 1
After creation of index
ALTER TABLE `TCross5_NP`.`ORD_CLI`
ADD INDEX `ORD_CLI_MULTI` (`COD_BUSN_UN` ASC, `DATA_DOC` ASC, `DOC_ID` ASC) ;
The execution time gone down 2 times, but still hits the ORD_CLI_MULTI index
First, (and has helped in many other similar queries where you appear to be dealing with a lot of "lookup" secondary table references), change start of query to
SELECT STRAIGHT_JOIN
Which directs the engine to run the query in the exact order you have listed. This will prevent it from trying to use a lookup table as a primary consideration and trying to work backwords or end-around to get the data. Sometimes works well, other times (rarely in my experience), hinders performance.
Next, since you are looking for an " AND OFF_CLI_RIGHE.DOC_RIGA_ID IS NOT NULL", I would change your LEFT JOINs to INNER JOIN when joining to.
INNER JOIN ORD_CLI_RIGHE_SPEC
ON ORD_CLI_RIGHE.DOC_RIGA_ID = ORD_CLI_RIGHE_SPEC.DOC_RIGA_ID
INNER JOIN OFF_CLI_RIGHE
ON ORD_CLI_RIGHE_SPEC.OFF_RIGA_ID = OFF_CLI_RIGHE.DOC_RIGA_ID
and thus eliminate the "AND ... is not null" in the WHERE clause.
Finally, I would have an index that is multiple parts that can be optimized
FOR the query...
CREATE index MultipleParts on ORD_CLI ( COD_BUSN_UN, DATA_DOC, DOC_ID );
The multipart index will help the WHERE, GROUP BY AND ORDER BY of the query.