I have this code:
select transcript_features.*,
transcript_features_blast.hit, transcript_features_blast.evalue,
transcript_features_blast.swissprot_version,
transcript_features_premirna_family.family, transcript_features_premirna_family.evalue,
transcript_features_premirna_family.rfam_version,
transcript_features_premirna_homology.hit, transcript_features_premirna_homology.evalue,
transcript_features_premirna_homology.mirbase_version,
transcript_features_premirna.premirna,
transcript_features_transposons.hit, transcript_features_transposons.dfam_version,
confidence.confidence, expression.expression, expression.tissue, expression.conditions
from transcript_features_premirna, gene_transcripts, transcript_features_transposons, confidence, transcript_features
left join transcript_features_blast on transcript_features_blast.transcript_alias=transcript_features.transcript_alias
left join transcript_features_premirna_family on transcript_features_premirna_family.transcript_alias=transcript_features.transcript_alias
left join transcript_features_premirna_homology on transcript_features_premirna_homology.transcript_alias=transcript_features.transcript_alias
left join expression on expression.transcript_alias=transcript_features.transcript_alias
where transcript_features.transcript_alias=gene_transcripts.transcript_alias and gene_transcripts.gene_alias="AT1G19392" and transcript_features_premirna.transcript_alias=transcript_features.transcript_alias and transcript_features_transposons.transcript_alias=transcript_features.transcript_alias and confidence.transcript_alias=transcript_features.transcript_alias;
This code works fine. Nevertheless, when I introduce an additional left join:
select transcript_features.*,
transcript_features_blast.hit, transcript_features_blast.evalue,
transcript_features_blast.swissprot_version,
transcript_features_premirna_family.family, transcript_features_premirna_family.evalue,
transcript_features_premirna_family.rfam_version,
transcript_features_premirna_homology.hit, transcript_features_premirna_homology.evalue,
transcript_features_premirna_homology.mirbase_version,
transcript_features_premirna.premirna,
transcript_features_transposons.hit, transcript_features_transposons.dfam_version,
confidence.confidence, expression.expression, expression.tissue, expression.conditions
from transcript_features_premirna, gene_transcripts, transcript_features_transposons, confidence, transcript_features
left join transcript_features_blast on transcript_features_blast.transcript_alias=transcript_features.transcript_alias
left join transcript_features_premirna_family on transcript_features_premirna_family.transcript_alias=transcript_features.transcript_alias
left join transcript_features_premirna_homology on transcript_features_premirna_homology.transcript_alias=transcript_features.transcript_alias
left join expression on expression.transcript_alias=transcript_features.transcript_alias
left join transcript_names on transcript_names.transcript_alias=transcript_features.transcript_alias
where transcript_features.transcript_alias=gene_transcripts.transcript_alias and gene_transcripts.gene_alias="AT1G19392" and transcript_features_premirna.transcript_alias=transcript_features.transcript_alias and transcript_features_transposons.transcript_alias=transcript_features.transcript_alias and confidence.transcript_alias=transcript_features.transcript_alias;
Takes an eternity, it does not finish, ever. Why? If I delete a random left join, it works nicely, and I get results in less than 0.30 seconds. Maybe, too many left joins?
The output of using explain:
+----+-------------+---------------------------------------+--------+---------------+---------+---------+---------------------------------------------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------------------------------+--------+---------------+---------+---------+---------------------------------------------+-------+-------------+
| 1 | SIMPLE | confidence | ALL | PRIMARY | NULL | NULL | NULL | 75858 | |
| 1 | SIMPLE | transcript_features | eq_ref | PRIMARY | PRIMARY | 42 | GreeNC.confidence.transcript_alias | 1 | |
| 1 | SIMPLE | transcript_features_blast | eq_ref | PRIMARY | PRIMARY | 42 | GreeNC.confidence.transcript_alias | 1 | |
| 1 | SIMPLE | transcript_features_premirna_family | eq_ref | PRIMARY | PRIMARY | 42 | GreeNC.confidence.transcript_alias | 1 | |
| 1 | SIMPLE | transcript_features_premirna_homology | ALL | NULL | NULL | NULL | NULL | 9530 | |
| 1 | SIMPLE | expression | ALL | NULL | NULL | NULL | NULL | 75844 | |
| 1 | SIMPLE | transcript_names | eq_ref | PRIMARY | PRIMARY | 42 | GreeNC.transcript_features.transcript_alias | 1 | Using index |
| 1 | SIMPLE | transcript_features_transposons | eq_ref | PRIMARY | PRIMARY | 42 | GreeNC.transcript_features.transcript_alias | 1 | Using where |
| 1 | SIMPLE | transcript_features_premirna | eq_ref | PRIMARY | PRIMARY | 42 | GreeNC.transcript_features.transcript_alias | 1 | Using where |
| 1 | SIMPLE | gene_transcripts | eq_ref | PRIMARY | PRIMARY | 42 | GreeNC.transcript_features.transcript_alias | 1 | Using where |
+----+-------------+---------------------------------------+--------+---------------+---------+---------+---------------------------------------------+-------+-------------+
A more proper way of writing this query might be as follows:
SELECT f.*
, b.hit b_hit
, b.evalue b_evalue
, b.swissprot_version
, pf.family
, pf.evalue pf_evalue
, pf.rfam_version
, ph.hit ph_hit
, ph.evalue ph_value
, ph.mirbase_version
, p.premirna
, t.hit t_hit
, t.dfam_version
, c.confidence
, e.expression
, e.tissue
, e.conditions
FROM transcript_features f
JOIN gene_transcripts gt
ON gt.transcript_alias = f.transcript_alias
JOIN transcript_features_premirna p
ON p.transcript_alias = f.transcript_alias
JOIN transcript_features_transposons t
ON t.transcript_alias = f.transcript_alias
JOIN confidence c
ON c.transcript_alias = f.transcript_alias
LEFT
JOIN transcript_features_blast b
ON b.transcript_alias = f.transcript_alias
LEFT
JOIN transcript_features_premirna_family pf
ON pf.transcript_alias = f.transcript_alias
LEFT
JOIN transcript_features_premirna_homology ph
ON ph.transcript_alias = f.transcript_alias
LEFT
JOIN expression e
ON e.transcript_alias = f.transcript_alias
WHERE gt.gene_alias = "AT1G19392";
Related
In attempting to pull a large series of columns (~15-20) from several joined tables, I put together 2 views that would pull the necessary information. In my local DB (only ~1k posts rows), joining these views worked fine, however; when I created those same views on our production DB (~30k posts rows) and attempted to join the view, I realized that that solution wouldn't scale beyond a test dataset.
I attempted to migrate those 2 views (categories data—like categories.title—and creators' data—like users.display_name) into a CTE post_data which, in theory, would act as a keyed version of those views, and allow me to get all post data for the eligible posts.
I have put together a sample DBFiddle with some test data to explain the table structure. The actual data has many more columns, but this is representative of the joins necessary to build the query.
table : posts
+-----+-----------+------------+------------------------------------------+----------------------------------------+
| id | parent_id | created_by | message | attachments |
+-----+-----------+------------+------------------------------------------+----------------------------------------+
| 8 | NULL | 8 | laptop for sale | [{"media_id": 1380}] |
| 9 | NULL | 4 | NEW lamp shade up for grabs | [{"media_id": 1442}, {"link_id": 103}] |
| 10 | 1 | 7 | Oooh I could be interested | |
| 11 | 1 | 7 | DMing you now! I've been looking for one | |
+-----+-----------+------------+------------------------------------------+----------------------------------------+
table : users
+----+------------------+---------------------------+
| id | display_name | created_at |
+----+------------------+---------------------------+
| 1 | John Appleseed | 2018-02-20T00:00:00+00:00 |
| 2 | Massimo Jenkins | 2018-05-14T00:00:00+00:00 |
| 3 | Johanna Marionna | 2018-06-05T00:00:00+00:00 |
| 4 | Jackson Creek | 2018-11-15T00:00:00+00:00 |
| 5 | Joe Schmoe | 2019-01-09T00:00:00+00:00 |
| 6 | John Johnson | 2019-02-14T00:00:00+00:00 |
| 7 | Donna Madison | 2019-05-14T00:00:00+00:00 |
| 8 | Jenna Kaplan | 2019-06-23T00:00:00+00:00 |
+----+------------------+---------------------------+
table : categories
+----+------------+------------+-------------------------------------------------------+
| id | created_by | title | description |
+----+------------+------------+-------------------------------------------------------+
| 1 | 2 | Technology | Anything tech; Consumer, business or education tools! |
| 2 | 2 | Home Goods | Anything for the home |
+----+------------+------------+-------------------------------------------------------+
table : categories_posts
+---------+-------------+
| post_id | category_id |
+---------+-------------+
| 8 | 1 |
| 9 | 1 |
| 10 | 1 |
| 11 | 1 |
+---------+-------------+
table : users_categories
+---------+-------------+
| user_id | category_id |
+---------+-------------+
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 4 | 1 |
+---------+-------------+
table : posts_removed
+---------+----------------------+------------+
| post_id | removed_at | removed_by |
+---------+----------------------+------------+
| 10 | 2019-01-22 09:08:14 | 7 |
+---------+----------------------+------------+
In the below query, eligible posts are determined in the base SELECT; then, the post_data CTE is joined to the result set (limited to 25 rows) and all columns from the CTE are returned.
WITH post_data AS (
SELECT posts.id,
posts.parent_id,
posts.created_by,
posts.attachments,
categories_posts.category_id,
categories.title,
categories.created_by AS category_created_by,
creator.display_name AS creator_display_name,
creator.created_at AS creator_created_at
/* ... And a whole bunch of other fields from posts, categories_posts, users */
FROM posts
LEFT OUTER JOIN categories_posts
ON categories_posts.post_id = posts.id
LEFT OUTER JOIN categories
ON categories.id = categories_posts.category_id
LEFT OUTER JOIN users creator
ON creator.id = posts.created_by
/* ... And a whole bunch of other joins to facilitate the selected fields */
)
SELECT post_data.*
FROM posts
/* Set up the criteria for the posts selected before getting their data from the CTE */
LEFT OUTER JOIN posts_removed removed ON removed.post_id = posts.id
LEFT OUTER JOIN users user_me ON user_me.id = "1"
LEFT OUTER JOIN users_followed ON users_followed.user_id = posts.created_by
AND users_followed.followed_by = user_me.id
LEFT OUTER JOIN categories_posts ON categories_posts.post_id = posts.id
LEFT OUTER JOIN users_categories ON users_categories.category_id = categories_posts.category_id
LEFT OUTER JOIN posts_removed pp_removed ON pp_removed.post_id = posts.parent_id
/* Join our post_data on the post's ID */
JOIN post_data ON post_data.id = posts.id
WHERE
(
(
users_categories.user_id = user_me.id AND users_categories.left_at IS NULL
) OR categories_posts.category_id IS NULL
) AND (
posts.created_by = user_me.id
OR users_followed.followed_by = user_me.id
OR categories_posts.category_id IS NOT NULL
) AND removed.removed_at IS NULL
AND pp_removed.removed_at IS NULL
AND (post_data.id = posts.id OR post_data.id = posts.parent_id)
ORDER BY posts.id DESC
LIMIT 25
In theory, I thought this would work by selecting the rows based on the base select criteria, then doing an index scan for the CTE based on the Post ID; however, it seems that the query optimizer chooses instead to do a full table scan of the posts table.
The EXPLAIN SELECT gave me this information:
+----+-------------+------------------------+--------+-------------------------------+-------------+---------+---------------------------------------------+--------+----------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | extra |
+----+-------------+------------------------+--------+-------------------------------+-------------+---------+---------------------------------------------+--------+----------+----------------------------------------------------+
| 1 | PRIMARY | posts | ALL | PRIMARY,parent_id,created_by | | | | 33870 | 100 | Using temporary; Using filesort |
| 1 | PRIMARY | removed | eq_ref | PRIMARY | PRIMARY | 8 | posts.id | 1 | 19 | Using where |
| 1 | PRIMARY | user_me | const | PRIMARY | PRIMARY | 8 | const | 1 | 100 | Using where; Using index |
| 1 | PRIMARY | categories_posts | eq_ref | PRIMARY | PRIMARY | 8 | api.posts.id | 1 | 100 | |
| 1 | PRIMARY | categories | eq_ref | PRIMARY | PRIMARY | 8 | categories_posts.category_id | 1 | 100 | Using index |
| 1 | PRIMARY | users_categories | eq_ref | user_id_2,user_id,category_id | user_id_2 | 16 | user_me.id,api.categories_posts.category_id | 1 | 100 | Using where |
| 1 | PRIMARY | users_followed | eq_ref | user_id,followed_by | user_id | 16 | posts.created_by,api.user_me.id | 1 | 100 | Using where; Using index |
| 1 | PRIMARY | pp_removed | eq_ref | PRIMARY | PRIMARY | 8 | api.posts.parent_id | 1 | 19 | Using where |
| 1 | PRIMARY | <derived2> | ALL | | | | | 493911 | 19 | Using where; Using join buffer (Block Nested Loop) |
| 2 | DERIVED | posts | ALL | | | | | 33870 | 100 | Using temporary |
| 2 | DERIVED | categories_posts | eq_ref | PRIMARY | PRIMARY | 8 | api.posts.id | 1 | 100 | |
| 2 | DERIVED | categories | eq_ref | PRIMARY | PRIMARY | 8 | api.categories_posts.category_id | 1 | 100 | |
| 2 | DERIVED | posts_votes | ref | post_id | post_id | 8 | api.posts.id | 1 | 100 | Using index |
| 2 | DERIVED | pp | eq_ref | PRIMARY | PRIMARY | 8 | api.posts.parent_id | 1 | 100 | |
| 2 | DERIVED | pp_removed | eq_ref | PRIMARY | PRIMARY | 8 | api.pp.id | 1 | 100 | Using index |
| 2 | DERIVED | removed | eq_ref | PRIMARY | PRIMARY | 8 | api.posts.id | 1 | 100 | Using index |
| 2 | DERIVED | creator | eq_ref | PRIMARY | PRIMARY | 8 | api.posts.created_by | 1 | 100 | |
| 2 | DERIVED | usernames | ref | user_id | user_id | 8 | api.creator.id | 1 | 100 | |
| 2 | DERIVED | verifications | ALL | | | | | 4 | 100 | Using where; Using join buffer (Block Nested Loop) |
| 2 | DERIVED | categories_identifiers | ref | category_id | category_id | 8 | api.categories.id | 1 | 100 | |
+----+-------------+------------------------+--------+-------------------------------+-------------+---------+---------------------------------------------+--------+----------+----------------------------------------------------+
Beyond this, I tried refactoring my query to try and force key usage in the posts table, such as using FORCE INDEX(PRIMARY) in the select, and moving the CTE be the base query and adding a filter WHERE id IN ({the original base query}), but it seems the optimizer still does a full table scan.
In case it's helpful to decode what's happening in the query plan:
At time of writing, there are 33,387 posts rows, but the query plan shows
The query plan shows a full table scan which returns 33,870 rows
The query plan also shows the derived table (<derived2>) as having 493,911 rows
My core questions are:
Am I correct when I say that subqueries should only be executed once per result row from the base select query? If so, then the CTE should also use the JOIN on posts.id and likely use the table index?
Why does the query plan show that it selects 33,870 rows when there are only 33,387? And where do the 493,911 rows come from?
How do you prevent a full table scan in this case?
Give this a try... Do the LIMIT 25 before JOINing to the WITH:
SELECT * FROM
( SELECT ... FROM posts
JOIN categories_posts ...
ORDER BY posts.id DESC
LIMIT 25 ) AS x
JOIN post_data
ON post_data.id IN (x.id, x.parent_id)
ORDER BY posts.id DESC
This is how far i have gotten, though i don't think it can be done with 1 SQL-statement, i just want to confirm whether or not it is possible to do this with ONLY 1 statement:
SELECT * FROM users
INNER JOIN users_mentors ON users_mentors.id=users.mentoruser_id
INNER JOIN mentor_types ON (mentor_types.id=users_mentors.mentor_type OR users_mentors.mentor_type IS NULL)
INNER JOIN mentor_geographies ON mentor_geographies.mentor_id=users_mentors.id
INNER JOIN communes ON communes.id=mentor_geographies.commune_id
LIMIT 0,10
users table with foreignkey to users_mentors:
+------+---------+---------------+
| id | user_id | mentoruser_id |
+------+---------+---------------+
| 1886 | NULL | 4 |
| 1885 | NULL | NULL |
| 1884 | NULL | NULL |
| 1883 | NULL | NULL |
| 1882 | NULL | NULL |
+------+---------+---------------+
users_mentors table (in a many-to-many relationship with communes):
+----+-------------+
| id | mentor_type |
+----+-------------+
| 4 | NULL |
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
+----+-------------+
communes table (in a many-to-many relationship with users_mentors):
+----+--------------+-------+----------+
| id | name | short | contract |
+----+--------------+-------+----------+
| 1 | København | NULL | 0 |
| 2 | Aarhus | NULL | 0 |
| 3 | Aalborg | NULL | 0 |
| 4 | Odense | NULL | 0 |
| 5 | Esbjerg | NULL | 0 |
+----+--------------+-------+----------+
mentor_geographies table (the m2m table that has FK to communes & users_mentors):
+----+-----------+------------+
| id | mentor_id | commune_id |
+----+-----------+------------+
| 1 | 4 | 1 |
| 2 | 4 | 2 |
+----+-----------+------------+
Is it possible to get all rows from users_mentors and a list of all their commune.type's, IF THEY EXIST (if mentor_geographies is empty, i want empty list of commune.type). In all cases i want the user.
If you want all users, use left join:
SELECT *
FROM users LEFT JOIN
users_mentors
ON users_mentors.id = users.mentoruser_id LEFT JOIN
mentor_types
ON mentor_types.id=users_mentors.mentor_type OR
users_mentors.mentor_type IS NULL LEFT JOIN
mentor_geographies
ON mentor_geographies.mentor_id = users_mentors.id LEFT JOIN
communes
ON communes.id = mentor_geographies.commune_id
LIMIT 0, 10;
I would also recommend that you use table aliases. They make the queries easier to write and to read:
SELECT *
FROM users u LEFT JOIN
users_mentors um
ON um.id = u.mentoruser_id LEFT JOIN
mentor_types mt
ON mt.id = um.mentor_type OR
um.mentor_type IS NULL LEFT JOIN
mentor_geographies mg
ON mg.mentor_id = um.id LEFT JOIN
communes c
ON c.id = mg.commune_id
LIMIT 0, 10
Thanks for your kind words and wisdom in helping me optimize the following query as I would like to use the "answers" table only once. As well, if there is any explanation accompanying it, I would be delighted.
SELECT
score.user_id,
name_email.NAME,
score.question_id,
TRUNCATE(GREATEST(0, SUM(answers.is_correct / IF(answers.is_correct = 1, possible_answers.good_answers, possible_answers.wrong_answers))) ,2) AS score
FROM score
INNER JOIN (
SELECT
id,
email,
CONCAT(users.f_name, " ", users.l_name) as NAME
FROM
users
)AS name_email
INNER JOIN answers
INNER JOIN (
SELECT
answers.question_id,
count(*) AS total_answers,
SUM(IF(is_correct = 1, 1, 0)) AS good_answers,
SUM(IF(is_correct = -1, 1, 0)) AS wrong_answers
FROM answers
GROUP BY answers.question_id
) AS possible_answers
WHERE score.answer_id = answers.id
AND possible_answers.question_id = score.question_id
AND score.user_id = name_email.id
-- AND score.user_id = 2
-- AND score.question_id = 1007
GROUP BY score.question_id, score.user_id
ORDER BY score.user_id, score.question_id
;
Note all queries (including your original) are improved by adding a composite index on the scores table, like so:
ALTER TABLE `score` ADD INDEX `score_comp_uaq` (`user_id`, `answer_id`, `question_id`);
Your final score formula refers to answers in 2 different contexts:
the possible answers, and
the actual answers
So, one way to avoid using the answers table twice is to use an outer join so we can access both the possible and the actual in one pass. However in this approach there is a problem of knowing which questions a user has attempted so while it reduces the passes over the answers table it requires 2 passes of the scores table.
SELECT
d.user_id
, CONCAT(users.f_name, " ", users.l_name) as NAME
, d.question_id
, d.good_answers
, d.wrong_answers
, d.possible_good
, d.possible_wrong
, COALESCE((d.good_answers / NULLIF(d.possible_good,0)),0) x
, COALESCE((d.wrong_answers / NULLIF(d.possible_wrong,0)),0) y
FROM (
SELECT
u.user_id
, a.question_id
, SUM(case when s.user_id IS NOT NULL AND a.is_correct = 1 then 1 else 0 end) AS good_answers
, SUM(case when s.user_id IS NOT NULL AND a.is_correct =-1 then 1 else 0 end) AS wrong_answers
, SUM(case when a.is_correct = 1 then 1.0 else 0.0 end) AS possible_good
, SUM(case when a.is_correct =-1 then 1.0 else 0.0 end) AS possible_wrong
from answers a
INNER join (select distinct user_id, question_id from score) u
ON a.question_id = u.question_id
left join score s ON u.user_id = s.user_id
AND a.question_id = s.question_id
AND a.id = s.answer_id
GROUP BY
u.user_id
, a.question_id
) d
INNER JOIN users ON d.user_id = users.id
;
While that may be an option I believe the next query offers a better explain plan:
SELECT
s.user_id
, CONCAT(users.f_name, " ", users.l_name) as NAME
, s.question_id
, s.good_answers
, s.wrong_answers
, pa.possible_good
, pa.possible_wrong
, COALESCE((s.good_answers / NULLIF(pa.possible_good,0)),0) x
, COALESCE((s.wrong_answers / NULLIF(pa.possible_wrong,0)),0) y
FROM (
select
score.user_id
, answers.question_id
, count(*) AS total_answers
, SUM(IF(is_correct = 1, 1, 0)) AS good_answers
, SUM(IF(is_correct = -1, 1, 0)) AS wrong_answers
from score
INNER JOIN answers ON score.answer_id = answers.id
GROUP BY
score.user_id
, answers.question_id
) s
INNER JOIN (
SELECT
answers.question_id
, count(*) AS total_answers
, SUM(IF(is_correct = 1, 1.0, 0.0)) AS possible_good
, SUM(IF(is_correct = -1, 1.0, 0.0)) AS possible_wrong
FROM answers
GROUP BY answers.question_id
) AS pa ON s.question_id = pa.question_id
INNER JOIN users ON s.user_id = users.id
;
Notice that I have not attempted to reproduce your final "score" formula. I feel that you may want to amend how you do that after seeing these options, but the columns aliased "x" and "y" should allow you to arrange a final formula.
This SQLfiddle allows you to inspect both the above and your original.
first query:
+----+-------------+-------+--------+------------------------------------------------------+----------------+---------+------------------+------+----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+--------+------------------------------------------------------+----------------+---------+------------------+------+----------+----------------------------------------------+
| 1 | PRIMARY | | ALL | | | | | 4 | 100.00 | |
| 1 | PRIMARY | users | eq_ref | PRIMARY | PRIMARY | 4 | d.user_id | 1 | 100.00 | |
| 2 | DERIVED | | ALL | | | | | 4 | 100.00 | Using where; Using temporary; Using filesort |
| 2 | DERIVED | a | ref | question_id | question_id | 4 | u.question_id | 1 | 100.00 | |
| 2 | DERIVED | s | ref | user_id,answer_choice,question_choice,score_comp_uaq | answer_choice | 5 | db_9_2522be.a.id | 1 | 100.00 | Using where |
| 3 | DERIVED | score | index | score_comp_uaq | score_comp_uaq | 14 | | 4 | 100.00 | Using index; Using temporary |
+----+-------------+-------+--------+------------------------------------------------------+----------------+---------+------------------+------+----------+----------------------------------------------+
second query:
+----+-------------+---------+--------+---------------+----------------+---------+-----------------------------+------+----------+-----------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+--------+---------------+----------------+---------+-----------------------------+------+----------+-----------------------------------------------------------+
| 1 | PRIMARY | | ALL | | | | | 4 | 100.00 | |
| 1 | PRIMARY | users | eq_ref | PRIMARY | PRIMARY | 4 | s.user_id | 1 | 100.00 | |
| 1 | PRIMARY | | ref | | | 4 | s.question_id | 2 | 100.00 | |
| 3 | DERIVED | answers | index | question_id | question_id | 4 | | 19 | 100.00 | |
| 2 | DERIVED | score | index | answer_choice | score_comp_uaq | 14 | | 4 | 100.00 | Using where; Using index; Using temporary; Using filesort |
| 2 | DERIVED | answers | eq_ref | PRIMARY | PRIMARY | 4 | db_9_2522be.score.answer_id | 1 | 100.00 | |
+----+-------------+---------+--------+---------------+----------------+---------+-----------------------------+------+----------+-----------------------------------------------------------+
original query:
+----+-------------+---------+--------+------------------------------------------------------+----------------+---------+-------------------------------+------+----------+-----------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+--------+------------------------------------------------------+----------------+---------+-------------------------------+------+----------+-----------------------------------------------------------+
| 1 | PRIMARY | score | index | user_id,answer_choice,question_choice,score_comp_uaq | score_comp_uaq | 14 | | 4 | 100.00 | Using where; Using index; Using temporary; Using filesort |
| 1 | PRIMARY | answers | eq_ref | PRIMARY | PRIMARY | 4 | db_9_2522be.score.answer_id | 1 | 100.00 | |
| 1 | PRIMARY | | ref | | | 4 | db_9_2522be.score.user_id | 2 | 100.00 | |
| 1 | PRIMARY | | ref | | | 4 | db_9_2522be.score.question_id | 2 | 100.00 | |
| 3 | DERIVED | answers | index | question_id | question_id | 4 | | 19 | 100.00 | |
| 2 | DERIVED | users | ALL | | | | | 9 | 100.00 | |
+----+-------------+---------+--------+------------------------------------------------------+----------------+---------+-------------------------------+------+----------+-----------------------------------------------------------+
The query works. With 10.000 procuts, it takes 11 seconds. If I don't use ORDER BY it takes only 1 sec. But I need ORDER BY.
Can we optimize it and how?
SELECT
u.urunID,
i.urunadi,
u.seo,
u.stok_kodu,
u.kstok_sayisi,
u.stok_sayisi,
u.goruntuleme,
(SELECT SUM(su.adet) FROM siparis_urunler su LEFT JOIN siparis s ON s.siparisID = su.siparisID WHERE s.durum_id NOT IN (26, 24) AND su.urunID = u.urunID) AS sadet
FROM
urunler u
INNER JOIN urun_isim i ON u.urunID = i.urunID
WHERE
u.stok_sayisi <= u.kstok_sayisi
AND u.durum = 1
GROUP BY
u.urunID
ORDER BY
sadet DESC
LIMIT 0, 20
EXPLAIN:
+----+--------------------+-------+--------+---------------------------------+-----------+---------+-----------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+--------+---------------------------------+-----------+---------+-----------------------------+------+----------------------------------------------+
| 1 | PRIMARY | i | index | PRIMARY,urunadi2 | urunadi | 768 | NULL | 4997 | Using index; Using temporary; Using filesort |
| 1 | PRIMARY | u | eq_ref | PRIMARY,urunID,urunler,urunler2 | PRIMARY | 4 | katalog_db.i.urunID | 1 | Using where |
| 3 | DEPENDENT SUBQUERY | sp | ALL | NULL | NULL | NULL | NULL | 11 | Using where |
| 2 | DEPENDENT SUBQUERY | s | ALL | PRIMARY,siparis | NULL | NULL | NULL | 805 | Using where |
| 2 | DEPENDENT SUBQUERY | su | ref | surunler2 | surunler2 | 10 | katalog_db.s.siparisID,func | 1 | Using where |
+----+--------------------+-------+--------+---------------------------------+-----------+---------+-----------------------------+------+----------------------------------------------+
Does this run any faster?
SELECT
u.urunID,
i.urunadi,
u.seo,
u.stok_kodu,
u.kstok_sayisi,
u.stok_sayisi,
u.goruntuleme,
SUM(su.adet) AS sadet
FROM
urunler u
INNER JOIN urun_isim i ON u.urunID = i.urunID
INNER JOIN siparis_urunler su ON su.urunID = u.urunID
LEFT JOIN siparis s ON s.siparisID = su.siparisID
WHERE
u.stok_sayisi <= u.kstok_sayisi
AND s.durum_id NOT IN (26, 24)
AND u.durum = 1
GROUP BY
u.urunID,
i.urunadi,
u.seo,
u.stok_kodu,
u.kstok_sayisi,
u.stok_sayisi,
u.goruntuleme
ORDER BY 8 DESC
could you please help me with a monster.
Do you see any issue with this one?
Would like to reach the execution time below the second, is it possible?
Please ask for any other data you may need to understand the structure of DB. Any tips&tricks are welcome!
SELECT
ORD_CLI.COD_AGE,
ORD_CLI_RIGHE.DOC_ID,
OFF_CLI.off_cli_id,
ORD_CLI_RIGHE.DOC_RIGA_ID,
ORD_CLI_RIGHE.COD_ART,
ART_PESO.PESO_ART,
ORD_CLI.ANNO_DOC,
ORD_CLI.NUM_DOC,
ORD_CLI.SERIE_DOC,
ORD_CLI.DATA_DOC,
CF.RAG_SOC_CF,
AGENTI.NOME_AGE,
ORD_CLI.COD_CF,
ORD_CLI.COD_IVA,
ORD_CLI.COD_DEP,
ORD_CLI_TOT.IMPONIBILE_V1 AS IMPONIBILE_ORDINE,
FATT_CLI_TOT.IMPONIBILE_V1 AS IMPONIBILE_FATTURA,
ORD_CLI_TOT.IVA_V1,
SUM(ART_PESO.PESO_ART) AS weight,
SUM(FATT_CLI_RIGHE.QUANT_RIGA) AS quantity,
SUM(FATT_CLI_RIGHE.QUANT_RIGA*FATT_CLI_RIGHE.PREZZO_LORDO_VU1) AS sell_price,
SUM(FATT_CLI_RIGHE.QUANT_RIGA*DDT_FOR_RIGHE.PREZZO_LORDO_VU1) AS acqisition_price1,
SUM(FATT_CLI_RIGHE.QUANT_RIGA*FATT_FOR_RIGHE.PREZZO_LORDO_VU1) AS acqisition_price2,
SUM(FATT_CLI_RIGHE.QUANT_RIGA*FATT_CLI_RIGHE_PROVV.IMPORTO_PROVV_VU1) AS agent_reward,
SUM(FATT_CLI_RIGHE.QUANT_RIGA*ART_PESO.PESO_ART * 0.13) AS transport_price,
SUM(FATT_CLI_RIGHE.QUANT_RIGA*(
FATT_CLI_RIGHE.PREZZO_LORDO_VU1
- COALESCE(DDT_FOR_RIGHE.PREZZO_LORDO_VU1, 0)
- COALESCE(FATT_FOR_RIGHE.PREZZO_LORDO_VU1, 0)
- COALESCE(FATT_CLI_RIGHE_PROVV.IMPORTO_PROVV_VU1, 0)
- COALESCE(ART_PESO.PESO_ART, 0) * 0.13
)) AS net_earning,
OFF_CLI.stima_prezzo_acquisto,
OFF_CLI.stima_prezzo_trasporto,
OFF_CLI.stima_provvigioni_agenti,
OFF_CLI.stima_utile
FROM ORD_CLI
INNER JOIN ORD_CLI_RIGHE
ON ORD_CLI_RIGHE.DOC_ID = ORD_CLI.DOC_ID
LEFT JOIN ORD_CLI_RIGHE_SPEC
ON ORD_CLI_RIGHE.DOC_RIGA_ID = ORD_CLI_RIGHE_SPEC.DOC_RIGA_ID
INNER JOIN ART_PESO
ON ART_PESO.COD_ART = ORD_CLI_RIGHE.COD_ART
INNER JOIN ORD_CLI_TOT
ON ORD_CLI.DOC_ID = ORD_CLI_TOT.DOC_ID
INNER JOIN AGENTI
ON AGENTI.COD_AGE = ORD_CLI.COD_AGE
INNER JOIN CF
ON CF.COD_CF = ORD_CLI.COD_CF
LEFT JOIN FATT_CLI_RIGHE_SPEC
ON ORD_CLI_RIGHE.DOC_RIGA_ID = FATT_CLI_RIGHE_SPEC.ORD_RIGA_ID
LEFT JOIN FATT_CLI_RIGHE
ON FATT_CLI_RIGHE.DOC_RIGA_ID = FATT_CLI_RIGHE_SPEC.DOC_RIGA_ID
LEFT JOIN FATT_CLI_TOT
ON FATT_CLI_RIGHE.DOC_ID = FATT_CLI_TOT.DOC_ID
LEFT JOIN FATT_CLI_RIGHE_PROVV
ON FATT_CLI_RIGHE_PROVV.DOC_RIGA_ID = FATT_CLI_RIGHE_SPEC.DOC_RIGA_ID
LEFT JOIN FATT_CLI_RIGHE_LOTTI
ON FATT_CLI_RIGHE_LOTTI.DOC_RIGA_ID = FATT_CLI_RIGHE_SPEC.DOC_RIGA_ID
LEFT JOIN DDT_FOR_RIGHE_LOTTI
ON DDT_FOR_RIGHE_LOTTI.COD_LOT = FATT_CLI_RIGHE_LOTTI.COD_LOT
LEFT JOIN DDT_FOR_RIGHE
ON DDT_FOR_RIGHE.DOC_RIGA_ID = DDT_FOR_RIGHE_LOTTI.DOC_RIGA_ID
LEFT JOIN FATT_FOR_RIGHE
ON FATT_FOR_RIGHE.DOC_RIGA_ID = FATT_CLI_RIGHE_LOTTI.COD_LOT
LEFT JOIN OFF_CLI_RIGHE
ON OFF_CLI_RIGHE.DOC_RIGA_ID = ORD_CLI_RIGHE_SPEC.OFF_RIGA_ID
LEFT JOIN OFF_CLI
ON OFF_CLI.DOC_ID = OFF_CLI_RIGHE.DOC_ID
WHERE
ORD_CLI.COD_BUSN_UN='P'
AND OFF_CLI_RIGHE.DOC_RIGA_ID IS NOT NULL
AND ORD_CLI.DATA_DOC >= '2012-11-29'
AND ORD_CLI.DATA_DOC <= '2013-02-28'
GROUP BY ORD_CLI.DOC_ID
ORDER BY ORD_CLI.DATA_DOC
DESC LIMIT 30 OFFSET 0
Time of execution
Showing rows 0 - 29 ( 30 total, Query took 6.3458 sec)
EXPLAIN of the query
+----+-------------+----------------------+--------+-----------------------------------------------------------------------------+----------------------------------+---------+--------------------------------------------+------+----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------------------+--------+-----------------------------------------------------------------------------+----------------------------------+---------+--------------------------------------------+------+----------+----------------------------------------------+
| 1 | SIMPLE | ORD_CLI | range | PRIMARY,ORD_CLI_DATA_DOC,ORD_CLI_COD_CF,ORD_CLI_COD_BUSN_UN,ORD_CLI_COD_AGE | ORD_CLI_DATA_DOC | 4 | NULL | 3728 | 100.00 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | AGENTI | eq_ref | PRIMARY | PRIMARY | 38 | ORD_CLI.COD_AGE | 1 | 100.00 | Using where |
| 1 | SIMPLE | CF | eq_ref | PRIMARY | PRIMARY | 38 | ORD_CLI.COD_CF | 1 | 100.00 | |
| 1 | SIMPLE | ORD_CLI_TOT | eq_ref | PRIMARY | PRIMARY | 62 | ORD_CLI.DOC_ID | 1 | 100.00 | |
| 1 | SIMPLE | ORD_CLI_RIGHE | ref | PRIMARY,ORD_CLI_RIGHE_DOC_ID,ORD_CLI_RIGHE_COD_ART | ORD_CLI_RIGHE_DOC_ID | 62 | ORD_CLI_TOT.DOC_ID | 2 | 100.00 | Using where |
| 1 | SIMPLE | ART_PESO | eq_ref | PRIMARY | PRIMARY | 92 | ORD_CLI_RIGHE.COD_ART | 1 | 100.00 | |
| 1 | SIMPLE | ORD_CLI_RIGHE_SPEC | eq_ref | PRIMARY,ORD_CLI_RIGHE_SPEC_OFF_RIGA_ID | PRIMARY | 92 | ORD_CLI_RIGHE.DOC_RIGA_ID | 1 | 100.00 | Using where |
| 1 | SIMPLE | OFF_CLI_RIGHE | ref | DOC_RIGA_ID | DOC_RIGA_ID | 92 | ORD_CLI_RIGHE_SPEC.OFF_RIGA_ID | 1 | 100.00 | Using where |
| 1 | SIMPLE | OFF_CLI | ref | DOC_ID | DOC_ID | 63 | OFF_CLI_RIGHE.DOC_ID | 1 | 100.00 | |
| 1 | SIMPLE | FATT_CLI_RIGHE_SPEC | ref | FATT_CLI_RIGHE_SPEC_ORD_RIGA_ID | FATT_CLI_RIGHE_SPEC_ORD_RIGA_ID | 93 | ORD_CLI_RIGHE.DOC_RIGA_ID | 1 | 100.00 | Using index |
| 1 | SIMPLE | FATT_CLI_RIGHE | eq_ref | PRIMARY | PRIMARY | 92 | FATT_CLI_RIGHE_SPEC.DOC_RIGA_ID | 1 | 100.00 | |
| 1 | SIMPLE | FATT_CLI_TOT | eq_ref | PRIMARY | PRIMARY | 62 | FATT_CLI_RIGHE.DOC_ID | 1 | 100.00 | |
| 1 | SIMPLE | FATT_CLI_RIGHE_PROVV | ref | FATT_CLI_RIGHE_PROVV_DOC_RIGA_ID | FATT_CLI_RIGHE_PROVV_DOC_RIGA_ID | 92 | FATT_CLI_RIGHE_SPEC.DOC_RIGA_ID | 1 | 100.00 | |
| 1 | SIMPLE | FATT_CLI_RIGHE_LOTTI | ref | FATT_CLI_RIGHE_LOTTI_DOC_RIGA_ID | FATT_CLI_RIGHE_LOTTI_DOC_RIGA_ID | 92 | FATT_CLI_RIGHE_SPEC.DOC_RIGA_ID | 1 | 100.00 | |
| 1 | SIMPLE | DDT_FOR_RIGHE_LOTTI | ref | DDT_FOR_RIGHE_LOTTI_COD_LOT | DDT_FOR_RIGHE_LOTTI_COD_LOT | 92 | FATT_CLI_RIGHE_LOTTI.COD_LOT | 1 | 100.00 | |
| 1 | SIMPLE | DDT_FOR_RIGHE | eq_ref | PRIMARY | PRIMARY | 92 | DDT_FOR_RIGHE_LOTTI.DOC_RIGA_ID | 1 | 100.00 | |
| 1 | SIMPLE | FATT_FOR_RIGHE | eq_ref | PRIMARY | PRIMARY | 92 | FATT_CLI_RIGHE_LOTTI.COD_LOT | 1 | 100.00 | |
+----+-------------+----------------------+--------+-----------------------------------------------------------------------------+----------------------------------+---------+--------------------------------------------+------+----------+----------------------------------------------+
The following is the result of show status like 'Handler%' excatly after the query been executed
Handler_commit, 2
Handler_delete, 0
Handler_discover, 0
Handler_prepare, 0
Handler_read_first, 0
Handler_read_key, 421001
Handler_read_last, 0
Handler_read_next, 240344
Handler_read_prev, 0
Handler_read_rnd, 30
Handler_read_rnd_next, 2412
Handler_rollback, 0
Handler_savepoint, 0
Handler_savepoint_rollback, 0
Handler_update, 31846
Handler_write, 2409
Database structure: https://gist.github.com/moiseevigor/4988fc8868f92643c9fb
EDIT 1
After creation of index
ALTER TABLE `TCross5_NP`.`ORD_CLI`
ADD INDEX `ORD_CLI_MULTI` (`COD_BUSN_UN` ASC, `DATA_DOC` ASC, `DOC_ID` ASC) ;
The execution time gone down 2 times, but still hits the ORD_CLI_MULTI index
First, (and has helped in many other similar queries where you appear to be dealing with a lot of "lookup" secondary table references), change start of query to
SELECT STRAIGHT_JOIN
Which directs the engine to run the query in the exact order you have listed. This will prevent it from trying to use a lookup table as a primary consideration and trying to work backwords or end-around to get the data. Sometimes works well, other times (rarely in my experience), hinders performance.
Next, since you are looking for an " AND OFF_CLI_RIGHE.DOC_RIGA_ID IS NOT NULL", I would change your LEFT JOINs to INNER JOIN when joining to.
INNER JOIN ORD_CLI_RIGHE_SPEC
ON ORD_CLI_RIGHE.DOC_RIGA_ID = ORD_CLI_RIGHE_SPEC.DOC_RIGA_ID
INNER JOIN OFF_CLI_RIGHE
ON ORD_CLI_RIGHE_SPEC.OFF_RIGA_ID = OFF_CLI_RIGHE.DOC_RIGA_ID
and thus eliminate the "AND ... is not null" in the WHERE clause.
Finally, I would have an index that is multiple parts that can be optimized
FOR the query...
CREATE index MultipleParts on ORD_CLI ( COD_BUSN_UN, DATA_DOC, DOC_ID );
The multipart index will help the WHERE, GROUP BY AND ORDER BY of the query.