I have a query that is doing what I want on a truncated dataset but when I run it on the full dataset (millions of rows) it takes forever to run.
I have two tables - microsat_table and coverage_table.
microsat_table:
+----+----------+-----------+---------+-------------------------------------------------+
| id | Seq_Name | SSR_Start | SSR_End | Sequence |
+----+----------+-----------+---------+-------------------------------------------------+
| 2 | chr2L | 11050 | 11067 | TTTAATTTAATTTAATTT |
| 3 | chr2L | 44173 | 44187 | TATGTATGTATGTAT |
| 5 | chr2L | 54431 | 54477 | ATAATAATATAATATAATATAATATAATATATAATAATATAATAATA |
| 6 | chr2L | 57571 | 57594 | ATATATATATATATATATATATAT |
| 7 | chr2L | 72439 | 72453 | CATACATACATACAT |
| 8 | chr2L | 74028 | 74042 | ATACATACATACATA |
| 9 | chr2L | 85573 | 85587 | ATTTTATTTTATTTT |
| 10 | chr2L | 92429 | 92443 | ACATACATACATACA |
| 11 | chr2L | 138132 | 138166 | TATATAGATATATAAATATATATATATATATATAT |
| 13 | chr2L | 162245 | 162259 | ATACATACATACATA |
+----+----------+-----------+---------+-------------------------------------------------+
coverage_table:
| Seq_Name | Start | Stop | Coverage |
+----------+-------+-------+----------+
| chr2L | 5716 | 5771 | 1 |
| chr2L | 8730 | 8824 | 1 |
| chr2L | 9894 | 9948 | 1 |
| chr2L | 19391 | 19491 | 1 |
| chr2L | 19575 | 19675 | 1 |
| chr2L | 19773 | 19776 | 1 |
| chr2L | 19776 | 19872 | 2 |
| chr2L | 21920 | 21959 | 1 |
| chr2L | 21959 | 22020 | 2 |
| chr2L | 22020 | 22059 | 1 |
+----------+-------+-------+----------+
I want to add a column to the microsat_table which calculates the average coverage (from the coverage_table) over all rows where the Start and Stop values in the coverage table fall within the SSR_Start and SSR_End values in the microsat_table.
Example result:
+-----+----------+-----------+---------+--------------------------------+---------+
| id | Seq_Name | SSR_Start | SSR_End | Sequence | avg |
+-----+----------+-----------+---------+--------------------------------+---------+
| 53 | chr2L | 402489 | 402503 | AAAACAAAACAAAAC | 3.0000 |
| 64 | chr2L | 447214 | 447233 | CAGCAGCAGCAGCAGCAGCA | 8.0000 |
| 66 | chr2L | 457839 | 457868 | CAGCAGCAGCAACAGCAGCAGCAGGCAGCA | 2.0000 |
| 105 | chr2L | 579589 | 579603 | TCGAATCGAATCGAA | 11.0000 |
| 123 | chr2L | 628484 | 628501 | TAATGTTAATGTTAATGT | 6.0000 |
+-----+----------+-----------+---------+--------------------------------+---------+
My query is:
UPDATE microsat_table
JOIN
(SELECT m.id, SUM(p.Coverage)/count(p.Start)
AS avg FROM microsat_table m
LEFT OUTER JOIN coverage_table p
ON m.Seq_Name LIKE p.Seq_Name
WHERE m.Seq_Name LIKE p.Seq_Name GROUP BY m.id) AS qt
ON microsat_table.id = qt.id
SET microsat_table.avg = qt.avg;
Explain results for the truncated table:
+----+-------------+----------------------+------------+-------+---------------------------------------------------+-------------+---------+--------------------------------+--------+----------+----------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------------------+------------+-------+---------------------------------------------------+-------------+---------+--------------------------------+--------+----------+----------------------------------------------------+
| 1 | UPDATE | microsat_table_short | NULL | ALL | PRIMARY | NULL | NULL | NULL | 40356 | 100.00 | NULL |
| 1 | PRIMARY | <derived2> | NULL | ref | <auto_key0> | <auto_key0> | 4 | testdb.microsat_table_short.id | 1236 | 100.00 | NULL |
| 2 | DERIVED | m | NULL | index | PRIMARY,Sequence,Seq_Name,Motif,SSR_Start,SSR_End | Seq_Name | 53 | NULL | 40356 | 100.00 | Using index; Using temporary; Using filesort |
| 2 | DERIVED | p | NULL | ALL | NULL | NULL | NULL | NULL | 100163 | 1.23 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+----------------------+------------+-------+---------------------------------------------------+-------------+---------+--------------------------------+--------+----------+----------------------------------------------------+
I added indexes (including trying HASH and BTREE indexes) which sped it up considerably, but I've let it run for 1.5 days on the larger dataset and it still didn't finish.
Does anyone have any suggestions on how to make it run faster?
Thanks!!
There are a few relatively minor infelicities in your code. However the big problem is that while you say you want to calculate "the average coverage (from the coverage_table) over all rows where the Start and Stop values in the coverage table fall within the SSR_Start and SSR_End values in the microsat_table" you don't actually seem to limit the query to doing that. Instead you only coded a match on Seq_Name.
The code below attempts to fix that (I used >= and <= which may not be what you need) and the other more minor bits:
UPDATE microsat_table
JOIN
(
SELECT
m.id,
AVG(p.Coverage) AS avg -- MySQL has it's own average function
FROM
microsat_table m
INNER JOIN coverage_table p ON -- Change to INNER JOIN, your old WHERE clause had this effect anyway
m.Seq_Name = p.Seq_Name -- Use '=' not 'Like' when looking for an exact match
WHERE
p.Start >= m.SSR_Start -- This WHERE clause is the most important change
AND p.End <= m.SSR_End -- You omitted it in your version
GROUP BY
m.id) AS qt
ON microsat_table.id = qt.id
SET microsat_table.avg = qt.avg;
Maybe updating the table in 1 big transaction is simply too much for the system? (what is the size of the table you're updating?) You could try doing it in blocks. I'd also go for a simple sub-select here, seems easier to read IMHO.
Also take note of Steve Lovell's remark that your query doesn't seem to care about the start/stop columns. Since you probably forgot it by accident I've added it here too, removing it shouldn't be too difficult =)
DECLARE #min_id int,
#max_id int,
#blocksize int
SELECT #min_id = MIN(id),
#max_id = MAX(id),
#blocksize = 100000 -- adapt as needed
FROM microsat_table
WHILE #min_id <= #max_id
BEGIN
UPDATE microsat_table
SET microsat_table.avg = ((SELECT SUM(p.Coverage)/count(p.Start) AS avg
FROM microsat_table m
LEFT OUTER JOIN coverage_table p
ON m.Seq_Name LIKE p.Seq_Name -- if possble use '=' here instead of LIKE
AND p.Start >= m.SSR_Start -- flagrantly "stolen" from Steve Lovell's answer
AND p.End <= m.SSR_End
WHERE m.id = microsat_table.id)
-- limit update to this block:
WHERE microsat_table.id BETWEEN #min_id AND (#min_id + #blocksize - 1)
-- prepare for next block
SELECT #min_id = #min_id + #blocksize
END
You probably want the primary key on the id field of microsat_table and on the Seq_name + Start column of the coverage_table.
Related
In attempting to pull a large series of columns (~15-20) from several joined tables, I put together 2 views that would pull the necessary information. In my local DB (only ~1k posts rows), joining these views worked fine, however; when I created those same views on our production DB (~30k posts rows) and attempted to join the view, I realized that that solution wouldn't scale beyond a test dataset.
I attempted to migrate those 2 views (categories data—like categories.title—and creators' data—like users.display_name) into a CTE post_data which, in theory, would act as a keyed version of those views, and allow me to get all post data for the eligible posts.
I have put together a sample DBFiddle with some test data to explain the table structure. The actual data has many more columns, but this is representative of the joins necessary to build the query.
table : posts
+-----+-----------+------------+------------------------------------------+----------------------------------------+
| id | parent_id | created_by | message | attachments |
+-----+-----------+------------+------------------------------------------+----------------------------------------+
| 8 | NULL | 8 | laptop for sale | [{"media_id": 1380}] |
| 9 | NULL | 4 | NEW lamp shade up for grabs | [{"media_id": 1442}, {"link_id": 103}] |
| 10 | 1 | 7 | Oooh I could be interested | |
| 11 | 1 | 7 | DMing you now! I've been looking for one | |
+-----+-----------+------------+------------------------------------------+----------------------------------------+
table : users
+----+------------------+---------------------------+
| id | display_name | created_at |
+----+------------------+---------------------------+
| 1 | John Appleseed | 2018-02-20T00:00:00+00:00 |
| 2 | Massimo Jenkins | 2018-05-14T00:00:00+00:00 |
| 3 | Johanna Marionna | 2018-06-05T00:00:00+00:00 |
| 4 | Jackson Creek | 2018-11-15T00:00:00+00:00 |
| 5 | Joe Schmoe | 2019-01-09T00:00:00+00:00 |
| 6 | John Johnson | 2019-02-14T00:00:00+00:00 |
| 7 | Donna Madison | 2019-05-14T00:00:00+00:00 |
| 8 | Jenna Kaplan | 2019-06-23T00:00:00+00:00 |
+----+------------------+---------------------------+
table : categories
+----+------------+------------+-------------------------------------------------------+
| id | created_by | title | description |
+----+------------+------------+-------------------------------------------------------+
| 1 | 2 | Technology | Anything tech; Consumer, business or education tools! |
| 2 | 2 | Home Goods | Anything for the home |
+----+------------+------------+-------------------------------------------------------+
table : categories_posts
+---------+-------------+
| post_id | category_id |
+---------+-------------+
| 8 | 1 |
| 9 | 1 |
| 10 | 1 |
| 11 | 1 |
+---------+-------------+
table : users_categories
+---------+-------------+
| user_id | category_id |
+---------+-------------+
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 4 | 1 |
+---------+-------------+
table : posts_removed
+---------+----------------------+------------+
| post_id | removed_at | removed_by |
+---------+----------------------+------------+
| 10 | 2019-01-22 09:08:14 | 7 |
+---------+----------------------+------------+
In the below query, eligible posts are determined in the base SELECT; then, the post_data CTE is joined to the result set (limited to 25 rows) and all columns from the CTE are returned.
WITH post_data AS (
SELECT posts.id,
posts.parent_id,
posts.created_by,
posts.attachments,
categories_posts.category_id,
categories.title,
categories.created_by AS category_created_by,
creator.display_name AS creator_display_name,
creator.created_at AS creator_created_at
/* ... And a whole bunch of other fields from posts, categories_posts, users */
FROM posts
LEFT OUTER JOIN categories_posts
ON categories_posts.post_id = posts.id
LEFT OUTER JOIN categories
ON categories.id = categories_posts.category_id
LEFT OUTER JOIN users creator
ON creator.id = posts.created_by
/* ... And a whole bunch of other joins to facilitate the selected fields */
)
SELECT post_data.*
FROM posts
/* Set up the criteria for the posts selected before getting their data from the CTE */
LEFT OUTER JOIN posts_removed removed ON removed.post_id = posts.id
LEFT OUTER JOIN users user_me ON user_me.id = "1"
LEFT OUTER JOIN users_followed ON users_followed.user_id = posts.created_by
AND users_followed.followed_by = user_me.id
LEFT OUTER JOIN categories_posts ON categories_posts.post_id = posts.id
LEFT OUTER JOIN users_categories ON users_categories.category_id = categories_posts.category_id
LEFT OUTER JOIN posts_removed pp_removed ON pp_removed.post_id = posts.parent_id
/* Join our post_data on the post's ID */
JOIN post_data ON post_data.id = posts.id
WHERE
(
(
users_categories.user_id = user_me.id AND users_categories.left_at IS NULL
) OR categories_posts.category_id IS NULL
) AND (
posts.created_by = user_me.id
OR users_followed.followed_by = user_me.id
OR categories_posts.category_id IS NOT NULL
) AND removed.removed_at IS NULL
AND pp_removed.removed_at IS NULL
AND (post_data.id = posts.id OR post_data.id = posts.parent_id)
ORDER BY posts.id DESC
LIMIT 25
In theory, I thought this would work by selecting the rows based on the base select criteria, then doing an index scan for the CTE based on the Post ID; however, it seems that the query optimizer chooses instead to do a full table scan of the posts table.
The EXPLAIN SELECT gave me this information:
+----+-------------+------------------------+--------+-------------------------------+-------------+---------+---------------------------------------------+--------+----------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | extra |
+----+-------------+------------------------+--------+-------------------------------+-------------+---------+---------------------------------------------+--------+----------+----------------------------------------------------+
| 1 | PRIMARY | posts | ALL | PRIMARY,parent_id,created_by | | | | 33870 | 100 | Using temporary; Using filesort |
| 1 | PRIMARY | removed | eq_ref | PRIMARY | PRIMARY | 8 | posts.id | 1 | 19 | Using where |
| 1 | PRIMARY | user_me | const | PRIMARY | PRIMARY | 8 | const | 1 | 100 | Using where; Using index |
| 1 | PRIMARY | categories_posts | eq_ref | PRIMARY | PRIMARY | 8 | api.posts.id | 1 | 100 | |
| 1 | PRIMARY | categories | eq_ref | PRIMARY | PRIMARY | 8 | categories_posts.category_id | 1 | 100 | Using index |
| 1 | PRIMARY | users_categories | eq_ref | user_id_2,user_id,category_id | user_id_2 | 16 | user_me.id,api.categories_posts.category_id | 1 | 100 | Using where |
| 1 | PRIMARY | users_followed | eq_ref | user_id,followed_by | user_id | 16 | posts.created_by,api.user_me.id | 1 | 100 | Using where; Using index |
| 1 | PRIMARY | pp_removed | eq_ref | PRIMARY | PRIMARY | 8 | api.posts.parent_id | 1 | 19 | Using where |
| 1 | PRIMARY | <derived2> | ALL | | | | | 493911 | 19 | Using where; Using join buffer (Block Nested Loop) |
| 2 | DERIVED | posts | ALL | | | | | 33870 | 100 | Using temporary |
| 2 | DERIVED | categories_posts | eq_ref | PRIMARY | PRIMARY | 8 | api.posts.id | 1 | 100 | |
| 2 | DERIVED | categories | eq_ref | PRIMARY | PRIMARY | 8 | api.categories_posts.category_id | 1 | 100 | |
| 2 | DERIVED | posts_votes | ref | post_id | post_id | 8 | api.posts.id | 1 | 100 | Using index |
| 2 | DERIVED | pp | eq_ref | PRIMARY | PRIMARY | 8 | api.posts.parent_id | 1 | 100 | |
| 2 | DERIVED | pp_removed | eq_ref | PRIMARY | PRIMARY | 8 | api.pp.id | 1 | 100 | Using index |
| 2 | DERIVED | removed | eq_ref | PRIMARY | PRIMARY | 8 | api.posts.id | 1 | 100 | Using index |
| 2 | DERIVED | creator | eq_ref | PRIMARY | PRIMARY | 8 | api.posts.created_by | 1 | 100 | |
| 2 | DERIVED | usernames | ref | user_id | user_id | 8 | api.creator.id | 1 | 100 | |
| 2 | DERIVED | verifications | ALL | | | | | 4 | 100 | Using where; Using join buffer (Block Nested Loop) |
| 2 | DERIVED | categories_identifiers | ref | category_id | category_id | 8 | api.categories.id | 1 | 100 | |
+----+-------------+------------------------+--------+-------------------------------+-------------+---------+---------------------------------------------+--------+----------+----------------------------------------------------+
Beyond this, I tried refactoring my query to try and force key usage in the posts table, such as using FORCE INDEX(PRIMARY) in the select, and moving the CTE be the base query and adding a filter WHERE id IN ({the original base query}), but it seems the optimizer still does a full table scan.
In case it's helpful to decode what's happening in the query plan:
At time of writing, there are 33,387 posts rows, but the query plan shows
The query plan shows a full table scan which returns 33,870 rows
The query plan also shows the derived table (<derived2>) as having 493,911 rows
My core questions are:
Am I correct when I say that subqueries should only be executed once per result row from the base select query? If so, then the CTE should also use the JOIN on posts.id and likely use the table index?
Why does the query plan show that it selects 33,870 rows when there are only 33,387? And where do the 493,911 rows come from?
How do you prevent a full table scan in this case?
Give this a try... Do the LIMIT 25 before JOINing to the WITH:
SELECT * FROM
( SELECT ... FROM posts
JOIN categories_posts ...
ORDER BY posts.id DESC
LIMIT 25 ) AS x
JOIN post_data
ON post_data.id IN (x.id, x.parent_id)
ORDER BY posts.id DESC
I have following schema:
+--+------+-----+----+
|id|device|token|cash|
+--+------+-----+----+
column device is unique and token is not unique and null by default.
What i want to achieve is to set all duplicate token values to default (null) leaving only one with highest cash. If duplicates have same cash leave first one.
I have heard about cursor, but it seems that it can be done with usual query.
I have tried following SELECT only to see if im right about my thought how to achieve this, but it seems im wrong.
SELECT
*
FROM
db.table
WHERE
db.table.token NOT IN (SELECT
*
FROM
(
SELECT DISTINCT
MAX(db.table.balance)
FROM
db.table
GROUP BY db.table.balance) temp
)
For example:
This table after query
+-----+---------+--------+-------+
| id | device | token | cash|
+-----+---------+--------+-------+
| 1 | dev_1 | tkn_1 | 3 |
| 2 | dev_2 | tkn_1 | 10 |
| 3 | dev_3 | tkn_2 | 10 |
| 4 | dev_4 | tkn_2 | 14 |
| 5 | dev_5 | tkn_3 | 10 |
| 6 | dev_6 | null | 10 |
| 7 | dev_7 | null | 10 |
| 8 | dev_8 | tkn_4 | 11 |
| 8 | dev_8 | tkn_4 | 11 |
| 8 | dev_8 | tkn_5 | 11 |
+-----+---------+--------+-------+
should be:
+-----+---------+--------+-------+
| id | device | token | cash|
+-----+---------+--------+-------+
| 1 | dev_1 | null | 3 |
| 2 | dev_2 | tkn_1 | 10 |
| 3 | dev_3 | null | 10 |
| 4 | dev_4 | tkn_2 | 14 |
| 5 | dev_5 | tkn_3 | 10 |
| 6 | dev_6 | null | 10 |
| 7 | dev_7 | null | 10 |
| 8 | dev_8 | tkn_4 | 11 |
| 8 | dev_8 | null | 11 |
| 8 | dev_8 | tkn_5 | 15 |
+-----+---------+--------+-------+
Thanks in advance :)
Try using an EXISTS subquery:
UPDATE yourTable t1
SET token = NULL
WHERE EXISTS (SELECT 1 FROM (SELECT * FROM yourTable) t2
WHERE t2.token = t1.token AND
t2.cash > t1.cash);
Demo
Note that this answer assumes that there would never be a tie for two token records having the same highest cash amount.
To set exactly one row in the even of duplicates on the maximum cash, use the id:
update t join
(select tt.*,
(select t3.id
from t t3
where t3.token = tt.token
order by t3.cash desc, id desc
) as max_cash_id
from t tt
) tt
on t.id = tt.id and t.id < tt.max_cash_id
set token = null;
I can't seem to get this query to perform any faster than 8 hours! 0_0
I have read up on indexing and I am still not sure I am doing this right.
I am expecting my query to calculate a value for BROK_1_RATING based on dates and other row values - 500,000 records.
Using record #1 as an example - my query should:
get all other records that have the same ESTIMID
ignore records where ANALYST =""
ignore records where ID is the same as record being compared i.e.
ID != 1
the records must fall within a time frame
i.e. BB.ANNDATS_CONVERTED <= working.ANNDATS_CONVERTED,
BB.REVDATS_CONVERTED > working.ANNDATS_CONVERTED
BB.IRECCD must = 1
Then count the result
Then write the count value to the BROK_1_RATING column for record #1
now do same for record#2, and #3 and so on for the entire table
In human terms - "Examine the date of record #1 - Now, within time frame from record #1 - count the number of times the number 1 exists with the same brokerage ESTIMID, do not count record #1, do not count blank ANALYST rows. Move on to record #2 and do the same"
UPDATE `working` SET `BROK_1_RATING` =
(SELECT COUNT(`ID`) FROM (SELECT `ID`, `IRECCD`, `ANALYST`, `ESTIMID`, `ANNDATS_CONVERTED`, `REVDATS_CONVERTED` FROM `working`) AS BB
WHERE
BB.`ANNDATS_CONVERTED` <= `working`.`ANNDATS_CONVERTED`
AND
BB.`REVDATS_CONVERTED` > `working`.`ANNDATS_CONVERTED`
AND
BB.`ID` != `working`.`ID`
AND
BB.`ESTIMID` = `working`.`ESTIMID`
AND
BB.`ANALYST` != ''
AND
BB.`IRECCD` = 1
)
WHERE `working`.`ANALYST` != '';
| ID | ANALYST | ESTIMID | IRECCD | ANNDATS_CONVERTED | REVDATS_CONVERTED | BROK_1_RATING | NO_TOP_RATING |
------------------------------------------------------------------------------------------------------------------
| 1 | DAVE | Brokerage000 | 4 | 1998-07-01 | 1998-07-04 | | 3 |
| 2 | DAVE | Brokerage000 | 1 | 1998-06-28 | 1998-07-10 | | 4 |
| 3 | DAVE | Brokerage000 | 5 | 1998-07-02 | 1998-07-08 | | 2 |
| 4 | DAVE | Brokerage000 | 1 | 1998-07-04 | 1998-12-04 | | 3 |
| 5 | SAM | Brokerage000 | 1 | 1998-06-14 | 1998-06-30 | | 4 |
| 6 | SAM | Brokerage000 | 1 | 1998-06-28 | 1999-08-08 | | 4 |
| 7 | | Brokerage000 | 1 | 1998-06-28 | 1999-08-08 | | 5 |
| 8 | DAVE | Brokerage111 | 2 | 1998-06-28 | 1999-08-08 | | 3 |
'EXPLAIN' results:
id| select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
----------------------------------------------------------------------------------------------------------------------------------------
1 | PRIMARY | working | index | ANALYST | PRIMARY | 4 | NULL | 467847 | Using where
2 | DEPENDENT SUBQUERY | <derived3> | ALL | NULL | NULL | NULL | NULL | 467847 | Using where
3 | DERIVED | working | index | NULL | test_combined_indexes | 226 | NULL | 467847 | Using index
I have indexes on the single columns - and as well - have tried multiple column index like this:
ALTER TABLE `working` ADD INDEX `test_combined_indexes` (`IRECCD`, `ID`, `ANALYST`, `ESTIMID`, `ANNDATS_CONVERTED`, `REVDATS_CONVERTED`) COMMENT '';
Well you can shorten the query a lot by just removing the extra stuff:
UPDATE `working` as AA SET `BROK_1_RATING` =
(SELECT COUNT(`ID`) FROM `working` AS BB
WHERE BB.`ANNDATS_CONVERTED` <= AA.`ANNDATS_CONVERTED`
AND BB.`REVDATS_CONVERTED` > AA.`ANNDATS_CONVERTED`
AND BB.`ID` != AA.`ID`
AND BB.`ESTIMID` = AA.`ESTIMID`
AND BB.`ANALYST` != ''
AND BB.`IRECCD` = 1 )
WHERE `ANALYST` != '';
could you please help me with a monster.
Do you see any issue with this one?
Would like to reach the execution time below the second, is it possible?
Please ask for any other data you may need to understand the structure of DB. Any tips&tricks are welcome!
SELECT
ORD_CLI.COD_AGE,
ORD_CLI_RIGHE.DOC_ID,
OFF_CLI.off_cli_id,
ORD_CLI_RIGHE.DOC_RIGA_ID,
ORD_CLI_RIGHE.COD_ART,
ART_PESO.PESO_ART,
ORD_CLI.ANNO_DOC,
ORD_CLI.NUM_DOC,
ORD_CLI.SERIE_DOC,
ORD_CLI.DATA_DOC,
CF.RAG_SOC_CF,
AGENTI.NOME_AGE,
ORD_CLI.COD_CF,
ORD_CLI.COD_IVA,
ORD_CLI.COD_DEP,
ORD_CLI_TOT.IMPONIBILE_V1 AS IMPONIBILE_ORDINE,
FATT_CLI_TOT.IMPONIBILE_V1 AS IMPONIBILE_FATTURA,
ORD_CLI_TOT.IVA_V1,
SUM(ART_PESO.PESO_ART) AS weight,
SUM(FATT_CLI_RIGHE.QUANT_RIGA) AS quantity,
SUM(FATT_CLI_RIGHE.QUANT_RIGA*FATT_CLI_RIGHE.PREZZO_LORDO_VU1) AS sell_price,
SUM(FATT_CLI_RIGHE.QUANT_RIGA*DDT_FOR_RIGHE.PREZZO_LORDO_VU1) AS acqisition_price1,
SUM(FATT_CLI_RIGHE.QUANT_RIGA*FATT_FOR_RIGHE.PREZZO_LORDO_VU1) AS acqisition_price2,
SUM(FATT_CLI_RIGHE.QUANT_RIGA*FATT_CLI_RIGHE_PROVV.IMPORTO_PROVV_VU1) AS agent_reward,
SUM(FATT_CLI_RIGHE.QUANT_RIGA*ART_PESO.PESO_ART * 0.13) AS transport_price,
SUM(FATT_CLI_RIGHE.QUANT_RIGA*(
FATT_CLI_RIGHE.PREZZO_LORDO_VU1
- COALESCE(DDT_FOR_RIGHE.PREZZO_LORDO_VU1, 0)
- COALESCE(FATT_FOR_RIGHE.PREZZO_LORDO_VU1, 0)
- COALESCE(FATT_CLI_RIGHE_PROVV.IMPORTO_PROVV_VU1, 0)
- COALESCE(ART_PESO.PESO_ART, 0) * 0.13
)) AS net_earning,
OFF_CLI.stima_prezzo_acquisto,
OFF_CLI.stima_prezzo_trasporto,
OFF_CLI.stima_provvigioni_agenti,
OFF_CLI.stima_utile
FROM ORD_CLI
INNER JOIN ORD_CLI_RIGHE
ON ORD_CLI_RIGHE.DOC_ID = ORD_CLI.DOC_ID
LEFT JOIN ORD_CLI_RIGHE_SPEC
ON ORD_CLI_RIGHE.DOC_RIGA_ID = ORD_CLI_RIGHE_SPEC.DOC_RIGA_ID
INNER JOIN ART_PESO
ON ART_PESO.COD_ART = ORD_CLI_RIGHE.COD_ART
INNER JOIN ORD_CLI_TOT
ON ORD_CLI.DOC_ID = ORD_CLI_TOT.DOC_ID
INNER JOIN AGENTI
ON AGENTI.COD_AGE = ORD_CLI.COD_AGE
INNER JOIN CF
ON CF.COD_CF = ORD_CLI.COD_CF
LEFT JOIN FATT_CLI_RIGHE_SPEC
ON ORD_CLI_RIGHE.DOC_RIGA_ID = FATT_CLI_RIGHE_SPEC.ORD_RIGA_ID
LEFT JOIN FATT_CLI_RIGHE
ON FATT_CLI_RIGHE.DOC_RIGA_ID = FATT_CLI_RIGHE_SPEC.DOC_RIGA_ID
LEFT JOIN FATT_CLI_TOT
ON FATT_CLI_RIGHE.DOC_ID = FATT_CLI_TOT.DOC_ID
LEFT JOIN FATT_CLI_RIGHE_PROVV
ON FATT_CLI_RIGHE_PROVV.DOC_RIGA_ID = FATT_CLI_RIGHE_SPEC.DOC_RIGA_ID
LEFT JOIN FATT_CLI_RIGHE_LOTTI
ON FATT_CLI_RIGHE_LOTTI.DOC_RIGA_ID = FATT_CLI_RIGHE_SPEC.DOC_RIGA_ID
LEFT JOIN DDT_FOR_RIGHE_LOTTI
ON DDT_FOR_RIGHE_LOTTI.COD_LOT = FATT_CLI_RIGHE_LOTTI.COD_LOT
LEFT JOIN DDT_FOR_RIGHE
ON DDT_FOR_RIGHE.DOC_RIGA_ID = DDT_FOR_RIGHE_LOTTI.DOC_RIGA_ID
LEFT JOIN FATT_FOR_RIGHE
ON FATT_FOR_RIGHE.DOC_RIGA_ID = FATT_CLI_RIGHE_LOTTI.COD_LOT
LEFT JOIN OFF_CLI_RIGHE
ON OFF_CLI_RIGHE.DOC_RIGA_ID = ORD_CLI_RIGHE_SPEC.OFF_RIGA_ID
LEFT JOIN OFF_CLI
ON OFF_CLI.DOC_ID = OFF_CLI_RIGHE.DOC_ID
WHERE
ORD_CLI.COD_BUSN_UN='P'
AND OFF_CLI_RIGHE.DOC_RIGA_ID IS NOT NULL
AND ORD_CLI.DATA_DOC >= '2012-11-29'
AND ORD_CLI.DATA_DOC <= '2013-02-28'
GROUP BY ORD_CLI.DOC_ID
ORDER BY ORD_CLI.DATA_DOC
DESC LIMIT 30 OFFSET 0
Time of execution
Showing rows 0 - 29 ( 30 total, Query took 6.3458 sec)
EXPLAIN of the query
+----+-------------+----------------------+--------+-----------------------------------------------------------------------------+----------------------------------+---------+--------------------------------------------+------+----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------------------+--------+-----------------------------------------------------------------------------+----------------------------------+---------+--------------------------------------------+------+----------+----------------------------------------------+
| 1 | SIMPLE | ORD_CLI | range | PRIMARY,ORD_CLI_DATA_DOC,ORD_CLI_COD_CF,ORD_CLI_COD_BUSN_UN,ORD_CLI_COD_AGE | ORD_CLI_DATA_DOC | 4 | NULL | 3728 | 100.00 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | AGENTI | eq_ref | PRIMARY | PRIMARY | 38 | ORD_CLI.COD_AGE | 1 | 100.00 | Using where |
| 1 | SIMPLE | CF | eq_ref | PRIMARY | PRIMARY | 38 | ORD_CLI.COD_CF | 1 | 100.00 | |
| 1 | SIMPLE | ORD_CLI_TOT | eq_ref | PRIMARY | PRIMARY | 62 | ORD_CLI.DOC_ID | 1 | 100.00 | |
| 1 | SIMPLE | ORD_CLI_RIGHE | ref | PRIMARY,ORD_CLI_RIGHE_DOC_ID,ORD_CLI_RIGHE_COD_ART | ORD_CLI_RIGHE_DOC_ID | 62 | ORD_CLI_TOT.DOC_ID | 2 | 100.00 | Using where |
| 1 | SIMPLE | ART_PESO | eq_ref | PRIMARY | PRIMARY | 92 | ORD_CLI_RIGHE.COD_ART | 1 | 100.00 | |
| 1 | SIMPLE | ORD_CLI_RIGHE_SPEC | eq_ref | PRIMARY,ORD_CLI_RIGHE_SPEC_OFF_RIGA_ID | PRIMARY | 92 | ORD_CLI_RIGHE.DOC_RIGA_ID | 1 | 100.00 | Using where |
| 1 | SIMPLE | OFF_CLI_RIGHE | ref | DOC_RIGA_ID | DOC_RIGA_ID | 92 | ORD_CLI_RIGHE_SPEC.OFF_RIGA_ID | 1 | 100.00 | Using where |
| 1 | SIMPLE | OFF_CLI | ref | DOC_ID | DOC_ID | 63 | OFF_CLI_RIGHE.DOC_ID | 1 | 100.00 | |
| 1 | SIMPLE | FATT_CLI_RIGHE_SPEC | ref | FATT_CLI_RIGHE_SPEC_ORD_RIGA_ID | FATT_CLI_RIGHE_SPEC_ORD_RIGA_ID | 93 | ORD_CLI_RIGHE.DOC_RIGA_ID | 1 | 100.00 | Using index |
| 1 | SIMPLE | FATT_CLI_RIGHE | eq_ref | PRIMARY | PRIMARY | 92 | FATT_CLI_RIGHE_SPEC.DOC_RIGA_ID | 1 | 100.00 | |
| 1 | SIMPLE | FATT_CLI_TOT | eq_ref | PRIMARY | PRIMARY | 62 | FATT_CLI_RIGHE.DOC_ID | 1 | 100.00 | |
| 1 | SIMPLE | FATT_CLI_RIGHE_PROVV | ref | FATT_CLI_RIGHE_PROVV_DOC_RIGA_ID | FATT_CLI_RIGHE_PROVV_DOC_RIGA_ID | 92 | FATT_CLI_RIGHE_SPEC.DOC_RIGA_ID | 1 | 100.00 | |
| 1 | SIMPLE | FATT_CLI_RIGHE_LOTTI | ref | FATT_CLI_RIGHE_LOTTI_DOC_RIGA_ID | FATT_CLI_RIGHE_LOTTI_DOC_RIGA_ID | 92 | FATT_CLI_RIGHE_SPEC.DOC_RIGA_ID | 1 | 100.00 | |
| 1 | SIMPLE | DDT_FOR_RIGHE_LOTTI | ref | DDT_FOR_RIGHE_LOTTI_COD_LOT | DDT_FOR_RIGHE_LOTTI_COD_LOT | 92 | FATT_CLI_RIGHE_LOTTI.COD_LOT | 1 | 100.00 | |
| 1 | SIMPLE | DDT_FOR_RIGHE | eq_ref | PRIMARY | PRIMARY | 92 | DDT_FOR_RIGHE_LOTTI.DOC_RIGA_ID | 1 | 100.00 | |
| 1 | SIMPLE | FATT_FOR_RIGHE | eq_ref | PRIMARY | PRIMARY | 92 | FATT_CLI_RIGHE_LOTTI.COD_LOT | 1 | 100.00 | |
+----+-------------+----------------------+--------+-----------------------------------------------------------------------------+----------------------------------+---------+--------------------------------------------+------+----------+----------------------------------------------+
The following is the result of show status like 'Handler%' excatly after the query been executed
Handler_commit, 2
Handler_delete, 0
Handler_discover, 0
Handler_prepare, 0
Handler_read_first, 0
Handler_read_key, 421001
Handler_read_last, 0
Handler_read_next, 240344
Handler_read_prev, 0
Handler_read_rnd, 30
Handler_read_rnd_next, 2412
Handler_rollback, 0
Handler_savepoint, 0
Handler_savepoint_rollback, 0
Handler_update, 31846
Handler_write, 2409
Database structure: https://gist.github.com/moiseevigor/4988fc8868f92643c9fb
EDIT 1
After creation of index
ALTER TABLE `TCross5_NP`.`ORD_CLI`
ADD INDEX `ORD_CLI_MULTI` (`COD_BUSN_UN` ASC, `DATA_DOC` ASC, `DOC_ID` ASC) ;
The execution time gone down 2 times, but still hits the ORD_CLI_MULTI index
First, (and has helped in many other similar queries where you appear to be dealing with a lot of "lookup" secondary table references), change start of query to
SELECT STRAIGHT_JOIN
Which directs the engine to run the query in the exact order you have listed. This will prevent it from trying to use a lookup table as a primary consideration and trying to work backwords or end-around to get the data. Sometimes works well, other times (rarely in my experience), hinders performance.
Next, since you are looking for an " AND OFF_CLI_RIGHE.DOC_RIGA_ID IS NOT NULL", I would change your LEFT JOINs to INNER JOIN when joining to.
INNER JOIN ORD_CLI_RIGHE_SPEC
ON ORD_CLI_RIGHE.DOC_RIGA_ID = ORD_CLI_RIGHE_SPEC.DOC_RIGA_ID
INNER JOIN OFF_CLI_RIGHE
ON ORD_CLI_RIGHE_SPEC.OFF_RIGA_ID = OFF_CLI_RIGHE.DOC_RIGA_ID
and thus eliminate the "AND ... is not null" in the WHERE clause.
Finally, I would have an index that is multiple parts that can be optimized
FOR the query...
CREATE index MultipleParts on ORD_CLI ( COD_BUSN_UN, DATA_DOC, DOC_ID );
The multipart index will help the WHERE, GROUP BY AND ORDER BY of the query.
+--------------------+---------------+------+-----+---------+-------+
| ID | GKEY |GOODS | PRI | COUNTRY | Extra |
+--------------------+---------------+------+-----+---------+-------+
| 1 | BOOK-1 | 1 | 10 | | |
| 2 | PHONE-1 | 2 | 12 | | |
| 3 | BOOK-2 | 1 | 13 | | |
| 4 | BOOK-3 | 1 | 10 | | |
| 5 | PHONE-2 | 2 | 10 | | |
| 6 | PHONE-3 | 2 | 20 | | |
| 7 | BOOK-10 | 2 | 20 | | |
| 8 | BOOK-11 | 2 | 20 | | |
| 9 | BOOK-20 | 2 | 20 | | |
| 10 | BOOK-21 | 2 | 20 | | |
| 11 | PHONE-30 | 2 | 20 | | |
+--------------------+---------------+------+-----+---------+-------+
Above is my table. I want to get all records which GKEY > BOOK-2, Who can tell me the expression with mysql?
Using " WHERE GKEY>'BOOK-2' " Cannot get the correct results.
How about (something like):
(this is MSSQL - I guess it will be similar in MySQL)
select
*
from
(
select
*,
index = convert(int,replace(GKEY,'BOOK-',''))
from table
where
GKEY like 'BOOK%'
) sub
where
sub.index > 2
By way of explanation: The inner query basically recreates your table, but only for BOOK rows, and with an extra column containing the index in the right data type to make a greater than comparison work numerically.
Alternatively something like this:
select
*
from table
where
(
case
when GKEY like 'BOOK%' then
case when convert(int,replace(GKEY,'BOOK-','')) > 2 then 1
else 0
end
else 0
end
) = 1
Essentially the problem is that you need to check for BOOK before you turn the index into a numberic, as the other values of GKEY would create an error (without doing some clunky string handling).
SELECT * FROM `table` AS `t1` WHERE `t1`.`id` > (SELECT `id` FROM `table` AS `t2` WHERE `t2`.`GKEY`='BOOK-2' LIMIT 1)