I have a friggin bizarre situation going on. One of my nightly queries, which usually takes 5 mins, took >12 hours. Here is the query:
SELECT Z.id,
Z.seoAlias,
GROUP_CONCAT(DISTINCT LOWER(A.include)) AS include,
GROUP_CONCAT(DISTINCT LOWER(A.exclude)) AS exclude
FROM df_productsbystore AS X
INNER JOIN df_product_variants AS Y ON Y.id = X.id_variant
INNER JOIN df_products AS Z ON Z.id = Y.id_product
INNER JOIN df_advertisers AS A ON A.id = X.id_store
WHERE X.isActive > 0
AND Z.id > 60301433
GROUP BY Z.id
ORDER BY Z.id
LIMIT 45000;
I ran an EXPLAIN and got the following:
+----+-------------+-------+--------+------------------------------------------------------------------------------------+-----------+---------+---------------------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+------------------------------------------------------------------------------------+-----------+---------+---------------------+------+---------------------------------+
| 1 | SIMPLE | A | ALL | PRIMARY | NULL | NULL | NULL | 365 | Using temporary; Using filesort |
| 1 | SIMPLE | X | ref | UNIQUE_variantAndStore,idx_isActive,idx_store | idx_store | 4 | foenix.A.id | 600 | Using where |
| 1 | SIMPLE | Y | eq_ref | PRIMARY,UNIQUE,idx_prod | PRIMARY | 4 | foenix.X.id_variant | 1 | Using where |
| 1 | SIMPLE | Z | eq_ref | PRIMARY,UNIQUE_prods_seoAlias,idx_brand,idx_gender2,fk_df_products_id_category_idx | PRIMARY | 4 | foenix.Y.id_product | 1 | NULL |
+----+-------------+-------+--------+------------------------------------------------------------------------------------+-----------+---------+---------------------+------+---------------------------------+
Which looked different to my development environment. The df_advertisers section looked fishy to me, so I deleted and recreated the index on the X.id_store column, and now the EXPLAIN looks like this and the query is fast again:
+----+-------------+-------+--------+------------------------------------------------------------------------------------+------------------------+---------+-------------------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+------------------------------------------------------------------------------------+------------------------+---------+-------------------+---------+-------------+
| 1 | SIMPLE | Z | range | PRIMARY,UNIQUE_prods_seoAlias,idx_brand,idx_gender2,fk_df_products_id_category_idx | PRIMARY | 4 | NULL | 2090691 | Using where |
| 1 | SIMPLE | Y | ref | PRIMARY,UNIQUE,idx_prod | UNIQUE | 4 | foenix.Z.id | 1 | Using index |
| 1 | SIMPLE | X | ref | UNIQUE_variantAndStore,idx_isActive,idx_id_store | UNIQUE_variantAndStore | 4 | foenix.Y.id | 1 | Using where |
| 1 | SIMPLE | A | eq_ref | PRIMARY | PRIMARY | 4 | foenix.X.id_store | 1 | NULL |
+----+-------------+-------+--------+------------------------------------------------------------------------------------+------------------------+---------+-------------------+---------+-------------+
It would appear that the index magically disappeared. Can anyone explain how this is possible? Am I mean to run a mysqlcheck command or similar on a regular basis to avoid this kind of thing? I'm stumped!
Thanks
Next time, simply do ANALYZE TABLE df_productsbystore; It will be very fast, and may solve the problem.
ANALYZE recomputes the statistics on which the Optimizer depends for deciding, in this case, which table to start with. In rare situations, the stats get out of date and need a kick in the shin.
Caveat: I am assuming you are using InnoDB on a somewhat recent version. If you are using MyISAM, then ANALYZE is needed more often.
Do you really need 45K rows? What will you do with so many?
A way to speed the query up (probably) is to do everything you can with X and Z in a subquery, then JOIN A to do the rest:
SELECT XYZ.id, XYZ.seoAlias,
GROUP_CONCAT(DISTINCT LOWER(A.include)) AS include,
GROUP_CONCAT(DISTINCT LOWER(A.exclude)) AS exclude
FROM
(
SELECT Z.id, Z.seoAlias, X.id_store
FROM df_productsbystore AS X
INNER JOIN df_product_variants AS Y ON Y.id = X.id_variant
INNER JOIN df_products AS Z ON Z.id = Y.id_product
WHERE X.isActive > 0
AND Z.id > 60301433
GROUP BY Z.id -- may not be necessary ??
ORDER BY Z.id
LIMIT 45000
) AS XYZ
INNER JOIN df_advertisers AS A ON A.id = XYZ.id_store
GROUP BY ZYZ.id
ORDER BY XYZ.id;
Useful indexes:
Y: INDEX(id_product, id)
X: INDEX(id_variant, isActive, id_store)
In order to fix the problem I tried removing and recreating the index + FK. This did not solve the problem the first time, while the machine was under load, but it did work the second time, on a quiet machine.
It just feels like mysql is flaky. I really don't know what else to say.
Thanks for the help though
Related
I have a problem with a mysql query that sometimes needs more than 80 secs to execute.
Could you please help me to optimize it?
Here my sql code
SELECT
codFinAtl,
nome,
cognome,
dataNascita AS annoNascita,
MIN(tempoRisultato) AS tempoRisultato
FROM
graduatorie
INNER JOIN anagrafica ON codFin = codFinAtl
INNER JOIN manifestazionigrad ON manifestazionigrad.codice = graduatorie.codMan
WHERE
anagrafica.eliminato = 'n'
AND graduatorie.eliminato = 'n'
AND codGara IN('01', '81')
AND sesso = 'F'
AND manifestazionigrad.aa = '2018/19'
AND graduatorie.baseVasca = '25'
AND tempoRisultato IS NOT NULL
AND dataNascita BETWEEN '20050101' AND '20061231'
GROUP BY
codFinAtl
ORDER BY
tempoRisultato,
cognome,
nome
And my db schema
[UPDATE]
Here there is the results of EXPLAIN query
+----+-------------+--------------------+------------+--------+--------------------------+-----------+---------+-------------------------------+--------+----------+----------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------------------+------------+--------+--------------------------+-----------+---------+-------------------------------+--------+----------+----------------------------------------------+
| 1 | SIMPLE | anagrafica | NULL | ALL | codFin | NULL | NULL | NULL | 334094 | 0.11 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | graduatorie | NULL | ref | codMan,codFinAtl,codGara | codFinAtl | 33 | finsicilia.anagrafica.codFin | 20 | 0.24 | Using where |
| 1 | SIMPLE | manifestazionigrad | NULL | eq_ref | codice | codice | 32 | finsicilia.graduatorie.codMan | 1 | 10.00 | Using index condition; Using where |
+----+-------------+--------------------+------------+--------+--------------------------+-----------+---------+-------------------------------+--------+----------+----------------------------------------------+
graduatorie: INDEX(eliminato, baseVasca)
manifestazionigrad: INDEX(aa, codMan)
These may not be sufficient. Please qualify each column in the query so we know which table they come from.
But the real problem is probably "explode-implode". First it explodes by JOINing, then it implodes by GROUP BY.
Is it possible to compute MIN(tempoRisultato) without any JOINs? (I can't even tell which table is involved.) If so, provide that, then we can discuss what to do next. There may be two choices: a derived table with the MIN, or a subquery in the JOIN that may be correlated or uncorrelated.
(tempoRisultato seems to be in two tables!)
I need to optimize this query:
SELECT
cp.id_currency_purchase,
MIN(cr.buy_rate),
MAX(cr.buy_rate)
FROM currency_purchase cp
JOIN currency_rate cr ON cr.id_currency = cp.id_currency AND cr.id_currency_source = cp.id_currency_source
WHERE cr.rate_date >= cp.purchase_date
GROUP BY cp.id_currency_purchase
Now it takes about 0,5 s, which is way too long for me (it's used either updated very often, so query cache doesn't help in my case).
Here's EXPLAIN EXTENDED output:
+------+-------------+-------+-------+--------------------------------+-------------+---------+-------------------------+-------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+------+-------------+-------+-------+--------------------------------+-------------+---------+-------------------------+-------+----------+-------------+
| 1 | SIMPLE | cp | index | id_currency_source,id_currency | PRIMARY | 8 | NULL | 9 | 100.00 | |
| 1 | SIMPLE | cr | ref | id_currency,id_currency_source | id_currency | 8 | lokatnik.cp.id_currency | 21433 | 100.00 | Using where |
+------+-------------+-------+-------+--------------------------------+-------------+---------+-------------------------+-------+----------+-------------+
There are some great questions at stackoverflow for one table GROUP BY optimization (like: https://stackoverflow.com/a/11631480/1320081), but I really don't know how to optimize query based on JOIN-ed tables.
How can I speed it up?
The best index for this query:
SELECT cp.id_currency_purchase, MIN(cr.buy_rate), MAX(cr.buy_rate)
FROM currency_purchase cp JOIN
currency_rate cr
ON cr.id_currency = cp.id_currency AND
cr.id_currency_source = cp.id_currency_source
WHERE cr.rate_date >= cp.purchase_date
GROUP BY cp.id_currency_purchase;
is: currency_rate(id_currency, id_currency_source, rate_date, buy_rate). You might also try currency_purchase(id_currency, id_currency_source, rate_date).
SELECT GV.ID
, XS.SymbolId
, GS.ID
, XS.SymbolExchangeId
, XS.IssueId
, GE.ID
, XD.ACTIVE
, XD.ExchangeId
FROM TB_GDS_SECURITY GS
, WSOD_Vanilla.WSOD_XrefIssueSymbols XS
, WSOD_Vanilla.WSOD_XrefIssueData XD
, TB_GDS_EXCHANGE GE
, TB_GDS_VENDOR GV
WHERE XD.CompositeIssueID = GS.ISSUE_ID_TEMP
AND XD.IssueId = XS.IssueId
AND XD.ACTIVE = 'True'
AND GV.VENDOR_SHORT_NAME = XS.SymbolsetId
AND GE.EXCHANGE_SHORT_NAME = XD.ExchangeId
AND NOT EXISTS (SELECT ID
FROM TB_GDS_VENDOR_VENUE_SYMBOL VVS
WHERE VVS.ISSUE_ID = XS.IssueId
AND (SELECT ID
FROM TB_GDS_VENDOR GV
WHERE GV.VENDOR_SHORT_NAME = XS.SymbolsetId
)
);
This Particular query takes more than a minute. I want it to be done in lesser time.
the explain statement is as follow:
mysql> EXPLAIN SELECT GV.ID, XS.SymbolId, GS.ID, XS.SymbolExchangeId, XS.IssueId, GE.ID, XD.ACTIVE, XD.ExchangeId FROM TB_GDS_SECURITY GS, WSOD_Vanilla.WSOD_XrefIssueSymbols XS, WSOD_Vanilla.WSOD_XrefIssueData XD, TB_GDS_EXCHANGE GE, TB_GDS_VENDOR GV WHERE XD.CompositeIssueID = GS.ISSUE_ID_TEMP AND XD.IssueId = XS.IssueId AND XD.ACTIVE = 'True' AND GV.VENDOR_SHORT_NAME=XS.SymbolsetId AND GE.EXCHANGE_SHORT_NAME = XD.ExchangeId AND NOT exists (SELECT ID FROM TB_GDS_VENDOR_VENUE_SYMBOL VVS
-> WHERE VVS.ISSUE_ID = XS.IssueId AND (SELECT ID FROM TB_GDS_VENDOR GV WHERE GV.VENDOR_SHORT_NAME = XS.SymbolsetId));
+----+--------------------+-------+------+--------------------------------------------------------+-----------------------------+---------+----------------------------------+---------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+------+--------------------------------------------------------+-----------------------------+---------+----------------------------------+---------+------------------------------------+
| 1 | PRIMARY | XD | ref | WSOD_XrefIssueData_UINDX1,WSOD_XrefIssueData_INDX1 | WSOD_XrefIssueData_INDX1 | 21 | const | 3981788 | Using index condition; Using where |
| 1 | PRIMARY | GE | ref | TB_GDS_EXCHANGE_INDX1 | TB_GDS_EXCHANGE_INDX1 | 63 | WSOD_Vanilla.XD.ExchangeId | 1 | Using where; Using index |
| 1 | PRIMARY | GS | ref | TB_GDS_SECURITY_INDX1 | TB_GDS_SECURITY_INDX1 | 5 | WSOD_Vanilla.XD.CompositeIssueID | 1 | Using where; Using index |
| 1 | PRIMARY | XS | ref | PRIMARY | PRIMARY | 8 | WSOD_Vanilla.XD.IssueId | 1 | Using where |
| 1 | PRIMARY | GV | ref | TB_GDS_VENDOR_INDX1 | TB_GDS_VENDOR_INDX1 | 62 | WSOD_Vanilla.XS.SymbolsetId | 1 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | VVS | ref | TB_GDS_VVENUE_SYMBOL_UINDX1,TB_GDS_VVENUE_SYMBOL_INDX2 | TB_GDS_VVENUE_SYMBOL_UINDX1 | 5 | WSOD_Vanilla.XS.IssueId | 3 | Using where; Using index |
| 3 | DEPENDENT SUBQUERY | GV | ref | TB_GDS_VENDOR_INDX1 | TB_GDS_VENDOR_INDX1 | 62 | WSOD_Vanilla.XS.SymbolsetId | 1 | Using where; Using index |
+----+--------------------+-------+------+--------------------------------------------------------+-----------------------------+---------+----------------------------------+---------+------------------------------------+
7 rows in set (0.01 sec)
After optimizing still it takes around 1.35 min to execute.
Please help me out.
As per comment elsewhere, you've not shown us the index defintions.
Most likely the bog cost here is the NOT EXISTS sub-clause. MySQL does not handle push-predictates well, and you've also got a cartesian product in there. I'm struggling to understand what this actually does. MySQL (indeed no DBMS I'm familiar with) can effectively use an index for NOT EXISTS or <> or NOT LIKE. The nested nested query does not appear to serve any function since you already have an explicit join between these tables in the outer query:
WHERE...AND GV.VENDOR_SHORT_NAME = XS.SymbolsetId...
AND NOT EXISTS ( [a condition] AND
SELECT ID FROM TB_GDS_VENDOR GV
WHERE GV.VENDOR_SHORT_NAME = XS.SymbolsetId
Your only filtering apart from the joins and this ugly NOT EXISTS is "AND XD.ACTIVE = 'True'" - if this column only holds 2/3 values then it will likely be very inefficient to use an index.
I'm looking for some advice on how I might better optimize this query.
For each _piece_detail record that:
Contains at least one matching _scan record on (zip, zip_4,
zip_delivery_point, serial_number)
Belongs to a company from mailing_groups (through a chain of relationships)
Has either:
first_scan_date_time that is greater than the MIN(scan_date_time) of the related _scan records
latest_scan_date_time that is less than the MAX(scan_date_time) of
the related _scan records
I will need to:
Set _piece_detail.first_scan_date_time to MIN(_scan.scan_date_time)
Set _piece_detail.latest_scan_date_time to MAX(_scan.scan_date_time)
Since I'm dealing with millions upon millions of records, I am trying to reduce the number of records that I actually have to search through. Here are some facts about the data:
The _piece_details table is partitioned by job_id, so it seems to
make the most sense to run through these checks in the order of
_piece_detail.job_id, _piece_detail.piece_id.
The scan records table contains over 100,000,000 records right now and is partitioned by (zip, zip_4, zip_delivery_point,
serial_number, scan_date_time), which is the same key that is used
to match a _scan with a _piece_detail (aside from scan_date_time).
Only about 40% of the _piece_detail records belong to a mailing_group, but we don't know which ones these are until we run
through the full relationship of joins.
Only about 30% of the _scan records belong to a _piece_detail with a mailing_group.
There are typically between 0 and 4 _scan records per _piece_detail.
Now, I am having a hell of a time finding a way to execute this in a decent way. I had originally started with something like this:
UPDATE _piece_detail
INNER JOIN (
SELECT _piece_detail.job_id, _piece_detail.piece_id, MIN(_scan.scan_date_time) as first_scan_date_time, MAX(_scan.scan_date_time) as latest_scan_date_time
FROM _piece_detail
INNER JOIN _container_quantity
ON _piece_detail.cqt_database_id = _container_quantity.cqt_database_id
AND _piece_detail.job_id = _container_quantity.job_id
INNER JOIN _container_summary
ON _container_quantity.container_id = _container_summary.container_id
AND _container_summary.job_id = _container_quantity.job_id
INNER JOIN _mail_piece_unit
ON _container_quantity.mpu_id = _mail_piece_unit.mpu_id
AND _container_quantity.job_id = _mail_piece_unit.job_id
INNER JOIN _header
ON _header.job_id = _piece_detail.job_id
INNER JOIN mailing_groups
ON _mail_piece_unit.mpu_company = mailing_groups.mpu_company
INNER JOIN _scan
ON _scan.zip = _piece_detail.zip
AND _scan.zip_4 = _piece_detail.zip_4
AND _scan.zip_delivery_point = _piece_detail.zip_delivery_point
AND _scan.serial_number = _piece_detail.serial_number
GROUP BY _piece_detail.job_id, _piece_detail.piece_id, _scan.zip, _scan.zip_4, _scan.zip_delivery_point, _scan.serial_number
) as t1 ON _piece_detail.job_id = t1.job_id AND _piece_detail.piece_id = t1.piece_id
SET _piece_detail.first_scan_date_time = t1.first_scan_date_time, _piece_detail.latest_scan_date_time = t1.latest_scan_date_time
WHERE _piece_detail.first_scan_date_time < t1.first_scan_date_time
OR _piece_detail.latest_scan_date_time > t1.latest_scan_date_time;
I thought that this may have been trying to load too much into memory at once and might not be using the indexes properly.
Then I thought that I might be able to avoid doing that huge joined subquery and add two leftjoin subqueries to get the min/max like so:
UPDATE _piece_detail
INNER JOIN _container_quantity
ON _piece_detail.cqt_database_id = _container_quantity.cqt_database_id
AND _piece_detail.job_id = _container_quantity.job_id
INNER JOIN _container_summary
ON _container_quantity.container_id = _container_summary.container_id
AND _container_summary.job_id = _container_quantity.job_id
INNER JOIN _mail_piece_unit
ON _container_quantity.mpu_id = _mail_piece_unit.mpu_id
AND _container_quantity.job_id = _mail_piece_unit.job_id
INNER JOIN _header
ON _header.job_id = _piece_detail.job_id
INNER JOIN mailing_groups
ON _mail_piece_unit.mpu_company = mailing_groups.mpu_company
LEFT JOIN _scan fs ON (fs.zip, fs.zip_4, fs.zip_delivery_point, fs.serial_number) = (
SELECT zip, zip_4, zip_delivery_point, serial_number
FROM _scan
WHERE zip = _piece_detail.zip
AND zip_4 = _piece_detail.zip_4
AND zip_delivery_point = _piece_detail.zip_delivery_point
AND serial_number = _piece_detail.serial_number
ORDER BY scan_date_time ASC
LIMIT 1
)
LEFT JOIN _scan ls ON (ls.zip, ls.zip_4, ls.zip_delivery_point, ls.serial_number) = (
SELECT zip, zip_4, zip_delivery_point, serial_number
FROM _scan
WHERE zip = _piece_detail.zip
AND zip_4 = _piece_detail.zip_4
AND zip_delivery_point = _piece_detail.zip_delivery_point
AND serial_number = _piece_detail.serial_number
ORDER BY scan_date_time DESC
LIMIT 1
)
SET _piece_detail.first_scan_date_time = fs.scan_date_time, _piece_detail.latest_scan_date_time = ls.scan_date_time
WHERE _piece_detail.first_scan_date_time < fs.scan_date_time
OR _piece_detail.latest_scan_date_time > ls.scan_date_time
These are the explains when I convert them to SELECT statements:
+----+-------------+---------------------+--------+----------------------------------------------------+---------------+---------+------------------------------------------------------------------------------------------------------------------------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------------+--------+----------------------------------------------------+---------------+---------+------------------------------------------------------------------------------------------------------------------------+--------+----------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 844161 | NULL |
| 1 | PRIMARY | _piece_detail | eq_ref | PRIMARY,first_scan_date_time,latest_scan_date_time | PRIMARY | 18 | t1.job_id,t1.piece_id | 1 | Using where |
| 2 | DERIVED | _header | index | PRIMARY | date_prepared | 3 | NULL | 87 | Using index; Using temporary; Using filesort |
| 2 | DERIVED | _piece_detail | ref | PRIMARY,cqt_database_id,zip | PRIMARY | 10 | odms._header.job_id | 9703 | NULL |
| 2 | DERIVED | _container_quantity | eq_ref | unique,mpu_id,job_id,job_id_container_quantity | unique | 14 | odms._header.job_id,odms._piece_detail.cqt_database_id | 1 | NULL |
| 2 | DERIVED | _mail_piece_unit | eq_ref | PRIMARY,company,job_id_mail_piece_unit | PRIMARY | 14 | odms._container_quantity.mpu_id,odms._header.job_id | 1 | Using where |
| 2 | DERIVED | mailing_groups | eq_ref | PRIMARY | PRIMARY | 27 | odms._mail_piece_unit.mpu_company | 1 | Using index |
| 2 | DERIVED | _container_summary | eq_ref | unique,container_id,job_id_container_summary | unique | 14 | odms._header.job_id,odms._container_quantity.container_id | 1 | Using index |
| 2 | DERIVED | _scan | ref | PRIMARY | PRIMARY | 28 | odms._piece_detail.zip,odms._piece_detail.zip_4,odms._piece_detail.zip_delivery_point,odms._piece_detail.serial_number | 1 | Using index |
+----+-------------+---------------------+--------+----------------------------------------------------+---------------+---------+------------------------------------------------------------------------------------------------------------------------+--------+----------------------------------------------+
+----+--------------------+---------------------+--------+--------------------------------------------------------------------+---------------+---------+------------------------------------------------------------------------------------------------------------------------+-----------+-----------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+---------------------+--------+--------------------------------------------------------------------+---------------+---------+------------------------------------------------------------------------------------------------------------------------+-----------+-----------------------------------------------------------------+
| 1 | PRIMARY | _header | index | PRIMARY | date_prepared | 3 | NULL | 87 | Using index |
| 1 | PRIMARY | _piece_detail | ref | PRIMARY,cqt_database_id,first_scan_date_time,latest_scan_date_time | PRIMARY | 10 | odms._header.job_id | 9703 | NULL |
| 1 | PRIMARY | _container_quantity | eq_ref | unique,mpu_id,job_id,job_id_container_quantity | unique | 14 | odms._header.job_id,odms._piece_detail.cqt_database_id | 1 | NULL |
| 1 | PRIMARY | _mail_piece_unit | eq_ref | PRIMARY,company,job_id_mail_piece_unit | PRIMARY | 14 | odms._container_quantity.mpu_id,odms._header.job_id | 1 | Using where |
| 1 | PRIMARY | mailing_groups | eq_ref | PRIMARY | PRIMARY | 27 | odms._mail_piece_unit.mpu_company | 1 | Using index |
| 1 | PRIMARY | _container_summary | eq_ref | unique,container_id,job_id_container_summary | unique | 14 | odms._header.job_id,odms._container_quantity.container_id | 1 | Using index |
| 1 | PRIMARY | fs | index | NULL | updated | 1 | NULL | 102462928 | Using where; Using index; Using join buffer (Block Nested Loop) |
| 1 | PRIMARY | ls | index | NULL | updated | 1 | NULL | 102462928 | Using where; Using index; Using join buffer (Block Nested Loop) |
| 3 | DEPENDENT SUBQUERY | _scan | ref | PRIMARY | PRIMARY | 28 | odms._piece_detail.zip,odms._piece_detail.zip_4,odms._piece_detail.zip_delivery_point,odms._piece_detail.serial_number | 1 | Using where; Using index; Using filesort |
| 2 | DEPENDENT SUBQUERY | _scan | ref | PRIMARY | PRIMARY | 28 | odms._piece_detail.zip,odms._piece_detail.zip_4,odms._piece_detail.zip_delivery_point,odms._piece_detail.serial_number | 1 | Using where; Using index; Using filesort |
+----+--------------------+---------------------+--------+--------------------------------------------------------------------+---------------+---------+------------------------------------------------------------------------------------------------------------------------+-----------+-----------------------------------------------------------------+
Now, looking at the explains generated by each, I really can't tell which is giving me the best bang for my buck. The first one shows fewer total rows when multiplying the rows column, but the second appears to execute a bit quicker.
Is there anything that I could do to achieve the same results while increasing performance through modifying the query structure?
Disable update of index while doing the bulk updates
ALTER TABLE _piece_detail DISABLE KEYS;
UPDATE ....;
ALTER TABLE _piece_detail ENABLE KEYS;
Refer to the mysql docs : http://dev.mysql.com/doc/refman/5.0/en/alter-table.html
EDIT:
After looking at the mysql docs I pointed to, I see the docs specify this for MyISAM table, and is nit clear for other table types. Further solutions here : How to disable index in innodb
There is something I was taught and I strictly follow till today - Create as many temporary table you want while avoiding the usage of derived tables. Especially it in case of UPDATE/DELETE/INSERTs as
you cant predict the index on derived tables
The derived tables might not be held in memory if the resultset is big
The table(MyIsam)/rows(Innodb) may be locked for longer time as each time the derived query is running. I prefer a temp table which has primary key join with parent table.
And most importantly it makes you code look neat and readable.
My approach will be
CREATE table temp xxx(...)
INSERT INTO xxx select q from y inner join z....;
UPDATE _piece_detail INNER JOIN xxx on (...) SET ...;
Always reduce you downtime!!
Why aren't you using sub-queries for each join? Including the inner joins?
INNER JOIN (SELECT field1, field2, field 3 from _container_quantity order by 1,2,3)
ON _piece_detail.cqt_database_id = _container_quantity.cqt_database_id
AND _piece_detail.job_id = _container_quantity.job_id
INNER JOIN (SELECT field1, field2, field3 from _container_summary order by 1,2,3)
ON _container_quantity.container_id = _container_summary.container_id
AND _container_summary.job_id = _container_quantity.job_id
You're definitely pulling a lot into memory by not limiting your selects on those inner joins. By using the order by 1,2,3 at the end of each sub-query you create an index on each sub-query. Your only index is on headers and you aren't joining on _headers....
A couple suggestions to optimize this query. Either create the indexes you need on each table, or use the Sub-query join clauses to create manually the indexes you need on the fly.
Also remember that when you do a left join on a "temporary" table full of aggregates you are just asking for performance trouble.
Contains at least one matching _scan record on (zip, zip_4,
zip_delivery_point, serial_number)
Umm...this is your first point in what you want to do, but none of these fields are indexed?
From your explain results it seems that the subquery is going through all the rows twice then, how about you keep the MIN/MAX from the first one and use just one left join instead of two?
First off, I've looked at several other questions about optimizing sql queries, but I'm still unclear for my situation what is causing my problem. I read a few articles on the topic as well and have tried implementing a couple possible solutions, as I'll describe below, but nothing has yet worked or even made an appreciable dent in the problem.
The application is a nutrition tracking system - users enter the foods they eat and based on an imported USDA database the application breaks down the foods to the individual nutrients and gives the user a breakdown of the nutrient quantities on a (for now) daily basis.
here's
A PDF of the abbreviated database schema
and here it is as a (perhaps poor quality) JPG. I made this in open office - if there are suggestions for better ways to visualize a database, I'm open to suggestions on that front as well! The blue tables are directly from the USDA, and the green and black tables are ones I've made. I've omitted a lot of data in order to not clutter things up unnecessarily.
Here's the query I'm trying to run that takes a very long time:
SELECT listing.date_time,listing.nutrdesc,data.total_nutr_mass,listing.units
FROM
(SELECT nutrdesc, nutr_no, date_time, units
FROM meals, nutr_def
WHERE meals.users_userid = '2'
AND date_time BETWEEN '2009-8-12' AND '2009-9-12'
AND (nutr_no <100000
OR nutr_no IN
(SELECT nutr_def_nutr_no
FROM nutr_rights
WHERE nutr_rights.users_userid = '2'))
) as listing
LEFT JOIN
(SELECT nutrdesc, date_time, nut_data.nutr_no, sum(ingred_gram_mass*entry_qty_num*nutr_val/100) AS total_nutr_mass
FROM nut_data, recipe_ingredients, food_entries, meals, nutr_def
WHERE nut_data.nutr_no = nutr_def.nutr_no
AND ndb_no = ingred_ndb_no
AND foods_food_id = entry_ident
AND meals_meal_id = meal_id
AND users_userid = '2'
AND date_time BETWEEN '2009-8-12' AND '2009-9-12'
GROUP BY date_time,nut_data.nutr_no ) as data
ON data.date_time = listing.date_time
AND listing.nutr_no = data.nutr_no
ORDER BY listing.date_time,listing.nutrdesc,listing.units
So I know that's rather complex - The first select gets a listing of all the nutrients that the user consumed within the given date range, and the second fills in all the quantities.
When I implement them separately, the first query is really fast, but the second is slow and gets very slow when the date ranges get large. The join makes the whole thing ridiculously slow. I know that the 'main' problem is the join between these two derived tables, and I can get rid of that and do the join by hand basically in php much faster, but I'm not convinced that's the whole story.
For example: for 1 month of data, the query takes about 8 seconds, which is slow, but not completely terrible. Separately, each query takes ~.01 and ~2 seconds respectively. 2 seconds still seems high to me.
If I try to retrieve a year's worth of data, it takes several (>10) minutes to run the whole query, which is problematic - the client-server connection sometimes times out, and in any case we don't want I don't want to sit there with a spinning 'please wait' icon. Mainly, I feel like there's a problem because it takes more than 12x as long to retrieve 12x more information, when it should take less than 12x as long, if I were doing things right.
Here's the 'explain' for each of the slow queries: (the whole thing, and just the second half).
Whole thing:
+----+--------------------+--------------------+----------------+-------------------------------+------------------+---------+-----------------------------------------------------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+--------------------+----------------+-------------------------------+------------------+---------+-----------------------------------------------------------------------+------+----------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 5053 | Using temporary; Using filesort |
| 1 | PRIMARY | <derived4> | ALL | NULL | NULL | NULL | NULL | 4341 | |
| 4 | DERIVED | meals | range | PRIMARY,day_ind | day_ind | 9 | NULL | 30 | Using where; Using temporary; Using filesort |
| 4 | DERIVED | food_entries | ref | meals_meal_id | meals_meal_id | 5 | nutrition.meals.meal_id | 15 | Using where |
| 4 | DERIVED | recipe_ingredients | ref | foods_food_id,ingred_ndb_no | foods_food_id | 4 | nutrition.food_entries.entry_ident | 2 | |
| 4 | DERIVED | nutr_def | ALL | PRIMARY | NULL | NULL | NULL | 174 | |
| 4 | DERIVED | nut_data | ref | PRIMARY | PRIMARY | 36 | nutrition.nutr_def.nutr_no,nutrition.recipe_ingredients.ingred_ndb_no | 1 | |
| 2 | DERIVED | meals | range | day_ind | day_ind | 9 | NULL | 30 | Using where |
| 2 | DERIVED | nutr_def | ALL | PRIMARY | NULL | NULL | NULL | 174 | Using where |
| 3 | DEPENDENT SUBQUERY | nutr_rights | index_subquery | users_userid,nutr_def_nutr_no | nutr_def_nutr_no | 19 | func | 1 | Using index; Using where |
+----+--------------------+--------------------+----------------+-------------------------------+------------------+---------+-----------------------------------------------------------------------+------+----------------------------------------------+
10 rows in set (2.82 sec)
Second chunk (data):
+----+-------------+--------------------+-------+-----------------------------+---------------+---------+-----------------------------------------------------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------------+-------+-----------------------------+---------------+---------+-----------------------------------------------------------------------+------+----------------------------------------------+
| 1 | SIMPLE | meals | range | PRIMARY,day_ind | day_ind | 9 | NULL | 30 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | food_entries | ref | meals_meal_id | meals_meal_id | 5 | nutrition.meals.meal_id | 15 | Using where |
| 1 | SIMPLE | recipe_ingredients | ref | foods_food_id,ingred_ndb_no | foods_food_id | 4 | nutrition.food_entries.entry_ident | 2 | |
| 1 | SIMPLE | nutr_def | ALL | PRIMARY | NULL | NULL | NULL | 174 | |
| 1 | SIMPLE | nut_data | ref | PRIMARY | PRIMARY | 36 | nutrition.nutr_def.nutr_no,nutrition.recipe_ingredients.ingred_ndb_no | 1 | |
+----+-------------+--------------------+-------+-----------------------------+---------------+---------+-----------------------------------------------------------------------+------+----------------------------------------------+
5 rows in set (0.00 sec)
I've 'analyzed' all the tables involved in the query, and added an index on the datetime field that is joining meals and food entries. I called it 'day_ind'. I hoped that would accelerate things, but it didn't seem to make a difference. I also tried removing the 'sum' function, as I understand that having a function in the query will frequently mean a full table scan, which is obviously much slower. Unfortunately removing the 'sum' didn't seem to make a difference either (well, about 3-5% or so, but not the order magnitude that I'm looking for).
I would love any suggestions and will be happy to provide any more information you need to help diagnose and improve this problem. Thanks in advance!
There are a few type All in your explain suggest full table scan. and hence create temp table. You could re-index if it is not there already.
Sort and Group By are usually the performance killer, you can adjust Mysql memory settings to avoid physical i/o to tmp table if you have extra memory available.
Lastly, try to make sure the data type of the join attributes matches. Ie data.date_time = listing.date_time has same data format.
Hope that helps.
Okay, so I eventually figured out what I'm gonna end up doing. I couldn't make the 'data' query any faster - that's still the bottleneck. But now I've made it so the total query process is pretty close to linear, not exponential.
I split the query into two parts and made each one into a temporary table. Then I added an index for each of those temp tables and did the join separately afterwards. This made the total execution time for 1 month of data drop from 8 to 2 seconds, and for 1 year of data from ~10 minutes to ~30 seconds. Good enough for now, I think. I can work with that.
Thanks for the suggestions. Here's what I ended up doing:
create table listing (
SELECT nutrdesc, nutr_no, date_time, units
FROM meals, nutr_def
WHERE meals.users_userid = '2'
AND date_time BETWEEN '2009-8-12' AND '2009-9-12'
AND (
nutr_no <100000 OR nutr_no IN (
SELECT nutr_def_nutr_no
FROM nutr_rights
WHERE nutr_rights.users_userid = '2'
)
)
);
create table data (
SELECT nutrdesc, date_time, nut_data.nutr_no, sum(ingred_gram_mass*entry_qty_num*nutr_val/100) AS total_nutr_mass
FROM nut_data, recipe_ingredients, food_entries, meals, nutr_def
WHERE nut_data.nutr_no = nutr_def.nutr_no
AND ndb_no = ingred_ndb_no
AND foods_food_id = entry_ident
AND meals_meal_id = meal_id
AND users_userid = '2'
AND date_time BETWEEN '2009-8-12' AND '2009-9-12'
GROUP BY date_time,nut_data.nutr_no
);
create index joiner on data(nutr_no, date_time);
create index joiner on listing(nutr_no, date_time);
SELECT listing.date_time,listing.nutrdesc,data.total_nutr_mass,listing.units
FROM listing
LEFT JOIN data
ON data.date_time = listing.date_time
AND listing.nutr_no = data.nutr_no
ORDER BY listing.date_time,listing.nutrdesc,listing.units;