I am trying to optimize a stored procedure. I have identified some issues with it, but I don't know enough to actually correct the problems. One of the subqueries looks like this
select d.districtCode,
b.year Year,
if(bli.releaseAdjustment = 3 and bli.id not in (select billLineItem_id from AppliedDiscount),
bli.amount ,0) *-1 Refund
from Bill b
join BillLineItem bli on bli.bill_id = b.id
left join Bill_District d on d.bill_id = b.id
and bli.type = d.type
left join AppliedDiscount a on a.billLineItem_id = bli.id
where bli.releaseAdjustment in (1,2,3)
and bli.type in (4,6)
group by d.districtCode, b.year
The EXPLAIN outputs this
+----+------------------------+---------------+--------------+--------------------------------+-----------------+---------+---------------------+---------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+------------------------+---------------+--------------+--------------------------------+-----------------+---------+---------------------+---------+----------------------------------------------+
| 1 | PRIMARY | bli | ALL | FKF4236A,type,releaseAdjustment| NULL | NULL | NULL | 2787322 | Using where; Using temporary; Using filesort |
| 1 | PRIMARY | b | eq_ref | PRIMARY | PRIMARY | 8 | tax.bli.bill_id | 1 | |
| 1 | PRIMARY | d | ref | bill_id,type | bill_id | 8 | tax.bli.bill_id | 1 | |
| 1 | PRIMARY | a | ref | billLineItem_idx | billLineItem_idx| 8 | tax.bli.id | 1 | Using index |
| 1 | DEPENDENT SUBQUERY |AppliedDiscount|index_subquery| billLineItem_idx | billLineItem_idx| 8 | func | 1 | Using index |
+----+------------------------+---------------+--------------+--------------------------------+-----------------+---------+---------------------+---------+----------------------------------------------+
How would you suggest I fix this? This problem, or one very similar, is found throughout this stored procedure numerous times. AppliedDiscount only consists of 3 columns, all of which are indexed already.
Edit: Removing the group by changes the first row of the explain to
| 1 | PRIMARY | bli | ALL | FKF4236A,type,releaseAdjustment,bill_id| NULL | NULL | NULL | 2613847 | Using where |
That's better and technically answers my question, but that just means that I was asking the wrong question.
The 'type' is still ALL. What can I do to improve that?
I have the following query that runs really slow on mysql (83 seconds) but really fast on mariadb (.4 seconds).
I verified the data database has the same indexes and data. Maria Db server has less cpu (1VCPU), memory (2gb)
Mysql servers have 8 - 32GB ram and full quad core processors (tried 5.6,5.7, and 8.0 with similar results).
The phppos_inventory table has ~170000 rows and the phppos_items table has ~3000 rows
Here is the query and the tables and explains
SELECT /*+ SEMIJOIN(#subq MATERIALIZATION) */ SQL_CALC_FOUND_ROWS
1 AS _h,
`phppos_location_items`.`location_id` AS `location_id`,
`phppos_items`.`item_id`,
`phppos_items`.`name`,
`phppos_categories`.`id` AS `category_id`,
`phppos_categories`.`name` AS `category`,
`location`,
`company_name`,
`phppos_items`.`item_number`,
`size`,
`product_id`,
Coalesce(phppos_location_item_variations.cost_price,
phppos_item_variations.cost_price, phppos_location_items.cost_price,
phppos_items.cost_price, 0) AS cost_price,
Coalesce(phppos_location_item_variations.unit_price,
phppos_item_variations.unit_price, phppos_location_items.unit_price,
phppos_items.unit_price, 0) AS unit_price,
Sum(Coalesce(inv.trans_current_quantity, 0)) AS quantity,
Coalesce(phppos_location_item_variations.reorder_level,
phppos_item_variations.reorder_level, phppos_location_items.reorder_level,
phppos_items.reorder_level) AS reorder_level,
Coalesce(phppos_location_item_variations.replenish_level,
phppos_item_variations.replenish_level, phppos_location_items.replenish_level,
phppos_items.replenish_level) AS replenish_level,
description
FROM `phppos_inventory` `inv`
LEFT JOIN `phppos_items`
ON `phppos_items`.`item_id` = `inv`.`trans_items`
LEFT JOIN `phppos_location_items`
ON `phppos_location_items`.`item_id` = `phppos_items`.`item_id`
AND `phppos_location_items`.`location_id` = `inv`.`location_id`
LEFT JOIN `phppos_item_variations`
ON `phppos_items`.`item_id` = `phppos_item_variations`.`item_id`
AND `phppos_item_variations`.`id` = `inv`.`item_variation_id`
AND `phppos_item_variations`.`deleted` = 0
LEFT JOIN `phppos_location_item_variations`
ON `phppos_location_item_variations`.`item_variation_id` =
`phppos_item_variations`.`id`
AND `phppos_location_item_variations`.`location_id` =
`inv`.`location_id`
LEFT OUTER JOIN `phppos_suppliers`
ON `phppos_items`.`supplier_id` =
`phppos_suppliers`.`person_id`
LEFT OUTER JOIN `phppos_categories`
ON `phppos_items`.`category_id` = `phppos_categories`.`id`
WHERE inv.trans_id = (SELECT Max(inv1.trans_id)
FROM phppos_inventory inv1
WHERE inv1.trans_items = inv.trans_items
AND ( inv1.item_variation_id =
phppos_item_variations.id
OR phppos_item_variations.id IS NULL )
AND inv1.location_id = inv.location_id
AND inv1.trans_date < '2019-12-31 23:59:59')
AND inv.location_id IN( 1 )
AND `phppos_items`.`system_item` = 0
AND `phppos_items`.`deleted` = 0
AND `is_service` != 1
GROUP BY `phppos_items`.`item_id`
LIMIT 20
Explain mysql (slighly different than maria db but I tried use index to match the execution plan and still was slow)
+------------------------------------------+-------+----------+------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+---------------------------------+------------+--------+------------------------------+-------+----------+------------------------------------+
| 1 | PRIMARY | phppos_items | NULL | ref | PRIMARY,item_number,product_id,phppos_items_ibfk_1,deleted,phppos_items_ibfk_3,phppos_items_ibfk_4,phppos_items_ibfk_5,description,size,reorder_level,cost_price,unit_price,promo_price,last_modified,name,phppos_items_ibfk_6,deleted_system_item,custom_field_1_value,custom_field_2_value,custom_field_3_value,custom_field_4_value,custom_field_5_value,custom_field_6_value,custom_field_7_value,custom_field_8_value,custom_field_9_value,custom_field_10_value,verify_age,phppos_items_ibfk_7,item_inactive_index,tags,full_search,name_search,item_number_search,product_id_search,description_search,size_search,custom_field_1_value_search,custom_field_2_value_search,custom_field_3_value_search,custom_field_4_value_search,custom_field_5_value_search,custom_field_6_value_search,custom_field_7_value_search,custom_field_8_value_search,custom_field_9_value_search,custom_field_10_value_search | deleted | 4 | const | 21188 | 9.00 | Using index condition; Using where |
| 1 | PRIMARY | inv | NULL | ref | phppos_inventory_ibfk_1,location_id,phppos_inventory_custom | phppos_inventory_custom | 8 | pos.phppos_items.item_id,const | 3 | 100.00 | NULL |
| 1 | PRIMARY | phppos_location_items | NULL | eq_ref | PRIMARY,phppos_location_items_ibfk_2 | PRIMARY | 8 | const,pos.phppos_items.item_id | 1 | 100.00 | NULL |
| 1 | PRIMARY | phppos_item_variations | NULL | eq_ref | PRIMARY,phppos_item_variations_ibfk_1 | PRIMARY | 4 | pos.inv.item_variation_id | 1 | 100.00 | Using where |
| 1 | PRIMARY | phppos_location_item_variations | NULL | eq_ref | PRIMARY,phppos_item_attribute_location_values_ibfk_2 | PRIMARY | 8 | pos.phppos_item_variations.id,const | 1 | 100.00 | NULL |
| 1 | PRIMARY | phppos_suppliers | NULL | ref | person_id | person_id | 4 | pos.phppos_items.supplier_id | 1 | 100.00 | NULL |
| 1 | PRIMARY | phppos_categories | NULL | eq_ref | PRIMARY | PRIMARY | 4 | pos.phppos_items.category_id | 1 | 100.00 | NULL |
| 2 | DEPENDENT SUBQUERY | inv1 | NULL | ref | phppos_inventory_ibfk_1,location_id,trans_date,phppos_inventory_ibfk_4,phppos_inventory_custom | phppos_inventory_custom | 8 | pos.inv.trans_items,pos.inv.location_id | 3 | 50.00 | Using where; Using index |
+----+--------------------+---------------------------------+------------+--------+---------------------------------------------------------------------------------------------------------
Explain maria db:
+------+---------------------------------------------+-------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+--------------------+---------------------------------+--------+------------------------------+
| 1 | PRIMARY | phppos_items | ref | PRIMARY,deleted,deleted_system_item | deleted | 4 | const | 23955 | Using where |
| 1 | PRIMARY | inv | ref | phppos_inventory_ibfk_1,location_id,phppos_inventory_custom | phppos_inventory_ibfk_1 | 4 | freelance_pos5.phppos_items.item_id | 2 | Using where |
| 1 | PRIMARY | phppos_location_items | eq_ref | PRIMARY,phppos_location_items_ibfk_2 | PRIMARY | 8 | const,freelance_pos5.phppos_items.item_id | 1 | |
| 1 | PRIMARY | phppos_item_variations | eq_ref | PRIMARY,phppos_item_variations_ibfk_1 | PRIMARY | 4 | freelance_pos5.inv.item_variation_id | 1 | Using where |
| 1 | PRIMARY | phppos_location_item_variations | eq_ref | PRIMARY,phppos_item_attribute_location_values_ibfk_2 | PRIMARY | 8 | freelance_pos5.phppos_item_variations.id,const | 1 | Using where |
| 1 | PRIMARY | phppos_suppliers | ref | person_id | person_id | 4 | freelance_pos5.phppos_items.supplier_id | 1 | Using where |
| 1 | PRIMARY | phppos_categories | eq_ref | PRIMARY | PRIMARY | 4 | freelance_pos5.phppos_items.category_id | 1 | Using where |
| 2 | DEPENDENT SUBQUERY | inv1 | ref | phppos_inventory_ibfk_1,location_id,trans_date,phppos_inventory_ibfk_4,phppos_inventory_custom | phppos_inventory_custom | 8 | freelance_pos5.inv.trans_items,freelance_pos5.inv.location_id | 2 | Using where; Using index |
+------+--------------------+---------------------------------+--------+------------------------------------------------------------------------------------------------+-------------------------+---------+---------------------------------------------------------------+-------+--------------------------+
Tables described (Reached StackOverflow char limit)
https://pastebin.com/nhngSHb8
Create tables:
https://pastebin.com/aWMeriqt
MYSQL (DEV BOX)
mysql> SHOW GLOBAL STATUS LIKE '%thread%';
+------------------------------------------+-------+
| Variable_name | Value |
+------------------------------------------+-------+
| Delayed_insert_threads | 0 |
| Performance_schema_thread_classes_lost | 0 |
| Performance_schema_thread_instances_lost | 0 |
| Slow_launch_threads | 0 |
| Threads_cached | 4 |
| Threads_connected | 1 |
| Threads_created | 5 |
| Threads_running | 1 |
+------------------------------------------+-------+
8 rows in set (0.06 sec)
MARIA DB
MariaDB [freelance_pos5]> SHOW GLOBAL STATUS LIKE '%thread%';
+------------------------------------------+-------+
| Variable_name | Value |
+------------------------------------------+-------+
| Delayed_insert_threads | 0 |
| Performance_schema_thread_classes_lost | 0 |
| Performance_schema_thread_instances_lost | 0 |
| Slow_launch_threads | 0 |
| Threadpool_idle_threads | 0 |
| Threadpool_threads | 0 |
| Threads_cached | 3 |
| Threads_connected | 2 |
| Threads_created | 5 |
| Threads_running | 1 |
| wsrep_applier_thread_count | 0 |
| wsrep_rollbacker_thread_count | 0 |
| wsrep_thread_count | 0 |
+------------------------------------------+-------+
13 rows in set (0.00 sec)
Moving the
WHERE inv.trans_id = (SELECT Max(inv1.trans_id)
into the INNER JOIN is the game changer.
INNER JOIN (
SELECT inv1.trans_items, inv1.item_variation_id, inv1.location_id, MAX(inv1.trans_id) as trans_id
FROM phppos_inventory inv1
WHERE inv1.trans_date < '2019-12-31 23:59:59'
GROUP BY inv1.trans_items, inv1.item_variation_id, inv1.location_id
ORDER BY inv1.trans_items, inv1.item_variation_id, inv1.location_id
) inv1 on inv1.trans_id = inv.trans_id
AND inv1.trans_items = inv.trans_items
AND (inv1.item_variation_id = phppos_item_variations.id OR phppos_item_variations.id IS NULL)
AND inv1.location_id = inv.location_id
The execution is reduced from 80+s down to ~ <0.4s, on MySQL 8.0.
MariaDB's and MySQL's Optimizers started diverging significantly at 5.6. Certain queries will run queries faster in one than the other.
I think I see a way to speed up the query, perhaps on both versions.
Don't use LEFT JOIN when it is the same as JOIN, which seems to be the case for at least phppos_items, which has items in the WHERE that override LEFT.
Please provide SHOW CREATE TABLE; meanwhile, I will guess that what indexes you have/don't have, and that each table has PRIMARY KEY(id)
Use composite indexes where appropriate. (More below.)
Get the 20 rows before JOINing to the rest of the tables:
SELECT ...
FROM ( SELECT inv.id, pi.id
FROM `phppos_inventory` AS inv `inv`
JOIN `phppos_items` AS pi
ON pi.`item_id` = `inv`.`trans_items`
AND inv.location_id IN( 1 )
AND pi.`system_item` = 0
AND pi.`deleted` = 0
AND `is_service` != 1 -- Which table is this in???
GROUP BY pi.`item_id`
LIMIT 20 )
LEFT JOIN .... (( all the other tables ))
-- no GROUP BY or LIMIT needed (I think)
phppos_items: INDEX(item_id, deleted, system_item, is_service)
phppos_items: INDEX(deleted, system_item, is_service)
phppos_inventory: INDEX(trans_items, location_id, location_id, item_variation_id, trans_date, trans_id)
phppos_inventory: INDEX(location_id)
Aside with the fact that the query is misleading since the outer join is discarded, the main difference is that the second engine operation in MariabDB is an index range scan (ref) using the phppos_inventory_custom index. MySQL also chose an index range scan but over phppos_inventory_ibfk_1.
However, without the definition of these two indexes it's difficult to asses why the engines may have chosen a different path.
Please add to your question the definition of these indexes, and alse their selectivity (percent of estimated rows selected / total table rows) to elaborate more.
This is my mysql query:
SELECT DISTINCT a.lineid
FROM (SELECT DISTINCT tmd.lineid, a.linename
FROM tagmodeldata tmd
INNER JOIN
tagline a
ON a.documentid = tmd.documentid AND tmd.tagvalue = 3
WHERE tmd.documentid = 926980) a
INNER JOIN
(SELECT DISTINCT tmd.lineid, b.linename
FROM tagmodeldata tmd
INNER JOIN
tagline b
ON b.documentid = tmd.documentid AND tmd.tagvalue IN (0 , 1)
WHERE tmd.documentid = 926980) b
ON b.linename = a.linename;
it is taking ~160s to run which is too slow for me. the basic idea is to retrieve those lineids where linename with tagvalue is 3, matches the linename with tagvalue 0 or 1.
+--+----+-------------+------------+------+---------------------------+----------------+---------+------+-------+--------------------------------+
| | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+--+----+-------------+------------+------+---------------------------+----------------+---------+------+-------+--------------------------------+
| | 1 | PRIMARY | <derived3> | ALL | NULL | NULL | NULL | NULL | 14760 | Using temporary |
| | 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 72160 | Using where; Using join buffer |
| | 3 | DERIVED | b | ref | documentid | documentid | 5 | | 593 | Using where; Using temporary |
| | 3 | DERIVED | tmd | ref | documentid,document_index | document_index | 4 | | 66784 | Using where |
| | 2 | DERIVED | a | ref | documentid | documentid | 5 | | 593 | Using where; Using temporary |
| | 2 | DERIVED | tmd | ref | documentid,document_index | document_index | 4 | | 66784 | Using where |
+--+----+-------------+------------+------+---------------------------+----------------+---------+------+-------+--------------------------------+
You seem to want lines for a particular document that have both 3 and either 0 or 1. If so, you can just use conditional aggregation. The resulting query is something like this:
SELECT tmd.lineid
FROM tagmodeldata tmd INNER JOIN
tagline a
ON a.documentid = tmd.documentid AND tmd.tagvalue IN (0, 1, 3)
WHERE tmd.documentid = 926980
GROUP BY tmd.lineid
HAVING SUM(tmd.tagvalue = 3) > 0 AND
SUM(tmd.tagvalue IN (0, 1)) > 0;
It is unclear what the relationship is between tagline.linename and tagline.lineid. The above assumes that they are the same.
I have been having issues with MySQL (version 5.5) left join performance on a number of queries. In all cases I have been able to work around the issue by restructuring the queries with unions and subselects (I saw some examples of this in the book High Performance MySQL). The problem is this this leads to very messy queries.
Below is an example of two queries that produce the exact same results. The first query is roughly two orders of magnitude slower than the second. The second query is much less readable than the first.
As far as I can tell these sorts of queries are not performing poorly because of bad indexing. In all cases when I restructure the query it runs just fine. I have also tried carefully looking at the indexes and using hints to no avail.
Has anyone else run into similar issues with MySQL? Are there any server parameters I should try tweaking? Has anyone found a cleaner way to work around this sort of issue?
Query 1
select
i.id,
sum(vp.measurement * pol.quantity_ordered) measurement_on_order
from items i
left join (vendor_products vp, purchase_order_lines pol, purchase_orders po) on
vp.item_id = i.id and
pol.vendor_product_id = vp.id and
pol.purchase_order_id = po.id and
po.received_at is null and
po.closed_at is null
group by i.id
explain:
+----+-------------+-------+--------+-------------------------------+-------------------+---------+-------------------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+-------------------------------+-------------------+---------+-------------------------------------+------+-------------+
| 1 | SIMPLE | i | index | NULL | PRIMARY | 4 | NULL | 241 | Using index |
| 1 | SIMPLE | po | ref | PRIMARY,received_at,closed_at | received_at | 9 | const | 2 | |
| 1 | SIMPLE | pol | ref | purchase_order_id | purchase_order_id | 4 | nutkernel_dev.po.id | 7 | |
| 1 | SIMPLE | vp | eq_ref | PRIMARY,item_id | PRIMARY | 4 | nutkernel_dev.pol.vendor_product_id | 1 | |
+----+-------------+-------+--------+-------------------------------+-------------------+---------+-------------------------------------+------+-------------+
Query 2
select
i.id,
sum(on_order.measurement_on_order) measurement_on_order
from (
(
select
i.id item_id,
sum(vp.measurement * pol.quantity_ordered) measurement_on_order
from purchase_orders po
join purchase_order_lines pol on pol.purchase_order_id = po.id
join vendor_products vp on pol.vendor_product_id = vp.id
join items i on vp.item_id = i.id
where
po.received_at is null and po.closed_at is null
group by i.id
)
union all
(select id, 0 from items)
) on_order
join items i on on_order.item_id = i.id
group by i.id
explain:
+------+--------------+------------+--------+-------------------------------+--------------------------------+---------+-------------------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+--------------+------------+--------+-------------------------------+--------------------------------+---------+-------------------------------------+------+----------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 3793 | Using temporary; Using filesort |
| 1 | PRIMARY | i | eq_ref | PRIMARY | PRIMARY | 4 | on_order.item_id | 1 | Using index |
| 2 | DERIVED | po | ALL | PRIMARY,received_at,closed_at | NULL | NULL | NULL | 20 | Using where; Using temporary; Using filesort |
| 2 | DERIVED | pol | ref | purchase_order_id | purchase_order_id | 4 | nutkernel_dev.po.id | 7 | |
| 2 | DERIVED | vp | eq_ref | PRIMARY,item_id | PRIMARY | 4 | nutkernel_dev.pol.vendor_product_id | 1 | |
| 2 | DERIVED | i | eq_ref | PRIMARY | PRIMARY | 4 | nutkernel_dev.vp.item_id | 1 | Using index |
| 3 | UNION | items | index | NULL | index_new_items_on_external_id | 257 | NULL | 3380 | Using index |
| NULL | UNION RESULT | <union2,3> | ALL | NULL | NULL | NULL | NULL | NULL | |
+------+--------------+------------+--------+-------------------------------+--------------------------------+---------+-------------------------------------+------+----------------------------------------------+
I have a query that is taking way too long to execute (4 seconds) even though all the fields i am querying against are indexed. Below are the query and the explain results. Any ideas what the problem is? (mysql CPU usage shoots up to 100% when executing the query
EXPLAIN SELECT count(hd.did) as NumPo, `hd`.`sid`, `src`.`Name`
FROM (`hd`)
JOIN `result` ON `result`.`did` = `hd`.`did`
JOIN `sf` ON `sf`.`fid` = `hd`.`fid`
JOIN `src` ON `src`.`sid` = `hd`.`sid`
WHERE `sf`.`tid` = 2
AND `result`.`set` = 'xxxxxxx'
GROUP BY `hd`.`sid`
ORDER BY `NumPo` DESC
LIMIT 10;
+----+-------------+--------------+--------+-------------------------+---------+---------+--------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+--------+-------------------------+---------+---------+--------------------------+------+----------------------------------------------+
| 1 | SIMPLE | sf | ref | PRIMARY,type | type | 2 | const | 4 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | hd | ref | PRIMARY,sid,fid | FeedID | 4 | f2.sf.fid | 3 | |
| 1 | SIMPLE | result | ALL | resultset | NULL | NULL | NULL | 5322 | Using where; Using join buffer |
| 1 | SIMPLE | src | eq_ref | PRIMARY | PRIMARY | 4 | f2.hd.sid | 1 | |
+----+-------------+--------------+--------+-------------------------+---------+---------+--------------------------+------+----------------------------------------------+
| 1 | SIMPLE | result | ALL | resultset | NULL | NULL | NULL | 5322 | Using where; Using join buffer |
It looks like it's not using an index on the biggest table. I'm having trouble guessing what this query is supposed to do, but it looks like you have an index on result.set, so I'd try adding one to result.did and see if it helps.