TPCH Query Optimization - mysql

The following query is taking 5 hours so far to run:
INSERT $LINEITEM_PUBLIC SELECT *
FROM LINEITEM
WHERE L_PARTKEY IN ( SELECT P_PARTKEY FROM $PART_PUBLIC )
AND L_SUPPKEY IN ( SELECT S_SUPPKEY FROM $SUPPLIER_PUBLIC )
AND L_ORDERKEY IN ( SELECT O_ORDERKEY FROM $ORDERS_PUBLIC );
I added all required indexes but nothing seems to be helping. The Query Explain Plan prints the following:
+----+-------------+------------------+------------+--------+--------------------------------+-------------+---------+--------------------------------+----------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------------+------------+--------+--------------------------------+-------------+---------+--------------------------------+----------+----------+-------------+
| 1 | INSERT | $LINEITEM_PUBLIC | NULL | ALL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| 1 | SIMPLE | $ORDERS_PUBLIC | NULL | index | PRIMARY | O_ORDERDATE | 3 | NULL | 12826617 | 100.00 | Using index |
| 1 | SIMPLE | LINEITEM | NULL | ref | PRIMARY,LINEITEM_FK2,L_SUPPKEY | PRIMARY | 4 | TPCH.$ORDERS_PUBLIC.O_ORDERKEY | 3 | 100.00 | NULL |
| 1 | SIMPLE | $SUPPLIER_PUBLIC | NULL | eq_ref | PRIMARY | PRIMARY | 4 | TPCH.LINEITEM.L_SUPPKEY | 1 | 100.00 | Using index |
| 1 | SIMPLE | $PART_PUBLIC | NULL | eq_ref | PRIMARY | PRIMARY | 4 | TPCH.LINEITEM.L_PARTKEY | 1 | 100.00 | Using index |
+----+-------------+------------------+------------+--------+--------------------------------+-------------+---------+--------------------------------+----------+----------+-------------+
Any recommendations on how this query can be optimized?
Update:
The size of the tables in the previous query is as follows:
LINEITEM: 60M records
$ORDERS_PUBLIC: 13M records
$SUPPLIER_PUBLIC: 92K records
$PART_PUBLIC: 2M records

Make sure there is an index starting with O_ORDERKEY.
IN (SELECT ...) may be optimized poorly (depending on version); try this:
INSERT $LINEITEM_PUBLIC
SELECT l.*
FROM LINEITEM AS l
WHERE EXISTS( SELECT * FROM $PART_PUBLIC WHERE P_PARTKEY = L_PARTKEY )
AND EXISTS( SELECT * FROM $SUPPLIER_PUBLIC WHERE S_SUPPKEY = L_SUPPKEY )
AND EXISTS( SELECT * FROM $ORDERS_PUBLIC WHERE O_ORDERKEY = L_ORDERKEY );

Related

MySQL: Out of sort memory, consider increasing server sort buffer size

I cant find explanation of MySQL behavior. Examples are below.
Fields name and name_full are text type and field price_steps is json type. None of them has index.
SELECT
name,
name_full,
price_steps
FROM
`lots`
WHERE
EXISTS (
SELECT
*
FROM
`categories`
INNER JOIN `category_lot` ON `categories`.`id` = `category_lot`.`category_id`
WHERE
`lots`.`id` = `category_lot`.`lot_id`
AND `category_id` IN (25)
)
ORDER BY
`created_at` DESC
LIMIT 31 OFFSET 0
MySQL throws error: [Err] 1038 - Out of sort memory, consider increasing server sort buffer size.
Ok, let it be.
Then I added extra field into select part.
SUBSTRING(`name_full`, 1, 200000000000) as name_full2
and query runs successfully (Why? Extra field should lead to extra memory allocation, isnt it?).
Then I decided to make query heavier and replace string
AND `category_id` IN (25)
with
AND `category_id` IN (1,2,3,4,5,6,7,8,9,10, 25)
and the query also finishes successfully.
Count of rows with category = 25 is only about 250, but with categories in (1,2,3,4,5,6,7,8,9,10, 25) is about 40000 rows. This must lead to extra memory demands, but mysql doesnt throw error. Why?
Any explanation to this paradox? Thanks in advance!
UPDATE1
EXPLAIN with failing query
mysql> EXPLAIN SELECT name, name_full, price_steps FROM `lots` WHERE EXISTS ( SELECT * FROM `categories` INNER JOIN `category_lot` ON `categories`.`id` = `category_lot`.`category_id` WHERE `lots`.`id` = `category_lot`.`lot_id` AND `category_id` IN(25) ) ORDER BY `created_at` DESC;
+----+-------------+--------------+------------+--------+--------------------------------------------------------------+----------------------------------+---------+---------------------------+------+----------+----------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------------+------------+--------+--------------------------------------------------------------+----------------------------------+---------+---------------------------+------+----------+----------------------------------------------+
| 1 | SIMPLE | categories | NULL | const | PRIMARY | PRIMARY | 8 | const | 1 | 100.00 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | category_lot | NULL | ref | category_lot_lot_id_foreign,category_lot_category_id_foreign | category_lot_category_id_foreign | 8 | const | 1099 | 100.00 | Start temporary |
| 1 | SIMPLE | lots | NULL | eq_ref | PRIMARY | PRIMARY | 8 | torgs.category_lot.lot_id | 1 | 100.00 | End temporary |
+----+-------------+--------------+------------+--------+--------------------------------------------------------------+----------------------------------+---------+---------------------------+------+----------+----------------------------------------------+
3 rows in set, 2 warnings (0.00 sec)
EXPLAIN with success query (added 4th field)
mysql> EXPLAIN SELECT name, name_full, price_steps,SUBSTRING(`name_full`, 1, 200000000000) as name_full2 FROM `lots` WHERE EXISTS ( SELECT * FROM `categories` INNER JOIN `category_lot` ON `categories`.`id` = `category_lot`.`category_id` WHERE `lots`.`id` = `category_lot`.`lot_id` AND `category_id` IN(25) ) ORDER BY `created_at` DESC;
+----+-------------+--------------+------------+--------+--------------------------------------------------------------+----------------------------------+---------+---------------------------+------+----------+----------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------------+------------+--------+--------------------------------------------------------------+----------------------------------+---------+---------------------------+------+----------+----------------------------------------------+
| 1 | SIMPLE | categories | NULL | const | PRIMARY | PRIMARY | 8 | const | 1 | 100.00 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | category_lot | NULL | ref | category_lot_lot_id_foreign,category_lot_category_id_foreign | category_lot_category_id_foreign | 8 | const | 1099 | 100.00 | Start temporary |
| 1 | SIMPLE | lots | NULL | eq_ref | PRIMARY | PRIMARY | 8 | torgs.category_lot.lot_id | 1 | 100.00 | End temporary |
+----+-------------+--------------+------------+--------+--------------------------------------------------------------+----------------------------------+---------+---------------------------+------+----------+----------------------------------------------+
3 rows in set, 2 warnings (0.00 sec)
EXPLAIN with success query (added 4th field and category_id in (1,2,3,4,5,6,7,8,9,10,25))
mysql> EXPLAIN SELECT name, name_full, price_steps, SUBSTRING(`name_full`, 1, 200000000000) as name_full2 FROM `lots` WHERE EXISTS ( SELECT * FROM `categories` INNER JOIN `category_lot` ON `categories`.`id` = `category_lot`.`category_id` WHERE `lots`.`id` = `category_lot`.`lot_id` AND `category_id` IN(1,2,3,4,5,6,7,8,9,10,25) ) ORDER BY `created_at` DESC;
+----+--------------+--------------+------------+--------+--------------------------------------------------------------+----------------------------------+---------+---------------------+------+----------+---------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------+--------------+------------+--------+--------------------------------------------------------------+----------------------------------+---------+---------------------+------+----------+---------------------------------+
| 1 | SIMPLE | <subquery2> | NULL | ALL | NULL | NULL | NULL | NULL | NULL | 100.00 | Using temporary; Using filesort |
| 1 | SIMPLE | lots | NULL | eq_ref | PRIMARY | PRIMARY | 8 | <subquery2>.lot_id | 1 | 100.00 | NULL |
| 2 | MATERIALIZED | categories | NULL | range | PRIMARY | PRIMARY | 8 | NULL | 11 | 100.00 | Using where; Using index |
| 2 | MATERIALIZED | category_lot | NULL | ref | category_lot_lot_id_foreign,category_lot_category_id_foreign | category_lot_category_id_foreign | 8 | torgs.categories.id | 1883 | 100.00 | NULL |
+----+--------------+--------------+------------+--------+--------------------------------------------------------------+----------------------------------+---------+---------------------+------+----------+---------------------------------+
4 rows in set, 2 warnings (0.00 sec)
UPDATE2
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1276 | Field or reference 'torgs.lots.id' of SELECT #2 was resolved in SELECT #1 |
| Note | 1003 | /* select#1 */ select `torgs`.`lots`.`name` AS `name`,`torgs`.`lots`.`name_full` AS `name_full`,`torgs`.`lots`.`price_steps` AS `price_steps`,substr(`torgs`.`lots`.`name_full`,1,200000000000) AS `name_full2` from `torgs`.`lots` semi join (`torgs`.`categories` join `torgs`.`category_lot`) where ((`torgs`.`category_lot`.`category_id` = `torgs`.`categories`.`id`) and (`torgs`.`lots`.`id` = `<subquery2>`.`lot_id`) and (`torgs`.`categories`.`id` in (1,2,3,4,5,6,7,8,9,10,25))) order by `torgs`.`lots`.`created_at` desc |
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)

Optimizing join on derived table - EXPLAIN different on local and server

I have the following ugly query, which runs okay but not great, on my local machine (1.4 secs, running v5.7). On the server I'm using, which is running an older version of MySQL (v5.5), the query just hangs. It seems to get caught on "Copying to tmp table":
SELECT
SQL_CALC_FOUND_ROWS
DISTINCT p.parcel_number,
p.street_number,
p.street_name,
p.site_address_city_state,
p.number_of_units,
p.number_of_stories,
p.bedrooms,
p.bathrooms,
p.lot_area_sqft,
p.cost_per_sq_ft,
p.year_built,
p.sales_date,
p.sales_price,
p.id
FROM (
SELECT APN, property_case_detail_id FROM property_inspection AS pi
GROUP BY APN, property_case_detail_id
HAVING
COUNT(IF(status='Resolved Date', 1, NULL)) = 0
) as open_cases
JOIN property AS p
ON p.parcel_number = open_cases.APN
LIMIT 0, 1000;
mysql> show processlist;
+-------+-------------+-----------+--------------+---------+------+----------------------+------------------------------------------------------------------------------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+-------+-------------+-----------+--------------+---------+------+----------------------+------------------------------------------------------------------------------------------------------+
| 21120 | headsupcity | localhost | lead_housing | Query | 21 | Copying to tmp table | SELECT
SQL_CALC_FOUND_ROWS
DISTINCT p.parcel_number,
p.street_numbe |
| 21121 | headsupcity | localhost | lead_housing | Query | 0 | NULL | show processlist |
+-------+-------------+-----------+--------------+---------+------+----------------------+------------------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)
Explains are different on my local machine and on the server, and I'm assuming the only reason my query runs at all on my local machine, is because of the key that is automatically created on the derived table:
Explain (local):
+----+-------------+------------+------------+------+---------------+-------------+---------+------------------------------+---------+----------+---------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+------+---------------+-------------+---------+------------------------------+---------+----------+---------------------------------+
| 1 | PRIMARY | p | NULL | ALL | NULL | NULL | NULL | NULL | 40319 | 100.00 | Using temporary |
| 1 | PRIMARY | <derived2> | NULL | ref | <auto_key0> | <auto_key0> | 8 | lead_housing.p.parcel_number | 40 | 100.00 | NULL |
| 2 | DERIVED | pi | NULL | ALL | NULL | NULL | NULL | NULL | 1623978 | 100.00 | Using temporary; Using filesort |
+----+-------------+------------+------------+------+---------------+-------------+---------+------------------------------+---------+----------+---------------------------------+
Explain (server):
+----+-------------+------------+------+---------------+------+---------+------+---------+------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+---------------+------+---------+------+---------+------------------------------------------+
| 1 | PRIMARY | p | ALL | NULL | NULL | NULL | NULL | 41369 | Using temporary |
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 122948 | Using where; Distinct; Using join buffer |
| 2 | DERIVED | pi | ALL | NULL | NULL | NULL | NULL | 1718586 | Using temporary; Using filesort |
+----+-------------+------------+------+---------------+------+---------+------+---------+------------------------------------------+
Schemas:
mysql> explain property_inspection;
+-------------------------+--------------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------------+--------------+------+-----+-------------------+-----------------------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| lblCaseNo | int(11) | NO | MUL | NULL | |
| APN | bigint(10) | NO | MUL | NULL | |
| date | varchar(50) | NO | | NULL | |
| status | varchar(500) | NO | | NULL | |
| property_case_detail_id | int(11) | YES | MUL | NULL | |
| case_type_id | int(11) | YES | MUL | NULL | |
| date_modified | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
| update_status | tinyint(1) | YES | | 1 | |
| created_date | datetime | NO | | NULL | |
+-------------------------+--------------+------+-----+-------------------+-----------------------------+
10 rows in set (0.02 sec)
mysql> explain property; (not all columns, but you get the gist)
+----------------------------+--------------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+----------------------------+--------------+------+-----+-------------------+-----------------------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| parcel_number | bigint(10) | NO | | 0 | |
| date_modified | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
| created_date | datetime | NO | | NULL | |
+----------------------------+--------------+------+-----+-------------------+-----------------------------+
Variables that might be relevant:
tmp_table_size: 16777216
innodb_buffer_pool_size: 8589934592
Any ideas on how to optimize this, and any idea why the explains are so different?
Since this is where the Optimizers are quite different, let's try to optimize
SELECT APN, property_case_detail_id FROM property_inspection AS pi
GROUP BY APN, property_case_detail_id
HAVING
COUNT(IF(status='Resolved Date', 1, NULL)) = 0
) as open_cases
Give this a try:
SELECT ...
FROM property AS p
WHERE NOT EXISTS ( SELECT 1 FROM property_inspection
WHERE status = 'Resolved Date'
AND p.parcel_number = APN )
ORDER BY ??? -- without this, the `LIMIT` is unpredictable
LIMIT 0, 1000;
or...
SELECT ...
FROM property AS p
LEFT JOIN property_inspection AS pi ON p.parcel_number = pi.APN
WHERE pi.status = 'Resolved Date'
AND pi.APN IS NULL
ORDER BY ??? -- without this, the `LIMIT` is unpredictable
LIMIT 0, 1000;
Index:
property_inspection: INDEX(status, parcel_number) -- in either order
MySQL 5.5 and 5.7 are quite different and the later has better optimizer so there is no surprise that explain plans are different.
You'd better provide SHOW CREATE TABLE property; and SHOW CREATE TABLE property_inspection; outputs as it will show indexes that are on your tables.
Your sub-query is the issue.
- Server tries to process 1.6M rows with no index and grouping everything.
- Having is quite expensive operation so you'd better avoid it, expecially in sub-queries.
- Grouping in this case is bad idea. You do not need the aggregation/counting. You need to check if the 'Resolved Date' status is just exists
Based on the information provided I'd recommend:
- Alter table property_inspection to reduce length of status column.
- Add index on the column. Use covering index (APN, property_case_detail_id, status) if possible (in this columns order).
- Change query to something like this:
SELECT
SQL_CALC_FOUND_ROWS
DISTINCT p.parcel_number,
...
p.id
FROM
property_inspection AS `pi1`
INNER JOIN property AS p ON (
p.parcel_number = `pi1`.APN
)
LEFT JOIN (
SELECT
`pi2`.property_case_detail_id
, `pi2`. APN
FROM
property_inspection AS `pi2`
WHERE
`status` = 'Resolved Date'
) AS exclude ON (
exclude.APN = `pi1`.APN
AND exclude.property_case_detail_id = `pi1`.property_case_detail_id
)
WHERE
exclude.APN IS NULL
LIMIT
0, 1000;

How to create an index on a CONCAT("string" ,column) in mysql?

I have a table where id is primary key.
CREATE TABLE t1 (
id INT NOT NULL AUTO_INCREMENT,
col1 VARCHAR(45) NULL,
PRIMARY KEY (id));
I have another table t2 which is joining table t1 as
t2 LEFT JOIN t1 ON CONCAT("USER_", t1.id) = t2.user_id
I want to create an index which has CONCAT("USER_", t1.id) values indexed in any order.
I tried
ALTER TABLE t1 ADD INDEX ((CONCAT('user_',id) DESC);
but it is giving error.
I have followed official documentation of mysql.
Note : I do not want to create a new CONCAT("user_", id) column.
https://dev.mysql.com/doc/refman/8.0/en/create-index.html#create-index-column-prefixes
InnoDB supports secondary indexes on virtual generated columns.
https://dev.mysql.com/doc/refman/5.7/en/create-table-secondary-indexes.html
In 5.7(onward) you can use a generated column, then index that column. e.g.
Here is an example of taking the integer out of the string to create an efficient join:
CREATE TABLE myusers (
id mediumint(8) unsigned NOT NULL auto_increment
, name varchar(255) default NULL,
PRIMARY KEY (`id`)
) AUTO_INCREMENT=1
;
INSERT INTO myusers (`name`) VALUES ('Imelda'),('Hamish'),('Brandon'),('Amity'),('Jillian'),('Lionel'),('Faith'),('Dai'),('Reed'),('Molly');
CREATE TABLE mytable (
id mediumint(8) unsigned NOT NULL auto_increment
, user_id VARCHAR(20)
, ex_user_id integer GENERATED ALWAYS AS (0+substring(user_id,6,20))
, password varchar(255)
, PRIMARY KEY (`id`)
, INDEX idx_ex_user_id (ex_user_id)
) AUTO_INCREMENT=1
;
INSERT INTO mytable (`user_id`,`password`) VALUES
('user_1','PYX68BIC9RD')
,('user_2','LPY07EIN0UA')
,('user_3','UGC24TKI3JL')
,('user_4','YQU18ALB8YA')
,('user_5','DEL56AGR6AD')
,('user_6','YQN87UOB0PO')
,('user_7','CPC15JFU6MC')
,('user_8','MWC40ZWD2EE')
,('user_9','HEB34QQH0UM')
,('user_10','GVP36PLP5PW')
;
select
*
from myusers
inner join mytable on myusers.id = mytable.ex_user_id
;
id | name | id | user_id | ex_user_id | password
-: | :------ | -: | :------ | ---------: | :----------
1 | Imelda | 1 | user_1 | 1 | PYX68BIC9RD
2 | Hamish | 2 | user_2 | 2 | LPY07EIN0UA
3 | Brandon | 3 | user_3 | 3 | UGC24TKI3JL
4 | Amity | 4 | user_4 | 4 | YQU18ALB8YA
5 | Jillian | 5 | user_5 | 5 | DEL56AGR6AD
6 | Lionel | 6 | user_6 | 6 | YQN87UOB0PO
7 | Faith | 7 | user_7 | 7 | CPC15JFU6MC
8 | Dai | 8 | user_8 | 8 | MWC40ZWD2EE
9 | Reed | 9 | user_9 | 9 | HEB34QQH0UM
10 | Molly | 10 | user_10 | 10 | GVP36PLP5PW
explain select
*
from myusers
inner join mytable on myusers.id = mytable.ex_user_id
;
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra
-: | :---------- | :------ | :--------- | :--- | :------------- | :------------- | :------ | :------------------------------------- | ---: | -------: | :----------
1 | SIMPLE | myusers | null | ALL | PRIMARY | null | null | null | 10 | 100.00 | null
1 | SIMPLE | mytable | null | ref | idx_ex_user_id | idx_ex_user_id | 5 | fiddle_HNTHMETRTFAHHKBIGWZM.myusers.id | 1 | 100.00 | Using where
db<>fiddle here
note the conversion of user_id from string to integer is "implicit":
To cast a string to a number, you normally need do nothing other than use the string value in numeric context:
https://dev.mysql.com/doc/refman/5.7/en/create-table-secondary-indexes.html

Trouble with subquery but only in view

I have some trouble with the following query:
DROP VIEW IF EXISTS own_inst_stakes_detail_all;
CREATE VIEW `own_inst_stakes_detail_all` AS
SELECT
`own_inst_stakes_detail`.`FactSet_Entity_ID` AS `factset_entity_id`,
`own_inst_stakes_detail`.`FSYM_ID` AS `fsym_id`,
`own_inst_stakes_detail`.`as_of_date` AS `report_date`,
`own_inst_stakes_detail`.`Position` AS `adj_holding`,
`own_inst_stakes_detail`.`Position` / (SELECT own_sec_prices.adj_shares_outstanding FROM own_sec_prices WHERE own_sec_prices.`FSYM_ID` = own_inst_stakes_detail.`FSYM_ID` ORDER BY ABS(DATEDIFF(own_sec_prices.price_date, `own_inst_stakes_detail`.`as_of_date`)) DESC LIMIT 1) AS `adj_ratio`,
`own_inst_stakes_detail`.`Position` AS `reported_holding`,
`sym_coverage`.`Proper_Name` AS `security_proper_name`,
`sym_entity`.`Entity_Proper_Name` AS `entity_proper_name`
FROM
`own_inst_stakes_detail`
LEFT JOIN `sym_entity` ON `sym_entity`.`FactSet_Entity_ID` = `own_inst_stakes_detail`.`FactSet_Entity_ID`
LEFT JOIN `sym_coverage` ON `sym_coverage`.`FSYM_ID` = `own_inst_stakes_detail`.`FSYM_ID`;
It uses the following tables, which are of particular interest:
CREATE TABLE `own_inst_stakes_detail` (
`FactSet_Entity_ID` CHAR(8) NOT NULL,
`FSYM_ID` CHAR(8) NOT NULL,
`as_of_date` DATE NOT NULL,
`Position` DOUBLE DEFAULT NULL,
PRIMARY KEY (`FactSet_Entity_ID`,`FSYM_ID`,`as_of_date`),
KEY `idx_own_inst_stakes_detail_FactSet_Entity_ID` (`FactSet_Entity_ID`),
KEY `idx_own_inst_stakes_detail_FSYM_ID` (`FSYM_ID`),
KEY `idx_own_inst_stakes_detail_as_of_date` (`as_of_date`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE `own_sec_prices` (
`FSYM_ID` CHAR(8) NOT NULL,
`price_date` DATE NOT NULL,
`unadj_shares_outstanding` DOUBLE NOT NULL,
`adj_shares_outstanding` DOUBLE NOT NULL,
PRIMARY KEY (`FSYM_ID`,`price_date`),
KEY `idx_own_sec_prices_FSYM_ID` (`FSYM_ID`),
KEY `idx_own_sec_prices_price_date` (`price_date`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
The query in the view works really well running just the query but it is a catastrophe if that is wrapped in a view. The query never completes, probably because of a full table scan. The tables are really big which I am using here.
I have tried that with MySQL Server 5.7 (btw., with MariaDB 10.1 there are no problems). Is there a way to improve that?
The EXPLAINs look as follows (Maria and My):
MariaDB [factset]> explain SELECT * FROM own_inst_stakes_detail_all WHERE FSYM_ID = 'WK13LJ-S' ORDER BY report_date DESC, adj_ratio DESC;
+------+--------------------+------------------------+--------+------------------------------------+------------------------------------+---------+--------------------------------------------------+------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+--------------------+------------------------+--------+------------------------------------+------------------------------------+---------+--------------------------------------------------+------+----------------------------------------------------+
| 1 | PRIMARY | own_inst_stakes_detail | ref | idx_own_inst_stakes_detail_FSYM_ID | idx_own_inst_stakes_detail_FSYM_ID | 8 | const | 8 | Using index condition; Using where; Using filesort |
| 1 | PRIMARY | sym_entity | eq_ref | PRIMARY | PRIMARY | 8 | factset.own_inst_stakes_detail.FactSet_Entity_ID | 1 | |
| 1 | PRIMARY | sym_coverage | const | PRIMARY | PRIMARY | 8 | const | 1 | Using where |
| 3 | DEPENDENT SUBQUERY | own_sec_prices | ref | PRIMARY,idx_own_sec_prices_FSYM_ID | idx_own_sec_prices_FSYM_ID | 8 | factset.own_inst_stakes_detail.FSYM_ID | 10 | Using temporary; Using filesort |
+------+--------------------+------------------------+--------+------------------------------------+------------------------------------+---------+--------------------------------------------------+------+----------------------------------------------------+
4 rows in set (0.01 sec)
mysql> explain SELECT * FROM own_inst_stakes_detail_all WHERE FSYM_ID = 'WK13LJ-S' ORDER BY report_date DESC, adj_ratio DESC;
+----+--------------------+------------------------+------------+--------+------------------------------------+----------------------------+---------+--------------------------------------------------+--------+----------+---------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+------------------------+------------+--------+------------------------------------+----------------------------+---------+--------------------------------------------------+--------+----------+---------------------------------+
| 1 | PRIMARY | <derived2> | NULL | ref | <auto_key0> | <auto_key0> | 8 | const | 10 | 100.00 | Using where; Using filesort |
| 2 | DERIVED | own_inst_stakes_detail | NULL | ALL | NULL | NULL | NULL | NULL | 817961 | 100.00 | NULL |
| 2 | DERIVED | sym_entity | NULL | eq_ref | PRIMARY | PRIMARY | 8 | factset.own_inst_stakes_detail.FactSet_Entity_ID | 1 | 100.00 | NULL |
| 2 | DERIVED | sym_coverage | NULL | eq_ref | PRIMARY | PRIMARY | 8 | factset.own_inst_stakes_detail.FSYM_ID | 1 | 100.00 | NULL |
| 3 | DEPENDENT SUBQUERY | own_sec_prices | NULL | ref | PRIMARY,idx_own_sec_prices_FSYM_ID | idx_own_sec_prices_FSYM_ID | 8 | factset.own_inst_stakes_detail.FSYM_ID | 10 | 100.00 | Using temporary; Using filesort |
+----+--------------------+------------------------+------------+--------+------------------------------------+----------------------------+---------+--------------------------------------------------+--------+----------+---------------------------------+
5 rows in set, 3 warnings (0.08 sec)
To me it looks like MySQL cannot use the index idx_own_inst_stakes_detail_FSYM_ID. But if I change the column with the subquery to "1 as adj_ratio", the following happens:
mysql> explain SELECT * FROM own_inst_stakes_detail_all WHERE FSYM_ID = 'WK13LJ-S';
+----+-------------+------------------------+------------+--------+------------------------------------+------------------------------------+---------+--------------------------------------------------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------------------+------------+--------+------------------------------------+------------------------------------+---------+--------------------------------------------------+------+----------+-------+
| 1 | SIMPLE | own_inst_stakes_detail | NULL | ref | idx_own_inst_stakes_detail_FSYM_ID | idx_own_inst_stakes_detail_FSYM_ID | 8 | const | 8 | 100.00 | NULL |
| 1 | SIMPLE | sym_entity | NULL | eq_ref | PRIMARY | PRIMARY | 8 | factset.own_inst_stakes_detail.FactSet_Entity_ID | 1 | 100.00 | NULL |
| 1 | SIMPLE | sym_coverage | NULL | const | PRIMARY | PRIMARY | 8 | const | 1 | 100.00 | NULL |
+----+-------------+------------------------+------------+--------+------------------------------------+------------------------------------+---------+--------------------------------------------------+------+----------+-------+
3 rows in set, 1 warning (0.00 sec)
That works well! Any help is greatly appreciated. Thanks!

MySQL left join performance issues

I have been having issues with MySQL (version 5.5) left join performance on a number of queries. In all cases I have been able to work around the issue by restructuring the queries with unions and subselects (I saw some examples of this in the book High Performance MySQL). The problem is this this leads to very messy queries.
Below is an example of two queries that produce the exact same results. The first query is roughly two orders of magnitude slower than the second. The second query is much less readable than the first.
As far as I can tell these sorts of queries are not performing poorly because of bad indexing. In all cases when I restructure the query it runs just fine. I have also tried carefully looking at the indexes and using hints to no avail.
Has anyone else run into similar issues with MySQL? Are there any server parameters I should try tweaking? Has anyone found a cleaner way to work around this sort of issue?
Query 1
select
i.id,
sum(vp.measurement * pol.quantity_ordered) measurement_on_order
from items i
left join (vendor_products vp, purchase_order_lines pol, purchase_orders po) on
vp.item_id = i.id and
pol.vendor_product_id = vp.id and
pol.purchase_order_id = po.id and
po.received_at is null and
po.closed_at is null
group by i.id
explain:
+----+-------------+-------+--------+-------------------------------+-------------------+---------+-------------------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+-------------------------------+-------------------+---------+-------------------------------------+------+-------------+
| 1 | SIMPLE | i | index | NULL | PRIMARY | 4 | NULL | 241 | Using index |
| 1 | SIMPLE | po | ref | PRIMARY,received_at,closed_at | received_at | 9 | const | 2 | |
| 1 | SIMPLE | pol | ref | purchase_order_id | purchase_order_id | 4 | nutkernel_dev.po.id | 7 | |
| 1 | SIMPLE | vp | eq_ref | PRIMARY,item_id | PRIMARY | 4 | nutkernel_dev.pol.vendor_product_id | 1 | |
+----+-------------+-------+--------+-------------------------------+-------------------+---------+-------------------------------------+------+-------------+
Query 2
select
i.id,
sum(on_order.measurement_on_order) measurement_on_order
from (
(
select
i.id item_id,
sum(vp.measurement * pol.quantity_ordered) measurement_on_order
from purchase_orders po
join purchase_order_lines pol on pol.purchase_order_id = po.id
join vendor_products vp on pol.vendor_product_id = vp.id
join items i on vp.item_id = i.id
where
po.received_at is null and po.closed_at is null
group by i.id
)
union all
(select id, 0 from items)
) on_order
join items i on on_order.item_id = i.id
group by i.id
explain:
+------+--------------+------------+--------+-------------------------------+--------------------------------+---------+-------------------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+--------------+------------+--------+-------------------------------+--------------------------------+---------+-------------------------------------+------+----------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 3793 | Using temporary; Using filesort |
| 1 | PRIMARY | i | eq_ref | PRIMARY | PRIMARY | 4 | on_order.item_id | 1 | Using index |
| 2 | DERIVED | po | ALL | PRIMARY,received_at,closed_at | NULL | NULL | NULL | 20 | Using where; Using temporary; Using filesort |
| 2 | DERIVED | pol | ref | purchase_order_id | purchase_order_id | 4 | nutkernel_dev.po.id | 7 | |
| 2 | DERIVED | vp | eq_ref | PRIMARY,item_id | PRIMARY | 4 | nutkernel_dev.pol.vendor_product_id | 1 | |
| 2 | DERIVED | i | eq_ref | PRIMARY | PRIMARY | 4 | nutkernel_dev.vp.item_id | 1 | Using index |
| 3 | UNION | items | index | NULL | index_new_items_on_external_id | 257 | NULL | 3380 | Using index |
| NULL | UNION RESULT | <union2,3> | ALL | NULL | NULL | NULL | NULL | NULL | |
+------+--------------+------------+--------+-------------------------------+--------------------------------+---------+-------------------------------------+------+----------------------------------------------+