Optimize mysql query (use or create index) - mysql

I have a SQL request (mySQL 5.1.51) that makes PHP timeout.
I would like to optimize it but I can't find what is missing.
The request is :
SELECT s_i.incident,
s.hostname,
a.application,
s_ie.problem_status,
s_i.open_time,
s_i.close_time,
s_ie.open_group,
s_ie.primary_assignment,
s_ie.closed_by_group,
s_ie.contact_first_name,
s_ie.contact_last_name,
s_ie.description,
s_ie.resolution,
s_ie.famille_1,
s_ie.famille_2,
s_ie.famille_3,
YEARWEEK(s_i.open_time) AS 'semaine_ouverture',
DATE_FORMAT(s_i.open_time, '%Y-%m') AS 'mois_ouverture',
YEARWEEK(s_i.close_time) AS 'semaine_cloture',
DATE_FORMAT(s_i.close_time, '%Y-%m') AS 'mois_cloture',
p.nom,
s.exploite_par,
t.environnement,
a.tdb
FROM t_link_serveur_eac t USE KEY(nna)
INNER JOIN serveur s ON s.id = t.id_serveur
INNER JOIN plateau p ON p.id = t.id_plateau
INNER JOIN applications a ON a.nna = t.nna
INNER JOIN scope_i s_i USE KEY (id_serveur) ON s_i.id_serveur = t.id_serveur
INNER JOIN scope_i_extended s_ie USE KEY (id_scope_i) ON s_ie.id_scope_i = s_i.id
WHERE s_ie.problem_status = 'Closed'
AND s_ie.contact_first_name = 'AUTOMATE'
AND s_ie.contact_last_name LIKE '%BEM%'
AND p.id = 4
AND open_time >= CURDATE() - INTERVAL 52 WEEK AND open_time <= CURDATE()
AND AND s_i.close_time < CURDATE() - INTERVAL DAYOFMONTH(CURDATE()) - 1 DAY
ORDER BY mois_cloture
When I ask mySQL to explain it, I have a line type 'ALL' for the union of the table s_ie.
I tried to create/modify all possibles index but all my tried didn't make any difference :
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE p const PRIMARY PRIMARY 4 const 1 Using temporary; Using filesort
1 SIMPLE a ALL PRIMARY NULL NULL NULL 957
1 SIMPLE t ref nna nna 26 inspire.a.nna 10 Using where
1 SIMPLE s eq_ref PRIMARY PRIMARY 4 inspire.t.id_serveur 1
1 SIMPLE s_i ref id_serveur id_serveur 4 inspire.t.id_serveur 135 Using where
1 SIMPLE s_ie eq_ref id_scope_i id_scope_i 4 inspire.s_i.id 1 Using where
s_ie has 712.000 lines and s_i 740.000 so I think that the problem comes from this junction
Here is the structure of the table s_ie
CREATE TABLE IF NOT EXISTS `scope_i_extended` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`id_scope_i` int(11) NOT NULL,
`problem_status` varchar(16) NOT NULL,
`open_group` varchar(32) NOT NULL,
`primary_assignment` varchar(32) NOT NULL,
`closed_by_group` varchar(32) NOT NULL,
`contact_first_name` varchar(32) NOT NULL,
`contact_last_name` varchar(32) NOT NULL,
`description` text NOT NULL,
`resolution` text NOT NULL,
`famille_1` text NOT NULL,
`famille_2` text NOT NULL,
`famille_3` text NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `id_scope_i` (`id_scope_i`),
UNIQUE KEY `problem_status` (`id_scope_i`, `problem_status`, `contact_first_name`, `contact_last_name`),
KEY `contact_last_name` (`contact_last_name`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
And the structure of s_i
CREATE TABLE IF NOT EXISTS `scope_i` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`incident` varchar(20) NOT NULL,
`statut` varchar(20) NOT NULL,
`id_serveur` int(11) NOT NULL,
`open_time` datetime NOT NULL,
`close_time` datetime DEFAULT NULL,
`affectation` varchar(50) NOT NULL,
`titre` varchar(200) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `incident` (`incident`),
KEY `serveur` (`id_serveur`),
KEY `serveur_open_time` (`id_serveur`,`open_time`),
KEY `id_serveur` (`id_serveur`,`close_time`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=738862 ;
May you help/save me?
Regards,
Olivier

Sorry for stating the obvious but I'd suggest using:
"YEARWEEK(open_time) <= '201246' AND YEARWEEK(open_time) >= '201146'"
instead of
"YEARWEEK(open_time) IN (...)"
IN always slows things dramatically.

In your condition, you have date columns wrapped within a Mysql Function.
ex: YEARWEEK(open_time) and DATE_FORMAT(s_i.close_time, '%Y-%m-%d')
You should avoid this since Mysql seems to execute the function for each row of the table.
Can you try to replace
DATE_FORMAT(s_i.close_time, '%Y-%m-%d') < DATE_FORMAT(NOW(), '%Y-%m-01')
by
s_i.close_time < CURDATE() - INTERVAL DAYOFMONTH(CURDATE()) - 1 DAY
and
YEARWEEK(open_time) IN ('201246', '201245'....)
by this: (below is condition to get all records has "open_time" within a year. I am not sure if it's your case)
open_time >= CURDATE() - INTERVAL 1 YEAR AND open_time <= CURDATE()

Related

Query speed of insert/ update SMA (simple moving average)

I would like to include a column in my table with the simple moving average of stock data. I have been able to create several queries which successfully do so, however the query speed is slow. My goal is to improve the query speed.
I have the following table:
CREATE TABLE `timeseries_test` (
`timeseries_id` int(11) NOT NULL AUTO_INCREMENT,
`stock_id` int(10) NOT NULL,
`date` date NOT NULL,
`open` decimal(16,8) NOT NULL,
`high` decimal(16,8) NOT NULL,
`low` decimal(16,8) NOT NULL,
`close` decimal(16,8) NOT NULL,
`adjusted_close` double(16,8) NOT NULL,
`volume` int(16) NOT NULL,
`dividend` double(16,8) NOT NULL,
`split_coefficient` double(16,15) NOT NULL,
`100sma` decimal(16,8) NOT NULL,
PRIMARY KEY (`timeseries_id`),
KEY `stock` (`stock_id`),
KEY `date` (`date`),
KEY `date_stock` (`stock_id`,`date`)
) ENGINE=InnoDB AUTO_INCREMENT=5444325 DEFAULT CHARSET=latin1
I have tried many different query formats, but they all take about 25 seconds per 5000 rows. The select query only takes less than a second. Below an example query:
UPDATE stock.timeseries_test t1 INNER JOIN (
SELECT a.timeseries_id,
Round( ( SELECT SUM(b.close) / COUNT(b.close)
FROM timeseries_test AS b
WHERE DATEDIFF(a.date, b.date) BETWEEN 0 AND 99 AND a.stock_id = b.stock_id
), 2 ) AS '100sma'
FROM timeseries_test AS a) t2
ON t1.`timeseries_id` = t2.`timeseries_id`
SET t1.100sma = t2.100SMA
WHERE t2.100sma = null
Below the explain query:
1 PRIMARY <derived2> NULL ALL NULL NULL NULL NULL 10385 10.00 Using where
1 UPDATE t1 NULL eq_ref PRIMARY PRIMARY 4 t2.timeseries_id 1 100.00 NULL
2 DERIVED a NULL index NULL date_stock 7 NULL 10385 100.00 Using index
3 DEPENDENT SUBQUERY b NULL ref stock,date_stock stock 4 stock.a.stock_id 5192 100.00 Using where
Any help is appreciated.
If you are running MySQL 8.0, I recommend window functions with a range specification; this avois the need for a correlated subquery.
update stock.timeseries_test t1
inner join (
select timeseries_id,
avg(close) over(
partition by stock_id
order by date
range between interval 99 day preceding and current row
) `100sma`
from timeseries_test
) t2 on t1.timeseries_id = t2.timeseries_id
set t1.`100sma` = t2.`100sma`
It is quite unclear what the purpose of the original, outer where clause is, so I removed it:
WHERE t2.`100sma` = null
If you do want to check for nullness, then you need is null; but doing so would pretty much defeat whole logic of the update statement. Maybe you meant:
WHERE t1.`100sma` is null
Functions are not sargable. Instead of
DATEDIFF(a.date, b.date) BETWEEN 0 AND 99
use
a.date BETWEEN b.date AND b.date + INTERVAL 99 DAY
(or maybe a and b should be swapped)
I suspect (from the column names) that the pair (stock_id,date) is unique and that timeseries_id is never really used. If those are correct, then
PRIMARY KEY (`timeseries_id`),
KEY `date_stock` (`stock_id`,`date`)
-->
PRIMARY KEY(`stock_id`,`date`)
The ON(timestamp_id would need to be changed to testing both those columns.
Also, toss this since there is another index that starts with the same column(s):
KEY `stock` (`stock_id`),

How to improve query speed in mysql query

I'm trying to optimize my query speed as much as possible. A side problem is that I cannot see the exact query speed, because it is rounded to a whole second. The query does get the expected result and takes about 1 second. The final query should be extended even more and for this reason i am trying to improve it. How can this query be improved?
The database is constructed as an electricity utility company. The query should eventually calculate an invoice. I basically have 4 tables, APX price, powerdeals, powerload, eans_power.
APX price is an hourly price, powerload is a quarterly hour volume. First step is joining these two together for each quarter of an hour.
Second step is that I currently select the EAN that is indicated in the table eans_power.
Finally I will join the Powerdeals that currently consist only of a single line and indicates from which hour, until which hour and weekday from/until it should be applicable. It consist of an hourly volume and price. Currently it is only joined on the hours, but it will be extended to weekdays as well.
MYSQL Query:
SELECT l.DATE, l.PERIOD_FROM, a.PRICE, l.POWERLOAD,
SUM(a.PRICE*l.POWERLOAD), SUM(d.hourly_volume/4)
FROM timeseries.powerload l
INNER JOIN timeseries.apxprice a ON l.DATE = a.DATE
INNER JOIN contracts.eans_power c ON l.ean = c.ean
LEFT OUTER JOIN timeseries.powerdeals d ON d.period_from <= l.period_from
AND d.period_until >= l.period_until
WHERE l.PERIOD_FROM >= a.PERIOD_FROM
AND l.PERIOD_FROM < a.PERIOD_UNTIL
AND l.DATE >= '2018-01-01'
AND l.DATE <= '2018-12-31'
GROUP BY l.date
Explain:
1 SIMPLE c NULL system PRIMARY,ean NULL NULL NULL 1 100.00 Using temporary; Using filesort
1 SIMPLE l NULL ref EAN EAN 21 const 35481 11.11 Using index condition
1 SIMPLE d NULL ALL NULL NULL NULL NULL 1 100.00 Using where; Using join buffer (Block Nested Loop)
1 SIMPLE a NULL ref DATE DATE 4 timeseries.l.date 24 11.11 Using index condition
Create table queries:
apxprice
CREATE TABLE `apxprice` (
 `apx_id` int(11) NOT NULL AUTO_INCREMENT,
 `date` date DEFAULT NULL,
 `period_from` time DEFAULT NULL,
 `period_until` time DEFAULT NULL,
 `price` decimal(10,2) DEFAULT NULL,
 PRIMARY KEY (`apx_id`),
 KEY `DATE` (`date`,`period_from`,`period_until`)
) ENGINE=MyISAM AUTO_INCREMENT=29664 DEFAULT CHARSET=latin1
powerdeals
CREATE TABLE `powerdeals` (
 `deal_id` int(11) NOT NULL AUTO_INCREMENT,
 `date_deal` date NOT NULL,
 `start_date` date NOT NULL,
 `end_date` date NOT NULL,
 `weekday_from` int(11) NOT NULL,
 `weekday_until` int(11) NOT NULL,
 `period_from` time NOT NULL,
 `period_until` time NOT NULL,
 `hourly_volume` int(11) NOT NULL,
 `price` int(11) NOT NULL,
 `type_deal_id` int(11) NOT NULL,
 `contract_id` int(11) NOT NULL,
 PRIMARY KEY (`deal_id`)
) ENGINE=MyISAM AUTO_INCREMENT=2 DEFAULT CHARSET=latin1
powerload
CREATE TABLE `powerload` (
 `powerload_id` int(11) NOT NULL AUTO_INCREMENT,
 `ean` varchar(18) DEFAULT NULL,
 `date` date DEFAULT NULL,
 `period_from` time DEFAULT NULL,
 `period_until` time DEFAULT NULL,
 `powerload` int(11) DEFAULT NULL,
 PRIMARY KEY (`powerload_id`),
 KEY `EAN` (`ean`,`date`,`period_from`,`period_until`)
) ENGINE=MyISAM AUTO_INCREMENT=61039 DEFAULT CHARSET=latin1
eans_power
CREATE TABLE `eans_power` (
 `ean` char(19) NOT NULL,
 `contract_id` int(11) NOT NULL,
 `invoicing_id` int(11) NOT NULL,
 `street` varchar(255) NOT NULL,
 `number` int(11) NOT NULL,
 `affix` char(11) NOT NULL,
 `postal` char(6) NOT NULL,
 `city` varchar(255) NOT NULL,
 PRIMARY KEY (`ean`),
 KEY `ean` (`ean`,`contract_id`,`invoicing_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
Sample data tables
apx_prices
apx_id,date,period_from,period_until,price
1,2016-01-01,00:00:00,01:00:00,23.86
2,2016-01-01,01:00:00,02:00:00,22.39
powerdeals
deal_id,date_deal,start_date,end_date,weekday_from,weekday_until,period_from,period_until,hourly_volume,price,type_deal_id,contract_id
1,2019-05-15,2018-01-01,2018-12-31,1,5,08:00:00,20:00:00,1000,50,3,1
powerload
powerload_id,ean,date,period_from,period_until,powerload
1,871688520000xxxxxx,2018-01-01,00:00:00,00:15:00,9
2,871688520000xxxxxx,2018-01-01,00:15:00,00:30:00,11
eans_power
ean,contract_id,invoicing_id,street,number,affix,postal,city
871688520000xxxxxx,1,1,road,14,postal,city
Result, without sum() and group by:
DATE,PERIOD_FROM,PRICE,POWERLOAD,a.PRICE*l.POWERLOAD,d.hourly_volume/4,
2018-01-01,00:00:00,27.20,9,244.80,NULL
2018-01-01,00:15:00,27.20,11,299.20,NULL
Result, with sum() and group by:
DATE, PERIOD_FROM, PRICE, POWERLOAD, SUM(a.PRICE*l.POWERLOAD), SUM(d.hourly_volume/4)
2018-01-01,08:00:00,26.33,21,46193.84,12250.0000
2018-01-02, 08:00:00,47.95,43,90623.98,12250.0000
Preliminary optimizations:
Use InnoDB, not MyISAM.
Use CHAR only for constant-lenght strings
Use consistent datatypes (see ean, for example)
For an alternative to using time-to-the-second, check out the Handler counts .
Because range tests (such as l.PERIOD_FROM >= a.PERIOD_FROM AND l.PERIOD_FROM < a.PERIOD_UNTIL) are essentially impossible to optimize, I recommend you expand the table to have one entry per hour (or 1 per quarter hour, if necessary). Looking up a row via a key is much faster than doing a scan of "ALL" the table. 9K rows for an entire year is trivial.
When you get past these recommendations (and the Comments), I will have more tips on optimizing the indexes, especially InnoDB's PRIMARY KEY.

MYSQL: Still using filesort, using temporary, even though the index is properly added

I have this query that I think I have indexed them properly. But still get filesort and temporary indexing.
The query is as follow:
SELECT * FROM
(SELECT PIH.timestamp, PIH.practice_id, PIH.timestamp as invoice_num, PIH.custom_invnum,
CEIL(PIH.total_invoice + PIH.tax + PIH.other_bill) as grand_total, PIH.total_invoice, PIH.extra_charge_ph as extra_charge,
PIH.tax, PIH.other_bill, PIH.changed, PIH.source,
PIH.notes, PIH.is_active, PIH.paid as pay,
PIH.covered_amount, IF(PIH.is_active = 1, IF(PIH.total_invoice = 0 OR PIH.total_invoice + PIH.tax + PIH.other_bill - PIH.covered_amount <= PIH.paid, 1, IF(PIH.paid = 0, 0, 2)), '') as invoice_st,
RPP.patient_id, RPP.first_name as pfname, RPP.last_name as plname, RPP.dob as p_dob, RPP.gender as p_gender, RPP.reff_id as p_reff_id, RPP.mobile_number as p_mobile, IF(PIH.group_doctors IS NOT NULL, NULL, D.doc_title) as doc_title, IF(PIH.group_doctors IS NOT NULL,
PIH.group_doctors, D.first_name) as doc_fname, IF(PIH.group_doctors IS NOT NULL, PIH.group_doctors, D.last_name) as doc_lname, IF(PIH.group_doctors IS NOT NULL, NULL, D.spc_dsg) as spc_dsg, PA.username, TL.timestamp as checkout_time, IP.name as ip_name, PMM.timestamp as mcu_id
FROM practice_invoice_header PIH
INNER JOIN practice_invoice_detail PID ON PID.timestamp = PIH.timestamp
AND PID.practice_id = PIH.practice_id
INNER JOIN practice_queue_list PQL ON PQL.encounter_id = PID.encounter_id
AND PQL.practice_place_id = PIH.practice_id
INNER JOIN temp_search_view D ON D.id = PQL.doctor_id
AND D.pp_id = PQL.practice_place_id
INNER JOIN practice_place PP ON PP.id = PIH.practice_id
INNER JOIN ref_practice_patient RPP ON RPP.patient_id = PIH.patient_id
AND RPP.practice_id = PP.parent_id
LEFT JOIN practice_mcu_module PMM ON PMM.id = PID.mcu_module_id
AND PMM.practice_id = PID.practice_id
LEFT JOIN transaction_log TL ON TL.reff_id = PIH.timestamp
AND TL.practice_id = PIH.practice_id
AND TL.activity = "CHK"
LEFT JOIN practice_admin PA ON PA.id = TL.admin_id
LEFT JOIN insurance_plan IP ON IP.id = PIH.insurance_plan_id
WHERE PIH.source <> 'P'
AND PIH.practice_id = 28699
AND PIH.is_active = 1
AND PQL.cal_id >= 201807010
AND PQL.cal_id <= 201807312
GROUP BY PIH.timestamp, PIH.practice_id
) AS U LIMIT 0,20
NOTE: I only show some of the main tables that are used in this query and the ones that sort using filesort/temporary, of course If I post everything it will be too many information.
The query is about list of invoices, and it has the header (practice_invoice_header) and the details (practice_invoice_detail). And this query join with the practice_place table
CREATE TABLE `practice_invoice_header` (
`timestamp` bigint(20) NOT NULL,
`practice_id` int(11) NOT NULL,
`cal_id` int(11) NOT NULL,
`patient_id` int(11) NOT NULL DEFAULT 0,
`source` char(1) NOT NULL COMMENT 'E = ENCOUNTER; P = OTHER (PHARM / LAB)',
`total_invoice` float(30,2) NOT NULL DEFAULT 0.00,
`tax` float(30,2) NOT NULL DEFAULT 0.00,
`other_bill` float(30,2) NOT NULL DEFAULT 0.00,
`changed` float(30,2) NOT NULL DEFAULT 0.00,
`paid` float(30,2) NOT NULL DEFAULT 0.00,
`covered_amount` float(30,2) NOT NULL DEFAULT 0.00,
`notes` varchar(300) DEFAULT NULL,
`custom_invnum` varchar(30) DEFAULT NULL,
`insurance_plan_id` varchar(20) DEFAULT NULL,
`is_active` int(11) NOT NULL DEFAULT 1,
`cancel_reason` varchar(200) DEFAULT NULL,
PRIMARY KEY (`timestamp`,`practice_id`),
KEY `custom_invnum` (`custom_invnum`),
KEY `insurance_plan_id` (`insurance_plan_id`),
KEY `practice_id_3` (`practice_id`,`xxx_reff_id`),
KEY `ph_check_status` (`ph_checked_by`),
KEY `cal_id` (`cal_id`),
KEY `practice_id_5` (`practice_id`,`outpx_id`),
KEY `practice_id_6` (`practice_id`,`cal_id`,`source`,`is_active`),
KEY `total_invoice` (`total_invoice`),
KEY `patient_id` (`patient_id`),
CONSTRAINT `practice_invoice_header_ibfk_1` FOREIGN KEY (`practice_id`)
REFERENCES `practice_place` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
CREATE TABLE `practice_invoice_detail` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`timestamp` bigint(20) NOT NULL,
`practice_id` int(11) NOT NULL,
`item_id` int(11) NOT NULL,
`item_sub_id` int(11) DEFAULT NULL,
`item_type` char(1) NOT NULL COMMENT 'D = DRUG; P = PROCEDURE; L = LAB',
`item_qty` float NOT NULL,
`item_price` float(22,2) NOT NULL,
`discount` float NOT NULL DEFAULT 0,
`is_active` int(11) NOT NULL DEFAULT 1,
PRIMARY KEY (`id`),
KEY `item_type` (`item_type`),
KEY `timestamp` (`timestamp`,`practice_id`),
KEY `practice_id` (`practice_id`),
KEY `item_id_2` (`item_id`,`item_sub_id`,`item_type`),
KEY `timestamp_2` (`timestamp`,`practice_id`,`item_id`,`item_sub_id`,`item_type`),
KEY `practice_id_3` (`practice_id`,`item_type`),
KEY `the_id` (`id`,`practice_id`) USING BTREE,
KEY `timestamp_3` (`timestamp`,`practice_id`,`item_type`,`item_comission`,
`item_comission_type`, `doctor_id`,`item_id`,`item_sub_id`,`id`) USING BTREE,
KEY `timestamp_4` (`timestamp`,`practice_id`,`item_id`,`item_sub_id`,`item_type`,
`item_comission_2`,`item_comission_2_type`,`doctor_id_2`,`id`) USING BTREE,
KEY `request_id` (`request_id`,`request_practice`),
KEY `timestamp_5` (`timestamp`,`practice_id`,`is_active`),
KEY `practice_id_6` (`practice_id`,`encounter_id`,`is_active`),
KEY `practice_id_7` (`practice_id`,`item_type`,`encounter_id`,`is_active`),
CONSTRAINT `practice_invoice_detail_ibfk_1` FOREIGN KEY (`timestamp`)
REFERENCES `practice_invoice_header` (`timestamp`) ON DELETE CASCADE,
CONSTRAINT `practice_invoice_detail_ibfk_2` FOREIGN KEY (`practice_id`)
REFERENCES `practice_place` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1447348 DEFAULT CHARSET=latin1
CREATE TABLE `ref_practice_patient` (
`practice_id` int(11) NOT NULL,
`patient_id` int(11) NOT NULL,
`reff_id` varchar(35) DEFAULT NULL,
`is_user` int(11) NOT NULL DEFAULT 0,
`parent_user_id` int(11) NOT NULL DEFAULT 0
PRIMARY KEY (`practice_id`,`patient_id`),
KEY `patient_id` (`patient_id`),
KEY `reff_id` (`reff_id`),
KEY `practice_id` (`practice_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
CREATE TABLE `practice_place` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(75) NOT NULL,
`statement` text DEFAULT NULL,
`address` varchar(200) NOT NULL,
`phone` varchar(15) NOT NULL,
`wa_number` varchar(15) DEFAULT NULL,
`fax` varchar(15) NOT NULL,
`email` varchar(50) NOT NULL,
`is_branch` int(11) NOT NULL,
`parent_id` int(11) NOT NULL,
`editted_by` int(11) DEFAULT NULL,
`editted_date` bigint(20) DEFAULT NULL,
`status` int(11) NOT NULL DEFAULT 1,
PRIMARY KEY (`id`),
KEY `parent_id` (`parent_id`),
KEY `reff_id` (`reff_id`),
) ENGINE=InnoDB AUTO_INCREMENT=29058 DEFAULT CHARSET=latin1
And below is the explain produce by the query, and I highlight the one using filsort (no. 2)
1 PRIMARY ALL NULL NULL NULL NULL 14028
2 DERIVED PP const PRIMARY,parent_id PRIMARY 4 const 1 Using temporary; Using filesort
2 DERIVED PIH ref PRIMARY,practice_id_3,practice_id_5,practice_id_6,practice_id_8,pharm_read,lab_read,rad_read,patient_id
practice_id_5 4 const 7014 Using where
2 DERIVED RPP eq_ref PRIMARY,patient_id,practice_id,practice_id_2,practice_id_3
PRIMARY 8 const,k6064619_lokadok.PIH.patient_id 1
2 DERIVED PID ref timestamp,practice_id,timestamp_2,practice_id_2,practice_id_3,timestamp_3,timestamp_4,practice_id_4,practice_id_5,timestamp_5,practice_id_6,practice_id_7
timestamp 12 k6064619_lokadok.PIH.timestamp,const 1
2 DERIVED PMM eq_ref PRIMARY,id,practice_id
PRIMARY 4 k6064619_lokadok.PID.mcu_module_id 1 Using where
2 DERIVED TL ref reff_id reff_id 12 k6064619_lokadok.PIH.timestamp,const 1 Using where
2 DERIVED PA eq_ref PRIMARY PRIMARY 4 k6064619_lokadok.TL.admin_id 1 Using where
2 DERIVED IP ref PRIMARY,id PRIMARY 22 k6064619_lokadok.PIH.insurance_plan_id 1 Using where
2 DERIVED PQL ref PRIMARY,encounter_id,cal_id_2
encounter_id 5 k6064619_lokadok.PID.encounter_id 2 Using where; Using index
2 DERIVED D ref doc_id,pp_id,id_2,pp_doc doc_id 4 k6064619_lokadok.PQL.doctor_id 1 Using where
I believe I have indexed the parent_id in practice_place table, and also in ref_practice_patient the patient_id and practice_id is PRIMARY.
Why have the outer query? The Optimizer is free to shuffle the result of the inner query, thereby leaving the LIMIT to pick an ordering that you not expecting. At least add ORDER BY, preferably also toss the outer select.
Main Index
Let's analyze the likely place to design an index:
WHERE PIH.source <> 'P'
AND PIH.practice_id = 28699
AND PIH.is_active = 1
AND PQL.cal_id >= 201807010
AND PQL.cal_id <= 201807312
GROUP BY PIH.timestamp, PIH.practice_id
Since there is a mixture of tables involved, it is not possible to have an index that handles all the WHERE.
Since the tests are not all =, it is not possible to reach beyond the WHERE and include columns of the GROUP BY.
So, I see two indexes:
PIH: INDEX(practice_id, is_active, -- in either order
source)
PQL: INDEX(cal_id)
Since we can't get into the GROUP BY, the Optimizer has no choice but to gather all the rows based on WHERE, do some grouping, and do an ORDER BY (as I said, that is missing, but necessary).
Therefor, GROUP BY and the ORDER BY will require one or two temps and filesorts. No, you can't get away from it, at least not without changing the query in some way. (Please note that "filesort" might actually be done in RAM.)
Your extra SELECT layer may be adding an extra temp and filesort.
EXPLAIN fails to point out when there are two sorts. EXPLAIN FORMAT=JSON has that sort of detail.
Other issues...
Having a timestamp in a PRIMARY KEY is risky unless you are sure that two rows can occur with the same timestamp, or there is another column in the PK to assure uniqueness.
Don't use FLOAT for money. It will incur extra rounding errors, and it cannot store more than about 7 significant digits (that' less than $100K to the penny). Don't use float(30,2), it is even worse because you are forcing an extra rounding. Use DECIMAL(30,2), but pick something reasonable, not 30. It takes 14 bytes -- mostly a waste of space.
Whenever you have INDEX(a,b), you don't need INDEX(a); it is redundant and slows down (slightly) INSERTs.
LEFT JOIN transaction_log TL
ON TL.reff_id = PIH.timestamp
AND TL.practice_id = PIH.practice_id
AND TL.activity = "CHK"
needs
INDEX(reff_id, practice_id, activity) -- in any order
Also
INNER JOIN practice_invoice_detail PID ON PID.timestamp = PIH.timestamp
AND PID.practice_id = PIH.practice_id
PIH: INDEX(practice_id, timestamp) -- not the opposite order
PIH: INDEX(practice_id, is_active, timestamp)
INNER JOIN practice_queue_list PQL ON PQL.encounter_id = PID.encounter_id
AND PQL.practice_place_id = PIH.practice_id
PQL: INDEX(encounter_id, cal_id)
PQL: INDEX(encounter_id, practice_place_id, cal_id)
Some discussion...
In a JOIN, EXPLAIN shows one order of working through the tables; it gives you no clues of how things would be if it worked through the tables some other way.
I have attempted to show what index might be needed if PQL were used first, or if PIH were used first -- namely use the WHERE stuff for that table, then
I have attempted to show the optimial index for joining to the other table.
Probably the Optimizer will not start with any table not mentioned in the WHERE clause, but this is not a certainty.
I have not listed the optimal indexes for getting to each of the other tables.
More discussion: http://mysql.rjweb.org/doc.php/index_cookbook_mysql

Mysql sub query with MAX

I'm trying to speed up a query which currently takes 1.2 seconds to run. The query is:
select
process.exchange,
process.market,
process.volume,
process.bid,
process.ask,
(select MAX(last) FROM a where a.exchange = process.exchange
AND a.market = process.market
AND a.created_date > NOW() - INTERVAL 5 MINUTE LIMIT 1) as Ask1,
Ask2,
((Ask2 / (select MAX(last) FROM a where a.exchange = process.exchange
AND a.market = process.market
AND a.created_date > NOW() - INTERVAL 5 MINUTE LIMIT 1 ))
* 100) - 100 as percentage
FROM process
WHERE process.exchange IN('BLAH','BLAH2')
ORDER BY percentage ASC
Table structures:
CREATE TABLE `a` (
`id` int(10) UNSIGNED NOT NULL,
`exchange` varchar(15) NOT NULL,
`market` varchar(15) NOT NULL,
`volume` double UNSIGNED NOT NULL,
`bid` double NOT NULL,
`ask` double UNSIGNED NOT NULL,
`last` double NOT NULL,
`created_date` datetime NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
ALTER TABLE `a`
ADD PRIMARY KEY (`id`),
ADD KEY `market` (`market`),
ADD KEY `exchange` (`exchange`),
ADD KEY `created_date` (`created_date`);
SET SQL_MODE = "NO_AUTO_VALUE_ON_ZERO";
SET time_zone = "+00:00";
CREATE TABLE `process` (
`id` int(11) NOT NULL,
`exchange` varchar(20) NOT NULL,
`market` varchar(10) NOT NULL,
`volume` double NOT NULL,
`bid` double NOT NULL,
`ask` double NOT NULL,
`Ask2` double NOT NULL,
`created_date` datetime NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
ALTER TABLE `process`
ADD PRIMARY KEY (`id`),
ADD KEY `created_date` (`created_date`),
ADD KEY `exchange` (`exchange`),
ADD KEY `market` (`market`);
ALTER TABLE `process`
MODIFY `id` int(11) NOT NULL AUTO_INCREMENT;
I need to work out the percentage change between Ask1 and Ask2. Ask1 information is in another table called a that has lots of rows of the prices for each market and exchange.
This is how the data in table "a" looks like:
Exchange Market Volume Bid Ask Last Created Date
BLAH APL 3000 1.2 1.3 1.3 2017-07-26 16:31:00
BLAH APL 3000 1.4 1.5 1.45 2017-07-26 16:30:00
I need Ask1 to get information from here getting the highest value in the last 5 minutes. So the sub query I got is:
select MAX(last)
FROM a
where a.exchange = process.exchange
AND a.market = process.market
AND a.created_date > NOW() - INTERVAL 5 MINUTE
LIMIT 1
I need to select it and also do a calculation with it so at the moment I have to have the sub query there twice.
The MAX seems to slow down the sub query a lot.
This is what EXPLAIN says:
id select type table partitions type possible keys key key_len ref rows filtered Extra
1 PRIMARY process NULL ALL exchange NULL NULL NULL 272 99.63 Using where; Using temporary; Using filesort
2 DEPENDENT SUBQUERY a NULL ref market,exchange,created_date market 47 cms.process.market 1603 0.17 Using index condition; Using where
3 DEPENDENT SUBQUERY a NULL ref market,exchange,created_date market 47 cms.process.market 1603 0.17 Using index condition; Using where
The "a" table has about a million rows and "process" table has about 630 rows.
Is it possible to speed it up?
Thanks in advance for your help.

A correct MySQL Left Join Query overburdens a smallsized DB although rows are indexed

I have this working SQL query but it almost makes my DB crash:
SELECT MASTER.master_id,
MASTER.master_summary,
MASTER.master_start,
MASTER.master_end,
MASTER.master_risk,
MASTER.master_source,
MASTER.master_veto,
master.master_tags,
NULL AS HAS_CE,
C2C.c2c_customer
FROM `cer_master` MASTER
LEFT JOIN `cer_c2customer` C2C
ON ( C2C.c2c_id = MASTER.master_id AND C2C.c2c_source = MASTER.master_source )
WHERE ( MASTER.master_id NOT LIKE 'TAV%' )
AND (( MASTER.master_class <> 'type2' ) OR ( MASTER.master_class <> 'type3' ))
AND ( MASTER.master_status <> 'Cancelled' )
AND ( MASTER.master_end >= Now() AND MASTER.master_start >= Date_sub(Now(), INTERVAL 1 day) )
If I try to run this on phpMyAdmin I have to literally wait for 5min and get this result: 3,699 total, Query took 0.9358 sec
I have indexed MASTER.master_id, MASTER.master_start, MASTER.master_end, MASTER.master_source aswell as c2c.c2c_id, C2C.c2c_source and C2C.c2c_customer but it doesn't seem to help.
Additional Info: cer_master MASTER table has 277,502 rows and cer_c2customer C2C table has 72,788 rows.
Can someone help me optimize this query? I need it badly and cannot think of another way.
EDIT: Results from the EXPLAIN query:
+----+-------------+--------+-------+-------------------------------------------------------+---------------------------------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+-------------------------------------------------------+---------------------------------+---------+------+-------+-------------+
| 1 | SIMPLE | MASTER | range | CHM_MASTER_SCHEDULED_START_DATE,CHM_MASTER_SCHEDUL... | CHM_MASTER_SCHEDULED_START_DATE | 4 | NULL | 5042 | Using where |
+----+-------------+--------+-------+-------------------------------------------------------+---------------------------------+---------+------+-------+-------------+
| 1 | SIMPLE | C2C | ALL | CER_C2C_CHANGE_ID | NULL | NULL | NULL | 72788 | |
+----+-------------+--------+-------+-------------------------------------------------------+---------------------------------+---------+------+-------+-------------+
Table Create Table
master CREATE TABLE `master` (
`ID` int(11) NOT NULL AUTO_INCREMENT,
` MASTER_LAST_MODIFIED_DATE` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
` MASTER_SOURCE` varchar(16) NOT NULL,
` MASTER_ID` varchar(16) NOT NULL,
` MASTER_SUMMARY` text NOT NULL,
` MASTER_NOTES` text NOT NULL,
` MASTER_SERVICE` varchar(255) NOT NULL,
` MASTER_SITE` text NOT NULL,
` MASTER_START` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
` MASTER_END` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
` MASTER_DEPARTMENT_FLAG` varchar(8) NOT NULL,
` MASTER_RISK` int(8) NOT NULL DEFAULT '1',
` MASTER_IMPACT_LEVEL` varchar(64) NOT NULL,
` MASTER_TOOL_STATUS` varchar(32) NOT NULL,
` MASTER_IMPACT_RISK_NOTES` text NOT NULL,
` MASTER_CALENDAR_WEEK` varchar(16) NOT NULL,
` MASTER_TAGS` varchar(1024) NOT NULL,
` MASTER_VETO` tinyint(1) NOT NULL DEFAULT '0',
` MASTER_LAYER_TAGS` text NOT NULL,
` MASTER_ORAKEL_ID` int(11) NOT NULL DEFAULT '0',
` MASTER_USED_TEMPLATE` text NOT NULL,
PRIMARY KEY (`ID`),
KEY ` MASTER_ID` (` MASTER_CHANGE_ID`),
KEY ` MASTER_LAST_MODIFIED_DATE` (` MASTER_LAST_MODIFIED_DATE`),
KEY ` MASTER_SERVICE` (` MASTER_SERVICE`),
KEY ` MASTER_START` (` MASTER_START`),
KEY ` MASTER_END` (` MASTER_END_`),
KEY ` MASTER_SOURCE` (` MASTER_SOURCE`)
) ENGINE=MyISAM AUTO_INCREMENT=278315 DEFAULT CHARSET=utf8
and this is show create table from c2c_customer:
cerberus_change2customer
CREATE TABLE `c2c_customer` (
`ID` int(11) NOT NULL AUTO_INCREMENT,
`C2C_SOURCE` text NOT NULL,
`C2C_ID` text NOT NULL,
`C2C_CUSTOMER` text NOT NULL,
`C2C_LAST_MODFIED` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`ID`),
FULLTEXT KEY `C2C_ID` (`C2C_ID`),
FULLTEXT KEY `C2C_CUSTOMER` (`C2C_CUSTOMER`)
) ENGINE=MyISAM AUTO_INCREMENT=516044 DEFAULT CHARSET=utf8
Your index should be on all columns, not separate indexes for each column.
For example:
ALTER TABLE `cer_c2customer` ADD INDEX `cer_c2customer_ID_SOURCE_CUSTOMER` (c2c_id, c2c_source, c2c_customer)
This means that the one index can be used to locate the data and also supply all of the columns required from this table in the query.
In addition you probably want the clustering index on cer_master to be the start date or end date.
One trick is to order your where clause properly
WHERE ( MASTER.master_id NOT LIKE 'TAV%' )
AND (( MASTER.master_class <> 'type2' ) OR ( MASTER.master_class <> 'type3' ))
AND ( MASTER.master_status <> 'Cancelled' )
AND ( MASTER.master_end >= Now() AND MASTER.master_start >= Date_sub(Now(), INTERVAL 1 day) )
Move the where clause up which is fastest and has the samllest result set.
For example if you know this condition has small result set move it up
( MASTER.master_status <> 'Cancelled' )
So, the query becomes
WHERE ( MASTER.master_status <> 'Cancelled' )
AND (( MASTER.master_class <> 'type2' ) OR ( MASTER.master_class <> 'type3' ))
AND ( MASTER.master_id NOT LIKE 'TAV%' )
AND ( MASTER.master_end >= Now() AND MASTER.master_start >= Date_sub(Now(), INTERVAL 1 day) )
In order of importance:
Fix the OR that is always TRUE. (Bug fix, not optimization)
Add Ben's composite (and covering) index.
Change to InnoDB.
INDEX(master_start), INDEX(master_end) -- Yes, two separate, not one composite, index. The optimizer may pick one of them and benefit.
Order the WHERE clause, as Shikhar suggests. (Even after doing the rest, this will provide only an insignificant improvement.)