Optimizing query to use the primary key instead of another index - mysql

I am trying to optimize the execution of a MYSQL query that joins two tables as follows:
CREATE TABLE `CPP` (
`RecordEntryType` varchar(7) NOT NULL default '',
`PositionNumber` mediumint(9) NOT NULL default '0',
`FundId` smallint(6) default NULL,
`QuantityHeld` decimal(14,2) default NULL,
`MarketValue` decimal(14,2) default NULL,
`PeriodBeginDate` date default NULL,
`PeriodEndDate` date NOT NULL default '0000-00-00',
PRIMARY KEY (`PositionNumber`,`PeriodEndDate`,`RecordEntryType`),
KEY `Index1` (`FundId`,`PeriodBeginDate`,`PeriodEndDate`),
KEY `FundId_idx` (`FundId`),
KEY `PeriodBeginDate_idx` (`PeriodBeginDate`),
KEY `PeriodEndDate_idx` (`PeriodEndDate`),
KEY `PositionNumber_id` (`PositionNumber`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE `classification_entity_map` (
`entity_id` varchar(32) NOT NULL,
`entity_type` varchar(32) NOT NULL,
`scheme_id` int(11) NOT NULL,
`class_id` varchar(24) NOT NULL,
PRIMARY KEY (`entity_id`,`entity_type`,`scheme_id`),
KEY `fk_classification_entity_map_1` (`scheme_id`),
KEY `fk_class` (`class_id`),
CONSTRAINT `fk_class` FOREIGN KEY (`class_id`) REFERENCES `classification_hierarchy` (`external_id`),
CONSTRAINT `fk_scheme` FOREIGN KEY (`scheme_id`) REFERENCES `classification_schemes` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
select cpp.*, cem.class_id from cpp
LEFT JOIN classification_entity_map cem
ON cem.entity_id = cpp.PositionNumber and cem.entity_type = 'Security'
AND cem.scheme_id = 9901
WHERE cpp.RecordEntryType = 'CURRENT'
AND ( cpp.MarketValue != 0 OR cpp.QuantityHeld != 0 )
AND FundId = 28
AND cpp.PeriodEndDate = '2013-09-30';
The issue is that the query takes longer than expected in mysql workbench (9.4 secs) as it is using the fk_classification_entity_map_1 index rather than the primary index on classification_entity_map table. cpp has 626,648 rows and cem has 63,487 rows.
I suspect that the issue has to do with the datatypes of cem.entity_id & cpp.PositionNumber but I am not sure as they cannot be changed. Please help. I can upload the explain output if that is helpful.
More Info: Changing the join to convert(cpp.PositionNumber, char(32)) as below does not help as the time goes up to 10 secs:
ON cem.entity_id = convert(cpp.PositionNumber, char(32)) and cem.entity_type = 'Security'
AND cem.scheme_id = 9901
The explain output for the query without convert is below and sees the PRIMARY as possible (but not in the query with convert):
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE cpp index_merge Index1,FundId_idx,PeriodEndDate_idx FundId_idx,PeriodEndDate_idx 3,3 402 Using intersect(FundId_idx,PeriodEndDate_idx); Using where
1 SIMPLE cem ref PRIMARY,fk_classification_entity_map_1 fk_classification_entity_map_1 4 const 24100

Related

Slow SQL query with GROUP BY optimization

I have a MySQL DB. Acquired data are stored in raw_data_headers, raw_data_rows and raw_data_row_details table.
raw_data_row_details has a foreign key that reference raw_data_rows.ID, the same for raw_data_rows and raw_data_headers.
In raw_data_headers are stored data headers, in raw_data_rows are stored every stage of acquisition program and in raw_data_row_details are stored details for each stage of acquisition program.
This is the query:
SELECT
q1.ProcessTypeID,
q1.TestTypeID,
q1.ComponentID,
q1.TestResultID,
COUNT(*) AS Counter
FROM (
SELECT
raw_data_headers.batch_id AS BatchID,
raw_data_test_outputs.test_output_type_id AS TestOutputTypeID,
raw_data_test_types.process_type_id AS ProcessTypeID,
raw_data_test_types.ID AS TestTypeID,
raw_data_row_details.component_id AS ComponentID,
raw_data_test_results.ID AS TestResultID
FROM raw_data_row_details
INNER JOIN raw_data_rows ON raw_data_rows.ID = raw_data_row_details.row_id
INNER JOIN raw_data_headers ON raw_data_headers.ID = raw_data_rows.header_id
INNER JOIN raw_data_test_results ON raw_data_test_results.ID = raw_data_row_details.Value
INNER JOIN raw_data_test_outputs ON raw_data_test_outputs.ID = raw_data_row_details.test_output_id
INNER JOIN raw_data_test_types ON raw_data_test_types.ID = raw_data_test_outputs.test_type_id
HAVING TestOutputTypeID = 2 AND BatchID = 1
) AS q1
GROUP BY q1.ProcessTypeID, q1.TestTypeID, q1.ComponentID, q1.TestResultID
raw_data_headers has 989'180 entries, row_data_rows has 2'967'540 entries and raw_data_row_details has 13'848'520 entries.
The subquery q1 take about 3 minutes, but final query takes about 25 minutes. I think that the point is in the GROUP BY.
How can I improve performance?
EDIT 1:
SELECT
gnuhmi.raw_data_test_types.process_type_id AS ProcessTypeID,
gnuhmi.raw_data_test_types.ID AS TestTypeID,
gnuhmi.raw_data_row_details.component_id AS ComponentID,
gnuhmi.raw_data_test_results.ID AS TestResultID,
COUNT(*) AS Counter
FROM gnuhmi.raw_data_row_details
INNER JOIN gnuhmi.raw_data_rows ON gnuhmi.raw_data_rows.ID = gnuhmi.raw_data_row_details.row_id
INNER JOIN gnuhmi.raw_data_headers ON gnuhmi.raw_data_headers.ID = gnuhmi.raw_data_rows.header_id
INNER JOIN gnuhmi.raw_data_test_results ON gnuhmi.raw_data_test_results.ID = gnuhmi.raw_data_row_details.Value
INNER JOIN gnuhmi.raw_data_test_outputs ON gnuhmi.raw_data_test_outputs.ID = gnuhmi.raw_data_row_details.test_output_id
INNER JOIN gnuhmi.raw_data_test_types ON gnuhmi.raw_data_test_types.ID = gnuhmi.raw_data_test_outputs.test_type_id
WHERE gnuhmi.raw_data_test_outputs.test_output_type_id = 2 AND gnuhmi.raw_data_headers.batch_id = 1
GROUP BY
gnuhmi.raw_data_test_results.ID,
gnuhmi.raw_data_row_details.component_id,
gnuhmi.raw_data_test_types.ID,
gnuhmi.raw_data_test_types.process_type_id
This is the new query, without subquery and WHERE. This increased performance (thanks #Yogesh Sharma).
this is raw_data_headers structure:
CREATE TABLE `raw_data_headers` (
`ID` int(11) unsigned NOT NULL AUTO_INCREMENT COMMENT 'Univocal record key',
`ProductID` int(11) NOT NULL COMMENT 'Product numeric ID',
`Datetime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT 'Univocal record creation date',
`batch_id` int(11) DEFAULT NULL COMMENT 'Univocal batch key',
`RecipeName` varchar(80) DEFAULT NULL COMMENT 'Used recipe name',
`RecipeVersion` smallint(6) DEFAULT NULL COMMENT 'Used recipe version',
`process_result_id` smallint(6) DEFAULT NULL COMMENT 'Process result key',
`invalidated` tinyint(1) NOT NULL DEFAULT '0' COMMENT 'invalidation after counters reset',
PRIMARY KEY (`ID`),
KEY `FK_raw_data_headers_batches_ID` (`batch_id`),
KEY `FK_raw_data_headers_process_re` (`process_result_id`),
CONSTRAINT `FK_raw_data_headers_batches_ID` FOREIGN KEY (`batch_id`) REFERENCES `batches` (`ID`) ON UPDATE CASCADE,
CONSTRAINT `FK_raw_data_headers_process_re` FOREIGN KEY (`process_result_id`) REFERENCES `process_result` (`ID`) ON DELETE NO ACTION ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Stores raw data headers'
This the raw_dato_rows:
CREATE TABLE `raw_data_rows` (
`ID` int(11) unsigned NOT NULL AUTO_INCREMENT COMMENT 'Univocal record key',
`Datetime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT 'Univocal record creation date',
`header_id` int(11) unsigned NOT NULL COMMENT 'Univocal raw data header key',
`process_type_id` smallint(6) NOT NULL COMMENT 'Univocal process type key',
`process_result_id` smallint(6) NOT NULL COMMENT 'Univocal process result key',
PRIMARY KEY (`ID`),
KEY `FK_raw_data_rows_header_id` (`header_id`),
KEY `FK_raw_data_rows_process_resu2` (`process_result_id`),
KEY `FK_raw_data_rows_process_resul` (`process_type_id`),
CONSTRAINT `FK_raw_data_rows_header_id` FOREIGN KEY (`header_id`) REFERENCES `raw_data_headers` (`ID`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `FK_raw_data_rows_process_resu2` FOREIGN KEY (`process_result_id`) REFERENCES `process_result` (`ID`) ON DELETE NO ACTION ON UPDATE CASCADE,
CONSTRAINT `FK_raw_data_rows_process_resul` FOREIGN KEY (`process_type_id`) REFERENCES `process_types` (`ID`) ON DELETE NO ACTION ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=2967541 DEFAULT CHARSET=utf8 COMMENT='Stores row data rows'
and finally this is the raw_data_row_details one:
CREATE TABLE `raw_data_row_details` (
`ID` bigint(20) NOT NULL AUTO_INCREMENT COMMENT 'Univocal row detail key',
`row_id` int(11) unsigned NOT NULL COMMENT 'Univocal row key',
`test_output_id` int(11) NOT NULL COMMENT 'Univocal test output key',
`component_id` int(11) NOT NULL COMMENT 'The component that take the measurement',
`Value` double NOT NULL COMMENT 'Output value',
PRIMARY KEY (`ID`),
KEY `FK_raw_data_row_details_row_id` (`row_id`),
KEY `FK_raw_data_rows_raw_data_test_outputs_ID` (`test_output_id`),
KEY `raw_data_row_details_components_FK` (`component_id`),
CONSTRAINT `FK_raw_data_row_details_row_id` FOREIGN KEY (`row_id`) REFERENCES `raw_data_rows` (`ID`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `FK_raw_data_rows_raw_data_test_outputs_ID` FOREIGN KEY (`test_output_id`) REFERENCES `raw_data_test_outputs` (`ID`) ON UPDATE CASCADE,
CONSTRAINT `raw_data_row_details_components_FK` FOREIGN KEY (`component_id`) REFERENCES `components` (`ID`) ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=13848521 DEFAULT CHARSET=utf8 COMMENT='Stores raw data rows details'
You don't need to use subquery, just use where clause with group by :
SELECT raw_data_test_types.process_type_id AS ProcessTypeID,
raw_data_test_types.ID AS TestTypeID,
raw_data_row_details.component_id AS ComponentID,
raw_data_test_results.ID AS TestResultID, COUNT(*) AS Counter
FROM raw_data_row_details INNER JOIN
raw_data_rows
ON raw_data_rows.ID = raw_data_row_details.row_id INNER JOIN
raw_data_headers
ON raw_data_headers.ID = raw_data_rows.header_id INNER JOIN
raw_data_test_results
ON raw_data_test_results.ID = raw_data_row_details.Value INNER JOIN
raw_data_test_outputs
ON raw_data_test_outputs.ID = raw_data_row_details.test_output_id INNER JOIN
raw_data_test_types
ON raw_data_test_types.ID = raw_data_test_outputs.test_type_id
WHERE raw_data_headers.batch_id = 1 AND raw_data_test_outputs.test_output_type = 2
GROUP BY raw_data_test_types.process_type_id, raw_data_test_types.ID,
raw_data_row_details.component_id, raw_data_test_results.ID;
Add indexes. TestOutputTypeID and BatchID need to be covered and probably are not.
To see what's currently going on, use EXPLAIN in the MySQL console. You will probably see an indication that a full table scan is happening i.e. the join type is marked as ALL.
It's often the case that the query optimiser will use the same execution plan for different queries e.g. by expanding the subquery as if you hadn't used it. Only EXPLAIN will show you what's what.
Here's the docs on how to interpret the EXPLAIN output: https://dev.mysql.com/doc/refman/8.0/en/explain-output.html
HAVING TestOutputTypeID = 2 AND BatchID = 1
Change that from HAVING to WHERE, and have indexes in each of those columns.
Also have these indexes:
raw_data_row_details: (row_id)
raw_data_rows: (header_id)
raw_data_row_details: (test_output_id)
raw_data_test_outputs: (test_type_id)
Get rid of raw_data_ from the table names; it just clutters the queries.
If those do not help enough, please provide EXPLAIN SELECT ... and SHOW CREATE TABLE.

MYSQL: Still using filesort, using temporary, even though the index is properly added

I have this query that I think I have indexed them properly. But still get filesort and temporary indexing.
The query is as follow:
SELECT * FROM
(SELECT PIH.timestamp, PIH.practice_id, PIH.timestamp as invoice_num, PIH.custom_invnum,
CEIL(PIH.total_invoice + PIH.tax + PIH.other_bill) as grand_total, PIH.total_invoice, PIH.extra_charge_ph as extra_charge,
PIH.tax, PIH.other_bill, PIH.changed, PIH.source,
PIH.notes, PIH.is_active, PIH.paid as pay,
PIH.covered_amount, IF(PIH.is_active = 1, IF(PIH.total_invoice = 0 OR PIH.total_invoice + PIH.tax + PIH.other_bill - PIH.covered_amount <= PIH.paid, 1, IF(PIH.paid = 0, 0, 2)), '') as invoice_st,
RPP.patient_id, RPP.first_name as pfname, RPP.last_name as plname, RPP.dob as p_dob, RPP.gender as p_gender, RPP.reff_id as p_reff_id, RPP.mobile_number as p_mobile, IF(PIH.group_doctors IS NOT NULL, NULL, D.doc_title) as doc_title, IF(PIH.group_doctors IS NOT NULL,
PIH.group_doctors, D.first_name) as doc_fname, IF(PIH.group_doctors IS NOT NULL, PIH.group_doctors, D.last_name) as doc_lname, IF(PIH.group_doctors IS NOT NULL, NULL, D.spc_dsg) as spc_dsg, PA.username, TL.timestamp as checkout_time, IP.name as ip_name, PMM.timestamp as mcu_id
FROM practice_invoice_header PIH
INNER JOIN practice_invoice_detail PID ON PID.timestamp = PIH.timestamp
AND PID.practice_id = PIH.practice_id
INNER JOIN practice_queue_list PQL ON PQL.encounter_id = PID.encounter_id
AND PQL.practice_place_id = PIH.practice_id
INNER JOIN temp_search_view D ON D.id = PQL.doctor_id
AND D.pp_id = PQL.practice_place_id
INNER JOIN practice_place PP ON PP.id = PIH.practice_id
INNER JOIN ref_practice_patient RPP ON RPP.patient_id = PIH.patient_id
AND RPP.practice_id = PP.parent_id
LEFT JOIN practice_mcu_module PMM ON PMM.id = PID.mcu_module_id
AND PMM.practice_id = PID.practice_id
LEFT JOIN transaction_log TL ON TL.reff_id = PIH.timestamp
AND TL.practice_id = PIH.practice_id
AND TL.activity = "CHK"
LEFT JOIN practice_admin PA ON PA.id = TL.admin_id
LEFT JOIN insurance_plan IP ON IP.id = PIH.insurance_plan_id
WHERE PIH.source <> 'P'
AND PIH.practice_id = 28699
AND PIH.is_active = 1
AND PQL.cal_id >= 201807010
AND PQL.cal_id <= 201807312
GROUP BY PIH.timestamp, PIH.practice_id
) AS U LIMIT 0,20
NOTE: I only show some of the main tables that are used in this query and the ones that sort using filesort/temporary, of course If I post everything it will be too many information.
The query is about list of invoices, and it has the header (practice_invoice_header) and the details (practice_invoice_detail). And this query join with the practice_place table
CREATE TABLE `practice_invoice_header` (
`timestamp` bigint(20) NOT NULL,
`practice_id` int(11) NOT NULL,
`cal_id` int(11) NOT NULL,
`patient_id` int(11) NOT NULL DEFAULT 0,
`source` char(1) NOT NULL COMMENT 'E = ENCOUNTER; P = OTHER (PHARM / LAB)',
`total_invoice` float(30,2) NOT NULL DEFAULT 0.00,
`tax` float(30,2) NOT NULL DEFAULT 0.00,
`other_bill` float(30,2) NOT NULL DEFAULT 0.00,
`changed` float(30,2) NOT NULL DEFAULT 0.00,
`paid` float(30,2) NOT NULL DEFAULT 0.00,
`covered_amount` float(30,2) NOT NULL DEFAULT 0.00,
`notes` varchar(300) DEFAULT NULL,
`custom_invnum` varchar(30) DEFAULT NULL,
`insurance_plan_id` varchar(20) DEFAULT NULL,
`is_active` int(11) NOT NULL DEFAULT 1,
`cancel_reason` varchar(200) DEFAULT NULL,
PRIMARY KEY (`timestamp`,`practice_id`),
KEY `custom_invnum` (`custom_invnum`),
KEY `insurance_plan_id` (`insurance_plan_id`),
KEY `practice_id_3` (`practice_id`,`xxx_reff_id`),
KEY `ph_check_status` (`ph_checked_by`),
KEY `cal_id` (`cal_id`),
KEY `practice_id_5` (`practice_id`,`outpx_id`),
KEY `practice_id_6` (`practice_id`,`cal_id`,`source`,`is_active`),
KEY `total_invoice` (`total_invoice`),
KEY `patient_id` (`patient_id`),
CONSTRAINT `practice_invoice_header_ibfk_1` FOREIGN KEY (`practice_id`)
REFERENCES `practice_place` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
CREATE TABLE `practice_invoice_detail` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`timestamp` bigint(20) NOT NULL,
`practice_id` int(11) NOT NULL,
`item_id` int(11) NOT NULL,
`item_sub_id` int(11) DEFAULT NULL,
`item_type` char(1) NOT NULL COMMENT 'D = DRUG; P = PROCEDURE; L = LAB',
`item_qty` float NOT NULL,
`item_price` float(22,2) NOT NULL,
`discount` float NOT NULL DEFAULT 0,
`is_active` int(11) NOT NULL DEFAULT 1,
PRIMARY KEY (`id`),
KEY `item_type` (`item_type`),
KEY `timestamp` (`timestamp`,`practice_id`),
KEY `practice_id` (`practice_id`),
KEY `item_id_2` (`item_id`,`item_sub_id`,`item_type`),
KEY `timestamp_2` (`timestamp`,`practice_id`,`item_id`,`item_sub_id`,`item_type`),
KEY `practice_id_3` (`practice_id`,`item_type`),
KEY `the_id` (`id`,`practice_id`) USING BTREE,
KEY `timestamp_3` (`timestamp`,`practice_id`,`item_type`,`item_comission`,
`item_comission_type`, `doctor_id`,`item_id`,`item_sub_id`,`id`) USING BTREE,
KEY `timestamp_4` (`timestamp`,`practice_id`,`item_id`,`item_sub_id`,`item_type`,
`item_comission_2`,`item_comission_2_type`,`doctor_id_2`,`id`) USING BTREE,
KEY `request_id` (`request_id`,`request_practice`),
KEY `timestamp_5` (`timestamp`,`practice_id`,`is_active`),
KEY `practice_id_6` (`practice_id`,`encounter_id`,`is_active`),
KEY `practice_id_7` (`practice_id`,`item_type`,`encounter_id`,`is_active`),
CONSTRAINT `practice_invoice_detail_ibfk_1` FOREIGN KEY (`timestamp`)
REFERENCES `practice_invoice_header` (`timestamp`) ON DELETE CASCADE,
CONSTRAINT `practice_invoice_detail_ibfk_2` FOREIGN KEY (`practice_id`)
REFERENCES `practice_place` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1447348 DEFAULT CHARSET=latin1
CREATE TABLE `ref_practice_patient` (
`practice_id` int(11) NOT NULL,
`patient_id` int(11) NOT NULL,
`reff_id` varchar(35) DEFAULT NULL,
`is_user` int(11) NOT NULL DEFAULT 0,
`parent_user_id` int(11) NOT NULL DEFAULT 0
PRIMARY KEY (`practice_id`,`patient_id`),
KEY `patient_id` (`patient_id`),
KEY `reff_id` (`reff_id`),
KEY `practice_id` (`practice_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
CREATE TABLE `practice_place` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(75) NOT NULL,
`statement` text DEFAULT NULL,
`address` varchar(200) NOT NULL,
`phone` varchar(15) NOT NULL,
`wa_number` varchar(15) DEFAULT NULL,
`fax` varchar(15) NOT NULL,
`email` varchar(50) NOT NULL,
`is_branch` int(11) NOT NULL,
`parent_id` int(11) NOT NULL,
`editted_by` int(11) DEFAULT NULL,
`editted_date` bigint(20) DEFAULT NULL,
`status` int(11) NOT NULL DEFAULT 1,
PRIMARY KEY (`id`),
KEY `parent_id` (`parent_id`),
KEY `reff_id` (`reff_id`),
) ENGINE=InnoDB AUTO_INCREMENT=29058 DEFAULT CHARSET=latin1
And below is the explain produce by the query, and I highlight the one using filsort (no. 2)
1 PRIMARY ALL NULL NULL NULL NULL 14028
2 DERIVED PP const PRIMARY,parent_id PRIMARY 4 const 1 Using temporary; Using filesort
2 DERIVED PIH ref PRIMARY,practice_id_3,practice_id_5,practice_id_6,practice_id_8,pharm_read,lab_read,rad_read,patient_id
practice_id_5 4 const 7014 Using where
2 DERIVED RPP eq_ref PRIMARY,patient_id,practice_id,practice_id_2,practice_id_3
PRIMARY 8 const,k6064619_lokadok.PIH.patient_id 1
2 DERIVED PID ref timestamp,practice_id,timestamp_2,practice_id_2,practice_id_3,timestamp_3,timestamp_4,practice_id_4,practice_id_5,timestamp_5,practice_id_6,practice_id_7
timestamp 12 k6064619_lokadok.PIH.timestamp,const 1
2 DERIVED PMM eq_ref PRIMARY,id,practice_id
PRIMARY 4 k6064619_lokadok.PID.mcu_module_id 1 Using where
2 DERIVED TL ref reff_id reff_id 12 k6064619_lokadok.PIH.timestamp,const 1 Using where
2 DERIVED PA eq_ref PRIMARY PRIMARY 4 k6064619_lokadok.TL.admin_id 1 Using where
2 DERIVED IP ref PRIMARY,id PRIMARY 22 k6064619_lokadok.PIH.insurance_plan_id 1 Using where
2 DERIVED PQL ref PRIMARY,encounter_id,cal_id_2
encounter_id 5 k6064619_lokadok.PID.encounter_id 2 Using where; Using index
2 DERIVED D ref doc_id,pp_id,id_2,pp_doc doc_id 4 k6064619_lokadok.PQL.doctor_id 1 Using where
I believe I have indexed the parent_id in practice_place table, and also in ref_practice_patient the patient_id and practice_id is PRIMARY.
Why have the outer query? The Optimizer is free to shuffle the result of the inner query, thereby leaving the LIMIT to pick an ordering that you not expecting. At least add ORDER BY, preferably also toss the outer select.
Main Index
Let's analyze the likely place to design an index:
WHERE PIH.source <> 'P'
AND PIH.practice_id = 28699
AND PIH.is_active = 1
AND PQL.cal_id >= 201807010
AND PQL.cal_id <= 201807312
GROUP BY PIH.timestamp, PIH.practice_id
Since there is a mixture of tables involved, it is not possible to have an index that handles all the WHERE.
Since the tests are not all =, it is not possible to reach beyond the WHERE and include columns of the GROUP BY.
So, I see two indexes:
PIH: INDEX(practice_id, is_active, -- in either order
source)
PQL: INDEX(cal_id)
Since we can't get into the GROUP BY, the Optimizer has no choice but to gather all the rows based on WHERE, do some grouping, and do an ORDER BY (as I said, that is missing, but necessary).
Therefor, GROUP BY and the ORDER BY will require one or two temps and filesorts. No, you can't get away from it, at least not without changing the query in some way. (Please note that "filesort" might actually be done in RAM.)
Your extra SELECT layer may be adding an extra temp and filesort.
EXPLAIN fails to point out when there are two sorts. EXPLAIN FORMAT=JSON has that sort of detail.
Other issues...
Having a timestamp in a PRIMARY KEY is risky unless you are sure that two rows can occur with the same timestamp, or there is another column in the PK to assure uniqueness.
Don't use FLOAT for money. It will incur extra rounding errors, and it cannot store more than about 7 significant digits (that' less than $100K to the penny). Don't use float(30,2), it is even worse because you are forcing an extra rounding. Use DECIMAL(30,2), but pick something reasonable, not 30. It takes 14 bytes -- mostly a waste of space.
Whenever you have INDEX(a,b), you don't need INDEX(a); it is redundant and slows down (slightly) INSERTs.
LEFT JOIN transaction_log TL
ON TL.reff_id = PIH.timestamp
AND TL.practice_id = PIH.practice_id
AND TL.activity = "CHK"
needs
INDEX(reff_id, practice_id, activity) -- in any order
Also
INNER JOIN practice_invoice_detail PID ON PID.timestamp = PIH.timestamp
AND PID.practice_id = PIH.practice_id
PIH: INDEX(practice_id, timestamp) -- not the opposite order
PIH: INDEX(practice_id, is_active, timestamp)
INNER JOIN practice_queue_list PQL ON PQL.encounter_id = PID.encounter_id
AND PQL.practice_place_id = PIH.practice_id
PQL: INDEX(encounter_id, cal_id)
PQL: INDEX(encounter_id, practice_place_id, cal_id)
Some discussion...
In a JOIN, EXPLAIN shows one order of working through the tables; it gives you no clues of how things would be if it worked through the tables some other way.
I have attempted to show what index might be needed if PQL were used first, or if PIH were used first -- namely use the WHERE stuff for that table, then
I have attempted to show the optimial index for joining to the other table.
Probably the Optimizer will not start with any table not mentioned in the WHERE clause, but this is not a certainty.
I have not listed the optimal indexes for getting to each of the other tables.
More discussion: http://mysql.rjweb.org/doc.php/index_cookbook_mysql

Slow MySql query with order by limit with index

I have a query generated by Entity Framework, that looks like this:
SELECT
`Extent1`.`Id`,
`Extent1`.`Name`,
`Extent1`.`ExpireAfterUTC`,
`Extent1`.`FileId`,
`Extent1`.`FileHash`,
`Extent1`.`PasswordHash`,
`Extent1`.`Size`,
`Extent1`.`TimeStamp`,
`Extent1`.`TimeStampOffset`
FROM `files` AS `Extent1` INNER JOIN `containers` AS `Extent2` ON `Extent1`.`ContainerId` = `Extent2`.`Id`
ORDER BY
`Extent1`.`Id` ASC LIMIT 0,10
It runs painfully slow.
I have indexes on files.Id (PK), files.ContainerId(FK), containers.Id(PK) and I don't understand why mysql seems to be doing a full sort before returning the required records, even though there already is an index on the Id column.
Further more, this data is displayed in a grid which supports filters, sorts and pagination and a good use of the indexes is highly required.
Here are the table definitions:
CREATE TABLE `files` (
`Id` int(11) NOT NULL AUTO_INCREMENT,
`FileId` varchar(100) NOT NULL,
`ContainerId` int(11) NOT NULL,
`ContainerGuid` binary(16) NOT NULL,
`Guid` binary(16) NOT NULL,
`Name` varchar(1000) NOT NULL,
`ExpireAfterUTC` datetime DEFAULT NULL,
`PasswordHash` binary(32) DEFAULT NULL,
`FileHash` tinyblob NOT NULL,
`Size` bigint(20) NOT NULL,
`TimeStamp` double NOT NULL,
`TimeStampOffset` double NOT NULL,
`FilePostId` int(11) NOT NULL,
`FilePostGuid` binary(16) NOT NULL,
`AttributeId` int(11) NOT NULL,
PRIMARY KEY (`Id`),
UNIQUE KEY `FileId_UNIQUE` (`FileId`),
KEY `Files_ContainerId_FK` (`ContainerId`),
KEY `Files_AttributeId_FK` (`AttributeId`),
KEY `Files_FileId_index` (`FileId`),
KEY `Files_FilePostId_index` (`FilePostId`),
KEY `Files_Guid_index` (`Guid`),
CONSTRAINT `Files_AttributeId_FK` FOREIGN KEY (`AttributeId`) REFERENCES `attributes` (`Id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `Files_ContainerId_FK` FOREIGN KEY (`ContainerId`) REFERENCES `containers` (`Id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `Files_FilePostsId_FK` FOREIGN KEY (`FilePostId`) REFERENCES `fileposts` (`Id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=977942 DEFAULT CHARSET=utf8;
CREATE TABLE `containers` (
`Id` int(11) NOT NULL AUTO_INCREMENT,
`Name` varchar(255) NOT NULL,
`Guid` binary(16) NOT NULL,
`AesKey` binary(32) NOT NULL,
`FileCount` int(10) unsigned NOT NULL DEFAULT '0',
`Size` bigint(20) unsigned NOT NULL,
PRIMARY KEY (`Id`),
KEY `Containers_Guid_index` (`Guid`),
KEY `Containers_Name_index` (`Name`)
) ENGINE=InnoDB AUTO_INCREMENT=76 DEFAULT CHARSET=utf8;
You will notice there are some other relationships in the files table, which I have left out just to simplify the query without affecting the observed behavior.
Here is also an output from EXPLAIN EXTENDED:
+----+-------------+---------+-------+----------------------+-----------------------+---------+----------------------------------+-------+----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+-------+----------------------+-----------------------+---------+----------------------------------+-------+----------+----------------------------------------------+
| 1 | SIMPLE | Extent2 | index | PRIMARY | Containers_Guid_index | 16 | NULL | 9 | 100.00 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | Extent1 | ref | Files_ContainerId_FK | Files_ContainerId_FK | 4 | netachmentgeneraltest.Extent2.Id | 73850 | 100.00 | |
+----+-------------+---------+-------+----------------------+-----------------------+---------+----------------------------------+-------+----------+----------------------------------------------+
Files table has ~900000 records (and counting) and containers has 9.
This issue only occurs when ORDER BY is present.
Also, I can't do much in terms of modifying the query because it is generated by Entity Framework. I did as much as I could with the LINQ query in order to simplify it (at first it had some horrible sub queries which executed even slower).
Query hints (as in force index) are not a solution here either, because EF does not support such features.
I am mostly hoping to find some database level optimizations to do.
For those who didn't spot the tags, the database in question is MySql.
MySQL only uses one index per table. Right now, it's preferring to use the foreign key index so the join is efficient, but that means that the sort is not using an index.
Try creating a compound index on ContainerId, filedID
This is essentially your query:
SELECT e1.*
FROM `files` e1 INNER JOIN
`containers` e2
ON e1.`ContainerId` = e2.`Id`
ORDER BY e1.`Id` ASC
LIMIT 0, 10;
You can try an index on files(id, ContainerId). This might inspire MySQL to use the composite index, focused on the order by.
It would probably be more likely if the query were phrased as:
SELECT e1.*
FROM `files` e1
WHERE EXISTS (SELECT 1 FROM containers e2 WHERE e1.`ContainerId` = e2.`Id`)
ORDER BY e1.`Id` ASC
LIMIT 0, 10;
There is one way that does work to use the indexes. However, it depends on something in MySQL that is not documented to work (although it does in practice). The following will read the data in order, but it incurs the overhead of materializing the subquery -- but not for a sort:
SELECT e1.*
FROM (SELECT e1.*
FROM files e1
ORDER BY e1.id ASC
) e1
WHERE EXISTS (SELECT 1 FROM containers e2 WHERE e1.`ContainerId` = e2.`Id`)
LIMIT 0, 10;

Collect data from 2 different tables, following conditions

I have 2 MySQL tables. One is pastsergicalhistory_type and the other one is pastsurgicalhistory
Below is pastsergicalhistory_type
CREATE TABLE `pastsergicalhistory_type` (
`idPastSergicalHistory_Type` int(11) NOT NULL AUTO_INCREMENT,
`idUser` int(11) DEFAULT NULL,
`Name` varchar(45) NOT NULL,
PRIMARY KEY (`idPastSergicalHistory_Type`),
KEY `fk_PastSergicalHistory_Type_User1_idx` (`idUser`),
CONSTRAINT `fk_PastSergicalHistory_Type_User1` FOREIGN KEY (`idUser`) REFERENCES `user` (`idUser`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=13 DEFAULT CHARSET=utf8
Below is pastsurgicalhistory
CREATE TABLE `pastsurgicalhistory` (
`idPastSurgicalHistory` int(11) NOT NULL AUTO_INCREMENT,
`idPatient` int(11) NOT NULL,
`idPastSergicalHistory_Type` int(11) NOT NULL,
`Comment` varchar(45) DEFAULT NULL,
`ActiveStatus` tinyint(1) NOT NULL,
`LastUpdated` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`idPastSurgicalHistory`),
KEY `fk_PastSurgicalHistory_Patient1_idx` (`idPatient`),
KEY `fk_PastSurgicalHistory_PastSergicalHistory_Type1_idx` (`idPastSergicalHistory_Type`),
CONSTRAINT `fk_PastSurgicalHistory_PastSergicalHistory_Type1` FOREIGN KEY (`idPastSergicalHistory_Type`) REFERENCES `pastsergicalhistory_type` (`idPastSergicalHistory_Type`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `fk_PastSurgicalHistory_Patient1` FOREIGN KEY (`idPatient`) REFERENCES `patient` (`idPatient`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=11 DEFAULT CHARSET=utf8
Now my requirement is as this, I will explain it in point form.
Get all the data from pastsergicalhistory_type where idUser is NULL or idUser is 1.
Get all the data from pastsurgicalhistory where idPatient is 2.
as you can see, the foreign key of pastsurgicalhistory is the primary key of pastsergicalhistory_type.
I tried the below query, but it gave me the wrong results. It only displayed what is available in pastsurgicalhistory. The data in pastsergicalhistory_type (which follows the condition in point 1) which is not in pastsurgicalhistory is not displayed.
SELECT pastsergicalhistory_type.*,
pastsurgicalhistory.*
FROM pastsergicalhistory_type
LEFT JOIN pastsurgicalhistory ON pastsurgicalhistory.`idPastSergicalHistory_Type` = pastsergicalhistory_type.`idPastSergicalHistory_Type`
WHERE pastsergicalhistory_type.idUser = NULL OR pastsergicalhistory_type.idUser=1 AND pastsurgicalhistory.idPatient=2
So, how can I solve this problem?
EDIT
If I use the AND pastsurgicalhistory.idPatient=2 in my where clause, it actually filters the "entire" result set. This will give me results where idPatient is related to 2. But as I mentioned, I need data which is not available in pastsurgicalhistory table as well.
Try
SELECT pastsergicalhistory_type.*,
pastsurgicalhistory.*
FROM pastsergicalhistory_type
LEFT JOIN pastsurgicalhistory ON
(pastsurgicalhistory.`idPastSergicalHistory_Type` =
pastsergicalhistory_type.`idPastSergicalHistory_Type` and
pastsurgicalhistory.idPatient=2)
WHERE (pastsergicalhistory_type.idUser = NULL OR
pastsergicalhistory_type.idUser=1) ;
Move pastsurgicalhistory.idPatient=2 to join condition
SELECT pastsergicalhistory_type.*,
pastsurgicalhistory.*
FROM pastsergicalhistory_type
LEFT JOIN pastsurgicalhistory ON pastsurgicalhistory.`idPastSergicalHistory_Type` = pastsergicalhistory_type.`idPastSergicalHistory_Type`
AND pastsurgicalhistory.idPatient=2
WHERE pastsergicalhistory_type.idUser IS NULL OR pastsergicalhistory_type.idUser=1
Use paraenthises?
WHERE pastsergicalhistory_type.idUser = NULL OR pastsergicalhistory_type.idUser=1 AND pastsurgicalhistory.idPatient=2
I belive would return results where idUser is 1 and idPatient is 2 or iduser is null
Try this:
WHERE (pastsergicalhistory_type.idUser = NULL OR pastsergicalhistory_type.idUser=1) AND pastsurgicalhistory.idPatient=2
If I understand you correctly?
SELECT pastsergicalhistory_type.*,
pastsurgicalhistory.*
FROM pastsergicalhistory_type
RIGHT JOIN pastsurgicalhistory ON pastsurgicalhistory.`idPastSergicalHistory_Type` = pastsergicalhistory_type.`idPastSergicalHistory_Type`
WHERE (pastsergicalhistory_type.idUser = NULL OR pastsergicalhistory_type.idUser=1) AND pastsurgicalhistory.idPatient=2
Even if it works without parenthesis for you, I would say it's better to use to make it more readable.

MySQL SELECT DISTINCT with multiple JOINS

I'm trying to SELECT, and get a unique result set, from a MySQL database, as shown below. My problem is, I think, I don't understand LEFT Joins well enough. Or, maybe I need to use a different Join approach.
Here's a description of the database.
tbAdult (Adults) have x number of tbchild (Children) , and uses a cross-ref table called tbadultchildxref. This table has an f-key to both Adult and Child. I have to use an x-ref table, because there's a many-to-many relationship between these two tables, and there's other data that's keep in the x-ref, which I have removed for simplicity.
In turn, each Child belongs to a Program (tblprogram).
Each Program has x number of Cameras (tblCamera). Again, I have to use an x-ref table between tblProgram and tblCamera due to a many-to-many relationship, and other reasons.
What I am trying to get at, is a unique list of Cameras for a given Parent.
For example, Parent 675 has three children, Child ID's 789,788, and 789. Those three children, in turn, belong to Program ID's 4, 5, and 6.
Program ID 4 has Camera ID's 1,2,3
Program ID 5 has Camera ID's 4,5,6
Program ID 6 has Camera ID's 1,6,7,8
What I would like the result set to be is 1,2,3,4,5,6,7,8
I have tried different combinations of SELECT DISTINCT, LEFT JOINS on the various x-ref tables, etc. but I just can't seem to get it.
My other problem, along the way, is I need to check the "Active" fields in Adult, Child, and Program to equal = 1 (true) for the result set.
Thanks in advance.
CREATE TABLE `tbladult` (
`pkAdultID` int(11) NOT NULL AUTO_INCREMENT,
`fldAdultActive` tinyint(1) DEFAULT '1',
`fldAdultLogin` varchar(30) DEFAULT NULL,
`fldAdultPassword` varchar(45) DEFAULT NULL,
`fldAdultFirstName` varchar(60) DEFAULT NULL,
`fldAdultLastName` varchar(60) DEFAULT NULL,
PRIMARY KEY (`pkAdultID`)
) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=latin1;
/*Table structure for table `tblchild` */
CREATE TABLE `tblchild` (
`pkChildID` int(11) NOT NULL AUTO_INCREMENT,
`fldChildActive` tinyint(4) DEFAULT NULL,
`fldChildFirstName` varchar(45) DEFAULT NULL,
`fldChildLastName` varchar(45) DEFAULT NULL,
`fkChildProgram` int(1) DEFAULT NULL,
PRIMARY KEY (`pkChildID`),
KEY `FK_tblchild` (`fkChildProgram`),
CONSTRAINT `FK_tblchild` FOREIGN KEY (`fkChildProgram`) REFERENCES `tblprogram` (`pkProgramID`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=latin1;
/*Table structure for table `tbladultchildxref` */
CREATE TABLE `tbladultchildxref` (
`pkAdultChildxRefID` int(11) NOT NULL AUTO_INCREMENT,
`fldAdultChildxRefActive` tinyint(1) DEFAULT '1',
`fkAdultID` int(11) DEFAULT NULL,
`fkChildID` int(11) DEFAULT NULL,
PRIMARY KEY (`pkAdultChildxRefID`),
KEY `FK_tbladultchildxref` (`fkAdultID`),
KEY `FK_tbladultchildxref2` (`fkChildID`),
CONSTRAINT `FK_tbladultchildxref` FOREIGN KEY (`fkAdultID`) REFERENCES `tbladult` (`pkAdultID`),
CONSTRAINT `FK_tbladultchildxref2` FOREIGN KEY (`fkChildID`) REFERENCES `tblchild` (`pkChildID`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=latin1;
/*Table structure for table `tblprogram` */
CREATE TABLE `tblprogram` (
`pkProgramID` int(11) NOT NULL AUTO_INCREMENT,
`fldProgamActive` tinyint(1) DEFAULT '1',
`fldProgramName` varchar(50) DEFAULT NULL,
PRIMARY KEY (`pkProgramID`)
) ENGINE=InnoDB AUTO_INCREMENT=8 DEFAULT CHARSET=latin1;
/*Table structure for table `tblcamera` */
CREATE TABLE `tblcamera` (
`pkCameraID` int(11) NOT NULL AUTO_INCREMENT,
`fldCameraName` varchar(50) DEFAULT NULL,
`fldCameralocation` varchar(50) DEFAULT NULL,
`fldCameraURL` varchar(250) DEFAULT NULL,
PRIMARY KEY (`pkCameraID`)
) ENGINE=InnoDB AUTO_INCREMENT=9 DEFAULT CHARSET=latin1;
/*Table structure for table `tblprogramcameraxref` */
CREATE TABLE `tblprogramcameraxref` (
`pkProgramCameraXrefID` int(11) NOT NULL AUTO_INCREMENT,
`fkProgramID` int(11) DEFAULT NULL,
`fkCameraID` int(11) DEFAULT NULL,
PRIMARY KEY (`pkProgramCameraXrefID`),
KEY `FK_tblprogramcameraxref` (`fkProgramID`),
KEY `FK_camerasforprograms` (`fkCameraID`),
CONSTRAINT `FK_camerasforprograms` FOREIGN KEY (`fkCameraID`) REFERENCES `tblcamera` (`pkCameraID`),
CONSTRAINT `FK_tblprogramcameraxref` FOREIGN KEY (`fkProgramID`) REFERENCES `tblprogram` (`pkProgramID`)
No LEFT JOINs necessary:
SELECT DISTINCT tblprogramcameraxref.fkcameraid
FROM tblprogramcameraxref
JOIN tblprogram ON tblprogramcameraxref.fkprogramid = tblprogram.pkprogramid
AND tblprobram.fldProgramActive = 1
JOIN tblchild ON tblprogramcameraxref.fkprogramid = tblchild.fkchildprogram
AND tblchild.fldChildActive = 1
JOIN tbladultchildxref ON tblchild.pkchildid = tbladultchildxref.fkchildid
AND tbladultchildxref.fldAdultChildxRefActive = 1
WHERE tbladultchildxref.fkadultid = 675
Also, you may want to check the fkChildProgram int(1) DEFAULT NULL, in tblchild - the column it references is defined as int(11)
At this point you shouldn't really need to check if Adult is active (since that's the search criteria you started with), but if you must - just add this to the end of the join list:
JOIN tbladult ON tbladultchildxref.fkadultid = tbladult.pkadultid
AND tbladult.fldAdultActive = 1
It is a long description. If I have understood the question correctly this query should help you -
SELECT DISTINCT pcref.fkCameraID
FROM tbladult adult,
tblchild child,
tbladultchildxref acref,
tblprogram prog,
tblcamera camera,
tblprogramcameraxref pcref
WHERE adult.pkAdultID = 675
AND adult.fldAdultActive = TRUE
AND adult.pkAdultID = acref.fkAdultID
AND acref.fkChildID = child.pkChildID
AND child.fldChildActive = TRUE
AND child.fkChildProgram = prog.pkProgramID
AND prog.fldProgamActive = TRUE
AND prog.pkProgramID = pcref.fkProgramID