I have found very long query in my system. The MySQL Slow Log says the following:
# Time: 2018-07-08T18:47:02.273314Z
# User#Host: server[server] # localhost [] Id: 1467
# Query_time: 97.251247 Lock_time: 0.000210 Rows_sent: 50 Rows_examined: 41646378
SET timestamp=1531075622;
SELECT n1.full_name AS sender_full_name, s1.email AS sender_email,
e.subject, e.body, e.attach, e.date, e.id, r.status,
n2.full_name AS receiver_full_name, s2.email AS receiver_email,
r.basket,
FROM email_routing r
JOIN email e ON e.id = r.message_id
JOIN people_emails s1 ON s1.id = r.sender_email_id
JOIN people n1 ON n1.id = s1.people_id
JOIN people_emails s2 ON s2.id = r.receiver_email_id
JOIN people n2 ON n2.id = s2.people_id
WHERE r.sender_email_id = 21897 ORDER BY e.date desc LIMIT 0, 50;
The EXPLAIN query shows no full table scan and the query using indexes:
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE s1 NULL const PRIMARY PRIMARY 4 const 1 100.00 Using temporary; Using filesort
1 SIMPLE n1 NULL const PRIMARY,ppl PRIMARY 4 const 1 100.00 NULL
1 SIMPLE n2 NULL index PRIMARY,ppl ppl 771 NULL 1 100.00 Using index
1 SIMPLE s2 NULL index PRIMARY s2 771 NULL 3178 10.00 Using where; Using index; Using join buffer (Block Nested Loop)
1 SIMPLE r NULL ref bk1,bk2,msgid bk1 4 server.s2.id 440 6.60 Using where; Using index
1 SIMPLE e NULL eq_ref PRIMARY PRIMARY 4 server.r.message_id 1 100.00 NULL
Here is my SHOW CREATE TABLE queries for the used tables:
CREATE TABLE `email_routing` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`message_id` int(11) NOT NULL,
`sender_email_id` int(11) NOT NULL,
`receiver_email_id` int(11) NOT NULL,
`basket` int(11) NOT NULL,
`status` int(11) NOT NULL,
`popup` int(11) NOT NULL DEFAULT '0',
`tm` int(11) NOT NULL DEFAULT '0',
KEY `id` (`id`),
KEY `bk1` (`receiver_email_id`,`status`,`sender_email_id`,`message_id`,`basket`),
KEY `bk2` (`sender_email_id`,`tm`),
KEY `msgid` (`message_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1055796 DEFAULT CHARSET=utf8
-
CREATE TABLE `email` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`subject` text NOT NULL,
`body` text NOT NULL,
`date` datetime NOT NULL,
`attach` text NOT NULL,
`attach_dir` varchar(255) CHARACTER SET cp1251 DEFAULT NULL,
`attach_subject` varchar(255) DEFAULT NULL,
`attach_content` longtext,
`sphinx_synced` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `Index_2` (`attach_dir`),
KEY `dt` (`date`)
) ENGINE=InnoDB AUTO_INCREMENT=898001 DEFAULT CHARSET=utf8
-
CREATE TABLE `people_emails` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`nick` varchar(255) NOT NULL,
`email` varchar(255) NOT NULL,
`key_name` varchar(255) NOT NULL,
`people_id` int(11) NOT NULL,
`status` int(11) NOT NULL DEFAULT '0',
`activity` int(11) NOT NULL,
`internal_user_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `s2` (`email`,`people_id`)
) ENGINE=InnoDB AUTO_INCREMENT=22146 DEFAULT CHARSET=utf8
-
CREATE TABLE `people` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`fname` varchar(255) CHARACTER SET cp1251 NOT NULL,
`lname` varchar(255) CHARACTER SET cp1251 NOT NULL,
`patronymic` varchar(255) CHARACTER SET cp1251 NOT NULL,
`gender` tinyint(1) NOT NULL,
`full_name` varchar(255) NOT NULL DEFAULT ' ',
`category` int(11) NOT NULL,
`people_type_id` int(255) DEFAULT NULL,
`tags` varchar(255) CHARACTER SET cp1251 NOT NULL,
`job` varchar(255) CHARACTER SET cp1251 NOT NULL,
`post` varchar(255) CHARACTER SET cp1251 NOT NULL,
`profession` varchar(255) CHARACTER SET cp1251 DEFAULT NULL,
`zip` varchar(16) CHARACTER SET cp1251 NOT NULL,
`country` int(11) DEFAULT NULL,
`region` varchar(10) NOT NULL,
`city` varchar(255) CHARACTER SET cp1251 NOT NULL,
`address` varchar(255) CHARACTER SET cp1251 NOT NULL,
`address_date` date DEFAULT NULL,
`last_update_ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `ppl` (`id`,`full_name`)
) ENGINE=InnoDB AUTO_INCREMENT=415040 DEFAULT CHARSET=utf8
Here is the SHOW TABLE STATUS output for those 4 tables:
Name Engine Version Row_format Rows Avg_row_length Data_length Max_data_length Index_length Data_free Auto_increment
email InnoDB 10 Dynamic 753748 12079 9104785408 0 61112320 4194304 898167
email_routing InnoDB 10 Dynamic 900152 61 55132160 0 69419008 6291456 1056033
people InnoDB 10 Dynamic 9538 386 3686400 0 2785280 4194304 415040
people_emails InnoDB 10 Dynamic 3178 752 2392064 0 98304 4194304 22146
MySQL Version 5.7.22 Ubuntu 16.04
However I have noticed one thing - if I take ORDER BY out of the query, but leaving the LIMIT, then query runs almost instantly taking not more than 0.2 seconds. So I have started to think to run query with no ORDER BY and do sorting by PHP means like that but eventually that seems to complicated as using the LIMIT with no ORDER BY I get wronge range to sort.
Is there anything else I could do to speed up or optimize that query?
AS AN ALTERNATIVE I could do sorting and paging by my PHP code. I add addtional columnt into the SELECT ..., UNIX_TIMESTAMP(e.date) as ts and then do:
<?php
...
$main_query = $server->query($query);
$emails_list = $main_query->fetch_all(MYSQLI_ASSOC);
function cmp($a, $b) {
return strcmp($a['ts'], $b['ts']);
}
$emails_sorted = usort($emails_list, "cmp");
for ($i=$start;$i<$lenght;$i++)
{
$singe_email = $emails_sorted[$i]
// Format the output
}
But when I do that I get
Fatal error: Allowed memory size of 134217728 bytes exhausted
at line of $emails_sorted = usort($emails_list, "cmp");
Warning, I'm not very familiar with MySQL, in fact I'm mostly projecting MSSQL experience on top of things I (mostly) read about MySQL.
1) Potential workaround: is it safe to assume that email.id and email.date are always in the same order? From a functional point of view this seems logical as emails get added to the table over time and thus have an ever increasing auto-number... But maybe the initial load of the data was in a different/random order? Anyway, if it is, what happens if you ORDER BY e.id instead of ORDER BY e.date ?
2) Does adding a composite index on email (id, date) (in that order!) help?
3) If all of that does not help, splitting the query into 2 parts might help out the optimizer. (You may need to fix the syntax for MySQL)
-- Locate what we want first
CREATE TEMPORARY TABLE results (
SELECT e.id
r.basket
FROM email_routing r
JOIN email e ON e.id = r.message_id
WHERE r.sender_email_id = 21897
ORDER BY e.date desc LIMIT 0, 50 );
-- Again, having an index on email (id, date) seems like a good idea to me
-- (As a test you may want to add an index on results (id) here, shouldn't take long and
-- in MSSQl it would help build a better query plan, can't tell with MySQL)
-- return actual results
SELECT n1.full_name AS sender_full_name,
s1.email AS sender_email,
e.subject, e.body, e.attach, e.date, e.id, r.status,
n2.full_name AS receiver_full_name,
s2.email AS receiver_email,
r.basket,
FROM results r
JOIN email e ON e.id = r.message_id
JOIN people_emails s1 ON s1.id = r.sender_email_id
JOIN people n1 ON n1.id = s1.people_id
JOIN people_emails s2 ON s2.id = r.receiver_email_id
JOIN people n2 ON n2.id = s2.people_id
ORDER BY e.date desc
If your data comes back that quickly, how about wrapping it... but how many rows are actually GOING to be returned WITHOUT the LIMIT. Maybe you would still get better performance AFTER such as...
select PQ.*
from ( YourQueryWithoutOrderByAndLimt ) PQ
order by PQ.date desc
LIMIT 0, 50;
I suspect this is a case where the MySQL Join Optimizer overestimates the benefits of Block Nested Loop (BNL) join. You can try to turn off BNL by doing:
set optimizer_switch='block_nested_loop=off';
Hopefully this will provide a better join order. You could also try:
set optimizer_prune_level = 0;
to force the join optimizer to explore all possible join orders.
Another option is to use STRAIGHT_JOIN to force a particular join order. In this case, it seems the order as specified in the query text would be good. Hence, to force this particular join order you could write
SELECT STRAIGHT_JOIN ...
Note that whatever you do, you can not expect the query to be as fast as without ORDER BY. As long as you need to find the latest emails from a particular sender, and there is no information about sender in the email table, it is not possible to use an index to avoid sorting without going through all emails from all senders. Things would be different if you had information about date in the email_routing table. Then an index on that table could have been used to avoid sorting.
MySQL cannot use index for order by in your query because
The query joins many tables, and the columns in the ORDER BY are not
all from the first nonconstant table that is used to retrieve rows.
(This is the first table in the EXPLAIN output that does not have a
const join type.)
MySQL Order By Optimization
Related
I'm trying to optimize my query speed as much as possible. A side problem is that I cannot see the exact query speed, because it is rounded to a whole second. The query does get the expected result and takes about 1 second. The final query should be extended even more and for this reason i am trying to improve it. How can this query be improved?
The database is constructed as an electricity utility company. The query should eventually calculate an invoice. I basically have 4 tables, APX price, powerdeals, powerload, eans_power.
APX price is an hourly price, powerload is a quarterly hour volume. First step is joining these two together for each quarter of an hour.
Second step is that I currently select the EAN that is indicated in the table eans_power.
Finally I will join the Powerdeals that currently consist only of a single line and indicates from which hour, until which hour and weekday from/until it should be applicable. It consist of an hourly volume and price. Currently it is only joined on the hours, but it will be extended to weekdays as well.
MYSQL Query:
SELECT l.DATE, l.PERIOD_FROM, a.PRICE, l.POWERLOAD,
SUM(a.PRICE*l.POWERLOAD), SUM(d.hourly_volume/4)
FROM timeseries.powerload l
INNER JOIN timeseries.apxprice a ON l.DATE = a.DATE
INNER JOIN contracts.eans_power c ON l.ean = c.ean
LEFT OUTER JOIN timeseries.powerdeals d ON d.period_from <= l.period_from
AND d.period_until >= l.period_until
WHERE l.PERIOD_FROM >= a.PERIOD_FROM
AND l.PERIOD_FROM < a.PERIOD_UNTIL
AND l.DATE >= '2018-01-01'
AND l.DATE <= '2018-12-31'
GROUP BY l.date
Explain:
1 SIMPLE c NULL system PRIMARY,ean NULL NULL NULL 1 100.00 Using temporary; Using filesort
1 SIMPLE l NULL ref EAN EAN 21 const 35481 11.11 Using index condition
1 SIMPLE d NULL ALL NULL NULL NULL NULL 1 100.00 Using where; Using join buffer (Block Nested Loop)
1 SIMPLE a NULL ref DATE DATE 4 timeseries.l.date 24 11.11 Using index condition
Create table queries:
apxprice
CREATE TABLE `apxprice` (
`apx_id` int(11) NOT NULL AUTO_INCREMENT,
`date` date DEFAULT NULL,
`period_from` time DEFAULT NULL,
`period_until` time DEFAULT NULL,
`price` decimal(10,2) DEFAULT NULL,
PRIMARY KEY (`apx_id`),
KEY `DATE` (`date`,`period_from`,`period_until`)
) ENGINE=MyISAM AUTO_INCREMENT=29664 DEFAULT CHARSET=latin1
powerdeals
CREATE TABLE `powerdeals` (
`deal_id` int(11) NOT NULL AUTO_INCREMENT,
`date_deal` date NOT NULL,
`start_date` date NOT NULL,
`end_date` date NOT NULL,
`weekday_from` int(11) NOT NULL,
`weekday_until` int(11) NOT NULL,
`period_from` time NOT NULL,
`period_until` time NOT NULL,
`hourly_volume` int(11) NOT NULL,
`price` int(11) NOT NULL,
`type_deal_id` int(11) NOT NULL,
`contract_id` int(11) NOT NULL,
PRIMARY KEY (`deal_id`)
) ENGINE=MyISAM AUTO_INCREMENT=2 DEFAULT CHARSET=latin1
powerload
CREATE TABLE `powerload` (
`powerload_id` int(11) NOT NULL AUTO_INCREMENT,
`ean` varchar(18) DEFAULT NULL,
`date` date DEFAULT NULL,
`period_from` time DEFAULT NULL,
`period_until` time DEFAULT NULL,
`powerload` int(11) DEFAULT NULL,
PRIMARY KEY (`powerload_id`),
KEY `EAN` (`ean`,`date`,`period_from`,`period_until`)
) ENGINE=MyISAM AUTO_INCREMENT=61039 DEFAULT CHARSET=latin1
eans_power
CREATE TABLE `eans_power` (
`ean` char(19) NOT NULL,
`contract_id` int(11) NOT NULL,
`invoicing_id` int(11) NOT NULL,
`street` varchar(255) NOT NULL,
`number` int(11) NOT NULL,
`affix` char(11) NOT NULL,
`postal` char(6) NOT NULL,
`city` varchar(255) NOT NULL,
PRIMARY KEY (`ean`),
KEY `ean` (`ean`,`contract_id`,`invoicing_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
Sample data tables
apx_prices
apx_id,date,period_from,period_until,price
1,2016-01-01,00:00:00,01:00:00,23.86
2,2016-01-01,01:00:00,02:00:00,22.39
powerdeals
deal_id,date_deal,start_date,end_date,weekday_from,weekday_until,period_from,period_until,hourly_volume,price,type_deal_id,contract_id
1,2019-05-15,2018-01-01,2018-12-31,1,5,08:00:00,20:00:00,1000,50,3,1
powerload
powerload_id,ean,date,period_from,period_until,powerload
1,871688520000xxxxxx,2018-01-01,00:00:00,00:15:00,9
2,871688520000xxxxxx,2018-01-01,00:15:00,00:30:00,11
eans_power
ean,contract_id,invoicing_id,street,number,affix,postal,city
871688520000xxxxxx,1,1,road,14,postal,city
Result, without sum() and group by:
DATE,PERIOD_FROM,PRICE,POWERLOAD,a.PRICE*l.POWERLOAD,d.hourly_volume/4,
2018-01-01,00:00:00,27.20,9,244.80,NULL
2018-01-01,00:15:00,27.20,11,299.20,NULL
Result, with sum() and group by:
DATE, PERIOD_FROM, PRICE, POWERLOAD, SUM(a.PRICE*l.POWERLOAD), SUM(d.hourly_volume/4)
2018-01-01,08:00:00,26.33,21,46193.84,12250.0000
2018-01-02, 08:00:00,47.95,43,90623.98,12250.0000
Preliminary optimizations:
Use InnoDB, not MyISAM.
Use CHAR only for constant-lenght strings
Use consistent datatypes (see ean, for example)
For an alternative to using time-to-the-second, check out the Handler counts .
Because range tests (such as l.PERIOD_FROM >= a.PERIOD_FROM AND l.PERIOD_FROM < a.PERIOD_UNTIL) are essentially impossible to optimize, I recommend you expand the table to have one entry per hour (or 1 per quarter hour, if necessary). Looking up a row via a key is much faster than doing a scan of "ALL" the table. 9K rows for an entire year is trivial.
When you get past these recommendations (and the Comments), I will have more tips on optimizing the indexes, especially InnoDB's PRIMARY KEY.
I have this query that I think I have indexed them properly. But still get filesort and temporary indexing.
The query is as follow:
SELECT * FROM
(SELECT PIH.timestamp, PIH.practice_id, PIH.timestamp as invoice_num, PIH.custom_invnum,
CEIL(PIH.total_invoice + PIH.tax + PIH.other_bill) as grand_total, PIH.total_invoice, PIH.extra_charge_ph as extra_charge,
PIH.tax, PIH.other_bill, PIH.changed, PIH.source,
PIH.notes, PIH.is_active, PIH.paid as pay,
PIH.covered_amount, IF(PIH.is_active = 1, IF(PIH.total_invoice = 0 OR PIH.total_invoice + PIH.tax + PIH.other_bill - PIH.covered_amount <= PIH.paid, 1, IF(PIH.paid = 0, 0, 2)), '') as invoice_st,
RPP.patient_id, RPP.first_name as pfname, RPP.last_name as plname, RPP.dob as p_dob, RPP.gender as p_gender, RPP.reff_id as p_reff_id, RPP.mobile_number as p_mobile, IF(PIH.group_doctors IS NOT NULL, NULL, D.doc_title) as doc_title, IF(PIH.group_doctors IS NOT NULL,
PIH.group_doctors, D.first_name) as doc_fname, IF(PIH.group_doctors IS NOT NULL, PIH.group_doctors, D.last_name) as doc_lname, IF(PIH.group_doctors IS NOT NULL, NULL, D.spc_dsg) as spc_dsg, PA.username, TL.timestamp as checkout_time, IP.name as ip_name, PMM.timestamp as mcu_id
FROM practice_invoice_header PIH
INNER JOIN practice_invoice_detail PID ON PID.timestamp = PIH.timestamp
AND PID.practice_id = PIH.practice_id
INNER JOIN practice_queue_list PQL ON PQL.encounter_id = PID.encounter_id
AND PQL.practice_place_id = PIH.practice_id
INNER JOIN temp_search_view D ON D.id = PQL.doctor_id
AND D.pp_id = PQL.practice_place_id
INNER JOIN practice_place PP ON PP.id = PIH.practice_id
INNER JOIN ref_practice_patient RPP ON RPP.patient_id = PIH.patient_id
AND RPP.practice_id = PP.parent_id
LEFT JOIN practice_mcu_module PMM ON PMM.id = PID.mcu_module_id
AND PMM.practice_id = PID.practice_id
LEFT JOIN transaction_log TL ON TL.reff_id = PIH.timestamp
AND TL.practice_id = PIH.practice_id
AND TL.activity = "CHK"
LEFT JOIN practice_admin PA ON PA.id = TL.admin_id
LEFT JOIN insurance_plan IP ON IP.id = PIH.insurance_plan_id
WHERE PIH.source <> 'P'
AND PIH.practice_id = 28699
AND PIH.is_active = 1
AND PQL.cal_id >= 201807010
AND PQL.cal_id <= 201807312
GROUP BY PIH.timestamp, PIH.practice_id
) AS U LIMIT 0,20
NOTE: I only show some of the main tables that are used in this query and the ones that sort using filesort/temporary, of course If I post everything it will be too many information.
The query is about list of invoices, and it has the header (practice_invoice_header) and the details (practice_invoice_detail). And this query join with the practice_place table
CREATE TABLE `practice_invoice_header` (
`timestamp` bigint(20) NOT NULL,
`practice_id` int(11) NOT NULL,
`cal_id` int(11) NOT NULL,
`patient_id` int(11) NOT NULL DEFAULT 0,
`source` char(1) NOT NULL COMMENT 'E = ENCOUNTER; P = OTHER (PHARM / LAB)',
`total_invoice` float(30,2) NOT NULL DEFAULT 0.00,
`tax` float(30,2) NOT NULL DEFAULT 0.00,
`other_bill` float(30,2) NOT NULL DEFAULT 0.00,
`changed` float(30,2) NOT NULL DEFAULT 0.00,
`paid` float(30,2) NOT NULL DEFAULT 0.00,
`covered_amount` float(30,2) NOT NULL DEFAULT 0.00,
`notes` varchar(300) DEFAULT NULL,
`custom_invnum` varchar(30) DEFAULT NULL,
`insurance_plan_id` varchar(20) DEFAULT NULL,
`is_active` int(11) NOT NULL DEFAULT 1,
`cancel_reason` varchar(200) DEFAULT NULL,
PRIMARY KEY (`timestamp`,`practice_id`),
KEY `custom_invnum` (`custom_invnum`),
KEY `insurance_plan_id` (`insurance_plan_id`),
KEY `practice_id_3` (`practice_id`,`xxx_reff_id`),
KEY `ph_check_status` (`ph_checked_by`),
KEY `cal_id` (`cal_id`),
KEY `practice_id_5` (`practice_id`,`outpx_id`),
KEY `practice_id_6` (`practice_id`,`cal_id`,`source`,`is_active`),
KEY `total_invoice` (`total_invoice`),
KEY `patient_id` (`patient_id`),
CONSTRAINT `practice_invoice_header_ibfk_1` FOREIGN KEY (`practice_id`)
REFERENCES `practice_place` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
CREATE TABLE `practice_invoice_detail` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`timestamp` bigint(20) NOT NULL,
`practice_id` int(11) NOT NULL,
`item_id` int(11) NOT NULL,
`item_sub_id` int(11) DEFAULT NULL,
`item_type` char(1) NOT NULL COMMENT 'D = DRUG; P = PROCEDURE; L = LAB',
`item_qty` float NOT NULL,
`item_price` float(22,2) NOT NULL,
`discount` float NOT NULL DEFAULT 0,
`is_active` int(11) NOT NULL DEFAULT 1,
PRIMARY KEY (`id`),
KEY `item_type` (`item_type`),
KEY `timestamp` (`timestamp`,`practice_id`),
KEY `practice_id` (`practice_id`),
KEY `item_id_2` (`item_id`,`item_sub_id`,`item_type`),
KEY `timestamp_2` (`timestamp`,`practice_id`,`item_id`,`item_sub_id`,`item_type`),
KEY `practice_id_3` (`practice_id`,`item_type`),
KEY `the_id` (`id`,`practice_id`) USING BTREE,
KEY `timestamp_3` (`timestamp`,`practice_id`,`item_type`,`item_comission`,
`item_comission_type`, `doctor_id`,`item_id`,`item_sub_id`,`id`) USING BTREE,
KEY `timestamp_4` (`timestamp`,`practice_id`,`item_id`,`item_sub_id`,`item_type`,
`item_comission_2`,`item_comission_2_type`,`doctor_id_2`,`id`) USING BTREE,
KEY `request_id` (`request_id`,`request_practice`),
KEY `timestamp_5` (`timestamp`,`practice_id`,`is_active`),
KEY `practice_id_6` (`practice_id`,`encounter_id`,`is_active`),
KEY `practice_id_7` (`practice_id`,`item_type`,`encounter_id`,`is_active`),
CONSTRAINT `practice_invoice_detail_ibfk_1` FOREIGN KEY (`timestamp`)
REFERENCES `practice_invoice_header` (`timestamp`) ON DELETE CASCADE,
CONSTRAINT `practice_invoice_detail_ibfk_2` FOREIGN KEY (`practice_id`)
REFERENCES `practice_place` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1447348 DEFAULT CHARSET=latin1
CREATE TABLE `ref_practice_patient` (
`practice_id` int(11) NOT NULL,
`patient_id` int(11) NOT NULL,
`reff_id` varchar(35) DEFAULT NULL,
`is_user` int(11) NOT NULL DEFAULT 0,
`parent_user_id` int(11) NOT NULL DEFAULT 0
PRIMARY KEY (`practice_id`,`patient_id`),
KEY `patient_id` (`patient_id`),
KEY `reff_id` (`reff_id`),
KEY `practice_id` (`practice_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
CREATE TABLE `practice_place` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(75) NOT NULL,
`statement` text DEFAULT NULL,
`address` varchar(200) NOT NULL,
`phone` varchar(15) NOT NULL,
`wa_number` varchar(15) DEFAULT NULL,
`fax` varchar(15) NOT NULL,
`email` varchar(50) NOT NULL,
`is_branch` int(11) NOT NULL,
`parent_id` int(11) NOT NULL,
`editted_by` int(11) DEFAULT NULL,
`editted_date` bigint(20) DEFAULT NULL,
`status` int(11) NOT NULL DEFAULT 1,
PRIMARY KEY (`id`),
KEY `parent_id` (`parent_id`),
KEY `reff_id` (`reff_id`),
) ENGINE=InnoDB AUTO_INCREMENT=29058 DEFAULT CHARSET=latin1
And below is the explain produce by the query, and I highlight the one using filsort (no. 2)
1 PRIMARY ALL NULL NULL NULL NULL 14028
2 DERIVED PP const PRIMARY,parent_id PRIMARY 4 const 1 Using temporary; Using filesort
2 DERIVED PIH ref PRIMARY,practice_id_3,practice_id_5,practice_id_6,practice_id_8,pharm_read,lab_read,rad_read,patient_id
practice_id_5 4 const 7014 Using where
2 DERIVED RPP eq_ref PRIMARY,patient_id,practice_id,practice_id_2,practice_id_3
PRIMARY 8 const,k6064619_lokadok.PIH.patient_id 1
2 DERIVED PID ref timestamp,practice_id,timestamp_2,practice_id_2,practice_id_3,timestamp_3,timestamp_4,practice_id_4,practice_id_5,timestamp_5,practice_id_6,practice_id_7
timestamp 12 k6064619_lokadok.PIH.timestamp,const 1
2 DERIVED PMM eq_ref PRIMARY,id,practice_id
PRIMARY 4 k6064619_lokadok.PID.mcu_module_id 1 Using where
2 DERIVED TL ref reff_id reff_id 12 k6064619_lokadok.PIH.timestamp,const 1 Using where
2 DERIVED PA eq_ref PRIMARY PRIMARY 4 k6064619_lokadok.TL.admin_id 1 Using where
2 DERIVED IP ref PRIMARY,id PRIMARY 22 k6064619_lokadok.PIH.insurance_plan_id 1 Using where
2 DERIVED PQL ref PRIMARY,encounter_id,cal_id_2
encounter_id 5 k6064619_lokadok.PID.encounter_id 2 Using where; Using index
2 DERIVED D ref doc_id,pp_id,id_2,pp_doc doc_id 4 k6064619_lokadok.PQL.doctor_id 1 Using where
I believe I have indexed the parent_id in practice_place table, and also in ref_practice_patient the patient_id and practice_id is PRIMARY.
Why have the outer query? The Optimizer is free to shuffle the result of the inner query, thereby leaving the LIMIT to pick an ordering that you not expecting. At least add ORDER BY, preferably also toss the outer select.
Main Index
Let's analyze the likely place to design an index:
WHERE PIH.source <> 'P'
AND PIH.practice_id = 28699
AND PIH.is_active = 1
AND PQL.cal_id >= 201807010
AND PQL.cal_id <= 201807312
GROUP BY PIH.timestamp, PIH.practice_id
Since there is a mixture of tables involved, it is not possible to have an index that handles all the WHERE.
Since the tests are not all =, it is not possible to reach beyond the WHERE and include columns of the GROUP BY.
So, I see two indexes:
PIH: INDEX(practice_id, is_active, -- in either order
source)
PQL: INDEX(cal_id)
Since we can't get into the GROUP BY, the Optimizer has no choice but to gather all the rows based on WHERE, do some grouping, and do an ORDER BY (as I said, that is missing, but necessary).
Therefor, GROUP BY and the ORDER BY will require one or two temps and filesorts. No, you can't get away from it, at least not without changing the query in some way. (Please note that "filesort" might actually be done in RAM.)
Your extra SELECT layer may be adding an extra temp and filesort.
EXPLAIN fails to point out when there are two sorts. EXPLAIN FORMAT=JSON has that sort of detail.
Other issues...
Having a timestamp in a PRIMARY KEY is risky unless you are sure that two rows can occur with the same timestamp, or there is another column in the PK to assure uniqueness.
Don't use FLOAT for money. It will incur extra rounding errors, and it cannot store more than about 7 significant digits (that' less than $100K to the penny). Don't use float(30,2), it is even worse because you are forcing an extra rounding. Use DECIMAL(30,2), but pick something reasonable, not 30. It takes 14 bytes -- mostly a waste of space.
Whenever you have INDEX(a,b), you don't need INDEX(a); it is redundant and slows down (slightly) INSERTs.
LEFT JOIN transaction_log TL
ON TL.reff_id = PIH.timestamp
AND TL.practice_id = PIH.practice_id
AND TL.activity = "CHK"
needs
INDEX(reff_id, practice_id, activity) -- in any order
Also
INNER JOIN practice_invoice_detail PID ON PID.timestamp = PIH.timestamp
AND PID.practice_id = PIH.practice_id
PIH: INDEX(practice_id, timestamp) -- not the opposite order
PIH: INDEX(practice_id, is_active, timestamp)
INNER JOIN practice_queue_list PQL ON PQL.encounter_id = PID.encounter_id
AND PQL.practice_place_id = PIH.practice_id
PQL: INDEX(encounter_id, cal_id)
PQL: INDEX(encounter_id, practice_place_id, cal_id)
Some discussion...
In a JOIN, EXPLAIN shows one order of working through the tables; it gives you no clues of how things would be if it worked through the tables some other way.
I have attempted to show what index might be needed if PQL were used first, or if PIH were used first -- namely use the WHERE stuff for that table, then
I have attempted to show the optimial index for joining to the other table.
Probably the Optimizer will not start with any table not mentioned in the WHERE clause, but this is not a certainty.
I have not listed the optimal indexes for getting to each of the other tables.
More discussion: http://mysql.rjweb.org/doc.php/index_cookbook_mysql
I have two tables, main_part (3k records) and part_details (25k records)
I tried the following indexes but explain always returns full table scan of 25k records as opposed to about 2k of matched records and Using where; Using temporary; Using filesort
ALTER TABLE `main_part` ADD INDEX `main_part_index_1` (`unit`);
ALTER TABLE `part_details` ADD INDEX `part_details_index_1` (`approved`, `display`, `country`, `id`, `price`);
Here is my query:
SELECT a.part_id, b.my_title,
b.price, a.type,
a.unit, a.certification,
b.my_image,
b.price/a.unit AS priceW
FROM main_part AS a
INNER JOIN part_details AS b ON a.part_id=b.id
WHERE b.approved = 'Yes'
AND b.display = 'On'
AND b.country = 'US'
AND a.unit >= 300
ORDER BY b.price ASC LIMIT 50
One thing that I am aware of is that a.part_id is not a Primary Key in main_part table. Could this be a culprit?
Create tables SQL:
CREATE TABLE `main_part` (
`id` smallint(6) NOT NULL AUTO_INCREMENT,
`part_id` mediumint(9) NOT NULL DEFAULT '0',
`type` varchar(50) NOT NULL DEFAULT '',
`unit` varchar(50) NOT NULL DEFAULT '',
`certification` varchar(50) NOT NULL DEFAULT '',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `part_details` (
`id` mediumint(9) NOT NULL AUTO_INCREMENT,
`asn` varchar(50) NOT NULL DEFAULT '',
`country` varchar(5) NOT NULL DEFAULT '',
`my_title` varchar(200) NOT NULL DEFAULT '',
`display` varchar(5) NOT NULL DEFAULT 'On',
`approved` varchar(5) NOT NULL DEFAULT 'No',
`price` decimal(7,3) NOT NULL DEFAULT '0.000',
`my_image` varchar(250) NOT NULL DEFAULT '',
`update_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
UNIQUE KEY `countryasn` (`country`,`asn`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
For your query the more important index is the JOIN condition and as you are already aware a.part_id isn't primary key, so doesn't have a default index and your first try should be:
ALTER TABLE `main_part` ADD INDEX `main_part_index_1` (`part_id`,`unit`);
Because we are interested on the JOIN condition first you should also change the second index to
ALTER TABLE `part_details` ADD INDEX `part_details_index_1`
(`id`, `approved`, `display`, `country`, `price`);
order matters in the index
Another tip is you start with the basic query:
SELECT *
FROM main_part AS a
INNER JOIN part_details AS b ON a.part_id=b.id
Add index for part_id and id check the explain plan and then start adding condition and updating the index if required.
It seems that most columns used for filtering in part_details aren't very selective (display is probably an On/off switch, country is probably very similar in many products, etc.).
In some cases, when the WHERE clause is not very selective, MySQL may choose to use an index that better suits the ORDER BY clause.
I would try to create this index as well and check in the explain plan if there is any changes:
ALTER TABLE `part_details` ADD INDEX `part_details_price_index` (`price`);
For this query:
SELECT mp.part_id, pd.my_title, pd.price, mp.type,
mp.unit, mp.certification, pd.my_image,
pd.price/mp.unit AS priceW
FROM main_part mp INNER JOIN
part_details pd
ON mp.part_id = pd.id
WHERE pd.approved = 'Yes' AND
pd.display = 'On' AND
pd.country = 'US' AND
mp.unit >= 300
ORDER BY pd.price ASC
LIMIT 50;
For this query, I would start with indexes on part_details(country, display, approved, id, price) and main_part(part_id, unit).
The index on part_details can be used for filtering before the join. It is not easy to get rid of the sort for the order by.
I have three tables.
One table contains submissions which has about 75,000 rows
One table contains submission ratings and only has < 10 rows
One table contains submission => competition mappings and for my test data also has about 75,000 rows.
What I want to do is
Get the top 50 submissions in a round of a competition.
Top is classified as highest average rating, followed by highest amount of votes
Here is the query I am using which works, but the problem is that it takes over 45 seconds to complete! I profiled the query (results at bottom) and the bottlenecks are copying the data to a tmp table and then sorting it so how can I speed this up?
SELECT `submission_submissions`.*
FROM `submission_submissions`
JOIN `competition_submissions`
ON `competition_submissions`.`submission_id` = `submission_submissions`.`id`
LEFT JOIN `submission_ratings`
ON `submission_submissions`.`id` = `submission_ratings`.`submission_id`
WHERE `top_round` = 1
AND `competition_id` = '2'
AND `submission_submissions`.`date_deleted` IS NULL
GROUP BY submission_submissions.id
ORDER BY AVG(submission_ratings.`stars`) DESC,
COUNT(submission_ratings.`id`) DESC
LIMIT 50
submission_submissions
CREATE TABLE `submission_submissions` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`account_id` int(11) NOT NULL,
`title` varchar(255) NOT NULL,
`description` varchar(255) DEFAULT NULL,
`genre` int(11) NOT NULL,
`goals` text,
`submission` text NOT NULL,
`date_created` datetime DEFAULT NULL,
`date_modified` datetime DEFAULT NULL,
`date_deleted` datetime DEFAULT NULL,
`cover_image` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `genre` (`genre`),
KEY `account_id` (`account_id`),
KEY `date_created` (`date_created`)
) ENGINE=InnoDB AUTO_INCREMENT=115037 DEFAULT CHARSET=latin1;
submission_ratings
CREATE TABLE `submission_ratings` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`account_id` int(11) NOT NULL,
`submission_id` int(11) NOT NULL,
`stars` tinyint(1) NOT NULL,
`date_created` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `submission_id` (`submission_id`),
KEY `account_id` (`account_id`),
KEY `stars` (`stars`)
) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=latin1;
competition_submissions
CREATE TABLE `competition_submissions` (
`competition_id` int(11) NOT NULL,
`submission_id` int(11) NOT NULL,
`top_round` int(11) DEFAULT '1',
PRIMARY KEY (`submission_id`),
KEY `competition_id` (`competition_id`),
KEY `top_round` (`top_round`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
SHOW PROFILE Result (ordered by duration)
state duration (summed) in sec percentage
Copying to tmp table 33.15621 68.46924
Sorting result 11.83148 24.43260
removing tmp table 3.06054 6.32017
Sending data 0.37560 0.77563
... insignificant amounts removed ...
Total 48.42497 100.00000
EXPLAIN
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE competition_submissions index_merge PRIMARY,competition_id,top_round competition_id,top_round 4,5 18596 Using intersect(competition_id,top_round); Using where; Using index; Using temporary; Using filesort
1 SIMPLE submission_submissions eq_ref PRIMARY PRIMARY 4 inkstakes.competition_submissions.submission_id 1 Using where
1 SIMPLE submission_ratings ALL submission_id 5 Using where; Using join buffer (flat, BNL join)
Assuming that in reality you won't be interested in unrated submissions, and that a given submission only has a single competition_submissions entry for a given match and top_round, I suggest:
SELECT s.*
FROM (SELECT `submission_id`,
AVG(`stars`) AvgStars,
COUNT(`id`) CountId
FROM `submission_ratings`
GROUP BY `submission_id`
ORDER BY AVG(`stars`) DESC, COUNT(`id`) DESC
LIMIT 50) r
JOIN `submission_submissions` s
ON r.`submission_id` = s.`id` AND
s.`date_deleted` IS NULL
JOIN `competition_submissions` c
ON c.`submission_id` = s.`id` AND
c.`top_round` = 1 AND
c.`competition_id` = '2'
ORDER BY r.AvgStars DESC,
r.CountId DESC
(If there is more than one competition_submissions entry per submission for a given match and top_round, then you can add the GROUP BY clause back in to the main query.)
If you do want to see unrated submissions, you can union the results of this query to a LEFT JOIN ... WHERE NULL query.
There is a simple trick that works on MySql and helps to avoid copying/sorting huge temp tables in queries like this (with LIMIT X).
Just avoid SELECT *, this copies all columns to the temporary table, then this huge table is sorted, and in the end, the query takes only 50 records from this huge table ( 50 / 70000 = 0,07 % ).
Select only columns that are really necessary to perform sort and limit, and then join missing columns only for selected 50 records by id.
select ss.*
from submission_submissions ss
join (
SELECT `submission_submissions`.id,
AVG(submission_ratings.`stars`) stars,
COUNT(submission_ratings.`id`) cnt
FROM `submission_submissions`
JOIN `competition_submissions`
ON `competition_submissions`.`submission_id` = `submission_submissions`.`id`
LEFT JOIN `submission_ratings`
ON `submission_submissions`.`id` = `submission_ratings`.`submission_id`
WHERE `top_round` = 1
AND `competition_id` = '2'
AND `submission_submissions`.`date_deleted` IS NULL
GROUP BY submission_submissions.id
ORDER BY AVG(submission_ratings.`stars`) DESC,
COUNT(submission_ratings.`id`) DESC
LIMIT 50
) xx
ON ss.id = xx.id
ORDER BY xx.stars DESC,
xx.cnt DESC;
I have a Joomla site that uses JomSocial. I have a .NET web app that I'm working on that will eventually replace Joomla since I prefer .NET over PHP. Right now I have .NET mobile site that users are using.
LINQ to Entity has made development very speedy, but I'm now in the process of trying to fix performance issues. Sending messages to one another is the #1 activity and there's over 40k messages sent so far. This is also where I have performance issue. Below are the two tables JomSocial uses for storing messages. Below that is my current LINQ code that I'm using, which is returning the results I want, it's just taking two seconds to do it.
I think by the column names you probably can figure out what the data looks like, but if not I can create some and then post that on here in a few as I have to run out for a little bit. I should mention that I'm using the Entity Framework with .NET 3.5 and MySQL w/ the MySQL .NET Connector.
Tables:
delimiter $$
CREATE TABLE `jos_community_msg` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`from` int(10) unsigned NOT NULL,
`parent` int(10) unsigned NOT NULL,
`deleted` tinyint(3) unsigned DEFAULT '0',
`from_name` varchar(45) NOT NULL,
`posted_on` datetime DEFAULT NULL,
`subject` tinytext NOT NULL,
`body` text NOT NULL,
PRIMARY KEY (`id`),
KEY `parent` (`parent`),
KEY `deleted` (`deleted`),
KEY `from` (`from`)
) ENGINE=MyISAM AUTO_INCREMENT=340 DEFAULT CHARSET=utf8$$
delimiter $$
CREATE TABLE `jos_community_msg_recepient` (
`msg_id` int(10) unsigned NOT NULL,
`msg_parent` int(10) unsigned NOT NULL DEFAULT '0',
`msg_from` int(10) unsigned NOT NULL,
`to` int(10) unsigned NOT NULL,
`bcc` tinyint(3) unsigned DEFAULT '0',
`is_read` tinyint(3) unsigned DEFAULT '0',
`deleted` tinyint(3) unsigned DEFAULT '0',
UNIQUE KEY `un` (`msg_id`,`to`),
KEY `msg_id` (`msg_id`),
KEY `to` (`to`),
KEY `idx_isread_to_deleted` (`is_read`,`to`,`deleted`),
KEY `from` (`msg_from`),
KEY `parent` (`msg_parent`),
KEY `deleted` (`deleted`),
KEY `to_deleted` (`deleted`,`to`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8$$
LINQ:
var messages = (
from b in context.jos_community_msg
join i in (
from i in context.jos_community_msg_recepient
join a in context.jos_community_msg on i.msg_id equals a.id
where i.to == userId && && a.deleted == 0
group a by a.parent into g
select g.Max(p => p.id)) on b.id equals i
join a in context.jos_community_msg_recepient on i equals a.msg_id
orderby b.id descending
select new MessageHeaderItem()
{
IsDeleted = false,
IsRead = (a.is_read.Value == 0) ? false : true,
MessageId = b.parent,
Sent = b.posted_on.Value,
Subject = b.subject,
UserId = a.msg_from
});
total = messages.Count();
return messages.Skip(start).Take(max).ToList();
I've tried a bunch of variations, but nothing has made it any quicker. Having the sub select is not good for performance, but I'm not sure how else to get just the last message in the message chain from that table.
Update:
Here's the SQL being generated:
SELECT
`Limit1`.`C1`,
`Limit1`.`C2`,
`Limit1`.`C3`,
`Limit1`.`parent`,
`Limit1`.`posted_on`,
`Limit1`.`subject`,
`Limit1`.`msg_from`,
`Limit1`.`C4`,
`Limit1`.`C5`,
`Limit1`.`C6`
FROM (SELECT
`Extent1`.`id`,
`Extent1`.`parent`,
`Extent1`.`posted_on`,
`Extent1`.`subject`,
`Extent6`.`msg_from`,
1 AS `C1`,
cast(0 as decimal(0,0)) AS `C2`,
CASE WHEN (0 = (`Extent6`.`is_read`)) THEN (cast(0 as decimal(0,0))) ELSE (cast(1 as decimal(0,0))) END AS `C3`,
'Test' AS `C4`,
'' AS `C5`,
'' AS `C6`
FROM `jos_community_msg` AS `Extent1` INNER JOIN (SELECT
(SELECT
Max(`Extent5`.`id`) AS `A1`
FROM (SELECT
`jos_community_msg_recepient`.`bcc`,
`jos_community_msg_recepient`.`deleted`,
`jos_community_msg_recepient`.`is_read`,
`jos_community_msg_recepient`.`msg_from`,
`jos_community_msg_recepient`.`msg_id`,
`jos_community_msg_recepient`.`msg_parent`,
`jos_community_msg_recepient`.`to`
FROM `jos_community_msg_recepient` AS `jos_community_msg_recepient`) AS `Extent4` INNER JOIN `jos_community_msg` AS `Extent5` ON (`Extent4`.`msg_id` = `Extent5`.`id`) OR ((`Extent4`.`msg_id` IS NULL) AND (`Extent5`.`id` IS NULL))
WHERE ((`Extent4`.`to` = 62) AND (0 = (`Extent5`.`deleted`))) AND ((`Extent5`.`parent` = `Project2`.`parent`) OR ((`Extent5`.`parent` IS NULL) AND (`Project2`.`parent` IS NULL)))) AS `C1`
FROM (SELECT
62 AS `p__linq__5`,
`Distinct1`.`parent`
FROM (SELECT DISTINCT
`Extent3`.`parent`
FROM (SELECT
`jos_community_msg_recepient`.`bcc`,
`jos_community_msg_recepient`.`deleted`,
`jos_community_msg_recepient`.`is_read`,
`jos_community_msg_recepient`.`msg_from`,
`jos_community_msg_recepient`.`msg_id`,
`jos_community_msg_recepient`.`msg_parent`,
`jos_community_msg_recepient`.`to`
FROM `jos_community_msg_recepient` AS `jos_community_msg_recepient`) AS `Extent2` INNER JOIN `jos_community_msg` AS `Extent3` ON (`Extent2`.`msg_id` = `Extent3`.`id`) OR ((`Extent2`.`msg_id` IS NULL) AND (`Extent3`.`id` IS NULL))
WHERE (`Extent2`.`to` = 62) AND (0 = (`Extent3`.`deleted`))) AS `Distinct1`) AS `Project2`) AS `Project3` ON (`Extent1`.`id` = `Project3`.`C1`) OR ((`Extent1`.`id` IS NULL) AND (`Project3`.`C1` IS NULL)) INNER JOIN (SELECT
`jos_community_msg_recepient`.`bcc`,
`jos_community_msg_recepient`.`deleted`,
`jos_community_msg_recepient`.`is_read`,
`jos_community_msg_recepient`.`msg_from`,
`jos_community_msg_recepient`.`msg_id`,
`jos_community_msg_recepient`.`msg_parent`,
`jos_community_msg_recepient`.`to`
FROM `jos_community_msg_recepient` AS `jos_community_msg_recepient`) AS `Extent6` ON (`Project3`.`C1` = `Extent6`.`msg_id`) OR ((`Project3`.`C1` IS NULL) AND (`Extent6`.`msg_id` IS NULL))
ORDER BY
`id` DESC LIMIT 0,16) AS `Limit1`;
Here's the explain from MySQL:
1 PRIMARY <derived2> ALL 16
2 DERIVED <derived3> ALL 55 Using temporary; Using filesort
2 DERIVED Extent1 eq_ref PRIMARY PRIMARY 4 Project3.C1 1 Using where
2 DERIVED <derived9> ALL 333 Using where; Using join buffer
9 DERIVED jos_community_msg_recepient ALL 333
3 DERIVED <derived6> ALL 55
6 DERIVED <derived7> ALL 55
7 DERIVED <derived8> ALL 333 Using where; Using temporary
7 DERIVED Extent3 eq_ref PRIMARY,deleted PRIMARY 4 Extent2.msg_id 1 Using where; Distinct
8 DERIVED jos_community_msg_recepient ALL 333
4 DEPENDENT SUBQUERY Extent5 ref PRIMARY,parent,deleted parent 4 Project2.parent 2 Using where
4 DEPENDENT SUBQUERY <derived5> ALL 333 Using where; Using join buffer
5 DERIVED jos_community_msg_recepient ALL 333
Your LINQ query looks pretty good. You use a projection to a DTO (MessageHeaderItem) which allows LINQ to Entities to create a very optimal query. You should however use the SQL profiler to check the actual SQL query that is executed. Perhaps LINQ to Entities fires many queries under the covers. It is also possible that you need some index tuning. Copy the executed query from the SQL profiler to the SQL tuning wizard (part of SQL Management Studio) and see what advice it comes up with.