Can this MySQL Query be optimized? - mysql

I currently try to optimize a MySQL query which runs a little slow on tables with 10,000+ rows.
CREATE TABLE IF NOT EXISTS `person` (
`_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`_oid` char(8) NOT NULL,
`firstname` varchar(255) NOT NULL,
`lastname` varchar(255) NOT NULL,
PRIMARY KEY (`_id`),
KEY `_oid` (`_oid`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `person_cars` (
`_id` int(11) NOT NULL AUTO_INCREMENT,
`_oid` char(8) NOT NULL,
`idx` varchar(255) NOT NULL,
`val` blob NOT NULL,
PRIMARY KEY (`_id`),
KEY `_oid` (`_oid`),
KEY `idx` (`idx`),
KEY `val` (`val`(64))
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
# Insert some 10000+ rows…
INSERT INTO `person` (`_oid`,`firstname`,`lastname`)
VALUES
('1', 'John', 'Doe'),
('2', 'Jack', 'Black'),
('3', 'Jim', 'Kirk'),
('4', 'Forrest', 'Gump');
INSERT INTO `person_cars` (`_oid`,`idx`,`val`)
VALUES
('1', '0', 'BMW'),
('1', '1', 'PORSCHE'),
('2', '0', 'BMW'),
('3', '1', 'MERCEDES'),
('3', '0', 'TOYOTA'),
('3', '1', 'NISSAN'),
('4', '0', 'OLDMOBILE');
SELECT `_person`.`_oid`,
`_person`.`firstname`,
`_person`.`lastname`,
`_person_cars`.`cars[0]`,
`_person_cars`.`cars[1]`
FROM `person` `_person`
LEFT JOIN (
SELECT `_person`.`_oid`,
IFNULL(GROUP_CONCAT(IF(`_person_cars`.`idx`=0, `_person_cars`.`val`, NULL)), NULL) AS `cars[0]`,
IFNULL(GROUP_CONCAT(IF(`_person_cars`.`idx`=1, `_person_cars`.`val`, NULL)), NULL) AS `cars[1]`
FROM `person` `_person`
JOIN `person_cars` `_person_cars` ON `_person`.`_oid` = `_person_cars`.`_oid`
GROUP BY `_person`.`_oid`
) `_person_cars` ON `_person_cars`.`_oid` = `_person`.`_oid`
WHERE `cars[0]` = 'BMW' OR `cars[1]` = 'BMW';
The above SELECT query takes ~170ms on my virtual machine running MySQL 5.1.53. with approx. 10,000 rows in each of the two tables.
When I EXPLAIN the above query, results differ depending on how many rows are in each table:
+----+-------------+--------------+-------+---------------+------+---------+------+------+---------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+-------+---------------+------+---------+------+------+---------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 4 | Using where |
| 1 | PRIMARY | _person | ALL | _oid | NULL | NULL | NULL | 4 | Using where; Using join buffer |
| 2 | DERIVED | _person_cars | ALL | _oid | NULL | NULL | NULL | 7 | Using temporary; Using filesort |
| 2 | DERIVED | _person | index | _oid | _oid | 24 | NULL | 4 | Using where; Using index; Using join buffer |
+----+-------------+--------------+-------+---------------+------+---------+------+------+---------------------------------------------+
Some 10,000 rows give the following result:
+----+-------------+--------------+------+---------------+------+---------+------------------------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+------+---------------+------+---------+------------------------+------+---------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 6613 | Using where |
| 1 | PRIMARY | _person | ref | _oid | _oid | 24 | _person_cars._oid | 10 | |
| 2 | DERIVED | _person_cars | ALL | _oid | NULL | NULL | NULL | 9913 | Using temporary; Using filesort |
| 2 | DERIVED | _person | ref | _oid | _oid | 24 | test._person_cars._oid | 10 | Using index |
+----+-------------+--------------+------+---------------+------+---------+------------------------+------+---------------------------------+
Things get worse when I leave out the WHERE clause or when I LEFT JOIN another table similar to person_cars.
Does anyone have an idea how to optimize the SELECT query to make things a little faster?

It's slow because this will force three full table scans on persons that then get joined together:
LEFT JOIN (
...
GROUP BY `_person`.`_oid` -- the group by here
) `_person_cars` ...
WHERE ... -- and the where clauses on _person_cars.
Considering the where clauses the left join is really an inner join, for one. And you could shove the conditions before the join with persons actually occurs. That join is also needlessly applied twice.
This will make it faster, but if you've an order by/limit clause it will still lead to a full table scan on persons (i.e. still not good) because of the group by in the subquery:
JOIN (
SELECT `_person_cars`.`_oid`,
IFNULL(GROUP_CONCAT(IF(`_person_cars`.`idx`=0, `_person_cars`.`val`, NULL)), NULL) AS `cars[0]`,
IFNULL(GROUP_CONCAT(IF(`_person_cars`.`idx`=1, `_person_cars`.`val`, NULL)), NULL) AS `cars[1]`
FROM `person_cars`
GROUP BY `_person_cars`.`_oid`
HAVING IFNULL(GROUP_CONCAT(IF(`_person_cars`.`idx`=0, `_person_cars`.`val`, NULL)), NULL) = 'BMW' OR
IFNULL(GROUP_CONCAT(IF(`_person_cars`.`idx`=1, `_person_cars`.`val`, NULL)), NULL) = 'BMW'
) `_person_cars` ... -- smaller number of rows
If you apply an order by/limit, you'll get better results with two queries, i.e.:
SELECT `_person`.`_oid`,
`_person`.`firstname`,
`_person`.`lastname`
FROM `_person`
JOIN `_person_cars`
ON `_person_cars`.`_oid` = `_person`.`_oid`
AND `_person_cars`.`val` = 'BMW'
GROUP BY -- pre-sort the result before grouping, so as to not do the work twice
`_person`.`lastname`,
`_person`.`firstname`,
-- eliminate users with multiple BMWs
`_person`.`_oid`
ORDER BY `_person`.`lastname`,
`_person`.`firstname`,
`_person`.`_oid`
LIMIT 10
And then select the cars with an IN () clause using the resulting ids.
Oh, and your vals column probably should be a varchar.

Check This
SELECT
p._oid AS oid,
p.firstname AS firstname,
p.lastname AS lastname,
pc.val AS car1,
pc2.val AS car2
FROM person AS p
LEFT JOIN person_cars AS pc
ON pc._oid = p._oid
AND pc.idx = 0
LEFT JOIN person_cars AS pc2
ON pc2._oid = p._oid
AND pc2.idx = 1
WHERE pc.val = 'BMW'
OR pc2.val = 'BWM'

Related

Complex query extremely slow when grouping or ordering - index suggestions?

So, I've been given the task of replicating functionality we currently handle via code, within MySQL.
The query below works beautifully, bringing back 245,000 rows in 40ms, however as soon as you touch it with a group or order, it takes over 6s.
Does anyone have any suggestions on what changes need making to the indexes or potentially how to modify the query to improve it?
Thanks
Without any grouping or ordering
select
s.id as sensorid,
s.sensortypeid,
COALESCE(s.pulserate, 1) as pulserate,
COALESCE(s.correctionFactor, 1) as correctionFactor,
ur.id as unitrateid,
COALESCE(ur.priceperkwh, 0) as priceperkwh,
COALESCE(ur.duosCharges, 0) as duosCharges,
IF(t.blnnonunitratecharges, t.nonunitratecharge/48, 0) as nonunitratecost,
IF(t.blnFeedIn, COALESCE(t.feedInRate, 0), 0) as feedInRate,
IF(t.blnRoc, COALESCE(t.rocRate, 0), 0) as rocRate,
from_unixtime(FLOOR(UNIX_TIMESTAMP(srs.dateTimeStamp)/(30*60))*(30*60)) as timeKey
from sensorreadings srs
inner join sensorpoints sp on (sp.id = srs.sensorpointid)
inner join sensors s on (s.id = sp.sensorid)
left join unitrates ur on ur.id = (
select
ur.id
from unitrates ur, tariffs t, companyhubs ch
where
ur.tariffid = t.id and
t.companyid = ch.companyid and
ch.hubid = s.hubid and
t.utilitytypeid = s.utilitytypeid and
(srs.dateTimeStamp between t.startdate and t.enddate) and
((time(srs.dateTimeStamp) between ur.starttime and ur.endtime) and
(ur.dayMask & POW(2, WEEKDAY(srs.dateTimeStamp)) <> 0) and
(ur.monthMask & POW(2, MONTH(srs.dateTimeStamp) - 1) <> 0))
order by
t.startdate desc,
ur.starttime desc
limit 0, 1
)
left join tariffs t on (t.id = ur.tariffid)
where
s.id = 5289
With grouping and ordering
select
s.id as sensorid,
s.sensortypeid,
COALESCE(s.pulserate, 1) as pulserate,
COALESCE(s.correctionFactor, 1) as correctionFactor,
ur.id as unitrateid,
COALESCE(ur.priceperkwh, 0) as priceperkwh,
COALESCE(ur.duosCharges, 0) as duosCharges,
IF(t.blnnonunitratecharges, t.nonunitratecharge/48, 0) as nonunitratecost,
IF(t.blnFeedIn, COALESCE(t.feedInRate, 0), 0) as feedInRate,
IF(t.blnRoc, COALESCE(t.rocRate, 0), 0) as rocRate,
min(srs.reading) as minReading,
avg(srs.reading) as avgReading,
from_unixtime(FLOOR(UNIX_TIMESTAMP(srs.dateTimeStamp)/(30*60))*(30*60)) as timeKey
from sensorreadings srs
inner join sensorpoints sp on (sp.id = srs.sensorpointid)
inner join sensors s on (s.id = sp.sensorid)
left join unitrates ur on ur.id = (
select
ur.id
from unitrates ur, tariffs t, companyhubs ch
where
ur.tariffid = t.id and
t.companyid = ch.companyid and
ch.hubid = s.hubid and
t.utilitytypeid = s.utilitytypeid and
(srs.dateTimeStamp between t.startdate and t.enddate) and
((time(srs.dateTimeStamp) between ur.starttime and ur.endtime) and
(ur.dayMask & POW(2, WEEKDAY(srs.dateTimeStamp)) <> 0) and
(ur.monthMask & POW(2, MONTH(srs.dateTimeStamp) - 1) <> 0))
order by
t.startdate desc,
ur.starttime desc
limit 0, 1
)
left join tariffs t on (t.id = ur.tariffid)
where
s.id = 5289
group by timeKey
order by timeKey desc
Schemas
CREATE TABLE `sensorreadings` (
`sensorpointid` int(11) NOT NULL DEFAULT '0',
`reading` decimal(15,5) NOT NULL,
`dateTimeStamp` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`sensorpointid`,`dateTimeStamp`),
KEY `sensormetricid` (`sensormetricid`),
KEY `sensorreadings_timestamp` (`dateTimeStamp`,`sensorpointid`),
KEY `sensorpointid` (`sensorpointid`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE `sensorpoints` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`sensorid` int(11) DEFAULT NULL,
`hubpointid` int(11) DEFAULT NULL,
`pointlabel` varchar(255) NOT NULL,
`pointhash` varchar(255) NOT NULL,
`target` decimal(10,0) DEFAULT NULL,
`tolerance` decimal(10,0) DEFAULT '0',
`blnlivepoint` int(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `FK_sensorpoints_sensors` (`sensorid`),
CONSTRAINT `FK_sensorpoints_sensors` FOREIGN KEY (`sensorid`) REFERENCES `sensors` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=8824 DEFAULT CHARSET=latin1;
CREATE TABLE `sensors` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`hubid` int(11) DEFAULT NULL,
`sensortypeid` int(11) NOT NULL DEFAULT '5',
`pulserate` decimal(10,6) DEFAULT NULL,
`utilitytypeid` int(11) NOT NULL DEFAULT '1',
`correctionfactor` decimal(10,3) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `FK_sensors_sensortypes` (`sensortypeid`),
KEY `FK_sensors_hubs` (`hubid`),
KEY `FK_sensors_utilitytypes` (`utilitytypeid`),
CONSTRAINT `FK_sensors_hubs` FOREIGN KEY (`hubid`) REFERENCES `hubs` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `FK_sensors_sensortypes` FOREIGN KEY (`sensortypeid`) REFERENCES `sensortypes` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
) ENGINE=InnoDB AUTO_INCREMENT=5503 DEFAULT CHARSET=latin1;
CREATE TABLE `tariffs` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`companyid` int(11) DEFAULT NULL,
`utilitytypeid` int(11) DEFAULT NULL,
`startdate` date NOT NULL,
`enddate` date NOT NULL,
`blnnonunitratecharges` int(1) DEFAULT '0',
`nonunitratecharge` decimal(16,8) DEFAULT '0.00000000',
`blnFeedIn` int(1) DEFAULT '0',
`blnRoc` int(1) DEFAULT '0',
`rocRate` decimal(16,8) DEFAULT '0.00000000',
`feedInRate` decimal(16,8) DEFAULT '0.00000000',
PRIMARY KEY (`id`),
KEY `companyid` (`companyid`,`utilitytypeid`,`startdate`,`enddate`),
KEY `startdate` (`startdate`,`enddate`),
) ENGINE=InnoDB AUTO_INCREMENT=1107 DEFAULT CHARSET=latin1;
CREATE TABLE `unitrates` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`tariffid` int(11) NOT NULL,
`priceperkwh` decimal(16,8) NOT NULL,
`starttime` time NOT NULL,
`endtime` time NOT NULL,
`duoscharges` decimal(10,5) DEFAULT NULL,
`dayMask` int(11) DEFAULT '127',
`monthMask` int(11) DEFAULT '4095',
PRIMARY KEY (`id`),
KEY `FK_unitrates_tariffs` (`tariffid`),
KEY `times` (`starttime`,`endtime`),
KEY `masks` (`dayMask`,`monthMask`),
CONSTRAINT `FK_unitrates_tariffs` FOREIGN KEY (`tariffid`) REFERENCES `tariffs` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
) ENGINE=InnoDB AUTO_INCREMENT=3104 DEFAULT CHARSET=latin1;
Explains
Without groups/ordering
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|----|--------------------|-------|--------|---------------------------------|-------------------------|---------|-------------------------------|------|----------------------------------------------|
| 1 | PRIMARY | s | const | PRIMARY | PRIMARY | 4 | const | 1 | NULL |
| 1 | PRIMARY | sp | ref | PRIMARY,FK_sensorpoints_sensors | FK_sensorpoints_sensors | 5 | const | 1 | Using index |
| 1 | PRIMARY | srs | ref | PRIMARY,sensorpointid | PRIMARY | 4 | dbnameprod.sp.id | 211 | Using index |
| 1 | PRIMARY | ur | eq_ref | PRIMARY | PRIMARY | 4 | func | 1 | Using where |
| 1 | PRIMARY | t | eq_ref | PRIMARY | PRIMARY | 4 | dbnameprod.ur.tariffid | 1 | NULL |
| 2 | DEPENDENT SUBQUERY | ch | ref | hubid | hubid | 5 | const | 1 | Using where; Using temporary; Using filesort |
| 2 | DEPENDENT SUBQUERY | t | ref | PRIMARY,companyid,startdate | companyid | 10 | dbnameprod.ch.companyid,const | 1 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | ur | ref | FK_unitrates_tariffs,times | FK_unitrates_tariffs | 4 | dbnameprod.t.id | 1 | Using where |
With ordering/grouping
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|----|--------------------|-------|--------|---------------------------------------------------------------|-------------------------|---------|-------------------------------|------|----------------------------------------------|
| 1 | PRIMARY | s | const | PRIMARY | PRIMARY | 4 | const | 1 | Using temporary; Using filesort |
| 1 | PRIMARY | sp | ref | PRIMARY,FK_sensorpoints_sensors | FK_sensorpoints_sensors | 5 | const | 1 | Using index |
| 1 | PRIMARY | srs | ref | PRIMARY,sensormetricid,sensorreadings_timestamp,sensorpointid | PRIMARY | 4 | dbnameprod.sp.id | 211 | Using index |
| 1 | PRIMARY | ur | eq_ref | PRIMARY | PRIMARY | 4 | func | 1 | Using where |
| 1 | PRIMARY | t | eq_ref | PRIMARY | PRIMARY | 4 | dbnameprod.ur.tariffid | 1 | NULL |
| 2 | DEPENDENT SUBQUERY | ch | ref | hubid | hubid | 5 | const | 1 | Using where; Using temporary; Using filesort |
| 2 | DEPENDENT SUBQUERY | t | ref | PRIMARY,companyid,startdate | companyid | 10 | dbnameprod.ch.companyid,const | 1 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | ur | ref | FK_unitrates_tariffs,times | FK_unitrates_tariffs | 4 | dbnameprod.t.id | 1 | Using where |
Well you are grouping and ordering for a calculated field timeKey and db doesnt have any index on that field.
So db need to calculate all rows before doing the group by and then do the ordering and without index cant speed up the calculations.
Suggestion: Create a time field on your db and add index for that field.
Before looking into the performance, let's discuss the likelihood that the query is broken.
When doing a GROUP BY, all the non-aggregate SELECT values should be included in the GROUP BY. Otherwise, any random value can be delivered.
Furthermore, this pattern:
SELECT ..., AVG(a.x)
FROM a
JOIN b ON ...
GROUP BY a.id
usually leads to an inflation of the number of rows (due to the JOIN), followed by computing the aggregates over the inflated number of rows. Add COUNT(*) to see if I am right for your case. For COUNT, the answer can be blatantly wrong; for AVG it can be subtly wrong; for MIN it is probably correct. And finally the GROUP BY deflates the number of rows.
The usual cure is to compute the aggregates without the JOINs (I am not sure if it is possible in your case). Maybe something like...
...
JOIN (
SELECT min(srs.reading) as minReading,
avg(srs.reading) as avgReading,
from_unixtime(FLOOR(UNIX_TIMESTAMP(srs.dateTimeStamp)/
(30*60))*(30*60)) as timeKey
FROM srs
GROUP BY timeKey
) AS r
JOIN ...
It is usually a 'bad' idea to have date and time in separate columns. A DATETIME or TIMESTAMP is easier to compare against, etc. (I am unclear on what you are doing with your separate date and time.) This can also be a performance issue.
The 3 tables lead to a bunch of JOINing, making the WHERE s.id = 5289 hard to transfer to srs. You may need to rethink the schema as another performance issue.
I realize the values are different, but could
order by
t.startdate desc,
ur.starttime desc
be replaced by
order by srs.dateTimeStamp
That might lead to to being able to us an index.
I'm surprised you are using DECIMAL(m,n) instead of FLOAT for sensor readings.

Eliminating Sub query in MySQL

I have a table in MySQL as below.
CREATE TABLE `myTable` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`employeeNumber` int(11) DEFAULT NULL,
`approveDate` date DEFAULT NULL,
`documentNumber` varchar(50) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `INDEX_1` (`documentNumber`)
) ENGINE=InnoDB;
I want to make a query if a documentNumber is approved by all employeeNumber or approved by some of employeeNumber or not approved by any employeeNumber.
I made a query as below.
SELECT T1.documentNumber,
(CASE WHEN T2.currentNum = '0' THEN '1' WHEN T2.currentNum < T2.totalNum THEN '2' ELSE '3' END) AS approveStatusNumber
FROM myTable AS T1 LEFT JOIN
(SELECT documentNumber, COUNT(*) AS totalNum, SUM(CASE WHEN approveDate IS NOT NULL THEN '1' ELSE '0' END) AS currentNum
FROM myTable GROUP by documentNumber) AS T2 ON T1.documentNumber = T2.documentNumber
GROUP BY T1.documentNumber;
This SQL works, but very slow.
I tried explain on this SQL, the result is as below.
+----+-------------+------------+-------+---------------+---------+---------+------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+---------+---------+------+------+----------------------------------------------+
| 1 | PRIMARY | T1 | range | INDEX_1 | INDEX_1 | 153 | NULL | 27 | Using where; Using temporary; Using filesort |
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 5517 | |
| 2 | DERIVED | myTable | index | NULL | INDEX_1 | 153 | NULL | 5948 | Using where |
+----+-------------+------------+-------+---------------+---------+---------+------+------+----------------------------------------------+
I think I have to eliminate sub query to improve my query.
How can I do the same thing without sub query? Or do I have another way to improve my query?
The expression to return currentNum could be more succinctly expressed (in MySQL) as
SUM(approveDate IS NOT NULL)
And there's no need for an inline view. This will return an equivalent result:
SELECT t.documentNumber
, CASE
WHEN SUM(t.approveDate IS NOT NULL) = 0
THEN '1'
WHEN SUM(t.approveDate IS NOT NULL) < COUNT(*)
THEN '2'
ELSE '3'
END AS approveStatusNumber
FROM myTable t
GROUP BY t.documentNumber
ORDER BY t.documentNumber

Improve performance or redesign 'greatest-n-per-group' mysql query

I'm using MySQL5 and I currently have a query that gets me the info I need but I feel like it could be improved in terms of performance.
Here's the query I built (roughly following this guide) :
SELECT d.*, dc.date_change, dc.cwd, h.name as hub
FROM livedata_dom AS d
LEFT JOIN ( SELECT dc1.*
FROM livedata_domcabling as dc1
LEFT JOIN livedata_domcabling AS dc2
ON dc1.dom_id = dc2.dom_id AND dc1.date_change < dc2.date_change
WHERE dc2.dom_id IS NULL
ORDER BY dc1.date_change desc) AS dc ON (d.id = dc.dom_id)
LEFT JOIN livedata_hub AS h ON (d.id = dc.dom_id AND dc.hub_id = h.id)
WHERE d.cluster = 'localhost'
GROUP BY d.id;
EDIT: Using ORDER BY + GROUP BY to avoid getting multiple dom entries in case 'domcabling' has an entry with null date_change and another one with a date for the same 'dom'.
I feel like I'm killing a mouse with a bazooka. This query takes more than 3 seconds with only about 5k entries in 'livedata_dom' and 'livedata_domcabling'. Also, EXPLAIN tells me that 2 filesorts are used:
+----+-------------+------------+--------+-----------------------------+-----------------------------+---------+-----------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+-----------------------------+-----------------------------+---------+-----------------+------+----------------------------------------------+
| 1 | PRIMARY | d | ALL | NULL | NULL | NULL | NULL | 3 | Using where; Using temporary; Using filesort |
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 3 | |
| 1 | PRIMARY | h | eq_ref | PRIMARY | PRIMARY | 4 | dc.hub_id | 1 | |
| 2 | DERIVED | dc1 | ALL | NULL | NULL | NULL | NULL | 4 | Using filesort |
| 2 | DERIVED | dc2 | ref | livedata_domcabling_dc592d9 | livedata_domcabling_dc592d9 | 4 | live.dc1.dom_id | 2 | Using where; Not exists |
+----+-------------+------------+--------+-----------------------------+-----------------------------+---------+-----------------+------+----------------------------------------------+
How could I change this query to make it more efficient?
Using the dummy data (provided below), this is the expected result:
+-----+-------+---------+--------+----------+------------+-----------+---------------------+------+-----------+
| id | mb_id | prod_id | string | position | name | cluster | date_change | cwd | hub |
+-----+-------+---------+--------+----------+------------+-----------+---------------------+------+-----------+
| 249 | 47 | 47 | 47 | 47 | SuperDOM47 | localhost | NULL | NULL | NULL |
| 250 | 48 | 48 | 48 | 48 | SuperDOM48 | localhost | 2014-04-16 05:23:00 | 32A | megahub01 |
| 251 | 49 | 49 | 49 | 49 | SuperDOM49 | localhost | NULL | 22B | megahub01 |
+-----+-------+---------+--------+----------+------------+-----------+---------------------+------+-----------+
Basically I need 1 row for every 'dom' entry, with
the 'domcabling' record with the highest date_change
if record does not exist, I need null fields
ONE entry may have a null date_change field per dom (null datetime field considered older than any other datetime)
the name of the 'hub', when a 'domcabling' entry is found, null otherwise
CREATE TABLE + dummy INSERT for the 3 tables:
livedata_dom (about 5000 entries)
CREATE TABLE `livedata_dom` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`mb_id` varchar(12) NOT NULL,
`prod_id` varchar(8) NOT NULL,
`string` int(11) NOT NULL,
`position` int(11) NOT NULL,
`name` varchar(30) NOT NULL,
`cluster` varchar(9) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `mb_id` (`mb_id`),
UNIQUE KEY `prod_id` (`prod_id`),
UNIQUE KEY `name` (`name`),
UNIQUE KEY `livedata_domgood_string_7bff074107b0e5a0_uniq` (`string`,`position`,`cluster`)
) ENGINE=InnoDB AUTO_INCREMENT=5485 DEFAULT CHARSET=latin1;
INSERT INTO `livedata_dom` VALUES (251,'49','49',49,49,'SuperDOM49','localhost'),(250,'48','48',48,48,'SuperDOM48','localhost'),(249,'47','47',47,47,'SuperDOM47','localhost');
livedata_domcabling (about 10000 entries and growing slowly)
CREATE TABLE `livedata_domcabling` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`dom_id` int(11) NOT NULL,
`hub_id` int(11) NOT NULL,
`cwd` varchar(3) NOT NULL,
`date_change` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `livedata_domcabling_dc592d9` (`dom_id`),
KEY `livedata_domcabling_4366aa6e` (`hub_id`),
CONSTRAINT `dom_id_refs_id_73e89ce0c50bf0a6` FOREIGN KEY (`dom_id`) REFERENCES `livedata_dom` (`id`),
CONSTRAINT `hub_id_refs_id_179c89d8bfd74cdf` FOREIGN KEY (`hub_id`) REFERENCES `livedata_hub` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=5397 DEFAULT CHARSET=latin1;
INSERT INTO `livedata_domcabling` VALUES (1,251,1,'22B',NULL),(2,250,1,'33A',NULL),(6,250,1,'32A','2014-04-16 05:23:00'),(5,250,1,'22B','2013-05-22 00:00:00');
livedata_hub (about 100 entries)
CREATE TABLE `livedata_hub` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(14) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `name` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=98 DEFAULT CHARSET=latin;
INSERT INTO `livedata_hub` VALUES (1,'megahub01');
Try this rewriting (tested in SQL-Fiddle:
SELECT
d.*, dc.date_change, dc.cwd, h.name as hub
FROM
livedata_dom AS d
LEFT JOIN
livedata_domcabling as dc
ON dc.id =
( SELECT id
FROM livedata_domcabling AS dcc
WHERE dcc.dom_id = d.id
ORDER BY date_change DESC
LIMIT 1
)
LEFT JOIN
livedata_hub AS h
ON dc.hub_id = h.id
WHERE
d.cluster = 'localhost' ;
And index on (dom_id, date_change) would help efficiency.
I'm not sure about the selectivity of d.cluster = 'localhost' (how many rows of the livedata_dom table match this condiiton?) but adding an index on (cluster) might help as well.
set #rn := 0, #dom_id := 0;
select d.*, dc.date_change, dc.cwd, h.name as hub
from
livedata_dom d
left join (
select
hub_id, date_change, cwd, dom_id,
if(#dom_id = dom_id, #rn := #rn + 1, #rn := 1) as rn,
#dom_id := dom_id as dm_id
from
livedata_domcabling
order by dom_id, date_change desc
) dc on d.id = dc.dom_id
left join
livedata_hub h on h.id = dc.hub_id
where rn = 1 or rn is null
order by dom_id
The data you posted does not have the dom_id 249. And the #250 has one null date so it comes first. So your result does not reflect what I understand form your question.

Optimize LEFT JOIN on table with 30 000+ rows

I have a website where visitors can leave comments. I want to add the ability to answer comments (i.e. nested comments).
At first this query was fast but after I populated the table with the existing comments (about 30000) a simple query like:
SELECT c.id, c2.id
FROM (SELECT id
FROM swb_comments
WHERE pageId = 1411
ORDER BY id DESC
LIMIT 10) AS c
LEFT JOIN swb_comments AS c2 ON c.id = c2.parentId
took over 2 seconds, with no childComments(!).
How do I optimize a query like this? On possible solution would be http://www.ferdychristant.com/blog//articles/DOMM-7QJPM7 (scroll to "The Flat Table Model done right") but this makes pagination rather difficult (how do I limit to 10 parent comments within 1 query?)
The table has 3 indexes, id, pageId and ParentId.
Thanks in advance!
EDIT:
Table definition added. This is the full definition with some differences to the above SELECT query, (i.e. pageId instead of numberId to avoid confussion)
CREATE TABLE `swb_comments` (
`id` mediumint(9) NOT NULL auto_increment,
`userId` mediumint(9) unsigned NOT NULL default '0',
`numberId` mediumint(9) unsigned default NULL,
`orgId` mediumint(9) unsigned default NULL,
`author` varchar(100) default NULL,
`email` varchar(255) NOT NULL,
`message` text NOT NULL,
`IP` varchar(40) NOT NULL,
`timestamp` varchar(25) NOT NULL,
`editedTimestamp` varchar(25) default NULL COMMENT 'last edited timestamp',
`status` varchar(20) NOT NULL default 'publish',
`parentId` mediumint(9) unsigned NOT NULL default '0',
`locale` varchar(10) NOT NULL,
PRIMARY KEY (`id`),
KEY `userId` (`userId`),
KEY `numberId` (`numberId`),
KEY `orgId` (`orgId`),
KEY `parentId` (`parentId`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=34748 ;
The issue is that MySQL cannot apply index if it need to deal with a result from a derived query (that's why you have NULL in the possible_keys column). So I suggest to filter out ten comments that you need:
SELECT * FROM swb_comments WHERE pageId = 1411 ORDER BY id DESC LIMIT 10
And after that send separate request to get answers for each comment id:
SELECT * FROM swb_comments WHERE parentId IN ($commentId1, $commentId2, ..., $commentId10)
In this case database engine will be able to apply pageId and parentId indexes efficiently.
If Mr Fedorenko is correct and the subquery is causing the optimiser difficulties, could you not try...
SELECT c.id, c2.id
FROM swb_comments c LEFT JOIN swb_comments c2 ON c.id = c2.parentID
WHERE c.pageId = 1411
ORDER BY c.id DESC
LIMIT 10;
and see if it's any improvement?
Later - I have created a table using your definition, filled it in with 30,000 skeletal rows, and tried both the queries. They both complete in too short a time to notice. The explain plans are here...
mysql> EXPLAIN SELECT c.id, c2.id
FROM swb_comments c LEFT JOIN swb_comments c2 ON c.id = c2.parentID
WHERE c.numberId = 1411 ORDER BY c.id DESC LIMIT 10;
+----+-------------+-------+------+---------------+----------+---------+------------+------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+----------+---------+------------+------+-----------------------------+
| 1 | SIMPLE | c | ref | numberId | numberId | 4 | const | 1 | Using where; Using filesort |
| 1 | SIMPLE | c2 | ref | parentId | parentId | 3 | books.c.id | 14 | |
+----+-------------+-------+------+---------------+----------+---------+------------+------+-----------------------------+
mysql> EXPLAIN SELECT c.id, c2.id
FROM swb_comments c LEFT JOIN swb_comments c2 ON c.id = c2.parentID
WHERE c.numberId = 1411 ORDER BY c.id DESC LIMIT 10;
+----+-------------+-------+------+---------------+----------+---------+------------+------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+----------+---------+------------+------+-----------------------------+
| 1 | SIMPLE | c | ref | numberId | numberId | 4 | const | 1 | Using where; Using filesort |
| 1 | SIMPLE | c2 | ref | parentId | parentId | 3 | books.c.id | 14 | |
+----+-------------+-------+------+---------------+----------+---------+------------+------+-----------------------------+
and are exactly what I'd expect.
This is very mysterious.
I'll think about it a bit more to see if there's anything else we can try.

MySQL: Usage of indices in UNION subselects

In MySQL 5.0.75-0ubuntu10.2 I've got a fixed table layout like that:
Table parent with an id
Table parent2 with an id
Table children1 with a parentId
CREATE TABLE `Parent` (
`id` int(11) NOT NULL auto_increment,
`name` varchar(200) default NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB
CREATE TABLE `Parent2` (
`id` int(11) NOT NULL auto_increment,
`name` varchar(200) default NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB
CREATE TABLE `Children1` (
`id` int(11) NOT NULL auto_increment,
`parentId` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `parent` (`parentId`)
) ENGINE=InnoDB
A children has a parent in one of the tables Parent or Parent2. When I need to get a children I use a query like that:
select * from Children1 c
inner join (
select id as parentId from Parent
union
select id as parentId from Parent2
) p on p.parentId = c.parentId
Explaining this query yields:
+----+--------------+------------+-------+---------------+---------+---------+------+------+-----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+------------+-------+---------------+---------+---------+------+------+-----------------------------------------------------+
| 1 | PRIMARY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Impossible WHERE noticed after reading const tables |
| 2 | DERIVED | Parent | index | NULL | PRIMARY | 4 | NULL | 1 | Using index |
| 3 | UNION | Parent2 | index | NULL | PRIMARY | 4 | NULL | 1 | Using index |
| NULL | UNION RESULT | <union2,3> | ALL | NULL | NULL | NULL | NULL | NULL | |
+----+--------------+------------+-------+---------------+---------+---------+------+------+-----------------------------------------------------+
4 rows in set (0.00 sec)
which is reasonable given the layout.
Now the problem: The previous query is somewhat useless, since it returns no columns from the parent elements. In the moment I add more columns to the inner query no index will be used anymore:
mysql> explain select * from Children1 c inner join ( select id as parentId,name from Parent union select id as parentId,name from Parent2 ) p on p.parentId = c.parentId;
+----+--------------+------------+------+---------------+------+---------+------+------+-----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+------------+------+---------------+------+---------+------+------+-----------------------------------------------------+
| 1 | PRIMARY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Impossible WHERE noticed after reading const tables |
| 2 | DERIVED | Parent | ALL | NULL | NULL | NULL | NULL | 1 | |
| 3 | UNION | Parent2 | ALL | NULL | NULL | NULL | NULL | 1 | |
| NULL | UNION RESULT | <union2,3> | ALL | NULL | NULL | NULL | NULL | NULL | |
+----+--------------+------------+------+---------------+------+---------+------+------+-----------------------------------------------------+
4 rows in set (0.00 sec)
Can anyone explain why the (PRIMARY) indices are not used any more? Is there a workaround for this problem if possible without having to change the DB layout?
Thanks!
I think that the optimizer falls down once you start pulling out multiple columns in the derived query because of the possibility that it would need to convert data types on the union (not in this case, but in general). It may also be due to the fact that your query essentially wants to be a correlated derived subquery, which isn't possible (from dev.mysql.com):
Subqueries in the FROM clause cannot be correlated subqueries, unless used within the ON clause of a JOIN operation.
What you are trying to do (but isn't valid) is:
select * from Children1 c
inner join (
select id as parentId from Parent where Parent.id = c.parentId
union
select id as parentId from Parent2 where Parent.id = c.parentId
) p
Result: "Unknown column 'c.parentId' in 'where clause'.
Is there a reason you don't prefer two left joins and IFNULLs:
select *, IFNULL(p1.name, p2.name) AS name from Children1 c
left join Parent p1 ON p1.id = c.parentId
left join Parent2 p2 ON p2.id = c.parentId
The only difference between the queries is that in yours you'll get two rows if there is a parent in each table. If that's what you want/need then this will work well also and joins will be fast and always make use of the indexes:
(select * from Children1 c join Parent p1 ON p1.id = c.parentId)
union
(select * from Children1 c join Parent2 p2 ON p2.id = c.parentId)
My first thought is to insert a "significant" number of records in the tables and use ANALYZE TABLE to update the statistics. A table with 4 records will always be faster to read using a full scan rather then going via the index!
Further, you can try USE INDEX to force the usage of the index and look how the plan changes.
I will also recomend reading this documentation and see which bits are relevant
MYSQL::Optimizing Queries with EXPLAIN
This article can also be useful
7 ways to convince MySQL to use the right index