My problem is that I have a mysql query that runs really fast (0.3 seconds) even though it has a large amount of left joins and a few conditions on the joined columns, but when I add one more condition the query takes upwards of 180 seconds! I understand that the condition means the execution plan has to adjust to pull all potential records first and then apply the condition in a loop, but what's weird to me is that the fast query without the additional condition only returns 16 rows, and even just wrapping the query with the condition on the outer query takes a crazy amount of time when you would think it would only just add an additional loop through 16 rows...
If it matters this is using Amazon Aurora serverless which should align with mysql 5.7
Here's what the query looks like. You can see the additional condition is commented out. (The general table structure of the DB itself cannot change currently so please refrain from suggesting a full database restructuring)
select
e1.entityId as _id,
v1.Value,
v2.Value
v3.Value,
v4.Value,
v5.Value,
v6.Value,
v7.Value,
v8.Value,
v9.Value,
v10.Value,
v11.Value,
v12.Value
from entity e1
left join val as v1 on (v1.entityId = e1.entityId and v1.attributeId = 1189)
left join val as v2 on (v2.entityId = e1.entityId and v2.attributeId = 1190)
left join entity as e2 on e2.entityId = (select entityId from entity where code = v1.Value and type = 88 limit 1)
left join val as v3 on (v3.entityId = e2.entityId and v3.attributeId = 507)
left join val as v4 on (v4.entityId = e2.entityId and v4.attributeId = 522)
left join val as v5 on (v5.entityId = e2.entityId and v5.attributeId = 558)
left join val as v6 on (v6.entityId = e2.entityId and v6.attributeId = 516)
left join val as v7 on (v7.entityId = e2.entityId and v7.attributeId = 518)
left join val as v8 on (v8.entityId = e2.entityId and v8.attributeId = 1384)
left join val as v9 on (v9.entityId = e2.entityId and v9.attributeId = 659)
left join val as v10 on (v10.entityId = e2.entityId and v10.attributeId = 519)
left join val as v11 on (v11.entityId = e2.entityId and v11.attributeId = 1614)
left join entity as e3 on e3.entityId = (select entityId from entity where code = v9.Value and type = 97 limit 1)
left join val as v12 on (v12.entityId = e3.entityId and v12.attributeId = 661)
where e1.type = 154
and v2.Value = 'foo'
and v5.Value = 'bar'
and v10.Value = 'foo2'
-- and v11`.Value = 'bar2'
order by v3.Value asc;
And wrapping that in something like this still takes forever...
select *
from (
<query from above>
) sub
where sub.v11 = 'bar2';
query execution plan with the condition commented out (fast)
query execution plan with the condition included (slow)
I'm going to fiddle around with indexing on the "entity" tables to improve the execution plan regardless which will likely help... but can someone explain what's going on here and what I should be looking at in the execution plan that would indicate such bad performance? And why wrapping the fast query in a subquery so that the outer query should only loop over 16 rows takes a really long time?
EDIT: I noticed in the slow query that the far left execution is using a non-unique key lookup (which is on val.entityId) for "68e9145e-43eb-4581-9727-4212be41bef5" (v11) instead of the unique key lookup the rest are using (which is a composite index on entityId,attributeId). I presume this might be part of the issue, but why can't it use the the composite index there like it does for the rest?
PS: For now since we know the result set will be small, we are implementing that last condition server side with a filter on the result set in our nodeJS server.
Here's the results of "SHOW CREATE TABLE entity" and "SHOW CREATE TABLE val"
CREATE TABLE `entity` (
`entityId` int(11) NOT NULL AUTO_INCREMENT,
`UID` varchar(64) NOT NULL,
`type` int(11) NOT NULL,
`code` longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci
PRIMARY KEY (`entityId`),
UNIQUE KEY `UID` (`UID`),
KEY `IX_Entity_Type` (`type`),
CONSTRAINT `FK_Entities_Types` FOREIGN KEY (`type`) REFERENCES `entityTypes` (`typeId`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=296138 DEFAULT CHARSET=latin1
CREATE TABLE `val` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`UID` varchar(64) NOT NULL,
`attributeId` int(11) NOT NULL,
`entityId` int(11) NOT NULL,
`Value` longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci
PRIMARY KEY (`id`),
UNIQUE KEY `UID` (`UID`),
UNIQUE KEY `idx_val_entityId_attributeId` (`entityId`,`attributeId`),
KEY `IX_val_attributeId` (`attributeId`),
KEY `IX_val_entityId` (`entityId`)
) ENGINE=InnoDB AUTO_INCREMENT=2325375 DEFAULT CHARSET=latin1
Please provide SHOW CREATE TABLE.
I would hope to see these composite indexes:
`val`: (entityId, attributeId) -- order is not critical
Alas, because code is LONGTEXT, this is not possible for entity: INDEX(type, code, entityId). Hence this will not be very efficient:
SELECT entityId
from entity
where code = v9.Value
and type = 97
limit 1
I see LIMIT with an ORDER BY -- do you care which value you get?
Probably that would be better written as
WHERE EXISTS ( SELECT 1 FROM entity
WHERE entityID = e3.entityID
AND code = v9.Value
AND type = 97 )
(Are you sure about the mixture of e3 and v9?)
Wrapping...
This forces the LEFT JOIN to become JOIN. And it gets rid of the then inner ORDER BY.
Then the Optimizer probably decides it is best to start with 68e9145e-43eb-4581-9727-4212be41bef5, which I call val AS v11:
JOIN val AS v11 ON (v11.entityId = e2.id
and v11.attributeId = 1614)
AND v11.Value = 'bar2')
If this is an EAV table, then all it does is verify that [, 1514] has value 'bar2'. This does not seem like a sensible test.
in addition to my former recommendation.
I would prefer EXPLAIN SELECT ....
EAV
Assuming val is a traditional EAV table, this would probably be much better:
CREATE TABLE `val` (
`attributeId` int(11) NOT NULL,
`entityId` int(11) NOT NULL,
`Value` longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci
PRIMARY KEY(`entityId`,`attributeId`),
KEY `IX_val_attributeId` (`attributeId`),
) ENGINE=InnoDB AUTO_INCREMENT=2325375 DEFAULT CHARSET=latin1
The two IDs have no practical use (unless I am missing something). If you are forced to use them because of a framework, that is unfortunate. Promoting (entityId, attributeId) to be the PK makes fetching value a little faster.
There is no useful way to include a LONGTEXT in any index, so some of my previous suggestions need changing.
Related
/* Below will fetch all completed task.*/
insert ignore into NodeInstanceLog_Dump
select nil.id, nil.connection, nil.log_date, nil.externalId,
nil.nodeContainerId, nil.nodeId ,nil.nodeInstanceId,
coalesce(nil.nodename, nil3.name)nodename, nil.nodeType, nil.processId,
nil.processInstanceId , nil.referenceId, nil.slaCompliance, nil.sla_due_date,
nil.type, nil.workItemId, 0 as activeStatus
from bpm.NodeInstanceLog nil
inner join bpm.VariableInstanceLog vil
ON nil.processInstanceId=vil.processInstanceId
and vil.value='Success'
and vil.variableId in ('oltOrderStatus','orderStatus')
and nodeType='EndNode'
and type=0
left join
(
SELECT distinct nil2.*,nil1.nodeName name
from bpm.NodeInstanceLog nil1 inner join
(
SELECT max(convert(nodeinstanceid,signed))id,processInstanceId
from bpm.NodeInstanceLog
where nodetype='HumanTaskNode'group by processInstanceId
)nil2 ON nil1.nodeinstanceid=nil2.id
and nil1.processInstanceId=nil2.processInstanceId
)nil3 ON nil.processInstanceId=nil3.processInstanceId;
/* Below will fetch all aborted task.*/
insert ignore into NodeInstanceLog_Dump
select nil.id, nil.connection, nil.log_date, nil.externalId,
nil.nodeContainerId, nil.nodeId ,nil.nodeInstanceId,
coalesce(nil.nodename, nil3.name)nodename, nil.nodeType, nil.processId,
nil.processInstanceId , nil.referenceId, nil.slaCompliance, nil.sla_due_date,
nil.type, nil.workItemId, 0 as activeStatus
from bpm.NodeInstanceLog nil
inner join bpm.VariableInstanceLog vil
ON nil.processInstanceId=vil.processInstanceId
and vil.value='Aborted'
and vil.variableId in ('oltOrderStatus','orderStatus')
and nodeType='EndNode'
and type=0
left join
(
SELECT distinct nil2.*,nil1.nodeName name
from bpm.NodeInstanceLog nil1 inner join
(
SELECT max(convert(nodeinstanceid,signed))id,processInstanceId
from bpm.NodeInstanceLog
where nodetype='HumanTaskNode'group by processInstanceId
)nil2 ON nil1.nodeinstanceid=nil2.id
and nil1.processInstanceId=nil2.processInstanceId
)nil3 ON nil.processInstanceId=nil3.processInstanceId;
(from comment)
Table:NodeInstanceLog Columns:
id bigint(20) AI PK
connection varchar(255)
log_date datetime
externalId varchar(255)
nodeId varchar(255)
nodeInstanceId varchar(255)
nodeName varchar(255)
nodeType varchar(255)
processId varchar(255)
processInstanceId bigint(20)
sla_due_date datetime
slaCompliance int(11)
type int(11)
workItemId bigint(20)
nodeContainerId varchar(255)
referenceId bigint(20)
Some of these indexes may help:
NodeInstanceLog: INDEX(processInstanceId)
NodeInstanceLog: INDEX(nodeinstanceid, nodeName, processInstanceId)
VariableInstanceLog: INDEX(processInstanceId, value, variableId)
When adding a composite index, DROP index(es) with the same leading columns.
That is, when you have both INDEX(a) and INDEX(a,b), toss the former.
max(convert(nodeinstanceid,signed)) -- Does this mean that nodeinstanceid is a VARCHAR, but needs to be compare as a number? I recommend you find a way to store it in an INT or other type of numeric column; this may allow the query to run much faster. Also, is that column in NodeInstanceLog? Is it the PRIMARY KEY? What indexes (including the PK) exists now for the tables?
To help readers understand the query, please use ON for specifying how the tables relate, use WHERE for filtering.
Please qualify all columns with their table alias -- I can't tell, for example, which table type and nodeType are in. Hence, the INDEX recommendations above may not be complete.
Once upon a time, I had a table like this:
CREATE TABLE `Events` (
`EvtId` INT UNSIGNED NOT NULL AUTO_INCREMENT,
`AlarmId` INT UNSIGNED,
-- Other fields omitted for brevity
PRIMARY KEY (`EvtId`)
);
AlarmId was permitted to be NULL.
Now, because I want to expand from zero-or-one alarm per event to zero-or-more alarms per event, in a software update I'm changing instances of my database to have this instead:
CREATE TABLE `Events` (
`EvtId` INT UNSIGNED NOT NULL AUTO_INCREMENT,
-- Other fields omitted for brevity
PRIMARY KEY (`EvtId`)
);
CREATE TABLE `EventAlarms` (
`EvtId` INT UNSIGNED NOT NULL,
`AlarmId` INT UNSIGNED NOT NULL,
PRIMARY KEY (`EvtId`, `AlarmId`),
CONSTRAINT `fk_evt` FOREIGN KEY (`EvtId`) REFERENCES `Events` (`EvtId`)
ON DELETE CASCADE ON UPDATE CASCADE
);
So far so good.
The data is easy to migrate, too:
INSERT INTO `EventAlarms`
SELECT `EvtId`, `AlarmId` FROM `Events` WHERE `AlarmId` IS NOT NULL;
ALTER TABLE `Events` DROP COLUMN `AlarmId`;
Thing is, my system requires that a downgrade also be possible. I accept that downgrades will sometimes be lossy in terms of data, and that's okay. However, they do need to work where possible, and result in the older database structure while making a best effort to keep as much original data as is reasonably possible.
In this case, that means going from zero-or-more alarms per event, to zero-or-one alarm per event. I could do it like this:
ALTER TABLE `Events` ADD COLUMN `AlarmId` INT UNSIGNED;
UPDATE `Events`
LEFT JOIN `EventAlarms` USING(`EvtId`)
SET `Events`.`AlarmId` = `EventAlarms`.`AlarmId`;
DROP TABLE `EventAlarms`;
… which is kind of fine, since I don't really care which one gets kept (it's best-effort, remember). However, as warned, this is not good for replication as the result may be unpredictable:
> SHOW WARNINGS;
Unsafe statement written to the binary log using statement format since
BINLOG_FORMAT = STATEMENT. Statements writing to a table with an auto-
increment column after selecting from another table are unsafe because the
order in which rows are retrieved determines what (if any) rows will be
written. This order cannot be predicted and may differ on master and the
slave.
Is there a way to somehow "order" or "limit" the join in the update, or shall I just skip this whole enterprise and stop trying to be clever? If the latter, how can I leave the downgraded AlarmId as NULL iff there were multiple rows in the new table between which we cannot safely distinguish? I do want to migrate the AlarmId if there is only one.
As a downgrade is a "one-time" maintenance operation, it doesn't have to be exactly real-time, but speed would be nice. Both tables could potentially have thousands of rows.
(MariaDB 5.5.56 on CentOS 7, but must also work on whatever ships with CentOS 6.)
First, we can perform a bit of analysis, with a self-join:
SELECT `A`.`EvtId`, COUNT(`B`.`EvtId`) AS `N`
FROM `EventAlarms` AS `A`
LEFT JOIN `EventAlarms` AS `B` ON (`A`.`EvtId` = `B`.`EvtId`)
GROUP BY `B`.`EvtId`
The result will look something like this:
EvtId N
--------------
370 1
371 1
372 4
379 1
380 1
382 16
383 1
384 1
Now you can, if you like, drop all the rows representing events that map to more than one alarm (which you suggest as a fallback solution; I think this makes sense, though you could modify the below to leave one of them in place if you really wanted).
Instead of actually DELETEing anything, though, it's easier to introduce a new table, populated using the self-joining query shown above:
CREATE TEMPORARY TABLE `_migrate` (
`EvtId` INT UNSIGNED,
`n` INT UNSIGNED,
PRIMARY KEY (`EvtId`),
KEY `idx_n` (`n`)
);
INSERT INTO `_migrate`
SELECT `A`.`EvtId`, COUNT(`B`.`EvtId`) AS `n`
FROM `EventAlarms` AS `A`
LEFT JOIN `EventAlarms` AS `B` ON(`A`.`EvtId` = `B`.`EvtId`)
GROUP BY `B`.`EvtId`;
Then your update becomes:
UPDATE `Events`
LEFT JOIN `_migrate` ON (`Events`.`EvtId` = `_migrate`.`EvtId` AND `_migrate`.`n` = 1)
LEFT JOIN `EventAlarms` ON (`_migrate`.`EvtId` = `EventAlarms`.`EvtId`)
SET `Events`.`AlarmId` = `EventAlarms`.`AlarmId`
WHERE `EventAlarms`.`AlarmId` IS NOT NULL
And, finally, clean up after yourself:
DROP TABLE `_migrate`;
DROP TABLE `EventAlarms`;
MySQL still kicks out the same warning as before, but since know that at most one value will be pulled from the source tables, we can basically just ignore it.
It should even be reasonably efficient, as we can tell from the equivalent EXPLAIN SELECT:
EXPLAIN SELECT `Events`.`EvtId` FROM `Events`
LEFT JOIN `_migrate` ON (`Events`.`EvtId` = `_migrate`.`EvtId` AND `_migrate`.`n` = 1)
LEFT JOIN `EventAlarms` ON (`_migrate`.`EvtId` = `EventAlarms`.`EvtId`)
WHERE `EventAlarms`.`AlarmId` IS NOT NULL
id select_type table type possible_keys key key_len ref rows Extra
---------------------------------------------------------------------------------------------------------------------
1 SIMPLE _migrate ref PRIMARY,idx_n idx_n 5 const 6 Using index
1 SIMPLE EventAlarms ref PRIMARY,fk_AlarmId PRIMARY 8 db._migrate.EvtId 1 Using where; Using index
1 SIMPLE Events eq_ref PRIMARY PRIMARY 8 db._migrate.EvtId 1 Using where; Using index
Use a subquery and user variables to select just one EventAlarms
In your update instead of EventAlarms use
( SELECT `EvtId`, `AlarmId`
FROM ( SELECT `EvtId`, `AlarmId`,
#rn := if ( #EvtId = `EvtId`
#rn + 1,
if ( #EvtId := `EvtId` , 1, 1)
) as rn
FROM `EventAlarms`
CROSS JOIN ( SELECT #EvtId := 0, #rn := 0) as vars
ORDER BY EvtId, AlarmId
) as t
WHERE rn = 1
) as SingleEventAlarms
I have a table that occasionally has duplicate row values, so I want to update anything except the first one and flag it as a duplicate. Currently I'm using this but it can be very slow:
UPDATE _gtemp X
JOIN _gtemp Y
ON CONCAT(X.gt_spid, "-", X.gt_cov) = CONCAT(Y.gt_spid, "-", Y.gt_cov)
AND Y.gt_dna = 0
AND Y.gt_gtid < X.gt_gtid
SET X.gt_dna = 1;
gt_spid is a numerical ID, and gt_cov is CHAR(3). I have an index on gt_spid and a 2nd index on gt_spid, gt_cov. At times this table can be upwards of 250,000 rows, but even at 30,000 it takes forever.
Is there a better way to accomplish this? I can change the table as needed.
CREATE TABLE `_gtemp` (
`gt_gtid` int(11) NOT NULL AUTO_INCREMENT,
`gt_group` varchar(10) DEFAULT NULL,
`gt_spid` int(11) DEFAULT NULL,
`gt_cov` char(3) DEFAULT NULL,
`gt_dna` tinyint(1) DEFAULT '0'
PRIMARY KEY (`gt_gtid`),
KEY `spid` (`gt_spid`),
KEY `spidcov` (`gt_spid`,`gt_cov`) USING HASH
)
The way you have used CONCAT makes MySQL optimizer lose it's indexes, resulting in very slow running query.
That's why you need to replace CONCAT with AND statements like below
UPDATE
_gtemp X
JOIN
_gtemp Y
ON
X.gt_spid = Y.gt_spid
AND
X.gt_cov = Y.gt_cov
AND
Y.gt_dna = 0
AND
Y.gt_gtid < X.gt_gtid
SET X.gt_dna = 1;
You can eliminate CONCAT in ON clause and replace it with AND as follows.
Also have moved one restriction from ON to WHERE clause.
Add index to gt_dna
UPDATE _gtemp X
JOIN _gtemp Y
ON X.gt_spid = Y.gt_spid
AND X.gt_cov = Y.gt_cov
AND Y.gt_dna = 0
SET X.gt_dna = 1
WHERE Y.gt_gtid < X.gt_gtid
I have this table:
CREATE TABLE `page` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`sortorder` SMALLINT(5) UNSIGNED NOT NULL,
PRIMARY KEY (`id`)
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB
;
This is the data I have:
id sortorder
1 0
2 1
And I want to run this query:
select id from page where (sortorder = (select sortorder from page where id = 1) - 1)
(I'm trying to find the previous page, ie the one with the lower sortorder, if it exists. If none exists, I want an empty result set.)
The error I receive from mysql:
SQL Error (1690): BIGINT UNSIGNED value is out of range in '((select '0' from `page` where 1) - 1)'
And more specifically when I run:
select sortorder - 1 from page where id = 1
I get:
SQL Error (1690): BIGINT UNSIGNED value is out of range in '('0' - 1)'
What can I do to prevent this?
I usually use JOINs for this goal because they can be optimized better than the sub-queries. This query should produce the same result as yours but probably faster:
SELECT pp.*
FROM page cp # 'cp' from 'current page'
LEFT JOIN page pp # 'pp' from 'previous page'
ON pp.sortorder = cp.sortorder - 1
WHERE cp.id = 1
Unfortunately it fails running with the same error message about -1 not being UNSIGNED.
It can be fixed by writing the JOIN condition as:
ON pp.sortorder + 1 = cp.sortorder
I moved the -1 to the other side of the equal sign and it turned to +1.
You can also fix your original query by using the same trick: moving -1 to the other side of the equal sign; this way it becomes +1 and there is no error any more:
select id
from page
where (sortorder + 1 = (select sortorder from page where id = 1)
The problem with both queries now is that, because there is no index on column sortorder, MySQL is forced to check all the rows one by one until it finds one matching the WHERE (or ON) condition and this takes a lot of time and uses a lot of resources.
Fortunately, this can be fixed easily by adding an index on column sortorder:
ALTER TABLE page ADD INDEX(sortorder);
Now both queries can be used. The one using JOIN (and the ON condition with +1) is slightly faster.
The original query doesn't return any rows when the condition is not met. The JOIN query returns a row full of NULLs. It can be modified to return no rows by replacing LEFT JOIN with INNER JOIN.
You can circumvent the error altogether (and use any version of these queries) by removing the UNSIGNED attribute from column sortorder:
ALTER TABLE page
CHANGE COLUMN `sortorder` `sortorder` SMALLINT(5) UNSIGNED NOT NULL;
Try to set your SQL Mode in 'NO_UNSIGNED_SUBTRACTION'
SET sql_mode = 'NO_UNSIGNED_SUBTRACTION'
I've a table friends :
CREATE TABLE IF NOT EXISTS `friends` (
`fr_id` int(11) NOT NULL AUTO_INCREMENT,
`fr_sender` int(11) NOT NULL,
`fr_receiver` int(11) NOT NULL,
`fr_validate` enum('0','1','2') NOT NULL DEFAULT '0',
PRIMARY KEY (`fr_id`),
KEY `fr_sender` (`fr_sender`),
KEY `fr_receiver` (`fr_receiver`),
KEY `fr_validate` (`fr_validate`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=2397953 ;
fr_id => index
fr_sender => sender of the friend request
fr_receiver => the receiver
fr_falidate => 0 = no reply, 1 = request accepted, 2 = request refused.
My mysql-slow.log have many lines for this query:
SELECT fr_id FROM friends WHERE (fr_sender = '113405' OR fr_receiver = '113405') && fr_validate = "1";
# Query_time: 5.607869 Lock_time: 0.000052 Rows_sent: 106 Rows_examined: 833517
How can I optimise my index for this query ?
Thank you.
It's tricky to optimize when you use OR between conditions for two different columns. It often results in a costly table-scan.
Here's a workaround:
ALTER TABLE friends
ADD INDEX (fr_validate, fr_sender),
ADD INDEX (fr_validate, fr_receiver);
SELECT fr_id FROM friends WHERE fr_validate = '1' AND fr_sender = '113405'
UNION
SELECT fr_id FROM friends WHERE fr_validate = '1' AND fr_receiver = '113405'
The reason to create two indexes is so that each subquery has a respective index to minimize the rows examined as much as possible. Then the results from each subquery are combined, which gives an equivalent result set as your original query.
PS: Please use single-quotes for string literals and date literals. MySQL allows double-quotes to serve the same role by default, but if you use another RDBMS brand, or if you SET SQL_MODE=ANSI_QUOTES in MySQL, you'll find the standard meaning of double-quotes is for delimiting table names and column names, not strings.