Percona performance issue vs MySQL - mysql

Our server company advised we switch to Percona when setting up our new DB servers. So we're currently on Percona Server (GPL), Release 82.0 (version 5.6.36-82.0) , Revision 58e846a and there's one behavior I'm trying to wrap my head around that we definitely weren't experiencing before with MySql so I thought I'd reach out:
This is a query we perform fairly regularly to pull an article from our db
SELECT * FROM table_a a, table_b b
WHERE a.id = b.id AND a.status_field = 'open' AND b.filter_field = 'no_filter' AND b.view_field = 'article'
ORDER BY a.unixtimestamp DESC LIMIT 1
This used to complete very quickly but under Percona, the combination of the where conditions from table b and ordering from table a makes the whole query take ~3s. I don't fully understand this behaviour.
If I alter it to:
SELECT * FROM table_a a, table_b b
WHERE a.id = b.id AND a.status_field = 'open' AND b.filter_field = 'no_filter' AND b.view_field = 'article'
ORDER BY b.unixtimestamp DESC LIMIT 1
Then it completes very quickly (< 0.05s)
Is this sort of an expected behavior with Percona?
I just wanted to know before changing any db structure to compensate.
Edit:
For the explain, I simplified the query and it still has the same issue (id = entry_id):
Slow Query (1.5122389793):
SELECT * FROM table_a a, table_b b
WHERE a.id = b.id AND b.special_filter = 'no_filter'
ORDER BY a.id DESC LIMIT 1
Slow Query explain:
1 SIMPLE table_b ref PRIMARY,entry_id,special_filter special_filter 26 const 130733 Using where; Using temporary; Using filesort
1 SIMPLE table_a eq_ref PRIMARY PRIMARY 4 db_name.table_b.entry_id 1 Using index
Fast Query (0.0006549358):
SELECT * FROM table_a a, table_b b
WHERE a.id = b.id AND b.special_filter = 'no_filter'
ORDER BY b.id DESC LIMIT 1
Fast Query explain:
1 SIMPLE table_b ref PRIMARY,entry_id,special_filter special_filter 26 const 130733 Using where
1 SIMPLE table_a eq_ref PRIMARY PRIMARY 4 db_name.table_b.entry_id 1 Using index
I've tried to omit as much info from the table as possible for security reasons but if I'm grossly missing something I can add it back in
table_a:
Relevant Keys:
table_a 0 PRIMARY 1 entry_id A 321147 BTREE
Create table:
table_a CREATE TABLE table_a ( entry_id int(10) unsigned NOT NULL AUTO_INCREMENT) ENGINE=InnoDB AUTO_INCREMENT=356198 DEFAULT CHARSET=utf8 DELAY_KEY_WRITE=1
table_b:
Relevant Keys:
table_b 0 PRIMARY 1 entry_id A 261467 BTREE
table_b 1 entry_id 1 entry_id A 261467 BTREE
table_b 1 special_filter 1 special_filter A 14 8 BTREE
Create Table:
table_b CREATE TABLE table_b ( entry_id int(10) unsigned NOT NULL DEFAULT '0', special_filter text NOT NULL, ) ENGINE=InnoDB DEFAULT CHARSET=utf8 DELAY_KEY_WRITE=1
Both tables have ~ 350k rows

It seems to be a mysql optimizer issue where it chooses not to join on the primary key under some conditions. Closely related to or the same as this issue: https://dba.stackexchange.com/questions/53274/mysql-innodb-issue-not-using-indexes-correctly-percona-5-6
Explicitly writing the query to use a STRAIGHT_JOIN with the tables in a specific order solves the issue. Writing USE INDEX(PRIMARY) after the JOIN keyword is easier though and doesn't rely on the table order.

Related

Optimize an Update query using IN subquery on Self

I have a 80000 row database with a result number between 130000000 and 168000000, the results are paired using field pid. I need to change the status of the rows from 'G' to 'X' where the result pair has a difference of 4300000.
I have come up with the query below, which works but is very slow, can it be improved for speed?
UPDATE table1 SET status = 'X'
WHERE id IN (
SELECT id FROM (
SELECT a.id AS id FROM table1 a, table1 b
WHERE a.result = b.result + 4300000
AND a.pid = b.pid
AND a.result between 130000000 and 168000000
AND a.status = 'G'
) AS c
);
The indexes are:-
table1 0 PRIMARY 1 id A 80233 NULL NULL BTREE
table1 1 id 1 id A 80233 NULL NULL BTREE
table1 1 id 2 result A 80233 NULL NULL BTREE
table1 1 id 3 status A 80233 4 NULL YES BTREE
table1 1 id 4 name A 80233 32 NULL BTREE
table1 1 id 5 pid A 80233 16 NULL BTREE
Using a subquery inside the IN(..) clause is generally inefficient in MySQL. Instead, you can rewrite the Update query utilizing UPDATE .. JOIN syntax and utilize "self-join" as well:
UPDATE table1 AS a
JOIN table1 AS b
ON b.pid = a.pid
AND b.result = a.result - 4300000
SET a.status = 'X'
WHERE a.result between 130000000 and 168000000
AND a.status = 'G'
For good performance (and if I understand NLJ (Nested-Loop-Join) correctly), you would need two indexes: (status,result) and (pid).
First (composite) index will be used to consider rows from the table alias a. Since we have range condition on result, it will be better to define status first, otherwise MySQL would simply stop at the result field in the index (if defined first), due to range condition.
Second index will be used for lookups in the Joined table alias b, using NLJ algorithm.

MYSQL Insert query fails with total number of locks exceeds the lock table size

The table table1 contains 1500000 rows and contains 80 fields and i want to remove the duplicates based on field1 and field2 and ID field is unique so i used the maximum option.
Option1: Insert Option
insert into table2_unique
select * from table1 a
where a.id = ( select max(b.id) from table1 b
where a.field1 = b.field1
and a.field2 = b.field2 );
But the query fails because of the below error.
Error Code: 1206. The total number of locks exceeds the lock table size
Explain Statement:
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 INSERT table2 NULL ALL NULL NULL NULL NULL NULL NULL NULL
1 PRIMARY a NULL ALL NULL NULL NULL NULL 1387764 100 Using where
2 DEPENDENT SUBQUERY b NULL ref field1x,field2x field1x 39 a.field1 537 10 Using where
Option2 DELETE Statement:
DELETE n1 FROM table1 n1, table1 n2 WHERE n1.id > n2.id AND n1.field1 = n2.field1 and n1.field2 and n2.field2
When i execute then Deadlock occured.
I am not able to increase the buffer pool size, please let me know shall i write the query in different way.
Increased the INNODB_BUFFER_POOL_SIZE in my.ini file and the query ran in 27 mins for that specified volume
I'm not sure how it will impact the locks but using dependant sub-queries (i.e. pushed predicates) in mysql has never worked very well in my experience. I would have written the first query as:
insert into table2_unique (id, col1, col2, ...col79)
select a.id, a.col1, a.col2, ...a.col79
from table1 a
Inner join (
Select max(b.id) as id
From table1 b
Group by b.col1, b.col2
) As dedup
On a.id=dedup.id;
Trying to update a table using a join is always a bit dodgy. When its a self-join, then its not surprising it fails. Using a temporary table and splitting the operation into 2 steps avoids this.

Nested Where clause can access global variable but inner join can't

The below example is just a demonstration of the problem i am facing.
I have two tables A & B on which i want to query. There schemas are as below
create table A
(
id int,
B_id int,
D_id int,
created date
);
create table B
(
id int,
C_id int
);
Table A can have multiple rows for a given B_id.
I insert test data as below :
insert into A(id, B_id, D_id, created) values(2, 1, 0, now());
insert into A(id, B_id, D_id, created) values(3, 1, 0, now());
Now, I want to fetch the newest(whose created is having the highest value) rows in A which have B_id = 1
Now, the problem :
I tried below double inner join which did not work
select * from A
inner join B on A.B_id = B.id
inner join ( select * from A where A.B_id = B.id order by created desc limit 1) as A1 on A.id = A1.id
and A.B_id = 1;
this fails with the error "Unknown column 'B.id' in 'where clause'"
However if i replace the second inner join with a where clause as below, it works :
select * from A
inner join B on A.B_id = B.id
and A.id = ( select id from A where A.B_id = B.id order by created desc limit 1)
and A.B_id = 1;
Why can where clause access the B.id in global scope but inner join can't??
This is a good point. It took me a while to get my head around. First of all I personally would not call it "global" scope. But I've got your point :)
Here is how I understand it. Please correct me if I'm wrong.
First query: I changed your query B.id to 1, so I can run the query correctly. I changed it to the following:
select * from A
inner join B on A.B_id = B.id
inner join ( select * from A where A.B_id = 1 order by created desc limit 1) as A1 on A.id = A1.id
and A.B_id = 1;
After I changed it, I did explain select ... to see how it work. Here is what I've got.
id select_type table type possible_keys key key_len ref rows Extra
--------------------------------------------------------------------------------------------------------
1 PRIMARY <derived2> system null null null null 1 null
1 PRIMARY B ALL null null null null 1 Using where
1 PRIMARY A ALL null null null null 4 Using where; Using join buffer (Block Nested Loop)
2 DERIVED A ALL null null null null 4 Using where; Using filesort
It seems that your subquery select * from A where A.B_id = 1 order by created desc limit 1 is executed during or before INNER JOIN. B.id is not yet available since INNER JOIN hasn't been done yet.
Second query: I did the same explain select ...
id select_type table type possible_keys key key_len ref rows Extra
----------------------------------------------------------------------------------------------------
1 PRIMARY B ALL null null null null 1 Using where
1 PRIMARY A ALL null null null null 4 Using where; Using join buffer (Block Nested Loop)
2 DEPENDENT SUBQUERY A ALL null null null null 4 Using where; Using filesort
As you can see, your subquery is executed after INNER JOIN. Therefore B.id is available.

Optimize Subquery in Join

I have the following query:
SELECT *
FROM s
JOIN b ON s.borrowerId = b.id
JOIN (
SELECT MIN(id) AS id
FROM tbl
WHERE dealId IS NULL
GROUP BY borrowerId, created
) s2 ON s.id = s2.id
Is there a simple way to optimize this so that I can do the JOIN directly and utilize indexes?
UPDATE
The created field is part of the GROUP BY statement because due to the limitations of our version of MySQL and the ORM being used it is possible to have multiple records with the same created timestamp value. As a result I need to find the first record for each combination of borrowerId and created.
Typically I might attempt something like this:
SELECT *
FROM s
INNER JOIN b ON s.borrowerId = b.id
LEFT OUTER JOIN s2
ON s.borrowerId = s2.borrowerId
AND s.created = s2.created
AND s.id <> s2.id
AND s.id < s2.id
WHERE s2.id IS NULL
AND s.dealId IS NULL;
But I'm not sure if that works 100% the way I want.
EXPLAIN from MySQL outputs the following:
1 PRIMARY b ALL NULL NULL NULL NULL 129690
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 317751 Using join buffer
1 PRIMARY s eq_ref PRIMARY,borrowerId_2,borrowerId PRIMARY 4 s2.id 1 Using where
2 DERIVED statuses ref dealId dealId 5 183987 Using where; Using temporary; Using filesort
As you can see, it has to query a massive number of records to build the subquery data set and when joining to the derived subquery, no indexes are found and so no indexes are used.
The first query needs this composite index:
INDEX(borrowerId, created, id)
Note that MySQL rarely uses two indexes for one SELECT, but a composite index is often very handy.
The second query seems grossly inefficient.
Please provide SHOW CREATE TABLE for each table.

MySQL Slow double join

I have a Link table with from_uid and to_uid (both indexed) and I want to filter out certain ids. So I do:
SELECT l.uid
FROM Link l
JOIN filter_ids t1 ON l.from_uid = t1.id
JOIN filter_ids t2 ON l.to_uid = t2.id
Now for some reason this is unexpectedly slow :( whereas each individual join is very fast. Can it not use the index right?
EXPLAIN tells me:
id select table type possible_keys key key_len ref rows Extra
1 SIMPLE t1 index Null PRIMARY 34 Null 12205 Using index
1 SIMPLE l ref from_uid,to_uid from_uid 96 func 6 Using where
1 SIMPLE t2 index Null PRIMARY 34 Null 12205 Using where; Using index; Using join buffer
No idea if it'll help but try:
select l.uid
from Link l
where l.from_uid in (select id from filter_ids)
and l.to_uid in (select id from filter_ids)
Maybe it'll make better work with indexes.
The EXPLAIN tells you that the JOIN actually starts from the t1 table. That is you need to add a new index on Link (or better extend the current from_uid index):
(from_uid, to_uid, uid)
or if uid is the primary key, just:
(from_uid, to_uid)
UPD
What you are describing is strange. You can try running just:
SELECT STRAIGHT_JOIN l.uid
FROM Link l
JOIN filter_ids t1 ON l.from_uid = t1.id
JOIN filter_ids t2 ON l.to_uid = t2.id