I have the following query:
SELECT *
FROM s
JOIN b ON s.borrowerId = b.id
JOIN (
SELECT MIN(id) AS id
FROM tbl
WHERE dealId IS NULL
GROUP BY borrowerId, created
) s2 ON s.id = s2.id
Is there a simple way to optimize this so that I can do the JOIN directly and utilize indexes?
UPDATE
The created field is part of the GROUP BY statement because due to the limitations of our version of MySQL and the ORM being used it is possible to have multiple records with the same created timestamp value. As a result I need to find the first record for each combination of borrowerId and created.
Typically I might attempt something like this:
SELECT *
FROM s
INNER JOIN b ON s.borrowerId = b.id
LEFT OUTER JOIN s2
ON s.borrowerId = s2.borrowerId
AND s.created = s2.created
AND s.id <> s2.id
AND s.id < s2.id
WHERE s2.id IS NULL
AND s.dealId IS NULL;
But I'm not sure if that works 100% the way I want.
EXPLAIN from MySQL outputs the following:
1 PRIMARY b ALL NULL NULL NULL NULL 129690
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 317751 Using join buffer
1 PRIMARY s eq_ref PRIMARY,borrowerId_2,borrowerId PRIMARY 4 s2.id 1 Using where
2 DERIVED statuses ref dealId dealId 5 183987 Using where; Using temporary; Using filesort
As you can see, it has to query a massive number of records to build the subquery data set and when joining to the derived subquery, no indexes are found and so no indexes are used.
The first query needs this composite index:
INDEX(borrowerId, created, id)
Note that MySQL rarely uses two indexes for one SELECT, but a composite index is often very handy.
The second query seems grossly inefficient.
Please provide SHOW CREATE TABLE for each table.
Related
I have a 80000 row database with a result number between 130000000 and 168000000, the results are paired using field pid. I need to change the status of the rows from 'G' to 'X' where the result pair has a difference of 4300000.
I have come up with the query below, which works but is very slow, can it be improved for speed?
UPDATE table1 SET status = 'X'
WHERE id IN (
SELECT id FROM (
SELECT a.id AS id FROM table1 a, table1 b
WHERE a.result = b.result + 4300000
AND a.pid = b.pid
AND a.result between 130000000 and 168000000
AND a.status = 'G'
) AS c
);
The indexes are:-
table1 0 PRIMARY 1 id A 80233 NULL NULL BTREE
table1 1 id 1 id A 80233 NULL NULL BTREE
table1 1 id 2 result A 80233 NULL NULL BTREE
table1 1 id 3 status A 80233 4 NULL YES BTREE
table1 1 id 4 name A 80233 32 NULL BTREE
table1 1 id 5 pid A 80233 16 NULL BTREE
Using a subquery inside the IN(..) clause is generally inefficient in MySQL. Instead, you can rewrite the Update query utilizing UPDATE .. JOIN syntax and utilize "self-join" as well:
UPDATE table1 AS a
JOIN table1 AS b
ON b.pid = a.pid
AND b.result = a.result - 4300000
SET a.status = 'X'
WHERE a.result between 130000000 and 168000000
AND a.status = 'G'
For good performance (and if I understand NLJ (Nested-Loop-Join) correctly), you would need two indexes: (status,result) and (pid).
First (composite) index will be used to consider rows from the table alias a. Since we have range condition on result, it will be better to define status first, otherwise MySQL would simply stop at the result field in the index (if defined first), due to range condition.
Second index will be used for lookups in the Joined table alias b, using NLJ algorithm.
In MySQL, I have a simple join between 2 tables. Something like
select a.id, SUM(b.qty) from a inner join b on a.id=b.id
where a.id=12345
group by a.id
It runs normal as a query. But when I keep the query
select a.id, SUM(b.qty) from a inner join b on a.id=b.id
group by a.id
in a view called view_ab, the view takes enormous amount of time when i run the following query on the view.
select * from view_ab where id = 12345
Both these tables are large tables. Unable to figure out the reason for such a drop in performance. Please help resolve this performance issue
EDIT:
This is the view SQL
CREATE VIEW view_ab AS SELECT
r.drid AS drid,
SUM(s.return_qty) AS return_qty
FROM tbl_deliveryroute r INNER JOIN tbl_deliveryroute_sku s ON r.drid =
s.drid GROUP BY r.drid;
This is the query
SELECT
r.drid AS drid,
SUM(s.return_qty) AS return_qty
FROM tbl_deliveryroute r INNER JOIN tbl_deliveryroute_sku s ON r.drid =
s.drid WHERE r.drid=12718651
GROUP BY r.drid;
This is the query on the VIEW
SELECT * FROM view_ab WHERE drid=12718651;
Execution plan of the view
EXPLAIN EXTENDED SELECT * FROM view_ab WHERE drid=12718651;
id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
1
PRIMARY
(NULL)
ref
4
const
10
100.00
(NULL)
2
DERIVED
s
(NULL)
ALL
idx_tbl_deliverroute_sku_drid
(NULL)
(NULL)
(NULL)
15060913
100.00
USING TEMPORARY; USING filesort
2
DERIVED
r
(NULL)
eq_ref
PRIMARY,FK_tbl_deliveryroute_1
PRIMARY
4
humdemotest.s.drid
1
100.00
USING INDEX
EXPLAIN EXTENDED SELECT
r.drid AS drid,
SUM(s.return_qty) AS return_qty
FROM tbl_deliveryroute r INNER JOIN tbl_deliveryroute_sku s ON r.drid =
s.drid WHERE r.drid=12718651
GROUP BY r.drid;
id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
1
SIMPLE
r
(NULL)
const
PRIMARY
PRIMARY
4
const
1
100.00
USING INDEX
1
SIMPLE
s
(NULL)
ref
idx_tbl_deliverroute_sku_drid
idx_tbl_deliverroute_sku_drid
4
const
22
100.00
(NULL)
From what I am seeing, you don't even need a join since you are dealing with a join on the same key column from A-B, the key already exists in table B, just query group by that. Also, I would have an index on your DeliveryRoute_SKU on its route ID column
SELECT
s.drid,
sum( s.return_qty ) Return_Qty
from
tbl_DeliveryRoute_Sku s
where
s.drID = 12718651
group by
s.drID;
Since you are only doing the key and the sum, you don't even NEED the other table. Now if you needed other columns from the first table OTHER THAN the key, then yes, you would need the join. You could even simplify a step further since you are only querying a single key ID
SELECT
sum( s.return_qty ) Return_Qty
from
tbl_DeliveryRoute_Sku s
where
s.drID = 12718651;
The reason the view is slow is simple. You are executing:
SELECT *
FROM view_ab
WHERE drid = 12718651;
What you want to execute is:
select a.id, SUM(b.qty)
from a inner join
b
on a.id = b.id
where a.id = 12345
group by a.id;
What is actually being executed is:
select ab.*
from (select a.id, SUM(b.qty)
from a inner join
b
on a.id = b.id
group by a.id
) ab
where ab.id = 12345;
That is, the entire aggregation is performed first. Then the where is applied. What you want is for the predicate to be pushed up (MySQL calls this merging). You can review the documentation on this subject.
One solution would seem to be rephrasing the query as a correlated subquery:
select a.id,
(select sum(b.qty) from b where b.id = a.id) as qty
from a
where a.id = 12345;
Alas, subqueries in the select have the same effect, so this doesn't work.
I don't know of a solution using a view. You can avoid using views for this. The ultimate solution would be to implement a trigger to store the summarized results in another table -- effectively materializing the view.
Our server company advised we switch to Percona when setting up our new DB servers. So we're currently on Percona Server (GPL), Release 82.0 (version 5.6.36-82.0) , Revision 58e846a and there's one behavior I'm trying to wrap my head around that we definitely weren't experiencing before with MySql so I thought I'd reach out:
This is a query we perform fairly regularly to pull an article from our db
SELECT * FROM table_a a, table_b b
WHERE a.id = b.id AND a.status_field = 'open' AND b.filter_field = 'no_filter' AND b.view_field = 'article'
ORDER BY a.unixtimestamp DESC LIMIT 1
This used to complete very quickly but under Percona, the combination of the where conditions from table b and ordering from table a makes the whole query take ~3s. I don't fully understand this behaviour.
If I alter it to:
SELECT * FROM table_a a, table_b b
WHERE a.id = b.id AND a.status_field = 'open' AND b.filter_field = 'no_filter' AND b.view_field = 'article'
ORDER BY b.unixtimestamp DESC LIMIT 1
Then it completes very quickly (< 0.05s)
Is this sort of an expected behavior with Percona?
I just wanted to know before changing any db structure to compensate.
Edit:
For the explain, I simplified the query and it still has the same issue (id = entry_id):
Slow Query (1.5122389793):
SELECT * FROM table_a a, table_b b
WHERE a.id = b.id AND b.special_filter = 'no_filter'
ORDER BY a.id DESC LIMIT 1
Slow Query explain:
1 SIMPLE table_b ref PRIMARY,entry_id,special_filter special_filter 26 const 130733 Using where; Using temporary; Using filesort
1 SIMPLE table_a eq_ref PRIMARY PRIMARY 4 db_name.table_b.entry_id 1 Using index
Fast Query (0.0006549358):
SELECT * FROM table_a a, table_b b
WHERE a.id = b.id AND b.special_filter = 'no_filter'
ORDER BY b.id DESC LIMIT 1
Fast Query explain:
1 SIMPLE table_b ref PRIMARY,entry_id,special_filter special_filter 26 const 130733 Using where
1 SIMPLE table_a eq_ref PRIMARY PRIMARY 4 db_name.table_b.entry_id 1 Using index
I've tried to omit as much info from the table as possible for security reasons but if I'm grossly missing something I can add it back in
table_a:
Relevant Keys:
table_a 0 PRIMARY 1 entry_id A 321147 BTREE
Create table:
table_a CREATE TABLE table_a ( entry_id int(10) unsigned NOT NULL AUTO_INCREMENT) ENGINE=InnoDB AUTO_INCREMENT=356198 DEFAULT CHARSET=utf8 DELAY_KEY_WRITE=1
table_b:
Relevant Keys:
table_b 0 PRIMARY 1 entry_id A 261467 BTREE
table_b 1 entry_id 1 entry_id A 261467 BTREE
table_b 1 special_filter 1 special_filter A 14 8 BTREE
Create Table:
table_b CREATE TABLE table_b ( entry_id int(10) unsigned NOT NULL DEFAULT '0', special_filter text NOT NULL, ) ENGINE=InnoDB DEFAULT CHARSET=utf8 DELAY_KEY_WRITE=1
Both tables have ~ 350k rows
It seems to be a mysql optimizer issue where it chooses not to join on the primary key under some conditions. Closely related to or the same as this issue: https://dba.stackexchange.com/questions/53274/mysql-innodb-issue-not-using-indexes-correctly-percona-5-6
Explicitly writing the query to use a STRAIGHT_JOIN with the tables in a specific order solves the issue. Writing USE INDEX(PRIMARY) after the JOIN keyword is easier though and doesn't rely on the table order.
I've got a composite key table CUSTOMER_PRODUCT_XREF
__________________________________________________________________
|CUSTOMER_ID (PK NN VARCHAR(191)) | PRODUCT_ID(PK NN VARCHAR(191))|
-------------------------------------------------------------------
In my batch program I need to select 500 updated customers and also get the PRODUCT_ID's purchased by CUSTOMERs separated by comma and update our SOLR index. In my query I'm select 500 customers and doing a left join to CUSTOMER_PRODUCT_XREF
SELECT
customer.*, group_concat(xref.PRODUCT_ID separator ', ')
FROM
CUSTOMER customer
LEFT JOIN CUSTOMER_PRODUCT_XREF xref ON customer.CUSTOMER_ID=xref.CUSTOMER_ID
group by customer.CUSTOMER_ID
LIMIT 500;
EDIT: EXPLAIN QUERY
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE customer ALL PRIMARY NULL NULL NULL 74236 Using where; Using temporary; Using filesort
1 SIMPLE xref index NULL PRIMARY 1532 NULL 121627 Using where; Using index; Using join buffer (Block Nested Loop)
I got lost connection exception after 20 minutes running the above query.
I tried with the following (sub query) and it took 1.7 seconds to get result but still slow.
SELECT
customer.*, (SELECT group_concat(PRODUCT_ID separator ', ')
FROM CUSTOMER_PRODUCT_XREF xref
WHERE customer.CUSTOMER_ID=xref.CUSTOMER_ID
GROUP BY customer.CUSTOMER_ID)
FROM
CUSTOMER customer
LIMIT 500;
EDIT: EXPLAIN QUERY produces
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY customer ALL NULL NULL NULL NULL 74236 NULL
2 DEPENDENT SUBQUERY xref index NULL PRIMARY 1532 NULL 121627 Using where; Using index; Using temporary; Using filesort
Question
CUSTOMER_PRODUCT_XREF already has both columns set as PRIMARY_KEY and NOT_NULL but why is my query still very slow ? I thought having Primary Key on a column was enough to build an index for it. Do I need further indexing ?
DATABASE INFO:
All the ID's in my database are VARCHAR(191) because the id's can contain alphabets.
I'm using utf8mb4_unicode_ci character encoding
I'm using SET group_concat_max_len := ##max_allowed_packet to get maximum number of product_ids for each customer. Prefer using group_concat in one main query so that I don't have to do multiple separate queries to get products for each customer.
Your original version of the query is doing the join first and then sorting all the resulting data -- which is probably pretty big given how large the fields are.
You can "fix" that version by selecting 500 hundred customers first and then doing the join:
SELECT c.*, group_concat(xref.PRODUCT_ID separator ', ')
FROM (select c.*
from CUSTOMER customer c
order by c.customer_id
limit 500
) c LEFT JOIN
CUSTOMER_PRODUCT_XREF xref
ON c.CUSTOMER_ID=xref.CUSTOMER_ID
group by c.CUSTOMER_ID ;
An alternative that might or might not have a big impact would be to doing the aggregation by customer in a subquery and join that, as in:
SELECT c.*, xref.products
FROM (select c.*
from CUSTOMER customer c
order by c.customer_id
limit 500
) c LEFT JOIN
(select customer_id, group_concat(xref.PRODUCT_ID separator ', ') as products
from CUSTOMER_PRODUCT_XREF xref
) xref
ON c.CUSTOMER_ID=xref.CUSTOMER_ID;
What you have discovered is that the MySQL optimizer does not recognize this situation (where the limit has a big impact on performance). Some other database engines do a better job of optimization in this case.
Alright the speed of the queries in my question shot up when I created an index just on the CUSTOMER_ID in CUSTOMER_PRODUCT_XREF table.
So I've got two indexes now
PRIMARY_KEY_INDEX on PRODUCT_ID and CUSTOMER_ID
CUSTOMER_ID_INDEX on CUSTOMER_ID
I have a Link table with from_uid and to_uid (both indexed) and I want to filter out certain ids. So I do:
SELECT l.uid
FROM Link l
JOIN filter_ids t1 ON l.from_uid = t1.id
JOIN filter_ids t2 ON l.to_uid = t2.id
Now for some reason this is unexpectedly slow :( whereas each individual join is very fast. Can it not use the index right?
EXPLAIN tells me:
id select table type possible_keys key key_len ref rows Extra
1 SIMPLE t1 index Null PRIMARY 34 Null 12205 Using index
1 SIMPLE l ref from_uid,to_uid from_uid 96 func 6 Using where
1 SIMPLE t2 index Null PRIMARY 34 Null 12205 Using where; Using index; Using join buffer
No idea if it'll help but try:
select l.uid
from Link l
where l.from_uid in (select id from filter_ids)
and l.to_uid in (select id from filter_ids)
Maybe it'll make better work with indexes.
The EXPLAIN tells you that the JOIN actually starts from the t1 table. That is you need to add a new index on Link (or better extend the current from_uid index):
(from_uid, to_uid, uid)
or if uid is the primary key, just:
(from_uid, to_uid)
UPD
What you are describing is strange. You can try running just:
SELECT STRAIGHT_JOIN l.uid
FROM Link l
JOIN filter_ids t1 ON l.from_uid = t1.id
JOIN filter_ids t2 ON l.to_uid = t2.id