Improving query performance by example - mysql

I'm trying to think out a way to improve a query the consumed schema is like this:
CREATE TABLE `orders` (
`id` int PRIMARY KEY NOT NULL AUTO_INCREMENT,
`store_id` INTEGER NOT NULL,
`billing_profile_id` INTEGER NOT NULL,
`billing_address_id` INTEGER NULL,
`total` DECIMAL(8, 2) NOT NULL
);
CREATE TABLE `billing_profiles` (
`id` int PRIMARY KEY NOT NULL AUTO_INCREMENT,
`name` TEXT NOT NULL
);
CREATE TABLE `billing_addresses` (
`id` int PRIMARY KEY NOT NULL AUTO_INCREMENT,
`address` TEXT NOT NULL
);
CREATE TABLE `stores` (
`id` int PRIMARY KEY NOT NULL AUTO_INCREMENT,
`name` TEXT NOT NULL
);
The query I'm executing:
SELECT bp.name,
ba.address,
s.name,
Sum(o.total) AS total
FROM billing_profiles bp,
stores s,
orders o
LEFT JOIN billing_addresses ba
ON o.billing_address_id = ba.id
WHERE o.billing_profile_id = bp.id
AND s.id = o.store_id
GROUP BY bp.name,
ba.address,
s.name;
And here is the EXPLAIN:
+----+-------------+-------+------------+--------+---------------+---------+---------+------------------------------+-------+----------+--------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+---------------+---------+---------+------------------------------+-------+----------+--------------------------------------------+
| 1 | SIMPLE | bp | NULL | ALL | PRIMARY | NULL | NULL | NULL |155000 | 100.00 | Using temporary |
| 1 | SIMPLE | o | NULL | ALL | NULL | NULL | NULL | NULL |220000 | 33.33 | Using where; Using join buffer (hash join) |
| 1 | SIMPLE | ba | NULL | eq_ref | PRIMARY | PRIMARY | 4 | factory.o.billing_address_id | 1 | 100.00 | NULL |
| 1 | SIMPLE | s | NULL | eq_ref | PRIMARY | PRIMARY | 4 | factory.o.store_id | 1 | 100.00 | NULL |
+----+-------------+-------+------------+--------+---------------+---------+---------+------------------------------+------+----------+--------------------------------------------+
The problem I'm facing is that this query takes 30+ secs to excute, we have over 200000 orders, and 150000+ billing_profiles/billing_addresses.
What should I do regarding index/constraints so that this query becomes faster to execute?
Edit: after some suggestions in the comments I edited the query to:
SELECT bp.name,
ba.address,
s.name,
Sum(o.total) AS total
FROM orders o
INNER JOIN billing_profiles bp
ON o.billing_profile_id = bp.id
INNER JOIN stores s
ON s.id = o.store_id
LEFT JOIN billing_addresses ba
ON o.billing_address_id = ba.id
GROUP BY bp.name,
ba.address,
s.name;
But still takes too much time.

One thing I have used in the past and has helped in many instances with MySQL is to use the STRAIGHT_JOIN clause which tells the engine to do the query in the order as listed.
I have cleaned-up your query to proper JOIN context. Since the ORDER table is the primary basis of data, and the other 3 are lookup references to their respective IDs, I put the ORDER table first.
SELECT STRAIGHT_JOIN
bp.name,
ba.address,
s.name,
Sum(o.total) AS total
FROM
orders o
JOIN stores s
ON o.store_id = s.id
JOIN billing_profiles bp
on o.billing_profile_id = bp.id
LEFT JOIN billing_addresses ba
ON o.billing_address_id = ba.id
GROUP BY
bp.name,
ba.address,
s.name
Now, your data tables dont appear that large, but if you are going to be grouping by 3 of the columns in the order table, I would have an index on the underlying basis of them, which are the "ID" keys linking to the other tables. Adding the total to help for a covering index / aggregate query, I would index on
( store_id, billing_profile_id, billing_address_id, total )
I'm sure that in reality, you have many other columns associated with an order and just showing the context for this query. Then, I would change to a pre-query so the aggregation is all done once for the orders table by their ID keys, THEN the result is joined to the lookup tables and you just need to apply an ORDER BY clause for your final output. Something like..
SELECT
bp.name,
ba.address,
s.name,
o.total
FROM
( select
store_id,
billing_profile_id,
billing_address_id,
sum( total ) total
from
orders
group by
store_id,
billing_profile_id,
billing_address_id ) o
JOIN stores s
ON o.store_id = s.id
JOIN billing_profiles bp
on o.billing_profile_id = bp.id
LEFT JOIN billing_addresses ba
ON o.billing_address_id = ba.id
ORDER BY
bp.name,
ba.address,
s.name

Add this index to o, being sure to start with billing_profile_id:
INDEX(billing_profile_id, store_id, billing_address_id, total)
Discussion of the Explain:
The Optimizer saw that it needed to do a full scan of some table.
bp was smaller than o, so it picked bp as the "first" table.
Then it reached into the next table repeatedly.
It did not see a suitable index (one starting with billing_profile_id) and decided to do "Using join buffer (hash join)", which involves loading the entire table into a hash in RAM.
"Using temporary", though mentioned on the "first" table, really does not show up until just before the GROUP BY. (The GROUP BY references multiple tables, so there is no way to optimize it.)
Potential bug Please check the results of Sum(o.total) AS total. It is performed after all the JOINing and before the GROUP BY, so it may be inflated. Notice how DRapp's formulation does the SUM before the JOINs.

Related

MySQL slow query with SELECT/ORDER BY on one table with WHERE on another, LIMIT results

I'm trying to query the top N rows from a couple of tables. The WHERE clause refers to a list of columns in one table, whereas the ORDER BY clause refers to columns in the other. It looks like MySQL is choosing the table involved in my WHERE clause for its first pass of filtering (which doesn't filter much) whereas it's the ORDER BY that affects the rows returned once I apply the LIMIT. If I force MySQL to use a covering index for the ORDER BY, the query returns immediately with the desired rows. Unfortunately I can't pass index hints to MySQL through JPA, and rewriting everything using native queries would be a substantial amount of work. Here's an illustrative example:
CREATE TABLE person (
id INTEGER PRIMARY KEY AUTO_INCREMENT,
first_name VARCHAR(255),
last_name VARCHAR(255)
) engine=InnoDB;
CREATE TABLE membership (
id INTEGER PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(255) NOT NULL
) engine=InnoDB;
CREATE TABLE employee (
id INTEGER PRIMARY KEY AUTO_INCREMENT,
membership_id INTEGER NOT NULL,
type VARCHAR(15),
enabled BIT NOT NULL,
person_id INTEGER NOT NULL REFERENCES person ( id ),
CONSTRAINT fk_employee_membership_id FOREIGN KEY ( membership_id ) REFERENCES membership ( id ),
CONSTRAINT fk_employee_person_id FOREIGN KEY ( person_id ) REFERENCES person ( id )
) engine=InnoDB;
CREATE UNIQUE INDEX uk_employee_person_id ON employee ( person_id );
CREATE INDEX idx_person_first_name_last_name ON person ( first_name, last_name );
I wrote a script to output a bunch of INSERT statements to populate the tables with 200'000 rows:
#!/bin/bash
#
echo "INSERT INTO membership ( id, name ) VALUES ( 1, 'Default Membership' );"
for seq in {1..200000}; do
echo "INSERT INTO person ( id, first_name, last_name ) VALUES ( $seq, 'firstName$seq', 'lastName$seq' );"
echo "INSERT INTO employee ( id, membership_id, type, enabled, person_id ) VALUES ( $seq, 1, 'INDIVIDUAL', 1, $seq );"
done
My first attempt:
SELECT e.*
FROM person p INNER JOIN employee e ON p.id = e.person_id
WHERE e.membership_id = 1 AND type = 'INDIVIDUAL' AND enabled = 1
ORDER BY p.first_name ASC, p.last_name ASC, p.id ASC
LIMIT 100;
-- 100 rows in set (1.43 sec)
and the EXPLAIN:
+----+-------------+-------+------------+--------+-------------------------------------------------+---------------------------+---------+--------------------+-------+----------+----------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+-------------------------------------------------+---------------------------+---------+--------------------+-------+----------+----------------------------------------------+
| 1 | SIMPLE | e | NULL | ref | uk_employee_person_id,fk_employee_membership_id | fk_employee_membership_id | 4 | const | 99814 | 5.00 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | p | NULL | eq_ref | PRIMARY | PRIMARY | 4 | qsuite.e.person_id | 1 | 100.00 | NULL |
+----+-------------+-------+------------+--------+-------------------------------------------------+---------------------------+---------+--------------------+-------+----------+----------------------------------------------+
Now I force MySQL to use the ( first_name, last_name ) index on person:
SELECT e.*
FROM person p USE INDEX ( idx_person_first_name_last_name )
INNER JOIN employee e ON p.id = e.person_id
WHERE e.membership_id = 1 AND type = 'INDIVIDUAL' AND enabled = 1
ORDER BY p.first_name ASC, p.last_name ASC, p.id ASC
LIMIT 100;
-- 100 rows in set (0.00 sec)
It returns instantly. And the explain:
+----+-------------+-------+------------+--------+-------------------------------------------------+---------------------------------+---------+-------------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+-------------------------------------------------+---------------------------------+---------+-------------+------+----------+-------------+
| 1 | SIMPLE | p | NULL | index | NULL | idx_person_first_name_last_name | 2046 | NULL | 100 | 100.00 | Using index |
| 1 | SIMPLE | e | NULL | eq_ref | uk_employee_person_id,fk_employee_membership_id | uk_employee_person_id | 4 | qsuite.p.id | 1 | 5.00 | Using where |
+----+-------------+-------+------------+--------+-------------------------------------------------+---------------------------------+---------+-------------+------+----------+-------------+
Note the WHERE clause in the example doesn't end up actually filtering any rows. This is largely representative of the data I have and the bulk of queries against this table. Is there a way to coax MySQL into using that index or some not-quite-destructive way of restructuring this to improve the performance?
Thanks.
Edit: I dropped the original covering index and added one to each of the tables:
CREATE INDEX idx_person_id_first_name_last_name ON person ( id, first_name, last_name );
CREATE INDEX idx_employee_etc ON employee ( membership_id, type, enabled, person_id );
It seems to speed it up a little, but MySQL still insists on running through the employee table first:
+----+-------------+-------+------------+--------+--------------------------------------------+------------------+---------+--------------------+-------+----------+----------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+--------------------------------------------+------------------+---------+--------------------+-------+----------+----------------------------------------------+
| 1 | SIMPLE | e | NULL | ref | uk_employee_person_id,idx_employee_etc | idx_employee_etc | 68 | const,const,const | 97311 | 100.00 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | p | NULL | eq_ref | PRIMARY,idx_person_id_first_name_last_name | PRIMARY | 4 | qsuite.e.person_id | 1 | 100.00 | NULL |
+----+-------------+-------+------------+--------+--------------------------------------------+------------------+---------+--------------------+-------+----------+----------------------------------------------+
I would have your second index on the person table to be on (id, first_name, last_name) and get rid of the second index unless you will really be querying by a person's first name as the primary basis.
For the employee table, have an index on (membership_id, type, enabled, person_id)
Having the proper index on employee table will help get all qualifying records back. Having the person's name and ID info in the index prevents the engine from going to the raw data pages to extract the columns from for final ordering / limit
SELECT
e.*
FROM
employee e
INNER JOIN person p
ON e.person_id = p.id
WHERE
e.membership_id = 1
AND e.type = 'INDIVIDUAL'
AND e.enabled = 1
ORDER BY
p.first_name ASC,
p.last_name ASC,
p.id ASC
LIMIT
100;
Storing first and last names redundantly in the employee table is an option - But with drawbacks. You will have to manage the redundancy. To guarantee the consistency, you can make those columns part of the foreign key. ON UPDATE CASCADE will take you some work. But you will still need to rewrite your INSERT statements or use triggers. With first_name and last_name being part of the employee table, you would be able to create an optimal index for your query. The table would look the following way:
CREATE TABLE employee (
id INTEGER PRIMARY KEY AUTO_INCREMENT,
membership_id INTEGER NOT NULL,
type VARCHAR(15),
enabled BIT NOT NULL,
person_id INTEGER NOT NULL REFERENCES person ( id ),
CONSTRAINT fk_employee_membership_id FOREIGN KEY ( membership_id ) REFERENCES membership ( id ),
CONSTRAINT fk_employee_person FOREIGN KEY ( person_id, first_name, last_name )
REFERENCES person ( id, first_name, last_name ),
INDEX (membership_id, type, enabled, first_name, last_name, person_id)
) engine=InnoDB;
The query would change to:
SELECT e.*
FROM employee e
WHERE e.membership_id = 1 AND e.type = 'INDIVIDUAL' AND e.enabled = 1
ORDER BY e.first_name ASC, e.last_name ASC, e.person_id ASC
LIMIT 100;
However - I would avoid such changes if possible. There might be other ways to use an index for ORDER BY. I would first try to move the WHERE conditions into a correlated EXISTS subquery:
SELECT e.*
FROM person p INNER JOIN employee e ON p.id = e.person_id
WHERE EXISTS (
SELECT *
FROM employee e1
WHERE e1.person_id = p.id
AND e1.membership_id = 1
AND e1.type = 'INDIVIDUAL'
AND e1.enabled = 1
)
ORDER BY p.first_name ASC, p.last_name ASC, p.id ASC
LIMIT 100;
Now, to evaluate the subquery, the engine needs p.id, so it has to start reading the data from person table first (which you will see in the execution plan). And I guess it will be smart enough to read it from the index. Note that in InnoDB the primary key is always part of any secondary key. So the idx_person_first_name_last_name index is actually on (first_name, last_name, id).

How to optimize select query with case statements?

I have 3 tables over 1,000,000+ records. My select query is running for hours.
How to optimize it? I'm newbie.
I tried to add index for name, still it taking hours to load.
Like this,
ALTER TABLE table2 ADD INDEX(name);
and like this also,
CREATE INDEX INDEX1 table2(name);
SELECT MS.*, P.Counts FROM
(SELECT M.*,
TIMESTAMPDIFF(YEAR, M.date, CURDATE()) AS age,
CASE V.name
WHEN 'text' THEN M.name
WHEN V.name IS NULL THEN M.name
ELSE V.name
END col1
FROM table1 M
LEFT JOIN table2 V ON M.id=V.id) AS MS
LEFT JOIN
(select E.id, count(E.id) Counts
from table3 E
where E.field2 = 'value1'
group by E.id) AS P
ON MS.id=P.id;
Explain <above query>;
output:
+----+-------------+------------+------------+-------+---------------------------------------------+------------------+---------+------------------------+---------+----------+-----------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+-------+---------------------------------------------+------------------+---------+------------------------+---------+----------+-----------------------------------------------------------------+
| 1 | PRIMARY | M | NULL | ALL | NULL | NULL | NULL | NULL | 344763 | 100.00 | NULL |
| 1 | PRIMARY | <derived3> | NULL | ref | <auto_key0> | <auto_key0> | 8 | CP.M.id | 10 | 100.00 | NULL |
| 1 | PRIMARY | V | NULL | index | NULL | INDEX1 | 411 | NULL | 1411083 | 100.00 | Using where; Using index; Using join buffer (Block Nested Loop) |
| 3 | DERIVED | E | NULL | ref | PRIMARY,f2,f3 | f2| 43 | const | 966442 | 100.00 | Using index |
+----+-------------+------------+------------+-------+---------------------------------------------+------------------+---------+------------------------+---------+----------+-----------------------------------------------------------------+
I expect to get result in less than 1 min.
The query indented for clarity.
SELECT MS.*, P.Counts
FROM (
SELECT M.*,
TIMESTAMPDIFF(YEAR, M.date, CURDATE()) AS age,
CASE V.name
WHEN 'text' THEN M.name
WHEN V.name IS NULL THEN M.name
ELSE V.name
END col1
FROM table1 M
LEFT JOIN table2 V ON M.id=V.id
) AS MS
LEFT JOIN (
select E.id, count(E.id) Counts
from table3 E
where E.field2 = 'value1'
group by E.id
) AS P ON MS.id=P.id;
Your query has no filtering predicate, so it's essentially retrieving all the rows. That is a 1,000,000+ rows from table1. Then it's joining it with table2, and then with another table expression/derived table.
Why do you expect this query to be fast? A massive query like this one will normally run as a batch process at night. I assume this query is not for an online process, right?
Maybe you need to rethink the process. Do you really need to process millions of rows at once interactively? Will the user read a million rows in the web page?
Subqueries are not always well-optimized.
I think you can flatten it out something like:
SELECT M.*, V.*,
TIMESTAMPDIFF(YEAR, M.date, CURDATE()) AS age,
CASE V.name WHEN 'text' THEN M.name
WHEN V.name IS NULL THEN M.name
ELSE V.name END col1,
( SELECT COUNT(*) FROM table3 WHERE field2 = 'value1' AND id = x.id
) AS Counts
FROM table1 AS M
LEFT JOIN table2 AS V ON M.id = V.id
I may have some parts not quite right; see if you can make this formulation work.
For starters, you are returning the same result for 'col1' in case v.name is null or v.name != 'text'. That said, you can include that extra condition on you join with table2 and use IFNULL function.
Has you are filtering table3 by field2, you could probably create an index over table 3 that includes field2.
You should also check if you can include any additional filter for any of those tables, and if you do you can consider using a stored procedure to get the results.
Also, I don´t see why you need to the aggregate the first join into 'MS' you can easy do all the joins in one go like this:
SELECT
M.*,
TIMESTAMPDIFF(YEAR, M.date, CURDATE()) AS age,
IFNULL(V.name, M.name) as col1,
P.Counts
FROM table1 M
LEFT JOIN table2 V ON M.id=V.id AND V.name <> 'text'
LEFT JOIN
(SELECT
E.id,
COUNT(E.id) Counts
FROM table3 E
WHERE E.field2 = 'value1'
GROUP BY E.id) AS P ON M.id=P.id;
I'm also assuming that you do have clustered indexes for all id fields in all this three tables, but with no filter, if you are dealing with millions off records, this will always be an big heavy query. To say the least your are doing a table scan for table1.
I've included this additional information after you comment.
I've mentioned clustered index, but according to the official documentation about indexes here
When you define a PRIMARY KEY on your table, InnoDB uses it as the clustered index. So if you already have a primary key defined you don't need to do anything else.
Has the documentation also point's out you should define a primary key for each table that you create.
If you don't have a primary key. Here is the code snippet you requested.
ALTER TABLE table1 ADD CONSTRAINT pk_table1
PRIMARY KEY CLUSTERED (id);
ATTENTION: Keep in mind that creating a clustered index is a big operation, for tables like yours with tones of data.
This isn’t something you want to do without planning, on a production server. This operation will also take a long time and table will be locked during the process.

ORDER BY column from the joined table performance

How can I improve this? It takes around half a second and it's just a demo query.
The problem here is the ORDER BY, but I can't really do without it. I also need empty rows of the LEFT JOIN for missing records in something table.
SELECT c.name
FROM customers c
LEFT JOIN something s USING(customer_id)
ORDER BY s.test DESC LIMIT 25
DB schema:
CREATE TABLE customers (
customer_id int(11) NOT NULL AUTO_INCREMENT,
name text NOT NULL,
PRIMARY KEY (customer_id),
KEY namne (name(999))
) ENGINE=MyISAM AUTO_INCREMENT=100001 DEFAULT CHARSET=latin
CREATE TABLE something (
id int(11) NOT NULL AUTO_INCREMENT,
customer_id int(11) NOT NULL,
text longtext NOT NULL,
test varchar(5) NOT NULL,
PRIMARY KEY (id),
KEY customer_id (customer_id),
KEY text (text(999)),
KEY test (test),
KEY asdasd (customer_id,test)
) ENGINE=MyISAM AUTO_INCREMENT=12 DEFAULT CHARSET=latin1
EXPLAIN:
+------+-------------+-------+------+--------------------+--------+---------+--------------------+--------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+------+--------------------+--------+---------+--------------------+--------+---------------------------------+
| 1 | SIMPLE | c | ALL | NULL | NULL | NULL | NULL | 100000 | Using temporary; Using filesort |
| 1 | SIMPLE | s | ref | customer_id,asdasd | asdasd | 4 | test.c.customer_id | 2 | Using index |
+------+-------------+-------+------+--------------------+--------+---------+--------------------+--------+---------------------------------+
It doesn't look like the LEFT JOIN makes sence here. If you replace it by an INNER JOIN, the engine would be able to use the KEY test (test) for the ORDER BY clause. So all you need might be this:
SELECT c.name
FROM customers c
INNER JOIN something s USING(customer_id)
ORDER BY s.test DESC LIMIT 25
But to get the exactly same result as with the LEFT JOIN you can combine two fast queries with UNION ALL:
(
SELECT c.name
FROM customers c
LEFT JOIN something s USING(customer_id)
ORDER BY s.test DESC LIMIT 25
) UNION ALL (
SELECT c.name
FROM customers c
LEFT JOIN something s USING(customer_id)
WHERE s.customer_id IS NULL
LIMIT 25
)
LIMIT 25
Use InnoDB, not MyISAM.
Use some sensible limit in a VARCHAR(..) instead of TEXT.
Then get rid of "prefix indexing" if possible.
INDEX(a) is redundant when you have INDEX(a,b).
For more speed:
SELECT name
FROM (
( SELECT c.name, s.test
FROM customers c
JOIN something s USING(customer_id)
ORDER BY s.test DESC
LIMIT 25
)
UNION ALL
( SELECT c.name, NULL
FROM customers c
LEFT JOIN something s USING(customer_id)
WHERE s.test IS NULL
LIMIT 25
)
) AS x
ORDER BY test DESC
LIMIT 25
And have
INDEX(test, customer_id)

SQL improvement in MySQL

I have these tables in MySQL.
CREATE TABLE `tableA` (
`id_a` int(11) NOT NULL,
`itemCode` varchar(50) NOT NULL,
`qtyOrdered` decimal(15,4) DEFAULT NULL,
:
PRIMARY KEY (`id_a`),
KEY `INDEX_A1` (`itemCode`)
) ENGINE=InnoDB
CREATE TABLE `tableB` (
`id_b` int(11) NOT NULL AUTO_INCREMENT,
`qtyDelivered` decimal(15,4) NOT NULL,
`id_a` int(11) DEFAULT NULL,
`opType` int(11) NOT NULL, -- '0' delivered to customer, '1' returned from customer
:
PRIMARY KEY (`id_b`),
KEY `INDEX_B1` (`id_a`)
KEY `INDEX_B2` (`opType`)
) ENGINE=InnoDB
tableA shows how many quantity we received order from customer, tableB shows how many quantity we delivered to customer for each order.
I want to make a SQL which counts how many quantity remaining for delivery on each itemCode.
The SQL is as below. This SQL works, but slow.
SELECT T1.itemCode,
SUM(IFNULL(T1.qtyOrdered,'0')-IFNULL(T2.qtyDelivered,'0')+IFNULL(T3.qtyReturned,'0')) as qty
FROM tableA AS T1
LEFT JOIN (SELECT id_a,SUM(qtyDelivered) as qtyDelivered FROM tableB WHERE opType = '0' GROUP BY id_a)
AS T2 on T1.id_a = T2.id_a
LEFT JOIN (SELECT id_a,SUM(qtyDelivered) as qtyReturned FROM tableB WHERE opType = '1' GROUP BY id_a)
AS T3 on T1.id_a = T3.id_a
WHERE T1.itemCode = '?'
GROUP BY T1.itemCode
I tried explain on this SQL, and the result is as below.
+----+-------------+------------+------+----------------+----------+---------+-------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+----------------+----------+---------+-------+-------+----------------------------------------------+
| 1 | PRIMARY | T1 | ref | INDEX_A1 | INDEX_A1 | 152 | const | 1 | Using where |
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 21211 | |
| 1 | PRIMARY | <derived3> | ALL | NULL | NULL | NULL | NULL | 10 | |
| 3 | DERIVED | tableB | ref | INDEX_B2 | INDEX_B2 | 4 | | 96 | Using where; Using temporary; Using filesort |
| 2 | DERIVED | tableB | ref | INDEX_B2 | INDEX_B2 | 4 | | 55614 | Using where; Using temporary; Using filesort |
+----+-------------+-------------------+----------------+----------+---------+-------+-------+----------------------------------------------+
I want to improve my query. How can I do that?
First, your table B has int for opType, but you are comparing to string via '0' and '1'. Leave as numeric 0 and 1. To optimize your pre-aggregates, you should not have individual column indexes, but a composite, and in this case a covering index. INDEX table B ON (OpType, ID_A, QtyDelivered) as a single index. The OpType to optimize the WHERE, ID_A to optimize the group by, and QtyDelivered for the aggregate in the index without going to the raw data pages.
Since you are looking for the two types, you can roll them up into a single subquery testing for either in a single pass result. THEN, Join to your tableA results.
SELECT
T1.itemCode,
SUM( IFNULL(T1.qtyOrdered, 0 )
- IFNULL(T2.qtyDelivered, 0)
+ IFNULL(T2.qtyReturned, 0)) as qty
FROM
tableA AS T1
LEFT JOIN ( SELECT
id_a,
SUM( IF( opType=0,qtyDelivered, 0)) as qtyDelivered,
SUM( IF( opType=1,qtyDelivered, 0)) as qtyReturned
FROM
tableB
WHERE
opType IN ( 0, 1 )
GROUP BY
id_a) AS T2
on T1.id_a = T2.id_a
WHERE
T1.itemCode = '?'
GROUP BY
T1.itemCode
Now, depending on the size of your tables, you might be better doing a JOIN on your inner table to table A so you only get those of the item code you are expectin. If you have 50k items and you are only looking for items that qualify = 120 items, then your inner query is STILL qualifying based on the 50k. In that case would be overkill. In this case, I would suggest an index on table A by ( ItemCode, ID_A ) and adjust the inner query to
LEFT JOIN ( SELECT
b.id_a,
SUM( IF( b.opType = 0, b.qtyDelivered, 0)) as qtyDelivered,
SUM( IF( b.opType = 1, b.qtyDelivered, 0)) as qtyReturned
FROM
( select distinct id_a
from tableA
where itemCode = '?' ) pqA
JOIN tableB b
on PQA.id_A = b.id_a
AND b.opType IN ( 0, 1 )
GROUP BY
id_a) AS T2
My Query against your SQLFiddle

MySQL grouping query optimization

I have three tables: categories, articles, and article_events, with the following structure
categories: id, name (100,000 rows)
articles: id, category_id (6000 rows)
article_events: id, article_id, status_id (20,000 rows)
The highest article_events.id for each article row describes the current status of each article.
I'm returning a table of categories and how many articles are in them with a most-recent-event status_id of '1'.
What I have so far works, but is fairly slow (10 seconds) with the size of my tables. Wondering if there's a way to make this faster. All the tables have proper indexes as far as I know.
SELECT c.id,
c.name,
SUM(CASE WHEN e.status_id = 1 THEN 1 ELSE 0 END) article_count
FROM categories c
LEFT JOIN articles a ON a.category_id = c.id
LEFT JOIN (
SELECT article_id, MAX(id) event_id
FROM article_events
GROUP BY article_id
) most_recent ON most_recent.article_id = a.id
LEFT JOIN article_events e ON most_recent.event_id = e.id
GROUP BY c.id
Basically I have to join to the events table twice, since asking for the status_id along with the MAX(id) just returns the first status_id it finds, and not the one associated with the MAX(id) row.
Any way to make this better? or do I just have to live with 10 seconds? Thanks!
Edit:
Here's my EXPLAIN for the query:
ID | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
---------------------------------------------------------------------------------------------------------------------------
1 | PRIMARY | c | index | NULL | PRIMARY | 4 | NULL | 124044 | Using index; Using temporary; Using filesort
1 | PRIMARY | a | ref | category_id | category_id | 4 | c.id | 3 |
1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 6351 |
1 | PRIMARY | e | eq_ref | PRIMARY | PRIMARY | 4 | most_recent.event_id | 1 |
2 | DERIVED | article_events | ALL | NULL | NULL | NULL | NULL | 19743 | Using temporary; Using filesort
If you can eliminate subqueries with JOINs, it often performs better because derived tables can't use indexes. Here's your query without subqueries:
SELECT c.id,
c.name,
COUNT(a1.article_id) AS article_count
FROM categories c
LEFT JOIN articles a ON a.category_id = c.id
LEFT JOIN article_events ae1
ON ae1.article_id = a.id
LEFT JOIN article_events ae2
ON ae2.article_id = a.id
AND ae2.id > a1.id
WHERE ae2.id IS NULL
GROUP BY c.id
You'll want to experiment with the indexes and use EXPLAIN to test, but here's my guess (I'm assuming id fields are primary keys and you are using InnoDB):
categories: `name`
articles: `category_id`
article_events: (`article_id`, `id`)
Didn't try it, but I'm thinking this will save a bit of work for the database:
SELECT ae.article_id AS ref_article_id,
MAX(ae.id) event_id,
ae.status_id,
(select a.category_id from articles a where a.id = ref_article_id) AS cat_id,
(select c.name from categories c where c.id = cat_id) AS cat_name
FROM article_events
GROUP BY ae.article_id
Hope that helps
EDIT:
By the way... Keep in mind that joins have to go through each row, so you should start your selection from the small end and work your way up, if you can help it. In this case, the query has to run through 100,000 records, and join each one, then join those 100,000 again, and again, and again, even if values are null, it still has to go through those.
Hope this all helps...
I don't like that index on categories.id is used, as you're selecting the whole table.
Try running:
ANALYZE TABLE categories;
ANALYZE TABLE article_events;
and re-run the query.