AWS RDS Aurora - Update table with subquery is very slow - mysql

Currently, I have two tables Customers and Orders. The Customer table is more of a report table where I want to insert the first and last order of a customer in the store. The related field is the email column which has an index and the filtering column "store" also has an index. So when I run the following query:
update Customer a
set
a.first_order_id = (select min(c.order_id) from Orders c
where store = "somename"
and c.email = a.email);
The Customer table has 7k records and the Orders have 7million. The issue is that this query takes forever to finish. The explain query result is:
1 - select type: update, type: index, key: primary, rows 6992, filtered: 100
2 - select type: dependent subquery, type: ref, key: orders_email_store, rows: 3 million
So I am guessing this is doing a 6992 x 3millions operation. How can I improve the performance?
The only solution I found is to use a stored procedure with a loop, which makes the subquery much faster.
I also tried to do an inner join on the update but the condition to get the minimum order id is my problem.
Any suggestions? Maybe my SQL is just bad.

c needs a composite INDEX(store, email, order_id), in that order.
Also, it might be better to use a "multi-table Update" instead of a subquery.
If you need further discussion, please provide SHOW CREATE TABLE; we need to verify that engines, datatypes, etc are not causing trouble.
Separate indexes for each column is not the same as a composite index.

Related

how to improve mysql query speedy with indexes?

I must run this query with MySQL:
select requests.id, requests.id_temp, categories.id
from opadithree.requests inner join
opadi.request_detail_2
on substring(requests.id_sub_temp, 3) = request_detail_2.id inner join
opadithree.categories
on request_detail_2.theme = categories.cu_code
where categories.atc = false and id_sub_temp like "2_%";
However for some reason the query is too slow. The table requests has 15583 rows. The table request_detail_2 66469 rows and the table categories has 13452 rows.
The most problematic column id_sub_temp has data strings in the following formats: "2_number" or "3_number".
Do you know some trick to make the query faster?
Here are the indexes I'd try:
First, I need an index so your WHERE condition on id_sub_temp can find the rows needed efficiently. Then add the column id_temp so the result can select that column from the index instead of forcing it to read the row.
CREATE INDEX bk1 ON requests (id_sub_temp, id_temp);
Next I'd like the join to categories to filter by atc=false and then match the cu_code. I tried reversing the order of these columns so cu_code was first, but that resulted in an expensive index-scan instead of a lookup. Maybe that was only because I was testing with empty tables. Anyway, I don't think the column order is important in this case.
CREATE INDEX bk2 ON categories (atc, cu_code);
The join to request_detail_2 is currently by primary key, which is already pretty efficient.

mysql index column order for join

I have two table (requests, results)
requests:
email
results:
email, processed_at
I now want to get all results that have a request with the same email and that have not been processed:
SELECT * FROM results
INNER JOIN requests ON requests.email = results.email
AND results.processed_at IS NULL
I have an index on each individual column, but the query is very slow. So I assume I need a multi column index on results:
I am just not sure which order the columns have to be:
ALTER TABLE results
ADD INDEX results_email_processed_at (email,processed_at)
ALGORITHM=INPLACE LOCK=NONE;
or
ALTER TABLE results
ADD INDEX results_processed_at_email (processed_at,email)
ALGORITHM=INPLACE LOCK=NONE;
Either composite index will be equally beneficial.
However, if you are fetching 40% of the table, then the Optimizer may choose to ignore any index and simply scan the table.
Is that SELECT the actual query? If not, please show us the actual query; a number of seemingly minor changes could make a big difference in optimization options.
Please provide EXPLAIN SELECT ... so we can see what it thinks with the current index(es). And please provide SHOW CREATE TABLE in case there are datatype issues that are relevant.
Not withstanding any indexing issues, you explicitly asked about all requests that WERE NOT processed. You have an INNER JOIN which means I WANT FROM BOTH Sides, so your NULL check in the where would never qualify.
You need a LEFT JOIN to the results table.
As for index, since the join is on the email, I would just have the EMAIL as the primary component of the index. By having a covering index and including the processed_at column would be faster as it would not have to go to the raw data page to qualify the results, but have index specifically ordered as (email, processed_at) so the EMAIL is first qualifier, THEN when it was processed comes along for the ride to complete the query requirement fields.

Need some clarification on indexes (WHERE, JOIN)

We are facing some performance issues in some reports that work on millions of rows. I tried optimizing sql queries, but it only reduces the time of execution to half.
The next step is to analyse and modify or add some indexes, therefore i have some questions:
1- the sql queries contain a lot of joins: do i have to create an index for each foreignkey?
2- Imagine the request SELECT * FROM A LEFT JOIN B on a.b_id = b.id where a.attribute2 = 'someValue', and we have an index on the table A based on b_id and attribute2: does my request use this index for the where part ( i know if the two conditions were on the where clause the index will be used).
3- If an index is based on columns C1, C2 and C3, and I decided to add an index based on C2, do i need to remove the C2 from the first index?
Thanks for your time
You can use EXPLAIN query to see what MySQL will do when executing it. This helps a LOT when trying to figure out why its slow.
JOIN-ing happens one table at a time, and the order is determined by MySQL analyzing the query and trying to find the fastest order. You will see it in the EXPLAIN result.
Only one index can be used per JOIN and it has to be on the table being joined. In your example the index used will be the id (primary key) on table B. Creating an index on every FK will give MySQL more options for the query plan, which may help in some cases.
There is only a difference between WHERE and JOIN conditions when there are NULL (missing rows) for the joined table (there is no difference at all for INNER JOIN). For your example the index on b_id does nothing. If you change it to an INNER JOIN (e.g. by adding b.something = 42 in the where clause), then it might be used if MySQL determines that it should do the query in reverse (first b, then a).
No.. It is 100% OK to have a column in multiple indexes. If you have an index on (A,B,C) and you add another one on (A) that will be redundant and pointless (because it is a prefix of another index). An index on B is perfectly fine.

Optimise Select Query containing inner join

Mysql Version - 5.5.39
I have these two tables Bugs and BugStatus
I want to fetch the Open and Closed bug counts for a given user.
I am currently using this query
SELECT BugStatus.name,
count(BugStatus.name) AS count
FROM bugs
INNER JOIN BugStatus ON bugs.status = bugstatus.id
WHERE bugs.assignee='irakam'
GROUP BY bugstatus.name;
Now let's assume I am going to have 100,000 rows in my Bugs table. Does this query still stand or how should I modify it. I did use Explain but I am still confused. So can this query be optimised?
SQLFiddle link - Click here
Select bs.name,
count(*) as count -- simply count(*) unless you are avoiding nulls
from bugs
inner join BugStatus AS bs ON bugs.status = bs.id
where bugs.assignee='irakam'
group by bs.name;
bugs: INDEX(assignee) -- since filtering occurs first
Index Cookbook
You can further optimize your table by creating an index on bugs.status and bugs.assignee:
CREATE INDEX idx_bugs_assignee_status on bugs(assignee, status);
As far as the execution plan goes:
Select Type: Simple
This means you are executing a simple query, without any subqueries or unions.
Type: ALL
This means that you are doing a full-table scan is being done on the bug status table (every row is inspected), should be avoided for large tables, but this is ok for the BugStatus table, since it only contains 2 rows.
Type: ref
This means all rows with the matching index values are read from the Bugs table, for each combination of rows found in BugStatus.
possible_keys
This lists out the possible indexes that might be used to answer your query (The primary key of BugStatus, and the foreign key on bugs.status)
Key
This is the actual index that the optimizer chose to answer your query (none in the case of the BugStatus table, since a full-table scan is being performed on it, and the foreign key on status in the case of the bugs table.)
ref
This shows the index that was used on the joined table to compare results.
rows
This column indicates the number of rows that were examined.
extra: Using temporary; Using filesort
'Using temporary' means that mysql needs to create a temporary table to sort your results, which is done because of your GROUP BY clause.
'Using filesort' this means the database had to perform an another pass over your results to figure out how to retrieve the sorted rows.
extra: Using where
Means you had a WHERE clause in your query.
See: https://dev.mysql.com/doc/refman/5.5/en/explain-output.html

How can I improve the performance of this MySQL query?

I have a MySQL query:
SELECT DISTINCT
c.id,
c.company_name,
cd.firstname,
cd.surname,
cis.description AS industry_sector
FROM (clients c)
JOIN clients_details cd ON c.id = cd.client_id
LEFT JOIN clients_industry_sectors cis ON cd.industry_sector_id = cis.id
WHERE c.record_type='virgin'
ORDER BY date_action, company_name asc, id desc
LIMIT 30
The clients table has about 60-70k rows and has an index for 'id', 'record_type', 'date_action' and 'company_name' - unfortunately the query still takes 5+ secs to complete. Removing the 'ORDER BY' reduces this to about 30ms since a filesort is not required. Is there any way I can alter this query to improve upon the 5+ sec response time?
See: http://dev.mysql.com/doc/refman/5.0/en/order-by-optimization.html
Especially:
In some cases, MySQL cannot use indexes to resolve the ORDER BY (..). These cases include the following:
(..)
You are joining many tables, and the columns in the ORDER BY are not all from the first nonconstant table that is used to retrieve rows. (This is the first table in the EXPLAIN output that does not have a const join type.)
You have an index for id, record_type, date_action. But if you want to order by date_action, you really need an index that has date_action as the first field in the index, preferably matching the exact fields in the order by. Otherwise yes, it will be a slow query.
Without seeing all your tables and indexes, it's hard to tell. When asking a question about speeding up a query, the query is just part of the equation.
Does clients have an index on id?
Does clients have an index on record_type
Does clients_details have an index on client_id?
Does clients_industry_sectors have an index on id?
These are the minimum you need for this query to have any chance of working quickly.
thanks so much for the input and suggestions. In the end I've decided to create a new DB table which has the sole purpose of existing to return results for this purpose so no joins are required, I just update the table when records are added or deleted to/from the master clients table. Not ideal from a data storage point of view but it solves the problem and means I'm getting results fantastically fast. :)