I have a query which seems to be taking a long time to run on occasion. The slowness may be unrelated but I wanted to check what could be done to make this more efficient.
The user table has about 40k rows. The code table has about 30k rows. user_id and code are unique values.
SELECT *
FROM `user`, code
WHERE `user`.user_id = code.user_id
AND code.code = '50816ef96210415d1cad824bdb43';
I have an index setup on the code.user_id field. Anything else I can do? Should I have other indexes in place here?
Output from EXPLAIN on that query:
>> +----+-------------+-------+--------+---------------+---------+---------+----------------------+-------+-------------+
>> | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
>> +----+-------------+-------+--------+---------------+---------+---------+----------------------+-------+-------------+
>> | 1 | SIMPLE | code | ALL | user_id | NULL | NULL | NULL 35696 | Using where |
>> | 1 | SIMPLE | user | eq_ref | PRIMARY | PRIMARY | 4 | mydb.code.user_id | 1 | |
>> +----+-------------+-------+--------+---------------+---------+---------+----------------------+-------+-------------+
>> 2 rows in set (10.11 sec)
You also need to add indexes on code.code and user.user_id fields, and it should start flying
Beside adding an index to code.code another thing you can do is to select only the columns you need ( i dont like using SELECT * )
Related
I've been struggling when it comes to optimizing the following query (Example 1):
SELECT `service`.*
FROM
(
SELECT `storeUser`.`storeId`
FROM `storeUser`
WHERE `storeUser`.`userId` = 1
UNION
SELECT `store`.`storeId`
FROM `companyUser`
INNER JOIN `store` ON `companyUser`.`companyId` = `store`.`companyId`
WHERE `companyUser`.`userId` = 1
UNION
SELECT `store`.`storeId`
FROM `accountUser`
INNER JOIN `company` ON `company`.`accountId` = `accountUser`.`accountId`
INNER JOIN `store` ON `company`.`companyId` = `store`.`companyId`
WHERE `accountUser`.`userId` = 1
) AS `storeUser`
INNER JOIN `service` ON `storeUser`.`storeId` = `service`.`storeId`
LIMIT 10;
The subquery should be returning something like "1","2","3,"4"
Anyway it's super slow and takes about 48 seconds to give a response, even though the subquery by itself, ran in a different console, takes about 0,0020ms to give results.
The same applies if I place the subquery inside an IN instead (Example 2):
SELECT `service`.*
FROM `service`
WHERE 1
AND `service`.`storeId` IN (
SELECT `storeUser`.`storeId` FROM `storeUser` WHERE `storeUser`.`userId` = 1
UNION
SELECT `store`.`storeId` FROM `companyUser`
INNER JOIN `store` ON `companyUser`.`companyId` = `store`.`companyId`
WHERE `companyUser`.`userId` = 1
UNION
SELECT `store`.`storeId`
FROM `accountUser`
INNER JOIN `company` ON `company`.`accountId` = `accountUser`.`accountId`
INNER JOIN `store` ON `company`.`companyId` = `store`.`companyId`
WHERE `accountUser`.`userId` = 1
)
LIMIT 10;
However if I simply put the values returned by that query, manually, it's basically instantly:
SELECT
`service`.*
FROM
`service`
WHERE 1
AND `service`.`storeId` IN (
"1", "2", "3", "4", "5"
)
LIMIT 10;
Important to mention that'd I've reviewed the indexes in the joins and everything seems to be in place, and the EXPLAIN [query] returns a filtered score of 100 for basically everything.
Edit:
Sorry for not providing enough information before, hope this can be more helpful:
MySQL 5.7,
Storage engine: InnoDB
EXPLAINs
1.) StoreUser
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra
1 | SIMPLE | storeUser | NULL | ref | PRIMARY, storeUserUser | PRIMARY | 4 | const | 1 |100.00 | Using index
2.) CompanyUser
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra
1 | SIMPLE | companyUser | NULL | ref | PRIMARY,companyUserCompany,companyUserUser | companyUserUser | 4 | const | 30 | 100.00 | Using index
1 | SIMPLE | store | NULL | ref | storeCompany | storeCompany | 4 | Table.companyUser.companyId | 5 | 100.00 | Using index
3.) AccountUser
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra
1 | SIMPLE | accountUser | NULL | ref | PRIMARY,accountUserUser | accountUserUser | 4 | const | 1 | 100.00 | Using index
1 | SIMPLE | company | NULL | ref | PRIMARY,companyAccount | companyAccount | 4 | Table.accountUser.accountId | 305 | 100.00 | Using index
1 | SIMPLE | store | NULL | ref | storeCompany | storeCompany | 4 | Table.company.companyId | 5 | 100.00 | Using index
4.) Whole query (Example 2)
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra
1 | PRIMARY | service | NULL | ALL | NULL | NULL | NULL | NULL | 2836046 | 100.00 | Using where
2 | DEPENDENT SUBQUERY | storeUser | NULL | eq_ref | PRIMARY,storeUserStore,storeUserUser | PRIMARY | 8 | const,func | 1 | 100.00 | Using index
3 | DEPENDENT UNION | store | NULL | eq_ref | PRIMARY,storeCompany | PRIMARY | 4 | func | 1 | 100.00 | NULL
3 | DEPENDENT UNION | companyUser | NULL | eq_ref | PRIMARY,companyUserCompany,companyUserUser | PRIMARY | 8 | const,Table.store.companyId | 1 | 100.00 | Using index
4 | DEPENDENT UNION | companyUser | NULL | ref | PRIMARY,accountUserUser | accountUserUser | 4 | const | 1 | 100.00 | Using index
4 | DEPENDENT UNION | store | NULL | eq_ref | PRIMARY,storeCompany | PRIMARY | 4 | func | 1 | 100.00 | NULL
4 | DEPENDENT UNION | company | NULL | eq_ref | PRIMARY,companyAccount | PRIMARY | 4 | Table.store.companyId | 1 | 100.00 | Using where
NULL | UNION RESULT | <union2,3,4>| NULL | ALL | NULL | NULL | NULL | NULL | NULL | NULL | Using temporary
You didn't show us your indexes or EXPLAIN output, so all this is guesswork.
Clearly it's the subquery in your second example that's not optimized. That subquery is a UNION with three branches. The way you address performance trouble? Analyze and optimize each branch of the UNION separately.
You certainly need some better indexes, unless your database server is too small or misconfigured. That's very rare, so let's work on indexes.
The first branch is
SELECT storeUser.storeId
FROM storeUser
WHERE storeUser.userId = 1
This compound index covers that query. Try adding it. If you have a separate index on just userId, drop it when you add this one.
ALTER TABLE storeUser ADD INDEX userId_storeId (userId, storeId);
The second branch is
SELECT store.storeId
FROM companyUser
INNER JOIN store ON companyUser.companyId = store.companyId
WHERE companyUser.userId = 1
Subqueries with JOIN operations are a little tricker to optimize without access to EXPLAIN output, so this is guesswork. I guess these indexes will help, though. (Assuming you use InnoDB and the PK on store is storeId.)
ALTER TABLE companyUser ADD INDEX userId_companyId (userId, companyId);
ALTER TABLE store ADD INDEX companyId (companyId);
Similar analysis applies to the third branch of the UNION.
And, add this index. Your EXPLAIN points to it being missing, and so a full table scan of that large table being required.
ALTER TABLE service ADD INDEX storeId (storeId);
Again, helping you would be far easier if you showed us your table definitions with indexes. SHOW CREATE TABLE service; for example, would show us what we need for your service table. Pro tip when troubleshooting this kind of performance stuff always doublecheck your indexes. Ask me how I know that when you have a couple of hours to spare.
Pro tip Be obsessive about formatting your queries so they're readable. You, yourself a year from now, and your co-workers yet unborn need to read and reason about them. To my way of thinking that means skipping those silly backticks.
Perhaps you need to rethink the schema. It seems like you need a table for "user" instead of, or in addition to, the 3 tables for different types of "users".
Meanwhile, these composite indexes are likely to help performance in either formulation:
storeUser: INDEX(storeId, userId)
storeUser: INDEX(userId, storeId)
service: INDEX(storeId)
store: INDEX(companyId, storeId)
companyUser: INDEX(userId, companyId)
company: INDEX(accountId, companyId)
accountUser: INDEX(userId, accounted)
When adding a composite index, DROP index(es) with the same leading columns.
That is, when you have both INDEX(a) and INDEX(a,b), toss the former.
In particular, storeUser smells like a many-to-many mapping table. If so, see Many:many mapping for more discussion.
In general IN( SELECT ... ) does not optimize well, but you might find otherwise for your query.
Sorry to not give more details about the schemas but I wasn't allowed to share it here, anyway, the problem happened to be elsewhere:
The service table was receiving a huge amount of requests, some actions were even locking it up, ending up on slow times whenever we were accesing that table, we have fixed our other proccess and it's working great now. Hugely appreciate your time and effort, thanks.
I have a table with the following structure with almost 120000 rows,
desc user_group_report
+------------------+----------+------+-----+-------------------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------------+----------+------+-----+-------------------+-------+
| user_id | int | YES | MUL | NULL | |
| group_id | int(11) | YES | MUL | NULL | |
| type_id | int(11) | YES | | NULL | |
| group_desc | varchar(128)| NO| | NULL |
| status | enum('open','close')|NO| | NULL | |
| last_updated | datetime | NO | | CURRENT_TIMESTAMP | |
+------------------+----------+------+-----+-------------------+-------+
I have indexes on the following keys :
user_group_type(user_id,group_id,group_type)
group_type(group_id,type_id)
user_type(user_id,type_id)
user_group(user_id,group_id)
My issue is I am running a count(*) aggregation on above table group by group_id and with a clause on type_id
Here is the query :
select count(*) user_count, group_id
from user_group_report
where type_id = 1
group by group_id;
and here is the explain plan (query taking 0.3 secs on average):
+----+-------------+------------------+-------+---------------------------------+---------+---------+------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+-------+---------------------------------+---------+---------+------+--------+--------------------------+
| 1 | SIMPLE | user_group_report | index | user_group_type,group_type,user_group | group_type | 10 | NULL | 119811 | Using where; Using index |
+----+-------------+------------------+-------+---------------------------------+---------+---------+------+--------+--------------------------+
Here as I understand the query almost does a full table scan because of complex indices and When I am trying to add an index on group_id, the rows in explain plan shows a less number (almost half the rows) but the time taking for query execution is increased to 0.4-0.5 secs.
I have tried different ways to add/remove indices but none of them is reducing the time taken.
Assuming the table structure cannot be changed and querying is independent of other tables, Can someone suggest me a better way to optimize the above query or If i am missing anything here.
PS:
I have already tried to modify the query to the following but couldn't find any improvement.
select count(user_id) user_count, group_id
from user_group_report
where type_id = 1
group by group_id;
Any little help is appreciated.
Edit:
As per the suggestions, I added a new index
type_group on (type_id,group_id)
This is the new explain plan. The number of rows in explain,reduced but the query execution time is still the same
+----+-------------+------------------+------+---------------------------------+---------+---------+-------+-------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+------+---------------------------------+---------+---------+-------+-------+--------------------------+
| 1 | SIMPLE | user_group_report | ref | user_group_type,type_group,user_group | type_group | 5 | const | 59846 | Using where; Using index |
+----+-------------+------------------+------+---------------------------------+---------+---------+-------+-------+--------------------------+
EDIT 2:
Adding details as suggested in answers/comments
select count(*)
from user_group_report
where type_id = 1
This query itself is taking 0.25 secs to execute.
and here is the explain plan:
+----+-------------+------------------+------+---------------+---------+---------+-------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+------+---------------+---------+---------+-------+-------+-------------+
| 1 | SIMPLE | user_group_report | ref | type_group | type_group | 5 | const | 59866 | Using index |
+----+-------------+------------------+------+---------------+---------+---------+-------+-------+-------------+
I believe that your group_type is wrong. Try to switch the attributes.
create index ix_type_group on user_group_report(type_id,group_id)
This index is better for your query because you specify the type_id = 1 in the where clause. Therefore, the query processor finds the first record with type_id = 1 in your index and then it scans the records in the index with this type_id and performs the aggregation. With such index, only relevant records in the index are accessed which is not possible with the group_type index.
If type_id is selective (i.e. it reduces the search space significantly), creating an index on type_id, group_id should help significantly.
This is because it reduces the number of records that need to be grouped first (remove everything where type_id != 1), and only then does the grouping/summing.
EDIT:
Following on from the comments, it seems we need to figure out more about where the bottleneck is - finding the records, or grouping/summing.
The first step would be to measure the performance of:
select count(*)
from user_group_report
where type_id = 1
If that is significantly faster, the challenge is likely in the grouping than in finding the records. If that's just as slow, it's in finding the records in the first place.
Do most of the columns really need to be NULLable? Change to NOT NULL where applicable.
What percentage of the table has type_id = 1? If it is most of the table, then that would explain why you don't see much improvement. Meanwhile, the EXPLAIN seems to be thinking there are only two distinct values for type_id, hence it says only half the table will be scanned -- this number cannot be trusted.
To get more insight into what is going on, please do these:
EXPLAIN FORMAT=JSON SELECT...;
And
FLUSH STATUS;
SELECT ...
SHOW SESSION STATUS LIKE 'Handler%';
We can help interpret the data you get there. (Here is a brief discussion of such.)
SELECT AVG(table1.column1) as a,
table2.column2
FROM table1
LEFT OUTER JOIN table2
ON table2.column2 = table1.column2
GROUP BY table2.column2 ORDER BY a DESC LIMIT 10
This is MySQL code. I have 1.5 Million rows in table1, 200.000 rows in table2.
I am still waiting for the query to finish.
Does anybody know a way to work in a shorter time?
Lot of comments in the same vein but I thought I'd give a thorough answer. I'm gonna use one of my own tables/databases here for explanation. Let's take this query:
SELECT A.id, B.asin FROM AmazonWishlistItems A LEFT JOIN AmazonWishlistItemPrices B ON (B.asin = A.asin) WHERE A.asin LIKE "%C%"
This query returns about 851 and takes 0.5 seconds. If we add the word EXPLAIN to the query, MySQL tells us what this query is doing.
mysql> EXPLAIN SELECT A.id, B.asin FROM AmazonWishlistItems A LEFT JOIN AmazonWishlistItemPrices B ON (B.asin = A.asin) WHERE A.asin LIKE "%C%";
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
| 1 | SIMPLE | A | ALL | NULL | NULL | NULL | NULL | 1183 | Using where |
| 1 | SIMPLE | B | ALL | NULL | NULL | NULL | NULL | 6594 | |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
2 rows in set (0.00 sec)
Important column to look at here is the rows as this is the number of records MySQL is having to look at and in this case for tables A and B it is having to look up all the rows even though there's only 851 that fit the condition. This is how tables can get out of hand quickly, this only has 6594 record to search through but left alone this could easily reach your 1.5 million rows.
So we can cut this down by adding an index to the table, allowing MySQL to store a reference for each record.
ALTER TABLE AmazonWishlistItemPrices ADD INDEX idx_asin (asin)
This simply says create an index called idx_asin and use the column asin to do the indexing. If we re run our EXPLAIN...
mysql> EXPLAIN SELECT A.id, B.asin FROM AmazonWishlistItems A LEFT JOIN AmazonWishlistItemPrices B ON (B.asin = A.asin) WHERE A.asin LIKE "%C%";
+----+-------------+-------+------+---------------+----------+---------+---------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+----------+---------+---------------------+------+-------------+
| 1 | SIMPLE | A | ALL | NULL | NULL | NULL | NULL | 1183 | Using where |
| 1 | SIMPLE | B | ref | idx_asin | idx_asin | 12 | mah_database.A.asin | 6 | Using index |
+----+-------------+-------+------+---------------+----------+---------+---------------------+------+-------------+
2 rows in set (0.00 sec)
We're down to six rows and you can see in the possible_keys it's found our index. You may find that with certain joins and where clauses your indexes are being ignored that's simply MySQL saying "I'm going to have to get all the data anyway" because of the conditions you've provided in the WHERE condition.
It's best to use numeric keys for indexing, you can get away with some varchars but they do take up disk space. You should have a PRIMARY KEY on each table where possible. So look at your database structure and consider adding some indexes.
Final thing to check if your table has indexes you can use SHOW CREATE TABLE followed by the table name.
I am willing to get information about the progress of a BIG insert of 10M rows into out_table during the INSERT execution.
I found that:
query1: EXPLAIN SELECT COUNT(*) FROM out_table; gives:
+----+-------------+-----------+-------+---------------+-----------------------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+-------+---------------+-----------------------+---------+------+---------+-------------+
| 1 | SIMPLE | out_table| index | NULL | periodo_cuit_cuil_idx | 22 | NULL | 2518148 | Using index |
+----+-------------+-----------+-------+---------------+-----------------------+---------+------+---------+-------------+
As I want to show the advance of the INSERT script I will run an script that keeps showing the progress. The problem I have is that I cannot get the 'rows'
column alone from the former query.
I've tried to make a SELECT rows FROM (query1) and to asign the output to a variable with no luck. Any idea?
Got a noob question. Say I create the following table:
temp1
up, varchar(15)
dn, varchar(15)
and I add a couple of indeces:
create table temp1 (up varchar(15), dn varchar(15), index id1(up), index id2(dn))
After I populate the table with some random data, I do the following explain select
explain select * from temp1 as t1, temp1 as t2 where t1.up = t2.up
and get
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
| 1 | SIMPLE | t1 | ALL | id1 | NULL | NULL | NULL | 4 | |
| 1 | SIMPLE | t2 | ALL | id1 | NULL | NULL | NULL | 3 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
Why isn't the optimizer using the keys?! I must be missing something very simple . . .
(I'm asking this question because a similar query with the tables I'm actually using (700K rows) is running awfully slow, and I'm guessing it has to do with indeces).
Thanks for the help!
Since you select all the rows from temp t1 (and almost all from t2) - mysql decides to use fullscan, due to it is more suitable in such case.
Correct me if i'm wrong, but this would return ALL values of your table temp1. It would not help you to use any indexes, because you are not looking for a subset of anything.