Very simple self-join, yet indeces ignored? - mysql

Got a noob question. Say I create the following table:
temp1
up, varchar(15)
dn, varchar(15)
and I add a couple of indeces:
create table temp1 (up varchar(15), dn varchar(15), index id1(up), index id2(dn))
After I populate the table with some random data, I do the following explain select
explain select * from temp1 as t1, temp1 as t2 where t1.up = t2.up
and get
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
| 1 | SIMPLE | t1 | ALL | id1 | NULL | NULL | NULL | 4 | |
| 1 | SIMPLE | t2 | ALL | id1 | NULL | NULL | NULL | 3 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
Why isn't the optimizer using the keys?! I must be missing something very simple . . .
(I'm asking this question because a similar query with the tables I'm actually using (700K rows) is running awfully slow, and I'm guessing it has to do with indeces).
Thanks for the help!

Since you select all the rows from temp t1 (and almost all from t2) - mysql decides to use fullscan, due to it is more suitable in such case.

Correct me if i'm wrong, but this would return ALL values of your table temp1. It would not help you to use any indexes, because you are not looking for a subset of anything.

Related

MySQL Subquery making query super slow

I've been struggling when it comes to optimizing the following query (Example 1):
SELECT `service`.*
FROM
(
SELECT `storeUser`.`storeId`
FROM `storeUser`
WHERE `storeUser`.`userId` = 1
UNION
SELECT `store`.`storeId`
FROM `companyUser`
INNER JOIN `store` ON `companyUser`.`companyId` = `store`.`companyId`
WHERE `companyUser`.`userId` = 1
UNION
SELECT `store`.`storeId`
FROM `accountUser`
INNER JOIN `company` ON `company`.`accountId` = `accountUser`.`accountId`
INNER JOIN `store` ON `company`.`companyId` = `store`.`companyId`
WHERE `accountUser`.`userId` = 1
) AS `storeUser`
INNER JOIN `service` ON `storeUser`.`storeId` = `service`.`storeId`
LIMIT 10;
The subquery should be returning something like "1","2","3,"4"
Anyway it's super slow and takes about 48 seconds to give a response, even though the subquery by itself, ran in a different console, takes about 0,0020ms to give results.
The same applies if I place the subquery inside an IN instead (Example 2):
SELECT `service`.*
FROM `service`
WHERE 1
AND `service`.`storeId` IN (
SELECT `storeUser`.`storeId` FROM `storeUser` WHERE `storeUser`.`userId` = 1
UNION
SELECT `store`.`storeId` FROM `companyUser`
INNER JOIN `store` ON `companyUser`.`companyId` = `store`.`companyId`
WHERE `companyUser`.`userId` = 1
UNION
SELECT `store`.`storeId`
FROM `accountUser`
INNER JOIN `company` ON `company`.`accountId` = `accountUser`.`accountId`
INNER JOIN `store` ON `company`.`companyId` = `store`.`companyId`
WHERE `accountUser`.`userId` = 1
)
LIMIT 10;
However if I simply put the values returned by that query, manually, it's basically instantly:
SELECT
`service`.*
FROM
`service`
WHERE 1
AND `service`.`storeId` IN (
"1", "2", "3", "4", "5"
)
LIMIT 10;
Important to mention that'd I've reviewed the indexes in the joins and everything seems to be in place, and the EXPLAIN [query] returns a filtered score of 100 for basically everything.
Edit:
Sorry for not providing enough information before, hope this can be more helpful:
MySQL 5.7,
Storage engine: InnoDB
EXPLAINs
1.) StoreUser
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra
1 | SIMPLE | storeUser | NULL | ref | PRIMARY, storeUserUser | PRIMARY | 4 | const | 1 |100.00 | Using index
2.) CompanyUser
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra
1 | SIMPLE | companyUser | NULL | ref | PRIMARY,companyUserCompany,companyUserUser | companyUserUser | 4 | const | 30 | 100.00 | Using index
1 | SIMPLE | store | NULL | ref | storeCompany | storeCompany | 4 | Table.companyUser.companyId | 5 | 100.00 | Using index
3.) AccountUser
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra
1 | SIMPLE | accountUser | NULL | ref | PRIMARY,accountUserUser | accountUserUser | 4 | const | 1 | 100.00 | Using index
1 | SIMPLE | company | NULL | ref | PRIMARY,companyAccount | companyAccount | 4 | Table.accountUser.accountId | 305 | 100.00 | Using index
1 | SIMPLE | store | NULL | ref | storeCompany | storeCompany | 4 | Table.company.companyId | 5 | 100.00 | Using index
4.) Whole query (Example 2)
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra
1 | PRIMARY | service | NULL | ALL | NULL | NULL | NULL | NULL | 2836046 | 100.00 | Using where
2 | DEPENDENT SUBQUERY | storeUser | NULL | eq_ref | PRIMARY,storeUserStore,storeUserUser | PRIMARY | 8 | const,func | 1 | 100.00 | Using index
3 | DEPENDENT UNION | store | NULL | eq_ref | PRIMARY,storeCompany | PRIMARY | 4 | func | 1 | 100.00 | NULL
3 | DEPENDENT UNION | companyUser | NULL | eq_ref | PRIMARY,companyUserCompany,companyUserUser | PRIMARY | 8 | const,Table.store.companyId | 1 | 100.00 | Using index
4 | DEPENDENT UNION | companyUser | NULL | ref | PRIMARY,accountUserUser | accountUserUser | 4 | const | 1 | 100.00 | Using index
4 | DEPENDENT UNION | store | NULL | eq_ref | PRIMARY,storeCompany | PRIMARY | 4 | func | 1 | 100.00 | NULL
4 | DEPENDENT UNION | company | NULL | eq_ref | PRIMARY,companyAccount | PRIMARY | 4 | Table.store.companyId | 1 | 100.00 | Using where
NULL | UNION RESULT | <union2,3,4>| NULL | ALL | NULL | NULL | NULL | NULL | NULL | NULL | Using temporary
You didn't show us your indexes or EXPLAIN output, so all this is guesswork.
Clearly it's the subquery in your second example that's not optimized. That subquery is a UNION with three branches. The way you address performance trouble? Analyze and optimize each branch of the UNION separately.
You certainly need some better indexes, unless your database server is too small or misconfigured. That's very rare, so let's work on indexes.
The first branch is
SELECT storeUser.storeId
FROM storeUser
WHERE storeUser.userId = 1
This compound index covers that query. Try adding it. If you have a separate index on just userId, drop it when you add this one.
ALTER TABLE storeUser ADD INDEX userId_storeId (userId, storeId);
The second branch is
SELECT store.storeId
FROM companyUser
INNER JOIN store ON companyUser.companyId = store.companyId
WHERE companyUser.userId = 1
Subqueries with JOIN operations are a little tricker to optimize without access to EXPLAIN output, so this is guesswork. I guess these indexes will help, though. (Assuming you use InnoDB and the PK on store is storeId.)
ALTER TABLE companyUser ADD INDEX userId_companyId (userId, companyId);
ALTER TABLE store ADD INDEX companyId (companyId);
Similar analysis applies to the third branch of the UNION.
And, add this index. Your EXPLAIN points to it being missing, and so a full table scan of that large table being required.
ALTER TABLE service ADD INDEX storeId (storeId);
Again, helping you would be far easier if you showed us your table definitions with indexes. SHOW CREATE TABLE service; for example, would show us what we need for your service table. Pro tip when troubleshooting this kind of performance stuff always doublecheck your indexes. Ask me how I know that when you have a couple of hours to spare.
Pro tip Be obsessive about formatting your queries so they're readable. You, yourself a year from now, and your co-workers yet unborn need to read and reason about them. To my way of thinking that means skipping those silly backticks.
Perhaps you need to rethink the schema. It seems like you need a table for "user" instead of, or in addition to, the 3 tables for different types of "users".
Meanwhile, these composite indexes are likely to help performance in either formulation:
storeUser: INDEX(storeId, userId)
storeUser: INDEX(userId, storeId)
service: INDEX(storeId)
store: INDEX(companyId, storeId)
companyUser: INDEX(userId, companyId)
company: INDEX(accountId, companyId)
accountUser: INDEX(userId, accounted)
When adding a composite index, DROP index(es) with the same leading columns.
That is, when you have both INDEX(a) and INDEX(a,b), toss the former.
In particular, storeUser smells like a many-to-many mapping table. If so, see Many:many mapping for more discussion.
In general IN( SELECT ... ) does not optimize well, but you might find otherwise for your query.
Sorry to not give more details about the schemas but I wasn't allowed to share it here, anyway, the problem happened to be elsewhere:
The service table was receiving a huge amount of requests, some actions were even locking it up, ending up on slow times whenever we were accesing that table, we have fixed our other proccess and it's working great now. Hugely appreciate your time and effort, thanks.

MySQL- Improvement on count(*) aggregation with composite index keys

I have a table with the following structure with almost 120000 rows,
desc user_group_report
+------------------+----------+------+-----+-------------------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------------+----------+------+-----+-------------------+-------+
| user_id | int | YES | MUL | NULL | |
| group_id | int(11) | YES | MUL | NULL | |
| type_id | int(11) | YES | | NULL | |
| group_desc | varchar(128)| NO| | NULL |
| status | enum('open','close')|NO| | NULL | |
| last_updated | datetime | NO | | CURRENT_TIMESTAMP | |
+------------------+----------+------+-----+-------------------+-------+
I have indexes on the following keys :
user_group_type(user_id,group_id,group_type)
group_type(group_id,type_id)
user_type(user_id,type_id)
user_group(user_id,group_id)
My issue is I am running a count(*) aggregation on above table group by group_id and with a clause on type_id
Here is the query :
select count(*) user_count, group_id
from user_group_report
where type_id = 1
group by group_id;
and here is the explain plan (query taking 0.3 secs on average):
+----+-------------+------------------+-------+---------------------------------+---------+---------+------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+-------+---------------------------------+---------+---------+------+--------+--------------------------+
| 1 | SIMPLE | user_group_report | index | user_group_type,group_type,user_group | group_type | 10 | NULL | 119811 | Using where; Using index |
+----+-------------+------------------+-------+---------------------------------+---------+---------+------+--------+--------------------------+
Here as I understand the query almost does a full table scan because of complex indices and When I am trying to add an index on group_id, the rows in explain plan shows a less number (almost half the rows) but the time taking for query execution is increased to 0.4-0.5 secs.
I have tried different ways to add/remove indices but none of them is reducing the time taken.
Assuming the table structure cannot be changed and querying is independent of other tables, Can someone suggest me a better way to optimize the above query or If i am missing anything here.
PS:
I have already tried to modify the query to the following but couldn't find any improvement.
select count(user_id) user_count, group_id
from user_group_report
where type_id = 1
group by group_id;
Any little help is appreciated.
Edit:
As per the suggestions, I added a new index
type_group on (type_id,group_id)
This is the new explain plan. The number of rows in explain,reduced but the query execution time is still the same
+----+-------------+------------------+------+---------------------------------+---------+---------+-------+-------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+------+---------------------------------+---------+---------+-------+-------+--------------------------+
| 1 | SIMPLE | user_group_report | ref | user_group_type,type_group,user_group | type_group | 5 | const | 59846 | Using where; Using index |
+----+-------------+------------------+------+---------------------------------+---------+---------+-------+-------+--------------------------+
EDIT 2:
Adding details as suggested in answers/comments
select count(*)
from user_group_report
where type_id = 1
This query itself is taking 0.25 secs to execute.
and here is the explain plan:
+----+-------------+------------------+------+---------------+---------+---------+-------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+------+---------------+---------+---------+-------+-------+-------------+
| 1 | SIMPLE | user_group_report | ref | type_group | type_group | 5 | const | 59866 | Using index |
+----+-------------+------------------+------+---------------+---------+---------+-------+-------+-------------+
I believe that your group_type is wrong. Try to switch the attributes.
create index ix_type_group on user_group_report(type_id,group_id)
This index is better for your query because you specify the type_id = 1 in the where clause. Therefore, the query processor finds the first record with type_id = 1 in your index and then it scans the records in the index with this type_id and performs the aggregation. With such index, only relevant records in the index are accessed which is not possible with the group_type index.
If type_id is selective (i.e. it reduces the search space significantly), creating an index on type_id, group_id should help significantly.
This is because it reduces the number of records that need to be grouped first (remove everything where type_id != 1), and only then does the grouping/summing.
EDIT:
Following on from the comments, it seems we need to figure out more about where the bottleneck is - finding the records, or grouping/summing.
The first step would be to measure the performance of:
select count(*)
from user_group_report
where type_id = 1
If that is significantly faster, the challenge is likely in the grouping than in finding the records. If that's just as slow, it's in finding the records in the first place.
Do most of the columns really need to be NULLable? Change to NOT NULL where applicable.
What percentage of the table has type_id = 1? If it is most of the table, then that would explain why you don't see much improvement. Meanwhile, the EXPLAIN seems to be thinking there are only two distinct values for type_id, hence it says only half the table will be scanned -- this number cannot be trusted.
To get more insight into what is going on, please do these:
EXPLAIN FORMAT=JSON SELECT...;
And
FLUSH STATUS;
SELECT ...
SHOW SESSION STATUS LIKE 'Handler%';
We can help interpret the data you get there. (Here is a brief discussion of such.)

Mysql select in not using index

I have a big tables (10M row), with 3 columns : x, y, status.
I have an primary index on x,y.
I request like '
SELECT * FROM table where (x,y) in (select 1234,5678) take approximately 5 secondes
Whereas the request SELECT * FROM table where (x,y) in (1234,5678) give the same result for less than 0.01s
I assume it's an issue with indexes, I've tried to add force index but without success.
when I run an explain on both query, the first one in not using indexes :
EXPLAIN SELECT * FROM table where (x,y) in (select 1234,5678)
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+----------+----------+----------------+
| 1 | PRIMARY | table | NULL | ALL | NULL | NULL | NULL | NULL | 10794773 | 100.00 | Using where |
| 2 | SUBQUERY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
EXPLAIN SELECT * FROM table where (x,y) in (1234,5678)
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+---------+---------+-------------+------+----------+-------+
| 1 | SIMPLE | table | NULL | const | PRIMARY | PRIMARY | 8 | const,const | 1 | 100.00 | NULL |
+----+-------------+-------+------------+-------+---------------+---------+---------+-------------+------+----------+-------+
Of course I'd like to use the first syntax because the real query is like UPDATE table set status=123 where (x,y) IN (SELECT x,y from table2 where ... );
I really don't understrand this behaviour
You do not need the select 1234,5678 subquery, use ... in ((1234,5678)) instead (pls note the double parentheses around the values):
SELECT * FROM table where (x,y) in ((1234,5678))
If you check multiple fields with the in() operator against a list of constant values, then you need to include the sets of values into parentheses:
SELECT * FROM table where (x,y) in ((1,1),(2,3),...(n,m))
The above syntax would enable MySQL to match the x,y fields against constant values, thus the query can utilise the multi-column index on x,y fields.
However, this may not be effective for the update query with a subquery. In this case, I would rewrite the update with a join instead of a subquery:
UPDATE table
INNER JOIN table2 on table.x=table2.x and table.y=table2.y
SET table.status=123
WHERE table2.fieldname=...
If x,y are indexed in both tables, then the joins should be fast. Moreover, if the table2 indexes are extended to cover the where criteria, then such a query can be really fast.

calculation user's age on fly, optimization. mysql

I have next (strange) query
SELECT DISTINCT c.id
FROM z1 INNER JOIN c c ON (z1.id=c.id)
INNER JOIN i ON (c.member_id=i.member_id)
WHERE DATE_FORMAT(CONCAT(i.birthyear,"-",i.birthmonth,"-",i.birthday),"%Y%m%d000000") BETWEEN '19820605000000' AND '19930604235959' AND c.id NOT IN (658887)
GROUP BY c.id
user's birthday keeps in db in three different colums. but here is the task to find out user's stuff which ages are in specific range.
The worst thing, that mysql will calculate age for each selected record and compare it with condition and it's not good :( is there any way to make it faster ?
this is the plan
+----+-------------+-------+--------+-------------------+---------+---------+--------------------+--------+----------+-----------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+--------+-------------------+---------+---------+--------------------+--------+----------+-----------------------------------------------------------+
| 1 | SIMPLE | z1 | index | PRIMARY | PRIMARY | 4 | NULL | 176659 | 100.00 | Using where; Using index; Using temporary; Using filesort |
| 1 | SIMPLE | c | eq_ref | PRIMARY,member_id | PRIMARY | 4 | z1.id | 1 | 100.00 | |
| 1 | SIMPLE | i | eq_ref | PRIMARY | PRIMARY | 4 | c.member_id | 1 | 100.00 | Using where |
+----+-------------+-------+--------+-------------------+---------+---------+--------------------+--------+----------+-----------------------------------------------------------+
As usual, the right answer is to fix your schema. i.e. data should be normalized, use native keys wherever practical and use the right data types.
Looking at your post, at least you've provided a EXPLAIN plan - but the table structures would help too.
Why is the table z1 in the query? You don't explicitly filter using it, and you don't use the result anywhere.
Why do you do bot a DISTINCT and a GROUP BY - you're asking the DBMS to do the same work twice.
Why do you use 'c' as an alias for 'c'?
Why are you using NOT IN to exclude a single value?
Why do you compare your date values as strings?
It's posible that the optimizer is getting confused about the best way to resolve the query - but you've not provided any information to support this - what proportion of the data is filterd by the age rule? You may get better results using the birthday / i table to drive the query:
SELECT DISTINCT c.id
FROM c
INNER JOIN i ON (c.member_id=i.member_id)
WHERE STR_TO_DATE(
CONCAT(i.birthyear,'-', i.birthmonth,'-',i.birthday)
,"%Y-%m-%d")
BETWEEN 19820605000000 AND 19930604235959
AND c.id <> 658887
AND i.birthyear BETWEEN 1982 AND 1993
Alter i table and add a TIMESTAMP or DATETIME column named date_of_birth with a INDEX on it :
ALTER TABLE i ADD date_of_birth DATETIME NOT NULL, ADD INDEX date_of_birth;
UPDATE i SET date_of_birth = CONCAT(i.birthyear,"-",i.birthmonth,"-",i.birthday);
And use this query which should be faster:
SELECT
c.id
FROM
i
INNER JOIN c
ON c.member_id=i.member_id
WHERE
i.date_of_bith BETWEEN '1982-06-05 00:00:00' AND '1993-06-04 23:59:59'
AND c.id NOT IN (658887)
GROUP BY
c.id
ORDER BY
NULL
You've asked me to explain what I mean. Unfortunately there are two problems with that.
The first is that I don't think that this can be adequately explained in a simple comments box.
The second is that I don't really know what I'm talking about, but I'll have a go...
Consider the following example - a simple utility table containing dates up to 2038 (when the whole UNIX_TIMESTAMP thing stops working anyway)...
CREATE TABLE calendar (
dt date NOT NULL DEFAULT '0000-00-00',
PRIMARY KEY (`dt`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Now, the following queries are logically identical...
SELECT * FROM calendar WHERE UNIX_TIMESTAMP(dt) BETWEEN 1370521405 AND 1370732400;
+------------+
| dt |
+------------+
| 2013-06-07 |
| 2013-06-08 |
| 2013-06-09 |
+------------+
SELECT * FROM calendar WHERE dt BETWEEN FROM_UNIXTIME(1370521405) AND FROM_UNIXTIME(1370732400);
+------------+
| dt |
+------------+
| 2013-06-07 |
| 2013-06-08 |
| 2013-06-09 |
+------------+
...and MySQL is clever enough to utilise the (PK) index to resolve both queries (rather than reading the table itself - yuk).
But while the first requires a full scan over the entire index (good but not great), the second is able to access the table with a key over one (or more) value ranges (terrific)...
EXPLAIN EXTENDED
SELECT * FROM calendar WHERE UNIX_TIMESTAMP(dt) BETWEEN 1370521405 AND 1370732400;
+----+-------------+----------+-------+---------------+---------+---------+------+-------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+---------------+---------+---------+------+-------+--------------------------+
| 1 | SIMPLE | calendar | index | NULL | PRIMARY | 3 | NULL | 10957 | Using where; Using index |
+----+-------------+----------+-------+---------------+---------+---------+------+-------+--------------------------+
EXPLAIN EXTENDED
SELECT * FROM calendar WHERE dt BETWEEN FROM_UNIXTIME(1370521405) AND FROM_UNIXTIME(1370732400);
+----+-------------+----------+-------+---------------+---------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+---------------+---------+---------+------+------+--------------------------+
| 1 | SIMPLE | calendar | range | PRIMARY | PRIMARY | 3 | NULL | 3 | Using where; Using index |
+----+-------------+----------+-------+---------------+---------+---------+------+------+--------------------------+

How to improve performance of this SELECT query JOIN

I have a query which seems to be taking a long time to run on occasion. The slowness may be unrelated but I wanted to check what could be done to make this more efficient.
The user table has about 40k rows. The code table has about 30k rows. user_id and code are unique values.
SELECT *
FROM `user`, code
WHERE `user`.user_id = code.user_id
AND code.code = '50816ef96210415d1cad824bdb43';
I have an index setup on the code.user_id field. Anything else I can do? Should I have other indexes in place here?
Output from EXPLAIN on that query:
>> +----+-------------+-------+--------+---------------+---------+---------+----------------------+-------+-------------+
>> | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
>> +----+-------------+-------+--------+---------------+---------+---------+----------------------+-------+-------------+
>> | 1 | SIMPLE | code | ALL | user_id | NULL | NULL | NULL 35696 | Using where |
>> | 1 | SIMPLE | user | eq_ref | PRIMARY | PRIMARY | 4 | mydb.code.user_id | 1 | |
>> +----+-------------+-------+--------+---------------+---------+---------+----------------------+-------+-------------+
>> 2 rows in set (10.11 sec)
You also need to add indexes on code.code and user.user_id fields, and it should start flying
Beside adding an index to code.code another thing you can do is to select only the columns you need ( i dont like using SELECT * )