Bad mysql index selection - mysql

I have a large table (250M rows) with a column group_id that broadly divides the table into groups (group_id). It has the following index:
mysql> show indexes from table\G;
*************************** 13. row ***************************
Table: table
Non_unique: 1
Key_name: myindex
Seq_in_index: 1
Column_name: group_id
Collation: A
Cardinality: 181819
Sub_part: NULL
Packed: NULL
Null: YES
Index_type: BTREE
Comment:
*************************** 14. row ***************************
Table: table
Non_unique: 1
Key_name: myindex
Seq_in_index: 2
Column_name: id
Collation: A
Cardinality: 213456239
Sub_part: NULL
Packed: NULL
Null:
Index_type: BTREE
Comment:
I want to execute the following query:
mysql> explain select * from `table` WHERE (`table`.`type_id` IN (11, 17, 12, 19) AND `table`.`group_id` = 310248) ORDER BY `table`.`id` ASC LIMIT 201\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: table
type: index
possible_keys: [SOME INDEX NAMES]
key: PRIMARY
key_len: 4
ref: NULL
rows: 257386914
Extra: Using where
1 row in set (0.00 sec)
I understand the it will need to scan some rows because of the problems with indexing for WHERE ... IN (). Amazingly to me, however, it chooses to scan almost as many rows as possible by using the primary key index.
The following seems unambiguously (and obviously) superior:
mysql> explain select * from `table` USE INDEX (myindex) WHERE (`table`.`type_id` IN (11, 17, 12, 19) AND `table`.`group_id` = 310248) ORDER BY `table`.`id` ASC LIMIT 201\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: table
type: ref
possible_keys: myindex
key: myindex
key_len: 5
ref: const
rows: 1883760
Extra: Using where
1 row in set (0.00 sec)
Using a larger value for LIMIT (2000), using different values of group_id, removing the ORDER BY and removing the type_id filter all cause it to use the index. I have run ANALYZE TABLE.
Its worth noting that the row estimates are quite high:
mysql> select count(*) from table where group_id=310248 and type_id in (11, 17, 12, 19) ;
+----------+
| count(*) |
+----------+
| 583868 |
+----------+
1 row in set (0.61 sec)
mysql version:
Ver 5.1.57-rel12.8-log for debian-linux-gnu on x86_64 ((Percona Server (GPL), 12.8, Revision 233))
Why would mysql choose a plan that it thinks will involve scanning 257386914 rows rather than 1883760? I understand that it might value sequential reads, but why would it choose the index for 2000 rows, but not for 200 rows? Why would filtering by a different group id?
Edited: I have also tried creating the index (group_id, id, type_id) so that all sorting can be done using only an index scan, but I can't get it to ever select that index.

Did you have a question?
Note that because that predicate on the type_id column has to be checked, and because your query is returning at least one column that is not in the index, MySQL will have to visit the data pages of the table, in order to access the values for those columns.
So, MySQL may be favoring the cluster key, since that's where the data pages are; the cluster key also allows MySQL to avoid a sort operation ("Using filesort"). (We do note that the execution plan that uses your index also avoids a sort operation.)
If you want MySQL to favor your index, you might consider including type_id as a third column in that index, if that is at all selective.
Alternatively, you might consider modifying your query to "ORDER BY group_id, id" to influence the optimizer.
Have you measured the performance of the query, both with the hint and without the hint?

Related

MySQL query stuck at "Sorting Result" for a single row result set

I am building a star schema to act as the backend for an analytics app I am building. My query generator is building queries using a regular star-join pattern. A sample query is below, whereby a fact table is joined to two dimension tables and the dimension tables are filtered by constant values chosen by the end user.
I am using MySQL 5.5 and all tables are MyISAM.
In this problem, I am simply trying to pull the first N rows (in this case, the first 1 row)
EXPLAIN
SELECT fact_table.*
FROM
fact_table
INNER JOIN
dim1 ON (fact_table.dim1_key = dim1.pkey)
INNER JOIN
dim2 ON (fact_table.dim2_key = dim2.pkey)
WHERE
dim1.constant_value = 123
AND dim2.constant_value = 456
ORDER BY
measure1 ASC LIMIT 1
The explain output follows. Both the dimension keys resolve to constant values since there is a unique key applied to their value.
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: dim1
type: const
possible_keys: PRIMARY,dim1_uk
key: dim1_uk
key_len: 8
ref: const
rows: 1
Extra: Using filesort
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: dim2
type: const
possible_keys: PRIMARY,dim2_uk
key: dim2_uk
key_len: 8
ref: const
rows: 1
Extra:
*************************** 3. row ***************************
id: 1
select_type: SIMPLE
table: fact_table
type: ref
possible_keys: my_idx
key: my_idx
key_len: 16
ref: const,const
rows: 50010
Extra: Using where
And here is the index on the fact table:
show indexes from fact_table
*************************** 10. row ***************************
Table: fact_table
Non_unique: 1
Key_name: my_idx
Seq_in_index: 1
Column_name: dim1_key
Collation: A
Cardinality: 24
Sub_part: NULL
Packed: NULL
Null:
Index_type: BTREE
Comment:
Index_comment:
*************************** 11. row ***************************
Table: fact_table
Non_unique: 1
Key_name: my_idx
Seq_in_index: 2
Column_name: dim2_key
Collation: A
Cardinality: 70
Sub_part: NULL
Packed: NULL
Null:
Index_type: BTREE
Comment:
Index_comment:
*************************** 12. row ***************************
Table: fact_table
Non_unique: 1
Key_name: my_idx
Seq_in_index: 3
Column_name: measure1
Collation: A
Cardinality: 5643
Sub_part: NULL
Packed: NULL
Null:
Index_type: BTREE
Comment:
Index_comment:
When profiling this query, I see the query spends the majority of its time performing a filesort operation "sorting result". My question is, even when using the correct index, why can't this query simply pull out the first value without doing a sort? The my_idx is already sorted on the right column and the two columns appearing first in the index resolve as constants, as shown in the plan.
If I rewrite the query, as follows, I am able to get the plan I want, with no file sorting.
SELECT fact_table.*
FROM
fact_table
WHERE
dim1_key = (select pkey from dim1 where constant_value = 123)
AND dim2_key = (select pkey from dim2 where constant_value = 456)
ORDER BY
measure1 ASC LIMIT 1
It would be expensive to change the tool generating these SQL commands so I would like to avoid this filesort even when the query is written in the original format.
My question is, why is MySQL opting to do a filesort even when the first keys on the index are constants (via an INNER JOIN) and the index is sorted in the right order? Is there a way around this?
My question is, why is MySQL opting to do a filesort even when the first keys on the index are constants (via an INNER JOIN) and the index is sorted in the right order? Is there a way around this?
Because the order of the resultset depends on the index used for reading the first table in the JOIN, but, as you see in EXPLAIN, the JOIN actually starts from dim1 table.
It might seem strange, but to implicitly force MySQL start from fact_table you will need to change the indexes in the dimension tables to (pkey, constantvalue) instead of (constantvalue), otherwise MySQL optimizer will start with a table for which the condition constantvalue=some_value returns minimum rows. The problem is that you might need those indexes for other queries.
Instead, you may try to add STRAIGHT_JOIN option to the SELECT and explicitly force the order.

Best way to index table for speeding up order by

I have following table structure.
town:
id (MEDINT,PRIMARY KEY,autoincrement),
town(VARCHAR(150),not null),
lat(FLOAT(10,6),notnull)
lng(FLOAT(10,6),notnull)
i frequently use "SELECT * FROM town ORDER BY town" query. I tried indexing town but it is not being used. So what is the best way to index so that i can speed up my queries.
USING EXPLAIN(UNIQUE INDEX Is PRESENT ON town):
mysql> EXPLAIN SELECT * FROM studpoint_town order by town \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: studpoint_town
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 3
Extra: Using filesort
1 row in set (0.00 sec)
ragards ,
ravi.
Your EXPLAIN output indicates that currently the studpoint_town table has only 3 rows. As explained in the manual:
The output from EXPLAIN shows ALL in the type column when MySQL uses a table scan to resolve a query. This usually happens under the following conditions:
[...]
The table is so small that it is faster to perform a table scan than to bother with a key lookup. This is common for tables with fewer than 10 rows and a short row length. Don't worry in this case.

MySQL Indexes on InnoDB table - not working right?

I am reading the book High Performance MySQL and messing around with a new database testing somethings.
I am not sure if I am doing something wrong though..
I have a table called table_users
Structure:
ID(Integer)
FullName(Char)
UserName(Char)
Password(Char)
SecurityID(TinyINT)
LocationID(TinyINT)
Active(TinyINT)
My indexes are as follows:
PRIMARY : ID
FullName : UNIQUE : FullName
FK_table_users_LocationID (foreign key reference) : INDEX : LocationID
FK_table_users_SecurityID (foreign key reference) : INDEX : SecurityID
Active : INDEX : Active
All are BTREE
While reading the book, I am trying to use the following mysql statement to view the extras involved with a SELECT statement
EXPLAIN
SELECT * FROM table_users WHERE
FullName = 'Jeff';
No matter what the WHERE statement points to with this call, the extra result is either nothing or Using where. If I SELECT ID ... WHERE FullName = 'Jeff' it returns Using where, Using Index. But not whenever I do SELECT FullName .... WHERE FullName = 'Jeff'..
I am not familiar at all with indexes and trying to wrap my head around them bit having a bit of confusion with this. Shouldn't they return Using Index if I am referencing an indexed column?
Thanks.
Using index doesn't mean what it seems to mean. Have a look at covering indexes. If it says "using index" it means that mysql could return the data for your query without reading the actual rows. SELECT * - is only going to be able to use a covering index if even column of the table is in the index. Usually this is not the case.
I seem to remember a Chapters in High Performance Mysql that talks about covering indexes and how to read EXPLAIN results.
What version of MySQL are you using? Here's a test I ran on Percona Server 5.5.16:
mysql> create table table_users (
id int auto_increment primary key,
fullname char(20),
username char(20),
unique key (fullname)
);
Query OK, 0 rows affected (0.03 sec)
mysql> insert into table_users values (default, 'billk', 'billk');
Query OK, 1 row affected (0.00 sec)
mysql> explain select * from table_users where fullname='billk'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: table_users
type: const
possible_keys: fullname
key: fullname
key_len: 21
ref: const
rows: 1
Extra:
1 row in set (0.00 sec)
This shows that it's using the fullname index, looking up by a constant value, but it's not an index-only query.
mysql> explain select fullname from table_users where fullname='billk'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: table_users
type: const
possible_keys: fullname
key: fullname
key_len: 21
ref: const
rows: 1
Extra: Using index
1 row in set (0.00 sec)
This is as expected, it's able to get the fullname column from the fullname index, so this is an index-only query.
mysql> explain select id from table_users where fullname='billk'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: table_users
type: const
possible_keys: fullname
key: fullname
key_len: 21
ref: const
rows: 1
Extra: Using index
1 row in set (0.00 sec)
Searching on fullname but fetching the primary key is also an index-only query, because the leaf nodes of InnoDB secondary indexes (e.g. the unique key) implicitly contain the primary key value. So this query is able to traverse the BTREE for fullname, and as a bonus it gets the id too.
mysql> explain select fullname, username from table_users where fullname='billk'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: table_users
type: const
possible_keys: fullname
key: fullname
key_len: 21
ref: const
rows: 1
Extra:
1 row in set (0.00 sec)
As soon as the select-list includes any column that's not part of the index, it can no longer be an index-only query. First it searches the BTREE for fullname, to find the primary key value. Then it uses that id value to traverse the BTREE for the clustered index, which is how InnoDB stores the whole table. There it finds the other columns for the given row, including username.

Why isn't MySQL using any of these possible keys?

I have the following query:
SELECT t.id
FROM account_transaction t
JOIN transaction_code tc ON t.transaction_code_id = tc.id
JOIN account a ON t.account_number = a.account_number
GROUP BY tc.id
When I do an EXPLAIN the first row shows, among other things, this:
table: t
type: ALL
possible_keys: account_id,transaction_code_id,account_transaction_transaction_code_id,account_transaction_account_number
key: NULL
rows: 465663
Why is key NULL?
Another issue you may be encountering is a data type mis-match. For example, if your column is a string data type (CHAR, for ex), and your query is not quoting a number, then MySQL won't use the index.
SELECT * FROM tbl WHERE col = 12345; # No index
SELECT * FROM tbl WHERE col = '12345'; # Index
Source: Just fought this same issue today, and learned the hard way on MySQL 5.1. :)
Edit: Additional information to verify this:
mysql> desc das_table \G
*************************** 1. row ***************************
Field: das_column
Type: varchar(32)
Null: NO
Key: PRI
Default:
Extra:
*************************** 2. row ***************************
[SNIP!]
mysql> explain select * from das_table where das_column = 189017 \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: das_column
type: ALL
possible_keys: PRIMARY
key: NULL
key_len: NULL
ref: NULL
rows: 874282
Extra: Using where
1 row in set (0.00 sec)
mysql> explain select * from das_table where das_column = '189017' \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: das_column
type: const
possible_keys: PRIMARY
key: PRIMARY
key_len: 34
ref: const
rows: 1
Extra:
1 row in set (0.00 sec)
It might be because the statistics is broken, or because it knows that you always have a 1:1 ratio between the two tables.
You can force an index to be used in the query, and see if that would speed up things. If it does, try to run ANALYZE TABLE to make sure statistics are up to date.
By specifying USE INDEX (index_list), you can tell MySQL to use only one of the named indexes to find rows in the table. The alternative syntax IGNORE INDEX (index_list) can be used to tell MySQL to not use some particular index or indexes. These hints are useful if EXPLAIN shows that MySQL is using the wrong index from the list of possible indexes.
You can also use FORCE INDEX, which acts like USE INDEX (index_list) but with the addition that a table scan is assumed to be very expensive. In other words, a table scan is used only if there is no way to use one of the given indexes to find rows in the table.
Each hint requires the names of indexes, not the names of columns. The name of a PRIMARY KEY is PRIMARY. To see the index names for a table, use SHOW INDEX.
From http://dev.mysql.com/doc/refman/5.1/en/index-hints.html
Index for the group by (=implicit order by)
...
GROUP BY tc.id
The group by does an implicit sort on tc.id.
tc.id is not listed a a possible key.
but t.transaction_id is.
Change the code to
SELECT t.id
FROM account_transaction t
JOIN transaction_code tc ON t.transaction_code_id = tc.id
JOIN account a ON t.account_number = a.account_number
GROUP BY t.transaction_code_id
This will put the potential index transaction_code_id into view.
Indexes for the joins
If the joins (nearly) fully join the three tables, there's no need to use the index, so MySQL doesn't.
Other reasons for not using an index
If a large % of the rows under consideration (40% IIRC) are filled with the same value. MySQL does not use an index. (because not using the index is faster)

The best way to delete 5K rows from Innodb table with 30M rows

table:
foreign_id_1
foreign_id_2
integer
date1
date2
primary(foreign_id_1, foreign_id_2)
Query: delete from table where (foreign_id_1 = ? or foreign_id_2 = ?) and date2 < ?
Without date query takes about 40 sec. That's too high :( With date much more longer..
The options are:
create another table and insert select, then rename
use limit and run query multiple times
split query to run for foreign_id_1 then foreign_id_2
use select then delete by single row
Is there any faster way?
mysql> explain select * from compatibility where user_id = 193 or person_id = 193 \G
id: 1
select_type: SIMPLE
table: compatibility
type: index_merge
possible_keys: PRIMARY,compatibility_person_id_user_id
key: PRIMARY,compatibility_person_id_user_id
key_len: 4,4
ref: NULL
rows: 2
Extra: Using union(PRIMARY,compatibility_person_id_user_id); Using where
1 row in set (0.00 sec)
mysql> explain select * from compatibility where (user_id = 193 or person_id = 193) and updated_at < '2010-12-02 22:55:33' \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: compatibility
type: index_merge
possible_keys: PRIMARY,compatibility_person_id_user_id
key: PRIMARY,compatibility_person_id_user_id
key_len: 4,4
ref: NULL
rows: 2
Extra: Using union(PRIMARY,compatibility_person_id_user_id); Using where
1 row in set (0.00 sec)
Having an OR in your WHERE makes MySQL reluctant (if not completely refuse) to use indexes on your user_id and/or person_id fields (if there is any -- showing the CREATE TABLE would indicate if there was).
If you can add indexes (or modify existing ones since I'm thinking of compound indexes), I'd likely add two:
ALTER TABLE compatibility
ADD INDEX user_id_updated_at (user_id, updated_at),
ADD INDEX persona_id_updated_at (person_id, updated_at);
Correspondingly, assuming the rows to DELETE didn't have to be be deleted atomically (i.e. occur at the same instant).
DELETE FROM compatibility WHERE user_id = 193 AND updated_at < '2010-12-02 22:55:33';
DELETE FROM compatibility WHERE person_id = 193 AND updated_at < '2010-12-02 22:55:33';
By now data amount is 40M (+33%) and rapidly growing. So I've started looking for other, some no-sql, solution.
Thanks.