I am performing a very simple select over a simple table, where the column that I am filtering over has an index.
Here is the schema:
CREATE TABLE IF NOT EXISTS `tmp_inventory_items` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`transmission_id` int(11) unsigned NOT NULL,
`inventory_item_id` int(11) unsigned DEFAULT NULL,
`material_id` int(11) unsigned NOT NULL,
PRIMARY KEY (`id`),
KEY `transmission_id` (`transmission_id`)
KEY `inventory_item_id` (`inventory_item_id`),
KEY `material_id` (`material_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=21 ;
Here is the SQL:
SELECT * FROM `tmp_inventory_items` WHERE `transmission_id` = 330
However, when explaining the query, I see that the index is NOT being used, why is that (the table has about 20 rows on my local machine)?
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE tmp_inventory_items... ALL transmission_id NULL NULL NULL 13 Using where
No key is being used even if I hint the mysql with USE INDEX(transmission_id)... this looks very strange to me (MySQL Version 5.5.28)
Because MySQL's algorithms tell it that preparing an index and using it would use more resources than simply performing the query without one.
When you feed query syntax to a DBMS, one of the things it does is attempts to determine the most efficient way to process the query (usually there are at least tens of ways).
If you want to, you can use FORCE INDEX(transmission_id) (documented here) which will inform MySQL that a table scan is assumed to be very expensive, but it's not recommended as to determine for 20 rows, it's just not valuable.
Related
So basically I created a table:
CREATE TABLE IF NOT EXISTS `student` (
`id` int(4) unsigned NOT NULL AUTO_INCREMENT,
`campus` enum('CAMPUS1', 'CAMPUS2') NOT NULL,
`fullname` char(32) NOT NULL,
`gender` enum('MALE', 'FEMALE') NOT NULL,
`birthday` char(16) NOT NULL,
`phone` char(32) NOT NULL,
`emergency` char(32) NOT NULL,
`address` char(128) NOT NULL,
PRIMARY KEY (`idx`),
KEY `key_student` (`campus`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
I have like 20 rows with only 12 in CAMPUS1
But when I use query it: SELECT * FROM student WHERE campus='CAMPUS1'; The EXPLAIN is this:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE student ALL key_student NULL NULL NULL 20 Using where
I am new to this thing, how does a KEY really works? I read documentation but I cant understand that much.
MySQL is trying to be smart (with varying success) when deciding which index to use for a query.
There are cases where it is faster to query the entire table instead of using the index. E.g: if your table has 500 records for CAMPUS1 and 100 records for CAMPUS2 it is faster to do a full (600 records) scan when looking for campus='CAMPUS1'.
When you have only 20 rows you run into the edge cases of the algorithm. Try adding some more rows, and see what happens.
Also, it seems this index will have a very low cardinality (an even split between only 2 values). It will probably not be very useful.
hope you will allow me to pick your brains so I can gain some knowledge in the process.
We have 3 tables - data_product, data_issuer, data_accountbalance
CREATE TABLE `data_issuer` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`issuer_name` varchar(128) NOT NULL
PRIMARY KEY (`id`)
) ENGINE=InnoDB
CREATE TABLE `data_product` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(100) NOT NULL,
`issuer_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `data_product_name_issuer_id_260fec65_uniq` (`name`,`issuer_id`),
KEY `data_product_issuer_id_d07fa696_fk_data_issuer_id` (`issuer_id`),
CONSTRAINT `data_product_issuer_id_d07fa696_fk_data_issuer_id` FOREIGN KEY
(`issuer_id`) REFERENCES `data_issuer` (`id`)
) ENGINE=InnoDB
CREATE TABLE `data_accountbalance` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`date` date NOT NULL,
`nominee_name` varchar(128) NOT NULL,
`beneficiary_name` varchar(128) NOT NULL,
`nominee_id` varchar(128) NOT NULL,
`account_id` varchar(16) NOT NULL,
`product_id` int(11) NOT NULL,
`register_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `data_accountbalance_date_product_id_nominee__7b8d2c6a_uniq` (`date`,`product_id`,`nominee_id`,`beneficiary_name`),
KEY `data_accountbalance_product_id_nominee_id_date_8ef8754f_idx` (`product_id`,`nominee_id`,`date`),
KEY `data_accountbalance_register_id_4e78ec16_fk_data_register_id` (`register_id`),
KEY `data_accountbalance_product_id_date_nominee_i_c3a41e39_idx` (`product_id`,`date`,`nominee_id`,`beneficiary_name`,`balance_amount`),
CONSTRAINT `data_accountbalance_product_id_acfb18f6_fk_data_product_id` FOREIGN KEY (`product_id`) REFERENCES `data_product` (`id`),
CONSTRAINT `data_accountbalance_register_id_4e78ec16_fk_data_register_id` FOREIGN KEY (`register_id`) REFERENCES `data_register` (`id`)
) ENGINE=InnoDB
When running the query below, the system takes about an hour to respond -
SELECT SQL_NO_CACHE *
from data_product
INNER JOIN `data_issuer` ON (`data_issuer`.`id` = `data_product`.`issuer_id`)
INNER JOIN `data_accountbalance` ON (`data_accountbalance`.`product_id` = `data_product`.`id`)
LIMIT 100000000;
Both data_issuer and data_product only have few 100 records in them, but the data_accountbalance is huge with about 15,384,358 records.
The explain plan produced is below -
# id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE data_product ALL PRIMARY,data_product_issuer_id_d07fa696_fk_data_issuer_id 459 100
1 SIMPLE data_issuer eq_ref PRIMARY PRIMARY 4 pnl.data_product.issuer_id 1 100
1 SIMPLE data_accountbalance ref data_accountbalance_product_id_nominee_id_date_8ef8754f_idx,data_accountbalance_product_id_date_nominee_i_c3a41e39_idx data_accountbalance_product_id_date_nominee_i_c3a41e39_idx 4 pnl.data_product.id 493 100
Can someone help tune the query so it does not take an hour to run please? Appreciate any pointers you might have for me.
If your query is literally what you are showing there... Then thats the problem. It has no WHERE clause.
That query would literally return 15,384,358 results. As the two smaller tables are typical domain tables with NOT NULL relations all the way across, it will return 1 to 1 results for every row in data_accountbalance.
The actual time cost will probably be in creating a Massive temp table (tho I'm not sure about that). Just to download the entire database, all 3 tables, you could look into optimize your temp table MySQL config to possibly speed this up, OR preferably make it so that when you start executing the query that you can read the results as MySQL gets them ready (avoids a temp table). Alternatively, maybe your script that runs this query is trying to read the whole data set into memory, which takes a long time?
Is there a particular reason to download All the data? Usually you just download the data you are meaning to operate on. Or have MySQL do the grouping, summing, etc then return the answer you wanted based on All the data.
How many rows did you expect the query to return? If you are thinking something less than 15 million, then the answer is to add some kind of WHERE statement, or an aggregate function. Depending on what table and column in you use to reduce the result set, those columns will have to be indexed.
I hope this helps. :)
I have a query.
SELECT id_id FROM videos_member ORDER BY date_id DESC LIMIT 0,30
Here is the table
CREATE TABLE IF NOT EXISTS `videos` (
`id_id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`date_id` int(11) NOT NULL,
PRIMARY KEY (`id_id`),
KEY `date_id` (`date_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=3 ;
I keep getting this
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE videos ALL NULL NULL NULL NULL 342 Using filesort
Why isn't is using the index?
The table contains (or at least MySQL thinks it contains) 342 rows. This is tiny and likely fits into a single block of physical storage, which means it can be read in a single read operation. Using the index would require at least two read operations. So MySQL might be smart here and realize that reading the whole table at once is just more efficient than reading the index and then using it to access the table.
In other words if you insert more rows into the table the plan might change to using index.
I have the following mysql query
select points_for_user from items where user_id = '38415';
explain on the query returns this
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE items index NULL points_for_user_index 2 NULL 1000511 Using index
The problem is, shouldn't the number of rows be FAR less then the number of rows in the table because of the index?
user_id is the primary index, so I tried creating an index on just points_for_user and that still look through every row. An index on user_id AND points_for_user still searches every row.
What am I missing?
Thanks!
CREATE TABLE IF NOT EXISTS `items` (
`capture_id` int(11) NOT NULL AUTO_INCREMENT,
`id` int(11) NOT NULL,
`creator_user_id` bigint(20) NOT NULL DEFAULT '0',
`user_id` int(11) NOT NULL,
`accuracy` int(11) NOT NULL,
`captured_at` timestamp NOT NULL DEFAULT '2011-01-01 06:00:00',
`ip` varchar(30) NOT NULL,
`capture_type_id` smallint(6) NOT NULL DEFAULT '0',
`points` smallint(6) NOT NULL DEFAULT '5',
`points_for_user` smallint(6) NOT NULL DEFAULT '3',
PRIMARY KEY (`capture_id`),
KEY `capture_user` (`capture_id`,`id`,`user_id`),
KEY `user_id` (`user_id`,`id`),
KEY `id` (`id`),
KEY `capture_creator_index` (`capture_id`,`creator_user_id`),
KEY `points_capture_index` (`points_for_user`,`creator_user_id`),
KEY `points_for_user_index` (`points_for_user`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=1008992 ;
select count(*) from items where user_id = '38415'
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE captures ref user_munzee_id user_munzee_id 4 const 81 Using index
the mysql optimizer try to use the best possible index during the query.
In your first query the optimizer is considering points_for_user_index the best choice, in fact the Extra column show the "Using index" status, this means a "Covering index".
The "Covering index" occurs when all fields required for a query (in your case select points_for_user from ... ) are contained in an index, this avoid the access to the full mysql data (.MYD) in favour of the direct index access (.MYI)
First of all you can try to rebuild the index tree analyzing table
ANALYZE TABLE itemes;
Note for very large tables:
ANALYZE TABLE analyzes and stores the key distribution for a table.
During the analysis, the table is locked with a read lock for InnoDB
and MyISAM. This statement works with InnoDB, NDB, and MyISAM tables.
For MyISAM tables, this statement is equivalent to using myisamchk
--analyze.
If "the problem" persist and you want to bypass the optimizer choice you can explicit try to force the usage of an index
EXPLAIN SELECT points_for_user FROM items USE INDEX ( user_id ) WHERE user_id = '38415'
More details: http://dev.mysql.com/doc/refman/5.5/en/index-hints.html
Cristian
This query:
explain
SELECT `Lineitem`.`id`, `Donation`.`id`, `Donation`.`order_line_id`
FROM `order_line` AS `Lineitem`
LEFT JOIN `donations` AS `Donation`
ON (`Donation`.`order_line_id` = `Lineitem`.`id`)
WHERE `Lineitem`.`session_id` = '1'
correctly uses the Donation.order_line_id and Lineitem.id indexes, shown in this EXPLAIN output:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE Lineitem ref session_id session_id 97 const 1 Using where; Using index
1 SIMPLE Donation ref order_line_id order_line_id 4 Lineitem.id 2 Using index
However, this query, which simply includes another field:
explain
SELECT `Lineitem`.`id`, `Donation`.`id`, `Donation`.`npo_id`,
`Donation`.`order_line_id`
FROM `order_line` AS `Lineitem`
LEFT JOIN `donations` AS `Donation`
ON (`Donation`.`order_line_id` = `Lineitem`.`id`)
WHERE `Lineitem`.`session_id` = '1'
Shows that the Donation table does not use an index:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE Lineitem ref session_id session_id 97 const 1 Using where; Using index
1 SIMPLE Donation ALL order_line_id NULL NULL NULL 3
All of the _id fields in the tables are indexed, but I can't figure out how adding this field into the list of selected fields causes the index to be dropped.
As requested by James C, here are the table definitions:
CREATE TABLE `donations` (
`id` int(10) unsigned NOT NULL auto_increment,
`npo_id` int(10) unsigned NOT NULL,
`order_line_detail_id` int(10) unsigned NOT NULL default '0',
`order_line_id` int(10) unsigned NOT NULL default '0',
`created` datetime default NULL,
`modified` datetime default NULL,
PRIMARY KEY (`id`),
KEY `npo_id` (`npo_id`),
KEY `order_line_id` (`order_line_id`),
KEY `order_line_detail_id` (`order_line_detail_id`)
) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=utf8
CREATE TABLE `order_line` (
`id` bigint(20) unsigned NOT NULL auto_increment,
`order_id` bigint(20) NOT NULL,
`npo_id` bigint(20) NOT NULL default '0',
`session_id` varchar(32) collate utf8_unicode_ci default NULL,
`created` datetime default NULL,
PRIMARY KEY (`id`),
KEY `order_id` (`order_id`),
KEY `npo_id` (`npo_id`),
KEY `session_id` (`session_id`)
) ENGINE=InnoDB AUTO_INCREMENT=23 DEFAULT CHARSET=utf8
I also did some reading about cardinality, and it looks like both the Donations.npo_id and Donations.order_line_id have a cardinality of 2. Hopefully this suggests something useful?
I'm thinking that a USE INDEX might solve the problem, but I'm using an ORM that makes this a bit tricky, and I don't understand why it wouldn't grab the correct index when the JOIN specifically names indexed fields?!?
Thanks for your brainpower!
The first explain has "uses index" at the end. This means that it was able to find the rows and return the result for the query by just looking at the index and not having to fetch/analyse any row data.
In the second query you add a row that's likely not indexed. This means that MySQL has to look at the data of the table. I'm not sure why the optimiser chose to do a table scan but I think it's likely that if the table is fairly small it's easier for it to just read everything than trying to pick out details for individual rows.
edit: I think adding the following indexes will improve things even more and let all of the join use indexes only:
ALTER TABLE order_line ADD INDEX(session_id, id);
ALTER TABLE donations ADD INDEX(order_line_id, npo_id, id)
This will allow order_line to to find the rows using session_id and then return id and also allow donations to join onto order_line_id and then return the other two columns.
Looking at the auto_increment values can I assume that there's not much data in there. It's worth noting that the amount of data in the tables will have an effect on the query plan and it's good practice to put some sample data in there to test things out. For more detail have a look in this blog post I made some time back: http://webmonkeyuk.wordpress.com/2010/09/27/what-makes-a-good-mysql-index-part-2-cardinality/