MySQL gurus: Why 2 queries give different 'explain' index use results? - mysql

This query:
explain
SELECT `Lineitem`.`id`, `Donation`.`id`, `Donation`.`order_line_id`
FROM `order_line` AS `Lineitem`
LEFT JOIN `donations` AS `Donation`
ON (`Donation`.`order_line_id` = `Lineitem`.`id`)
WHERE `Lineitem`.`session_id` = '1'
correctly uses the Donation.order_line_id and Lineitem.id indexes, shown in this EXPLAIN output:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE Lineitem ref session_id session_id 97 const 1 Using where; Using index
1 SIMPLE Donation ref order_line_id order_line_id 4 Lineitem.id 2 Using index
However, this query, which simply includes another field:
explain
SELECT `Lineitem`.`id`, `Donation`.`id`, `Donation`.`npo_id`,
`Donation`.`order_line_id`
FROM `order_line` AS `Lineitem`
LEFT JOIN `donations` AS `Donation`
ON (`Donation`.`order_line_id` = `Lineitem`.`id`)
WHERE `Lineitem`.`session_id` = '1'
Shows that the Donation table does not use an index:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE Lineitem ref session_id session_id 97 const 1 Using where; Using index
1 SIMPLE Donation ALL order_line_id NULL NULL NULL 3
All of the _id fields in the tables are indexed, but I can't figure out how adding this field into the list of selected fields causes the index to be dropped.
As requested by James C, here are the table definitions:
CREATE TABLE `donations` (
`id` int(10) unsigned NOT NULL auto_increment,
`npo_id` int(10) unsigned NOT NULL,
`order_line_detail_id` int(10) unsigned NOT NULL default '0',
`order_line_id` int(10) unsigned NOT NULL default '0',
`created` datetime default NULL,
`modified` datetime default NULL,
PRIMARY KEY (`id`),
KEY `npo_id` (`npo_id`),
KEY `order_line_id` (`order_line_id`),
KEY `order_line_detail_id` (`order_line_detail_id`)
) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=utf8
CREATE TABLE `order_line` (
`id` bigint(20) unsigned NOT NULL auto_increment,
`order_id` bigint(20) NOT NULL,
`npo_id` bigint(20) NOT NULL default '0',
`session_id` varchar(32) collate utf8_unicode_ci default NULL,
`created` datetime default NULL,
PRIMARY KEY (`id`),
KEY `order_id` (`order_id`),
KEY `npo_id` (`npo_id`),
KEY `session_id` (`session_id`)
) ENGINE=InnoDB AUTO_INCREMENT=23 DEFAULT CHARSET=utf8
I also did some reading about cardinality, and it looks like both the Donations.npo_id and Donations.order_line_id have a cardinality of 2. Hopefully this suggests something useful?
I'm thinking that a USE INDEX might solve the problem, but I'm using an ORM that makes this a bit tricky, and I don't understand why it wouldn't grab the correct index when the JOIN specifically names indexed fields?!?
Thanks for your brainpower!

The first explain has "uses index" at the end. This means that it was able to find the rows and return the result for the query by just looking at the index and not having to fetch/analyse any row data.
In the second query you add a row that's likely not indexed. This means that MySQL has to look at the data of the table. I'm not sure why the optimiser chose to do a table scan but I think it's likely that if the table is fairly small it's easier for it to just read everything than trying to pick out details for individual rows.
edit: I think adding the following indexes will improve things even more and let all of the join use indexes only:
ALTER TABLE order_line ADD INDEX(session_id, id);
ALTER TABLE donations ADD INDEX(order_line_id, npo_id, id)
This will allow order_line to to find the rows using session_id and then return id and also allow donations to join onto order_line_id and then return the other two columns.
Looking at the auto_increment values can I assume that there's not much data in there. It's worth noting that the amount of data in the tables will have an effect on the query plan and it's good practice to put some sample data in there to test things out. For more detail have a look in this blog post I made some time back: http://webmonkeyuk.wordpress.com/2010/09/27/what-makes-a-good-mysql-index-part-2-cardinality/

Related

Mysql GROUP BY really slow on a view

I have these tables.
CREATE TABLE `movements` (
`movementId` mediumint(8) UNSIGNED NOT NULL,
`movementType` tinyint(3) UNSIGNED NOT NULL,
`deleted` tinyint(1) UNSIGNED NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
ALTER TABLE `movements`
ADD PRIMARY KEY (`movementId`),
ADD KEY `movementType` (`movementType`) USING BTREE,
ADD KEY `deleted` (`deleted`),
ADD KEY `movementId` (`movementId`,`deleted`);
CREATE TABLE `movements_items` (
`movementId` mediumint(8) UNSIGNED NOT NULL,
`itemId` mediumint(8) UNSIGNED NOT NULL,
`qty` decimal(10,3) UNSIGNED NOT NULL,
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
ALTER TABLE `movements_items`
ADD KEY `movementId` (`movementId`),
ADD KEY `itemId` (`itemId`),
ADD KEY `movementId_2` (`movementId`,`itemId`);
and this view called "movements_items_view".
SELECT
movements_items.itemId, movements_items.qty,
movements.movementId, movements.movementType
FROM movements_items
JOIN movements ON (movements.movementId=movements_items.movementId
AND movements.deleted=0)
The first table has 5913 rows, the second one has 144992.
The view is very fast, it loads 20 result in PhpMyAdmin in 0.0011s but as soon as I ask for a GROUP BY on it (I need it to do statistics with SUM()) es:
SELECT * FROM movements_items_view GROUP BY itemId LIMIT 0,20
time jumps to 0.2s or more and it causes "Using where; Using temporary; Using filesort" on movements join.
Any help appreciated, thanks.
EDIT:
I also run via phpMyAdmin this query to try to not use the view:
SELECT movements.movementId, movements.movementType, movements_items.qty
FROM movements_items
JOIN movements ON movements.movementId=movements_items.movementId
GROUP BY itemId LIMIT 0,20
And the performance is the same.
Edit. Here is the EXPLAIN
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE movements index PRIMARY,movementid movement_type 1 NULL 5913 Using index; Using temporary; Using filesort
1 SIMPLE movements_items ref movementId,itemId,movementId_2 movementId_2 3 movements.movementId 12 Using index
Turn it inside out. See if this works:
SELECT movementId, m.movementType, mi.qty
FROM
( SELECT movementId, qty
FROM movements_items
GROUP BY itemId
ORDER BY itemId
LIMIT 20
) AS mi
JOIN movements AS m USING(movementId)
The trick is to do the LIMIT sooner. The original way had all the data being hauled around, not just 20 rows.
In movements_items, is no column or combination of columns 'unique'? If so, make it the PRIMARY KEY.
In movement, KEY movementId (movementId,deleted) is redundant and should be dropped.
In movement_items, KEY movementId (movementId) is redundant and should be dropped.

MySQL ORDER BY takes a very long time even if I have indexes

I have the following MySQL query which takes about 40 seconds on a linux VM:
SELECT
* FROM `clients_event_log`
WHERE
`ex_long` = 1475461 AND
`type` in (2, 1) AND NOT
(
(category=1 AND error=-2147212542) OR
(category=7 AND error=67)
)
ORDER BY `ev_time` DESC LIMIT 100
The table has around 7 million rows, aprox. 800 MB in size and it has indexes on all the fields used in the WHERE and ORDER BY clauses.
Now if I change the query in such a way that the ordering is done in an outer SELECT, everything works much faster (around 100ms):
SELECT res.* FROM
(
SELECT * FROM `clients_event_log`
WHERE
`ex_long` = 1475461 AND
`type` in (2, 1) AND NOT
(
(category=1 AND error=-2147212542) OR
(category=7 AND error=67)
)
) AS res
ORDER BY res.ev_time DESC LIMIT 0, 100
Do you have any idea why the first query takes such a long time? Thank you.
Later Update:
1st Query EXPLAIN:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE clients_event_log index category,ex_long,type,error,categ_error ev_time 4 NULL 5636 Using where
2nd Query EXPLAIN:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived2> system NULL NULL NULL NULL 1
2 DERIVED clients_event_log ref category,ex_long,type,error,categ_error ex_long 5 131264 Using where
Table definition:
CREATE TABLE `clients_event_log` (
`ev_id` int(11) NOT NULL,
`type` int(6) NOT NULL,
`ev_time` int(11) NOT NULL,
`category` smallint(6) NOT NULL,
`error` int(11) NOT NULL,
`ev_text` varchar(1024) DEFAULT NULL,
`userid` varchar(20) DEFAULT NULL,
`ex_long` int(11) DEFAULT NULL,
`client_ex_long` int(11) DEFAULT NULL,
`ex_text` varchar(1024) DEFAULT NULL,
PRIMARY KEY (`ev_id`),
KEY `category` (`category`),
KEY `ex_long` (`ex_long`),
KEY `type` (`type`),
KEY `ev_time` (`ev_time`),
KEY `error` (`error`),
KEY `categ_error` (`category`,`error`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
I ended up using the second query (inner SELECT) because the MySQL optimiser decided to always use the ev_time index even if I tried multiple versions of a composite index containing the columns in the WHERE and ORDER BY clauses.
Using force index (ex_long) also worked.
The MySQL version was 5.5.38
Thank you.
Add these
INDEX(ev_long, ev_time),
INDEX(ev_long, type)
and use the first format of the query and let the optimizer decide which is better based on the statistics.

MySQL optimize select count distinct group by

I have this query:
SELECT `variationID`, count(DISTINCT(`userID`))
FROM data WHERE `testID` = XXXX AND `visit` = 1 GROUP BY `variationID`
;
that takes a lot of time to query.How I can speed up the query.
select_type table type possible_keys key key_len ref rows
filtered Extra SIMPLE data
ref dc3_testIDPage,dc3_testIDvarIDPage,user_test_varID_url
dc3_testIDvarIDPage 8 const 33106102 100.00 Using where
This is the output of the create table:
CREATE TABLE `data` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`userID` bigint(17) NOT NULL,
`testID` bigint(20) NOT NULL,
`variationID` bigint(20) NOT NULL,
`url` bigint(20) NOT NULL,
`time` bigint(20) NOT NULL,
`visit` bigint(20) NOT NULL DEFAULT '1',
`isTestPage` tinyint(1) NOT NULL,
PRIMARY KEY (`id`,`testID`),
KEY `url` (`url`),
KEY `dc3_testIDPage` (`testID`,`url`),
KEY `dc3_testIDvarIDPage` (`testID`,`variationID`,`url`),
KEY `user_test_url` (`userID`,`testID`,`url`),
KEY `user_test_varID_url` (`userID`,`testID`,`variationID`,`url`)
) ENGINE=InnoDB AUTO_INCREMENT=67153224 DEFAULT CHARSET=latin1
The easiest thing you can do to speed up your query is to make sure you are not doing full table scans. All the columns in your where clause should appear in indexes. So in your case testID and visit should have indexes and even better you can create a single index with both testID and visit. If visit is a true/false boolean that won't help narrow the index search much but testID certainly will.
Create index documentation is here: http://dev.mysql.com/doc/refman/5.7/en/create-index.html
Based on the create table you have id and testID are in a single primary key. Add a new key or index that only has testID in it. That should help quite a bit. Since it looks like visit is not a boolean adding an index with both visit and testID will give you the best performance boost.

indexed query, but still searching every row

I have the following mysql query
select points_for_user from items where user_id = '38415';
explain on the query returns this
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE items index NULL points_for_user_index 2 NULL 1000511 Using index
The problem is, shouldn't the number of rows be FAR less then the number of rows in the table because of the index?
user_id is the primary index, so I tried creating an index on just points_for_user and that still look through every row. An index on user_id AND points_for_user still searches every row.
What am I missing?
Thanks!
CREATE TABLE IF NOT EXISTS `items` (
`capture_id` int(11) NOT NULL AUTO_INCREMENT,
`id` int(11) NOT NULL,
`creator_user_id` bigint(20) NOT NULL DEFAULT '0',
`user_id` int(11) NOT NULL,
`accuracy` int(11) NOT NULL,
`captured_at` timestamp NOT NULL DEFAULT '2011-01-01 06:00:00',
`ip` varchar(30) NOT NULL,
`capture_type_id` smallint(6) NOT NULL DEFAULT '0',
`points` smallint(6) NOT NULL DEFAULT '5',
`points_for_user` smallint(6) NOT NULL DEFAULT '3',
PRIMARY KEY (`capture_id`),
KEY `capture_user` (`capture_id`,`id`,`user_id`),
KEY `user_id` (`user_id`,`id`),
KEY `id` (`id`),
KEY `capture_creator_index` (`capture_id`,`creator_user_id`),
KEY `points_capture_index` (`points_for_user`,`creator_user_id`),
KEY `points_for_user_index` (`points_for_user`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=1008992 ;
select count(*) from items where user_id = '38415'
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE captures ref user_munzee_id user_munzee_id 4 const 81 Using index
the mysql optimizer try to use the best possible index during the query.
In your first query the optimizer is considering points_for_user_index the best choice, in fact the Extra column show the "Using index" status, this means a "Covering index".
The "Covering index" occurs when all fields required for a query (in your case select points_for_user from ... ) are contained in an index, this avoid the access to the full mysql data (.MYD) in favour of the direct index access (.MYI)
First of all you can try to rebuild the index tree analyzing table
ANALYZE TABLE itemes;
Note for very large tables:
ANALYZE TABLE analyzes and stores the key distribution for a table.
During the analysis, the table is locked with a read lock for InnoDB
and MyISAM. This statement works with InnoDB, NDB, and MyISAM tables.
For MyISAM tables, this statement is equivalent to using myisamchk
--analyze.
If "the problem" persist and you want to bypass the optimizer choice you can explicit try to force the usage of an index
EXPLAIN SELECT points_for_user FROM items USE INDEX ( user_id ) WHERE user_id = '38415'
More details: http://dev.mysql.com/doc/refman/5.5/en/index-hints.html
Cristian

Mysql query optimisation, EXPLAIN and slow execution

Having some real issues with a few queries, this one inparticular. Info below.
tgmp_games, about 20k rows
CREATE TABLE IF NOT EXISTS `tgmp_games` (
`g_id` int(8) NOT NULL AUTO_INCREMENT,
`site_id` int(6) NOT NULL,
`g_name` varchar(255) NOT NULL,
`g_link` varchar(255) NOT NULL,
`g_url` varchar(255) NOT NULL,
`g_platforms` varchar(128) NOT NULL,
`g_added` datetime NOT NULL,
`g_cover` varchar(255) NOT NULL,
`g_impressions` int(8) NOT NULL,
PRIMARY KEY (`g_id`),
KEY `g_platforms` (`g_platforms`),
KEY `site_id` (`site_id`),
KEY `g_link` (`g_link`),
KEY `g_release` (`g_release`),
KEY `g_genre` (`g_genre`),
KEY `g_name` (`g_name`),
KEY `g_impressions` (`g_impressions`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
tgmp_reviews - about 200k rows
CREATE TABLE IF NOT EXISTS `tgmp_reviews` (
`r_id` int(8) NOT NULL AUTO_INCREMENT,
`site_id` int(6) NOT NULL,
`r_source` varchar(128) NOT NULL,
`r_date` date NOT NULL,
`r_score` int(3) NOT NULL,
`r_copy` text NOT NULL,
`r_link` text NOT NULL,
`r_int_link` text NOT NULL,
`r_parent` int(8) NOT NULL,
`r_platform` varchar(12) NOT NULL,
`r_impressions` int(8) NOT NULL,
PRIMARY KEY (`r_id`),
KEY `site_id` (`site_id`),
KEY `r_parent` (`r_parent`),
KEY `r_platform` (`r_platform`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ;
Here is the query, takes 3 seconds ish
SELECT * FROM tgmp_games g
RIGHT JOIN tgmp_reviews r ON g_id = r.r_parent
WHERE g.site_id = '34'
GROUP BY g_name
ORDER BY g_impressions DESC LIMIT 15
EXPLAIN
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE r ALL r_parent NULL NULL NULL 201133 Using temporary; Using filesort
1 SIMPLE g eq_ref PRIMARY,site_id PRIMARY 4 engine_comp.r.r_parent 1 Using where
I am just trying to grab the 15 most viewed games, then grab a single review (doesnt really matter which, I guess highest rated would be ideal, r_score) for each game.
Can someone help me figure out why this is so horribly inefficient?
I don't understand what is the purpose of having a GROUP BY g_name in your query, but this makes MySQL performing aggregates on the columns selected, or all columns from both table. So please try to exclude it and check if it helps.
Also, RIGHT JOIN makes database to query tgmp_reviews first, which is not what you want. I suppose LEFT JOIN is a better choice here. Please, try to change the join type.
If none of the first options helps, you need to redesign your query. As you need to obtain 15 most viewed games for the site, the query will be:
SELECT g_id
FROM tgmp_games g
WHERE site_id = 34
ORDER BY g_impressions DESC
LIMIT 15;
This is the very first part that should be executed by the database, as it provides the best selectivity. Then you can get the desired reviews for the games:
SELECT r_parent, max(r_score)
FROM tgmp_reviews r
WHERE r_parent IN (/*1st query*/)
GROUP BY r_parent;
Such construct will force database to execute the first query first (sorry for the tautology) and will give you the maximal score for each of the wanted games. I hope you will be able to use the obtained results for your purpose.
Your MyISAM table is small, you can try converting it to see if that resolves the issue. Do you have a reason for using MyISAM instead of InnoDB for that table?
You can also try running an analyze on each table to update the statistics to see if the optimizer chooses something different.