MySQL selecting wrong index - mysql

The table in question has millions of records. The following query is really slow (takes up to four minutes to perform), because MySQL picks the primary key as index.
SELECT
*
FROM
`activities`
WHERE
`integration_id` = 11
ORDER BY `id` DESC
LIMIT 10 OFFSET 2
The table has multiple indexes, including an index on integration. When forced, the use of this index reduces the query time to about a second:
SELECT
*
FROM
`activities`
USE INDEX (integration)
WHERE
`integration_id` = 11
ORDER BY `id` DESC
LIMIT 10 OFFSET 2
Using EXPLAIN, MySQL shows the following:
first query: (the slow one)
key: PRIMARY
key_len: 4
rows: 5472
second query: (the fast one)
key: integration
key_len: 5
rows: 24028
Seeing MySQL's explanation, I understand why it picks the primary key. However, in reality the amount of rows in the first query should be millions; and indeed is excruciatingly slow to execute.
Unfortunately I cannot just force the index in the query, because the query is dynamic (some other clauses can be included), and an ORM is used. So i'm looking for a solution in MySQL's configuration, if possible.
Extra info
I found altering the query in some cases resulted in MySQL picking the correct index. E.g. removing the ORDER BY clause, and replacing SELECT * with SELECT <insert every table column here>. I've no clue why. Removing the order is no option for me, and manually selecting every column is tricky with the ORM (apart from that, I would like to know why this fixes the index picking).
The EXPLAIN
id: '1'
select_type: 'SIMPLE'
table: 'activities'
type: 'index'
possible_keys: 'integration_category,integration_level,integration_category_level,integration'
key: PRIMARY
key_len: 4
ref: null
rows: '5473'
Extra Using where
SHOW CREATE TABLE
'CREATE TABLE `activities` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
`content` text COLLATE utf8_unicode_ci NOT NULL,
`parameters` text COLLATE utf8_unicode_ci,
`input` text COLLATE utf8_unicode_ci,
`output` text COLLATE utf8_unicode_ci,
`response` text COLLATE utf8_unicode_ci,
`integration_id` int(10) unsigned DEFAULT NULL,
`level` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`category` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `integration_category` (`integration_id`,`category`),
KEY `integration_level` (`integration_id`,`level`),
KEY `integration_category_level` (`integration_id`,`category`,`level`),
KEY `integration` (`integration_id`),
CONSTRAINT `activities_integration_id_foreign` FOREIGN KEY (`integration_id`) REFERENCES `integrations` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=11262471 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci'

Related

mysql query is slow, adding indexes doesnt work

I have a table with 100k+ rows but my queries are slow (they take about 3 seconds).
I tried making an index like this but this doesn't seem to do anything.
ALTER TABLE pm ADD INDEX (sender,reciever)
This is my query:
SELECT id,message FROM pm WHERE reciever = '28075' OR sender = '28075'
That takes 3 seconds more or less.
Explain of table PM
''
EXPLAIN of the query:
SHOW CREATE TABLE PM:
`CREATE TABLE `pm` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`datetime` int(11) NOT NULL,
`sender` int(11) NOT NULL,
`reciever` int(11) NOT NULL,
`users` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`readm` int(11) NOT NULL DEFAULT '0',
`forOp` int(11) NOT NULL DEFAULT '0',
`bussy` int(11) NOT NULL DEFAULT '0',
`bericht` longtext CHARACTER SET utf8mb4,
`aantal` int(11) NOT NULL DEFAULT '1',
PRIMARY KEY (`id`),
KEY `users` (`users`(191)),
KEY `sender` (`sender`,`reciever`)
) ENGINE=InnoDB AUTO_INCREMENT=1637118 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci`
The reason the query could not use an index is that it uses OR, and your index can't be used to match the receiver (as a compound index requires that you match the leftmost column before matching the second one)
MySQL 5 added an index_merge which allows using multiple indexes for the same query, so if you have separate indexes on sender and receiver it could pick those.
An alternative would be to rewrite the query to use UNION and again use separate indexes instead of compound one:
SELECT id,message FROM pm WHERE reciever = '28075'
UNION
SELECT id,message FROM pm WHERE sender = '28075'
You can read more at this article

How to break find duplicate SQL query on a large table in multiple parts

I have a large table with ~3 million records in a MySQL database. I am trying to find duplicate rows in this table using the following query -
SELECT package_id
FROM version
WHERE metadata IS NOT NULL AND metadata <> '{}'
GROUP BY package_id, metadata HAVING COUNT(package_id) > 1
This query takes ~23 seconds to run on the database. Our database host however kill any query taking larger than 3 seconds using pt-kill. So I need to find a way to break this query down, such as each of the subpart would be a separate query and each one takes less than 3 seconds. Adding just a LIMIT constraint doesn't do it for the query, so how do I break a query to work on different parts of the table.
Result of SHOW CREATE TABLE version
CREATE TABLE `version` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`package_id` bigint(20) unsigned NOT NULL,
`version_number` int(11) unsigned NOT NULL,
`current_state_id` tinyint(2) unsigned NOT NULL,
`md5sum` varchar(32) CHARACTER SET utf8 COLLATE utf8_general_cs NOT NULL DEFAULT '',
`uri` varchar(1024) CHARACTER SET utf8 COLLATE utf8_general_cs NOT NULL DEFAULT '',
`filename` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_cs NOT NULL DEFAULT '',
`size` bigint(11) unsigned NOT NULL DEFAULT '0',
`metadata` varchar(1024) CHARACTER SET utf8 COLLATE utf8_general_cs DEFAULT NULL,
`storage_type_id` tinyint(2) unsigned NOT NULL DEFAULT '1',
PRIMARY KEY (`id`),
UNIQUE KEY `idx_version_package_id_version_number` (`package_id`,`version_number`),
KEY `idx_version_md5sum` (`md5sum`),
KEY `idx_version_metadata` (`metadata`(255)),
KEY `idx_version_current_state_id` (`current_state_id`),
KEY `storage_type_id` (`storage_type_id`),
CONSTRAINT `_fk_version_current_state_id` FOREIGN KEY (`current_state_id`) REFERENCES `state` (`id`),
CONSTRAINT `_fk_version_package_id` FOREIGN KEY (`package_id`) REFERENCES `package` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=3248761 DEFAULT CHARSET=utf8
As can be seen there are many indexes on the table including index on Package_id + Version_number combination of field. The problem is that this table is only going to get bigger and I don't think Optimization even if it pulls me back in 3 second range would scale. So I need a way where I can partition this table and run on queries on separate parts.
Steps to improve speed.
Create Table version_small with just columns id and package_id with index on package_id.
insert into version_small select id and package_id from version;
Run your original query on optimised table above - should be much faster on smaller table.
OR
Create Table version_small with just columns id and package_id, and a int counter with unique index on package_id.
insert into version_small select id and package_id from version, on duplicate key increment counter;
The rows with counter>1 are package_id that have more than one entry.

Any chance of speeding up this simple MySQL Query?

Stumped on this query, it searches an important table which contains about 213k rows. The purpose of the query is to report traffic data for a month. The amount of traffic for each day of that month. And sum of a decimal value for each day. This query is ran frequently so I need to optimize it to the best possible. Currently takes avg. 2 seconds..
SQL Fiddle: http://sqlfiddle.com/#!2/171f5/3/0
All suggestions will be greatly appreciated! Thank you.
Query:
SELECT `date_day`, COUNT(*) AS num, SUM(decval) AS sum_decval FROM (`tbl_traffic`)
WHERE `uuid` = '1' AND `date_year` = '2012' AND `date_month` = '11'
GROUP BY `date_day`;
Explain Result:
id: 1
select_type: SIMPLE
table: adb1_analytics
type: ref
possible_keys: keys1,keys2,keys3
key: keys1
key_len: 7
ref: const,const,const
rows: 106693
Extra: Using where
1 row in set (0.13 sec)
Table structure:
CREATE TABLE IF NOT EXISTS `tbl_traffic` (
`id` int(100) unsigned NOT NULL AUTO_INCREMENT,
`uuid` int(100) unsigned NOT NULL,
`country` char(2) CHARACTER SET latin1 DEFAULT NULL,
`browser` varchar(50) CHARACTER SET latin1 DEFAULT NULL,
`platform` varchar(50) CHARACTER SET latin1 DEFAULT NULL,
`referrer` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`decval` decimal(15,5) NOT NULL,
`date_year` smallint(4) unsigned NOT NULL,
`date_month` tinyint(2) unsigned NOT NULL,
`date_day` tinyint(2) unsigned NOT NULL,
PRIMARY KEY (`id`),
KEY `keys1` (`uuid`,`date_year`,`date_month`,`date_day`),
KEY `keys2` (`date_year`,`date_month`,`referrer`),
KEY `keys3` (`date_year`,`date_month`,`country`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
You have no effective indexes on date_day.
I would recommend creating a key specifically for what you are fetching and calculating: (date_day, decval)
Use count(id) and add keys on date_day and decval.
A covering key over both fields might be even better
Anytime you do a 'group by' you're asking for a full table scan.
http://dev.mysql.com/doc/refman/5.0/en/group-by-optimization.html
If you can't index it better (see #njk answer) then you might want to select for your specific values then do the group, sum, etc on the sub set. If nothing else, at least it will be a smaller set to sort.

optimize query (2 simple left joins)

SELECT fcat.id,fcat.title,fcat.description,
count(DISTINCT ftopic.id) as number_topics,
count(DISTINCT fpost.id) as number_posts FROM fcat
LEFT JOIN ftopic ON fcat.id=ftopic.cat_id
LEFT JOIN fpost ON ftopic.id=fpost.topic_id
GROUP BY fcat.id
ORDER BY fcat.ord
LIMIT 100;
index on ftopic_cat_id, fpost.topic_id, fcat.ord
EXPLAIN:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE fcat ALL PRIMARY NULL NULL NULL 11 Using temporary; Using filesort
1 SIMPLE ftopic ref PRIMARY,cat_id_2 cat_id_2 4 bloki.fcat.id 72
1 SIMPLE fpost ref topic_id_2 topic_id_2 4 bloki.ftopic.id 245
fcat - 11 rows,
ftopic - 1106 rows,
fpost - 363000 rows
Query takes 4,2 sec
TABLES:
CREATE TABLE IF NOT EXISTS `fcat` (
`id` int(11) NOT NULL auto_increment,
`title` varchar(250) collate utf8_unicode_ci NOT NULL,
`description` varchar(250) collate utf8_unicode_ci NOT NULL,
`created` datetime NOT NULL,
`visible` tinyint(4) NOT NULL default '1',
`ord` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `ord` (`ord`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=12 ;
CREATE TABLE IF NOT EXISTS `ftopic` (
`id` int(11) NOT NULL auto_increment,
`cat_id` int(11) NOT NULL,
`title` varchar(100) collate utf8_unicode_ci NOT NULL,
`created` datetime NOT NULL,
`updated` timestamp NOT NULL default CURRENT_TIMESTAMP,
`lastname` varchar(200) collate utf8_unicode_ci NOT NULL,
`visible` tinyint(4) NOT NULL default '1',
`closed` tinyint(4) NOT NULL default '0',
`views` int(11) NOT NULL default '1',
PRIMARY KEY (`id`),
KEY `cat_id_2` (`cat_id`,`updated`,`visible`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=1116 ;
CREATE TABLE IF NOT EXISTS `fpost` (
`id` int(11) NOT NULL auto_increment,
`topic_id` int(11) NOT NULL,
`pet_id` int(11) NOT NULL,
`content` text collate utf8_unicode_ci NOT NULL,
`imageName` varchar(300) collate utf8_unicode_ci NOT NULL,
`created` datetime NOT NULL,
`reply_id` int(11) NOT NULL,
`visible` tinyint(4) NOT NULL default '1',
`md5` varchar(100) collate utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `md5` (`md5`),
KEY `topic_id_2` (`topic_id`,`created`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=390971 ;
Thanks,
hamlet
you need to create a key with both fcat.id, fcat.ord
Bold rewrite
This code is not functionally identical, but...
Because you want to know about distinct ftopic.id and fpost.id I'm going to be bold and suggest two INNER JOIN's instead of LEFT JOIN's.
Then because the two id's are autoincrementing they will no longer repeat, so you can drop the distinct.
SELECT
fcat.id
, fcat.title
, fcat.description
, count(ftopic.id) as number_topics
, count(fpost.id) as number_posts
FROM fcat
INNER JOIN ftopic ON fcat.id = ftopic.cat_id
INNER JOIN fpost ON ftopic.id = fpost.topic_id
GROUP BY fcat.id
ORDER BY fcat.ord
LIMIT 100;
It depends on your data if this is what you are looking for, but I'm guessing it will be faster.
All your indexes seem to be in order though.
MySQL does not use indexes for small sample sizes!
Note that the explain list that MySQL only has 11 rows to consider for fcat. This is not enough for MySQL to really start worrying about indexes, so it doesn't.
Because going to the index for small row-counts slows things down.
MySQL is trying to speed things up so it chooses not to use the index, this confuses a lot of people because we are trained so hard on the index. Small sample sizes don't give good explains!
Increase the size of the test data so MySQL has more rows to consider and you should start seeing the index being used.
Common misconceptions about force index
Force index does not force MySQL to use an index as such.
It hints at MySQL to use a different index from the one it might naturally use and it pushes MySQL into using an index by setting a very high cost on a table scan.
(In your case MySQL is not using a table scan, so force index has no effect)
MySQL (same most other DBMS's on the planet) has a very strong urge to use indexes, so if it doesn't (use any) that's because using no index at all is faster.
How does MySQL know which index to use
One of the parameters the query optimizer uses is the stored cardinality of the indexes.
Over time these values change... But studying the table takes time, so MySQL doesn't do that unless you tell it to.
Another parameter that affects index selection is the predicted disk-seek-times that MySQL expects to encounter when performing the query.
Tips to improve index usage
ANALYZE TABLE will instruct MySQL to re-evaluate the indexes and update its key distribution (cardinality). (consider running it daily/weekly in a cron job)
SHOW INDEX FROM table will display the key distribution.
MyISAM tables and indexes fragment over time. Use OPTIMIZE TABLE to unfragment the tables and recreate the indexes.
FORCE/USE/IGNORE INDEX limits the options MySQL's query optimizer has to perform your query. Only consider it on complex queries.
Time the effect of your meddling with indexes on a regular basis. A forced index that speeds up your query today might slow it down tomorrow because the underlying data has changed.

Getting MySQL to use an index/key, 1 column in where and 2 in order by

How do I get MySQL to use a key/index with the following table structure and query?
-- the table
CREATE TABLE `country` (
`id` int(11) NOT NULL auto_increment,
`expiry_date` datetime NOT NULL,
`name` varchar(50) collate utf8_unicode_ci NOT NULL,
`symbol` varchar(5) collate utf8_unicode_ci NOT NULL,
`exchange_rate` decimal(11,5) NOT NULL default '1.00000',
`code` char(3) collate utf8_unicode_ci NOT NULL,
`currency_code` varchar(3) collate utf8_unicode_ci NOT NULL,
`display_order` smallint(6) unsigned NOT NULL default '0',
PRIMARY KEY (`id`),
KEY `code` (`code`),
KEY `currency_code` (`currency_code`),
KEY `display_order` (`expiry_date`,`name`,`display_order`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
-- the query
SELECT `country`.*
FROM `country`
WHERE `country`.`expiry_date` = 0
ORDER BY `country`.`display_order` ASC, `country`.`name` ASC;
I'm trying to get it to use a key because the query with 180 in the result takes 0.0013s and is by far the slowest query on the page (3x longer than the next slowest). From my understanding, the query should use the display_order index/key.
Change it to:
CREATE TABLE `country` (
`id` int(11) NOT NULL auto_increment,
`expiry_date` datetime NOT NULL,
`name` varchar(50) collate utf8_unicode_ci NOT NULL,
`symbol` varchar(5) collate utf8_unicode_ci NOT NULL,
`exchange_rate` decimal(11,5) NOT NULL default '1.00000',
`code` char(3) collate utf8_unicode_ci NOT NULL,
`currency_code` varchar(3) collate utf8_unicode_ci NOT NULL,
`display_order` smallint(6) unsigned NOT NULL default '0',
PRIMARY KEY (`id`),
KEY `code` (`code`),
KEY `currency_code` (`currency_code`),
KEY `expiry` (`expiry_date`,`name`,`display_order`) <<- renamed key for clarity
/* always name compound keys for their left-most parts*/
KEY `name` (`name`) <<-- new key here
KEY `display` (`display_order`) <<--new key here
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
-- the query
SELECT `country`.*
FROM `country`
WHERE `country`.`expiry_date` = 0
ORDER BY `country`.`display_order` ASC, `country`.`name` ASC;
Compound indexes are tricky
MySQL did not use the index on name in the compound index, because name was in the middle and MySQL only uses parts of an index if that part is the left-most part of a compound index.
The same goes for the index on field display order. The compound index that has display_order in it uses that field as it's right-most part, and therefore will not sort.
Solution
Make a separate index for field name,
and a separate index for field display_order.
Sometimes MySQL does not use an index, even if one is available. One circumstance under which this occurs is when the optimizer estimates that using the index would require MySQL to access a very large percentage of the rows in the table. (In this case, a table scan is likely to be much faster because it requires fewer seeks.) However, if such a query uses LIMIT to retrieve only some of the rows, MySQL uses an index anyway, because it can much more quickly find the few rows to return in the result.
Also if a large percentage of rows have the same value for a field (> 40% (IIRC)) then MySQL will not use the index.
See: http://dev.mysql.com/doc/refman/5.1/en/mysql-indexes.html
See: http://dev.mysql.com/doc/refman/5.1/en/index-hints.html
On how to force indexes as per FractalizeR suggestion.
Make sure to time your select after forcing the index
On such a simple query MySQL seems unlikely to be wrong, and your select time of 0.0013 seconds suggests that there are few rows in the table.
Indexes don't work as you'd expect when there are few rows in a table, because of the percentage rule stated above.
Note that in this case forcing the index would not have worked, because you cannot force MySQL to use the rightmost part of a compound index. It just cannot do that.
If you think MySQL chooses indexes unwisely and you are sure of that, use FORCE INDEX index hint: http://dev.mysql.com/doc/refman/5.1/en/index-hints.html
Your query has an ORDER BY on columns {display_order}+{name}, while
your index named "display_order" is in fact defined on columns {expiry_date}+{name}+{display_order}.
The order of columns in the index does matter. You can benefit an index if you need sorting of filtering on columns that are the beginning of the index.
This become obvious if you keep in mind that index are pre-sorted information.
If you want to benefit an index on {display_order}+{name} then you need an index that begins with {display_order}+{name}. For example {display_order}+{name} or {display_order}+{name}+{expiry_date}.
So in order to optimize your query, you have to change your index in the table, or your SORT clause in the query.
last thing you can do is, use "FORCE INDEX" as mentionten by fractalizeR